Posts
Mar, 15
Melia: A MapReduce Framework on OpenCL-based FPGAs
MapReduce, originally developed by Google for search applications, has recently become a popular programming framework for parallel and distributed environments. This paper presents an energy-efficient architecture design for MapReduce on Field Programmable Gate Arrays (FPGAs). The major goal is to enable users to program FPGAs with simple MapReduce interfaces, and meanwhile to embrace automatic performance […]
Mar, 15
Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs
Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity for accelerating MF much further when appropriately exploiting the GPU architectural characteristics. This paper presents cuMF, a CUDA-based matrix factorization library that implements memory-optimized […]
Mar, 14
2nd IEEE International Conference on Computer and Communications (ICCC), 2016
Submission Date: Before July 1 History: Good News! All papers from ICCC 2015 has been included in IEEE Xplore. Supported by: ICCC 2016 is hosted by IEEE and Sichuan Institue of Electronics, co-organized by Southwest Jiaotong University and Xihua University. Publication: All accepted papers must be written in English and will be published into conference […]
Mar, 14
The First Int. Conference on Multimedia and Image Processing (ICMIP), 2016
ICMIP 2016 is organized by University of Brunei Darussalam, Brunei Darussalam. Publication: After a careful reviewing process, all accepted papers will be published in the Conference Proceedings, and send to be reviewed by EI Compendex. Invited Speakers from International Prestigious University: Prof. Amine Bermak, IEEE Fellow, Hong Kong University of Science and Technology, Hong Kong […]
Mar, 14
6th Int. Workshop on Computer Science and Engineering (WCSE), 2016
All accepted of WCSE 2016 will be published by Conference proceedings, which will be indexed by 【EI &Scopus.】 Keynote &Plenary Speakers Prof. Hayato Ohwada, Tokyo University of Science, Japan Prof. Taku Harada, Tokyo University of Science, Japan Prof. Akiko Aizawa, National Institute of Informatics, Japan Prof. Hiroyuki Nishiyama, Tokyo University of Science, Japan Conference Program […]
Mar, 12
Machine Learning at the Limit
Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In roofline design, one exposes […]
Mar, 12
SGO: An ultrafast engine for atomic structure global optimization by differential evolution
This paper presents a fast method for global search of atomic structures. The structures global optimization (SGO) engine consists of a high-efficiency differential evolution algorithm, accelerated local relaxation methods and an ultrafast density functional theory plane-wave code run on GPU machines. It can search the global minimum configuration of crystals, two-dimensional materials and quantum clusters […]
Mar, 12
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the […]
Mar, 12
Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy
Purpose: To demonstrate the feasibility of fast Monte Carlo (MC) based inverse biological planning for the treatment of head and neck tumors in spot-scanning proton therapy. Methods: Recently, a fast and accurate Graphics Processor Unit (GPU)-based MC simulation of proton transport was developed and used as the dose calculation engine in a GPU-accelerated IMPT optimizer. […]
Mar, 12
Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models
The move from single core and processor systems to multi-core and many-processors systemscomes with the requirement of implementing computations in a way that can utilizethese multiple units eciently. This task of writing ecient multi-threaded algorithmswill not be possible with out improving programming languages and compilers to providethe mechanisms to do so. Computer aided mathematical modeling […]
Mar, 10
Accelerating Wright-Fisher Forward Simulations on the Graphics Processing Unit
Forward Wright-Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the CPU, thus limiting their usefulness. The single-locus Wright-Fisher forward algorithm is, however, exceedingly parallelizable, with many steps which are so-called embarrassingly parallel, consisting of a vast number of individual computations that are all […]
Mar, 10
Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures
The ubiquity of hybrid CPU+GPU architectures has led to renewed interest in automatic data layout generation owing to the fact that data layouts have a large impact on performance, and that different data layouts yield the best performance on CPUs vs. GPUs. Unfortunately, current programming models still fail to provide an effective solution to the […]