Posts
Mar, 15
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices
Breakthroughs from the field of deep learning are radically changing how sensor data are interpreted to extract the high-level information needed by mobile apps. It is critical that the gains in inference accuracy that deep models afford become embedded in future generations of mobile apps. In this work, we present the design and implementation of […]
Mar, 15
DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model
The rising pressure for simultaneously improving performance and reducing power is driving more diversity into all aspects of computing devices. An algorithm that is wellmatched to the target hardware can run multiple times faster and more energy efficiently than one that is not. The problem is complicated by the fact that a program’s input also […]
Mar, 15
Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code
The current trend in next-generation exascale systems goes towards integrating a wide range of specialized (co-)processors into traditional supercomputers. However, the integration of different specialized devices increases the degree of heterogeneity and the complexity in programming such type of systems. Due to the efficiency of heterogeneous systems in terms of Watt and FLOPS per surface […]
Mar, 15
Melia: A MapReduce Framework on OpenCL-based FPGAs
MapReduce, originally developed by Google for search applications, has recently become a popular programming framework for parallel and distributed environments. This paper presents an energy-efficient architecture design for MapReduce on Field Programmable Gate Arrays (FPGAs). The major goal is to enable users to program FPGAs with simple MapReduce interfaces, and meanwhile to embrace automatic performance […]
Mar, 15
Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs
Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity for accelerating MF much further when appropriately exploiting the GPU architectural characteristics. This paper presents cuMF, a CUDA-based matrix factorization library that implements memory-optimized […]
Mar, 14
2nd IEEE International Conference on Computer and Communications (ICCC), 2016
Submission Date: Before July 1 History: Good News! All papers from ICCC 2015 has been included in IEEE Xplore. Supported by: ICCC 2016 is hosted by IEEE and Sichuan Institue of Electronics, co-organized by Southwest Jiaotong University and Xihua University. Publication: All accepted papers must be written in English and will be published into conference […]
Mar, 14
The First Int. Conference on Multimedia and Image Processing (ICMIP), 2016
ICMIP 2016 is organized by University of Brunei Darussalam, Brunei Darussalam. Publication: After a careful reviewing process, all accepted papers will be published in the Conference Proceedings, and send to be reviewed by EI Compendex. Invited Speakers from International Prestigious University: Prof. Amine Bermak, IEEE Fellow, Hong Kong University of Science and Technology, Hong Kong […]
Mar, 14
6th Int. Workshop on Computer Science and Engineering (WCSE), 2016
All accepted of WCSE 2016 will be published by Conference proceedings, which will be indexed by 【EI &Scopus.】 Keynote &Plenary Speakers Prof. Hayato Ohwada, Tokyo University of Science, Japan Prof. Taku Harada, Tokyo University of Science, Japan Prof. Akiko Aizawa, National Institute of Informatics, Japan Prof. Hiroyuki Nishiyama, Tokyo University of Science, Japan Conference Program […]
Mar, 12
Machine Learning at the Limit
Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In roofline design, one exposes […]
Mar, 12
SGO: An ultrafast engine for atomic structure global optimization by differential evolution
This paper presents a fast method for global search of atomic structures. The structures global optimization (SGO) engine consists of a high-efficiency differential evolution algorithm, accelerated local relaxation methods and an ultrafast density functional theory plane-wave code run on GPU machines. It can search the global minimum configuration of crystals, two-dimensional materials and quantum clusters […]
Mar, 12
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the […]
Mar, 12
Clinically applicable Monte Carlo-based biological dose optimization for the treatment of head and neck cancers with spot-scanning proton therapy
Purpose: To demonstrate the feasibility of fast Monte Carlo (MC) based inverse biological planning for the treatment of head and neck tumors in spot-scanning proton therapy. Methods: Recently, a fast and accurate Graphics Processor Unit (GPU)-based MC simulation of proton transport was developed and used as the dose calculation engine in a GPU-accelerated IMPT optimizer. […]

