high performance computing on graphics processing units: hgpu.org

Posts

Apr, 19

LightScan: Faster Scan Primitive on CUDA Compatible Manycore Processors

Scan (or prefix sum) is a fundamental and widely used primitive in parallel computing. In this paper, we present LightScan, a faster parallel scan primitive for CUDA-enabled GPUs, which investigates a hybrid model combining intra-block computation and inter-block communication to perform a scan. Our algorithm employs warp shuffle functions to implement fast intra-block computation and […]

CUDA

Apr, 16

GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server

Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s to 1000s of machines to train networks with tens of layers and billions of connections. While the computation involved can be done more efficiently on GPUs than on more traditional CPU cores, training such networks on a […]

CUDA

Apr, 16

Fluid Simulation by the Smoothed Particle Hydrodynamics Method: A Survey

This paper presents a survey of Smoothed Particle Hydrodynamics (SPH) and its use in computational fluid dynamics. As a truly mesh-free particle method based upon the Lagrangian formulation, SPH has been applied to a variety of different areas in science, computer graphics and engineering. It has been established as a popular technique for fluid based […]

CUDA

Apr, 16

Efficiency of general Krylov methods on GPUs – An experimental study

This paper compares different Krylov methods based on short recurrences with respect to their efficiency when implemented on GPUs. The comparison includes BiCGSTAB, CGS, QMR, and IDR using different shadow space dimensions. These methods are known for their good convergence characteristics. For a large set of test matrices taken from the University of Florida Matrix […]

CUDA

•

OpenCL

Apr, 16

pocl: A Performance-Portable OpenCL Implementation

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left […]

OpenCL

Apr, 16

Parallel data mining algorithms for multi-dimensional points on GPUs

Data mining tasks such as clustering, outlier detection and similarity search typically employ a series of algorithms to operate on a large set of data, making them amenable to parallelization. Thus parallelization of data mining operations such as distance computation has been extensively studied in the literature. In recent years, the use of Graphics Processing […]

OpenCL

Apr, 14

The 5th International Conference on Information and Knowledge Management (ICIKM), 2016

Index Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. COMMITTEE Conference Chairs Prof. Chen-Huei Chou, College of Charleston, USA Prof. Yongsheng Ma, University of Alberta, Canada Prof. Jiangping Wang, Webster University, USA Local Chair Dr. Xiaoyu Zeng, Beijing Wuzi University, China AGENDA July 22, 2016 – Registration & Conference Materials Collection July […]

Apr, 14

International Conference on Engineering Design and Analysis (ICEDA), 2016

The conference committees are consisted of professors, specialists and distinguished researchers from all over the world. Publication All registered will be published in Conference Proceedings, which will be indexed by Ei Compendex and Scopus. Some Excellent papers will be recommended to International Journal of Engineering and Technology (IJET) and International Journal of Mechanical Engineering and […]

Apr, 14

International Conf. on Computational Biology and Biological Engineering (ICCBB), 2016

Publication All accepted papers of ICCBB 2016 will be published in Conference Proceedings, which will be indexed by Ei Compendex and Scopus. Some of registered papers will be recommended to be published in International Journal of Bioscience, Biochemistry and Bioinformatics (IJBBB). Supported by University of Malaya, Malaysia Universiti Teknologi MARA, Malaysia Universiti Teknologi Malaysia, Malaysia […]

Apr, 14

Breadth First Search Vectorization on the Intel Xeon Phi

Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its importance, a number of different parallel programming models and architectures have been exploited to optimize the BFS. […]

Apr, 14

High-level GPU programming in Julia

GPUs are popular devices for accelerating scientific calculations. However, as GPU code is usually written in low-level languages, it breaks the abstractions of high-level languages popular with scientific programmers. To overcome this, we present a framework for CUDA GPU programming in the high-level Julia programming language. This framework compiles Julia source code for GPU execution, […]

CUDA

Apr, 14

GPU-FV: Realtime Fisher Vector and Its Applications in Video Monitoring

Fisher vector has been widely used in many multimedia retrieval and visual recognition applications with good performance. However, the computation complexity prevents its usage in real-time video monitoring. In this work, we proposed and implemented GPU-FV, a fast Fisher vector extraction method with the help of modern GPUs. The challenge of implementing Fisher vector on […]

CUDA