high performance computing on graphics processing units: hgpu.org

Posts

Mar, 21

Shallow water simulations on multiple GPUs

We present a state-of-the-art shallow water simulator running on multiple GPUs. Our implementation is based on an explicit high-resolution finite volume scheme suitable for modeling dam breaks and flooding. We use row domain decomposition to enable multi-GPU computations, and perform traditional CUDA block decomposition within each GPU for further parallelism. Our implementation shows near perfect […]

CUDA

Mar, 20

Multicore and GPU Programming Models, Languages and Compilers Workshop, PLC 2012

Co-located with 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012). This workshop provides a forum for the presentation of research on all aspects of GPU and multicore processors programming models, compiler optimizations, language extensions, and software tools for GPU and Multicore processor platforms. Areas of interest include but are not limited to the […]

Mar, 20

G-Node Workshop on Neuronal GPU Computing

Graphics processing units (GPUs) offer a low-cost approach to parallel high-performance computing. Neuronal simulations can be parallelized efficiently and are particularly well suited for implementation on GPUs. There is also great potential for GPU-based high-throughput analysis of neuronal data. The field is progressing at rapid pace, and has reached a point where it may strongly […]

Mar, 19

Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware

The CUDA model for graphics processing units (GPUs) presents the programmer with a plethora of different programming options. These includes different memory types, different memory access methods and different data types. Identifying which options to use and when is a non-trivial exercise. This paper explores the effect of these different options on the performance of […]

CUDA

Mar, 19

Spatial Join with R-Tree on Graphics Processing Units

Spatial operations such as spatial join combine two objects on spatial predicates. It is different from relational join because objects have multi dimensions and spatial join consumes large execution time. Recently, many researches tried to find methods to improve the execution time. Parallel spatial join is one method to improve the execution time. Comparison between […]

CUDA

Mar, 19

Analysis of the Performance of the Fish School Search Algorithm Running in Graphic Processing Units

Fish School Search (FSS) is a computational intelligence technique invented by Bastos-Filho and Lima-Neto in 2007 and first presented in Bastos-Filho et al. (2008). FSS was conceived to solve search problems and it is based on the social behavior of schools of fish. In the FSS algorithm, the search space is bounded and each possible […]

CUDA

Mar, 19

GPU Enhanced Simulation of Angiogenesis

In the paper we present the use of graphic processor units to accelerate the most time-consuming stages of a simulation of angiogenesis and tumor growth. By the use of advanced CUDA mechanisms such as shared memory, textures and atomic operations, we managed to speed up the CUDA kernels by a factor of 57x. However, in […]

CUDA

Mar, 19

Parallelization of Particle Filter Algorithms

This paper presents the parallelization of the particle filter algorithm in a single target video tracking application. In this document we demonstrate the process by which we parallelized the particle filter algorithm, beginning with a MATLAB implementation. The final CUDA program provided approximately 71x speedup over the initial MATLAB implementation.

CUDA

Mar, 19

Computational Intelligence on Consumer Games and Graphics Hardware CIGPU-2012

The fifth International workshop and tutorial on Computational Intelligence on Consumer Games and Graphics Hardware (CIGPU 2012) will be held as a hybrid special session of the IEEE WCCI 2012 conference in Brisbane, 10-15 June 2012. WCCI 2012, the IEEE world congress on computational intelligence, joins together three international conferences: IJCNN 2012, FUZZ-IEEE 2012 and […]

Mar, 18

Improving Cache Locality for Ray Casting with CUDA

In this paper, we present an acceleration method for texture-based ray casting on the compute unified device architecture (CUDA) compatible graphics processing unit (GPU). Since ray casting is a memory-intensive application, our method increases the hit rate of the texture cache during rendering. To achieve this, our method dynamically selects the width and height of […]

CUDA

Mar, 18

Towards user transparent parallel multimedia computing on GPU-clusters

The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of MMCA problems, the use of High Performance Computing (HPC) techniques is essential. As most MMCA researchers are not HPC experts, there is an urgent need […]

CUDA

Mar, 18

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

Lack of efficient and transparent interaction with GPU data in hybrid MPI GPU environments challenges GPU acceleration of largescale scientific and engineering computations. A particular challenge is the efficient transfer of noncontiguous data to and from GPU memory. MPI supports such transfers through the use of datatypes, however an efficient means of utilizing datatypes for […]

CUDA