Posts
Mar, 22
Hierarchical N-body simulations with auto-tuning for heterogeneous systems
Algorithms designed to efficiently solve this classical problem of physics fit very well on GPU hardware, and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for many other applications amenable to an N-body formulation. Adding features such as auto-tuning makes multipole-type algorithms ideal for heterogeneous computing environments.
Mar, 22
GPU-based parallel collision detection for fast motion planning
We present parallel algorithms to accelerate collision queries for sample-based motion planning. Our approach is designed for current many-core GPUs and exploits data-parallelism and multi-threaded capabilities. In order to take advantage of the high number of cores, we present a clustering scheme and collision-packet traversal to perform efficient collision queries on multiple configurations simultaneously. Furthermore, […]
Mar, 22
Multi-target vectorization with MTPS C++ generic library
This article introduces a C++ template library dedicated at vectorizing algorithms for different target architectures: Multi-Target Parallel Skeleton (MTPS). Skeletons describing the data structures and algorithms are provided and allow MTPS to generate a code with optimized memory access patterns for the choosen architecture. MTPS currently supports x86-64 multicore CPUs and CUDA enabled GPUs. On […]
Mar, 22
High Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing
Magnetic Resonance Imaging (MRI) is one of the fields that the compressed sensing theory is well utilized to reduce the scan time significantly leading to faster imaging or higher resolution images. It has been shown that a small fraction of the overall measurements are sufficient to reconstruct images with the combination of compressed sensing and […]
Mar, 21
Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs
Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO […]
Mar, 21
Parallel Two-Stage Least Squares algorithms for Simultaneous Equations Models on GPU
Today it is usual to have computational systems formed by a multicore together with one or more GPUs. These systems are heterogeneous, due to the di erent types of memory in the GPUs and to the di erent speeds of computation of the cores in the CPU and the GPU. To accelerate the solution of […]
Mar, 21
Fast Antenna Characterization Using the Sources Reconstruction Method on Graphics Processors
The Sources Reconstruction Method (SRM) is a non-invasive technique for, among other applications, antenna characterization. The SRM is based on obtaining a distribution of equivalent currents that radiate the same field as the antenna under test. The computation of these currents requires solving a linear system, usually ill-posed, that may be very computationally demanding for […]
Mar, 21
CPU-GPU Hybrid Parallel Binomial American Option Pricing
We present in this paper a novel parallel binomial algorithm that computes the price of an American option. The algorithm partitions a binomial tree constructed for the pricing into blocks of multiple levels of nodes, and assigns each such block to multiple processors. Each of the processors then computes the option’s values at its assigned […]
Mar, 21
Shallow water simulations on multiple GPUs
We present a state-of-the-art shallow water simulator running on multiple GPUs. Our implementation is based on an explicit high-resolution finite volume scheme suitable for modeling dam breaks and flooding. We use row domain decomposition to enable multi-GPU computations, and perform traditional CUDA block decomposition within each GPU for further parallelism. Our implementation shows near perfect […]
Mar, 20
Multicore and GPU Programming Models, Languages and Compilers Workshop, PLC 2012
Co-located with 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012). This workshop provides a forum for the presentation of research on all aspects of GPU and multicore processors programming models, compiler optimizations, language extensions, and software tools for GPU and Multicore processor platforms. Areas of interest include but are not limited to the […]
Mar, 20
G-Node Workshop on Neuronal GPU Computing
Graphics processing units (GPUs) offer a low-cost approach to parallel high-performance computing. Neuronal simulations can be parallelized efficiently and are particularly well suited for implementation on GPUs. There is also great potential for GPU-based high-throughput analysis of neuronal data. The field is progressing at rapid pace, and has reached a point where it may strongly […]
Mar, 19
Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware
The CUDA model for graphics processing units (GPUs) presents the programmer with a plethora of different programming options. These includes different memory types, different memory access methods and different data types. Identifying which options to use and when is a non-trivial exercise. This paper explores the effect of these different options on the performance of […]