Posts
Mar, 23
Advanced Programming Platform for efficient use of Data Parallel Hardware
Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly interesting for scientific groups, which traditionally use mainly CPU as a work horse, and now can profit of the […]
Mar, 23
A Co-Prime Blur Scheme for Data Security in Video Surveillance
This paper presents a novel Coprime Blurred Pair (CBP) model for visual data-hiding for security in camera surveillance. While most previous approaches have focused on completely encrypting the video stream, we introduce a spatial encryption scheme by blurring the image/video contents to create a CBP. Our goal is to obscure detail in public video streams […]
Mar, 22
Impact of asynchronism on GPU accelerated parallel iterative computations
We study the impact of asynchronism on parallel iterative algorithms in the particular context of local clusters of workstations including GPUs. The application test is a classical PDE problem of advection-diffusion-reaction in 3D. We propose an asynchronous version of a previously developed PDE solver using GPUs for the inner computations. The algorithm is tested with […]
Mar, 22
Hierarchical N-body simulations with auto-tuning for heterogeneous systems
Algorithms designed to efficiently solve this classical problem of physics fit very well on GPU hardware, and exhibit excellent scalability on many GPUs. Their computational intensity makes them a promising approach for many other applications amenable to an N-body formulation. Adding features such as auto-tuning makes multipole-type algorithms ideal for heterogeneous computing environments.
Mar, 22
GPU-based parallel collision detection for fast motion planning
We present parallel algorithms to accelerate collision queries for sample-based motion planning. Our approach is designed for current many-core GPUs and exploits data-parallelism and multi-threaded capabilities. In order to take advantage of the high number of cores, we present a clustering scheme and collision-packet traversal to perform efficient collision queries on multiple configurations simultaneously. Furthermore, […]
Mar, 22
Multi-target vectorization with MTPS C++ generic library
This article introduces a C++ template library dedicated at vectorizing algorithms for different target architectures: Multi-Target Parallel Skeleton (MTPS). Skeletons describing the data structures and algorithms are provided and allow MTPS to generate a code with optimized memory access patterns for the choosen architecture. MTPS currently supports x86-64 multicore CPUs and CUDA enabled GPUs. On […]
Mar, 22
High Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing
Magnetic Resonance Imaging (MRI) is one of the fields that the compressed sensing theory is well utilized to reduce the scan time significantly leading to faster imaging or higher resolution images. It has been shown that a small fraction of the overall measurements are sufficient to reconstruct images with the combination of compressed sensing and […]
Mar, 21
Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs
Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO […]
Mar, 21
Parallel Two-Stage Least Squares algorithms for Simultaneous Equations Models on GPU
Today it is usual to have computational systems formed by a multicore together with one or more GPUs. These systems are heterogeneous, due to the di erent types of memory in the GPUs and to the di erent speeds of computation of the cores in the CPU and the GPU. To accelerate the solution of […]
Mar, 21
Fast Antenna Characterization Using the Sources Reconstruction Method on Graphics Processors
The Sources Reconstruction Method (SRM) is a non-invasive technique for, among other applications, antenna characterization. The SRM is based on obtaining a distribution of equivalent currents that radiate the same field as the antenna under test. The computation of these currents requires solving a linear system, usually ill-posed, that may be very computationally demanding for […]
Mar, 21
CPU-GPU Hybrid Parallel Binomial American Option Pricing
We present in this paper a novel parallel binomial algorithm that computes the price of an American option. The algorithm partitions a binomial tree constructed for the pricing into blocks of multiple levels of nodes, and assigns each such block to multiple processors. Each of the processors then computes the option’s values at its assigned […]
Mar, 21
Shallow water simulations on multiple GPUs
We present a state-of-the-art shallow water simulator running on multiple GPUs. Our implementation is based on an explicit high-resolution finite volume scheme suitable for modeling dam breaks and flooding. We use row domain decomposition to enable multi-GPU computations, and perform traditional CUDA block decomposition within each GPU for further parallelism. Our implementation shows near perfect […]