high performance computing on graphics processing units: hgpu.org

Posts

Jan, 11

CUDA-based real time surgery simulation

In this paper we present a general software platform that enables real time surgery simulation on the newly available compute unified device architecture (CUDA)from NVIDIA. CUDA-enabled GPUs harness the power of 128 processors which allow data parallel computations. Compared to the previous GPGPU, it is significantly more flexible with a C language interface. We report […]

CUDA

Jan, 11

Fast binding site mapping using GPUs and CUDA

Binding site mapping refers to the computational prediction of the regions on a protein surface that are likely to bind a small molecule with high affinity. The process involves flexibly docking a variety of small molecule probes and finding a consensus site that binds most of those probes. Due to the computational complexity of flexible […]

CUDA

Jan, 11

High-performance CUDA kernel execution on FPGAs

In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common […]

CUDA

Jan, 11

Quality-score guided error correction for short-read sequencing data using CUDA

Recently introduced new sequencing technologies can produce massive amounts of short-read data. Detection and correction of sequencing errors in this data is an important but time-consuming pre-processing step for de-novo genome assembly. In this paper, we demonstrate how the quality-score value associated with each base-call can be integrated in a CUDA-based parallel error correction algorithm. […]

CUDA

Jan, 10

High-speed volume ray casting with CUDA

Volume ray casting experiences a renewed interest in the last decade. Largely due to the graphics hardware, which enabled real-time implementations competitive in speed with slicing. However these implementations need specialized shader languages and are forced to use graphics APIs. It makes implementation of advanced methods difficult and hinders performance, bending the programming and execution […]

CUDA

Jan, 10

Parallel drainage network computation on CUDA

Drainage networks determination from Digital Elevation Models (DEM) has been a widely studied problem in the last three decades. During this time, satellite technology has been improving and optimizing digitalized images, and computers have been increasing their capabilities to manage such a huge quantity of information. The rapid growth of CPU power and memory size […]

CUDA

Jan, 10

Canny edge detection on NVIDIA CUDA

The Canny edge detector is a very popular and effective edge feature detector that is used as a pre-processing step in many computer vision algorithms. It is a multi-step detector which performs smoothing and filtering, non-maxima suppression, followed by a connected-component analysis stage to detect ldquotruerdquo edges, while suppressing ldquofalserdquo non edge filter responses. While […]

CUDA

Jan, 10

Molecular Dynamics Simulations on Commodity GPUs with CUDA

Molecular dynamics simulations are a common and often repeated task in molecular biology. The need for speeding up this treatment comes from the requirement for large system simulations with many atoms and numerous time steps. In this paper we present a new approach to high performance molecular dynamics simulations on graphics processing units. Using modern […]

CUDA

Jan, 10

Comparing Hardware Accelerators in Scientific Applications: A Case Study

Multi-core processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the application’s performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ […]

CUDA

•

OpenCL

Jan, 10

Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA

This paper describes several parallel algorithmic variations of the Neville elimination. This elimination solves a system of linear equations making zeros in a matrix column by adding to each row an adequate multiple of the preceding one. The parallel algorithms are run and compared on different multi- and many-core platforms using parallel programming techniques as […]

CUDA

Jan, 10

Importance sampling algorithms for first passage time probabilities in the infinite server queue

This paper applies importance sampling simulation for estimating rare event probabilities of the first passage time in the infinite server queue with renewal arrivals and general service time distributions. We consider importance sampling algorithms which are based on large deviations results of the infinite server queue, and we consider an algorithm based on the cross-entropy […]

Jan, 10

Dense optical flow by iterative local window registration

We study dense optical flow estimation using iterative registration of local window, also known as iterative Lucas-Kanade (LK) [B. Lucas et al, 1981]. We show that the usual iterative-warping scheme encounters divergence problems and propose a modified scheme with better behavior. It yields good results with a much lower cost than the exact dense LK […]

CUDA