high performance computing on graphics processing units: hgpu.org

Posts

Nov, 5

An intelligent semi-automatic application porting system for application accelerators

Work involving the use of application acceleration devices is showing great promise, however, there are still major obstacles preventing their widespread adoption. Currently the process of porting applications to an accelerator requires expertise in both the computer science and application domains, due to the lack of abstraction available. We present our work associated with the […]

Nov, 5

Interactive machinability analysis of free-form surfaces using multiple-view image space techniques on the GPU

In this paper we present a set of graphics hardware accelerated algorithms to interactively evaluate the machinability of complex free-form surfaces. These algorithms work in image space and easily interface with all common formats available on CAD systems. The running time of these algorithms is independent of the complexity of the surface to be analyzed […]

Nov, 5

An experimental approach to performance measurement of heterogeneous parallel applications using CUDA

Heterogeneous parallel systems using GPU devices for application acceleration have garnered significant attention in the supercomputing community. However, to realize the full potential of GPU computing, application developers will require tools to measure and analyze accelerator performance with respect to the parallel execution as a whole. A performance measurement technology for the NVIDIA CUDA platform […]

CUDA

Nov, 5

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

Because of their tremendous computing power and remarkable cost efficiency, GPUs (graphic processing unit) have quickly emerged as a kind of influential platform for high performance computing. However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application. On a conditional branch where […]

CUDA

Nov, 5

GPU-Accelerated Nearest Neighbor Search for 3D Registration

Nearest Neighbor Search (NNS) is employed by many computer vision algorithms. The computational complexity is large and constitutes a challenge for real-time capability. The basic problem is in rapidly processing a huge amount of data, which is often addressed by means of highly sophisticated search methods and parallelism. We show that NNS based vision algorithms […]

CUDA

Nov, 5

Debugging GPU stream programs through automatic dataflow recording and visualization

We present a novel framework for debugging GPU stream programs through automatic dataflow recording and visualization. Our debugging system can help programmers locate errors that are common in general purpose stream programs but very difficult to debug with existing tools. A stream program is first compiled into an instrumented program using a compiler. This instrumenting […]

CUDA

Nov, 5

GPU for Parallel On-Board Hyperspectral Image Processing

Hyperspectral analysis algorithms exhibit inherent parallelism at multiple levels, and map nicely on high performance systems such as massively parallel clusters and networks of computers. Unfortunately, these systems are generally expensive and difficult to adapt to onboard data processing scenarios, in which low-weight and low-power integrated components are desirable to reduce mission pay-load. An exciting […]

Nov, 5

RenderAnts: Interactive REYES Rendering on GPUs

We present RenderAnts, the first system that enables interactive REYES rendering on GPUs. Taking RenderMan scenes and shaders as input, our system first compiles RenderMan shaders to GPU shaders. Then all stages of the basic REYES pipeline, including bounding/splitting, dicing, shading, sampling, compositing and filtering, are executed on GPUs using carefully designed dataparallel algorithms. Advanced […]

Nov, 5

Accelerating MATLAB Image Processing Toolbox functions on GPUs

In this paper, we present our effort in developing an open-source GPU (graphics processing units) code library for the MATLAB Image Processing Toolbox (IPT). We ported a dozen of representative functions from IPT and based on their inherent characteristics, we grouped these functions into four categories: data independent, data sharing, algorithm dependent and data dependent. […]

CUDA

•

OpenCL

Nov, 5

Accelerating advanced MRI reconstructions on GPUs

Computational acceleration on graphics processing units (GPUs) can make advanced magnetic resonance imaging (MRI) reconstruction algorithms attractive in clinical settings, thereby improving the quality of MR images across a broad spectrum of applications. This paper describes the acceleration of such an algorithm on NVIDIA

CUDA

Nov, 5

Efficient computation of sum-products on GPUs through software-managed cache

We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implementation of a software-managed cache. We also present an analytical model for performance analysis of such algorithms. We apply this […]

CUDA

Nov, 5

Iterative induced dipoles computation for molecular mechanics on GPUs

In this work, we present a first step towards the efficient implementation of polarizable molecular mechanics force fields with GPU acceleration. The computational bottleneck of such applications is found in the treatment of electrostatics, where higher-order multipoles and a self-consistent treatment of polarization effects are needed. We have coded these sections, for the case of […]

CUDA