high performance computing on graphics processing units: hgpu.org

Posts

Dec, 8

GPU based extraction of moving objects without shadows under intensity changes

This paper proposes a GPU based algorithm for extracting moving objects in real time. The whole process of the proposed approach is handled on GPU. GPU is used for acceleration and the proposed approach increases processing speed dramatically. The method uses a* component and b* component of CIELAB color space without extracting shadow areas as […]

Dec, 8

Overview of implementation of DARPA GPU program in SAIC

This paper reviews the implementation of DARPA MTO STAP-BOY program for both Phase I and II conducted at Science Applications International Corporation (SAIC). The STAP-BOY program conducts fast covariance factorization and tuning techniques for space-time adaptive process (STAP) Algorithm Implementation on Graphics Processor unit (GPU) Architectures for Embedded Systems. The first part of our presentation […]

Dec, 8

Fast Deformable Registration on the GPU: A CUDA Implementation of Demons

In the medical imaging field, we need fast deformable registration methods especially in intra-operative settings characterized by their time-critical applications. Image registration studies which are based on graphics processing units (GPUs) provide fast implementations. However, only a small number of these GPU-based studies concentrate on deformable registration. We implemented Demons, a widely used deformable image […]

CUDA

Dec, 8

A survey of medical image registration on graphics hardware

The rapidly increasing performance of graphics processors, improving programming support and excellent performance-price ratio make graphics processing units (GPUs) a good option for a variety of computationally intensive tasks. Within this survey, we give an overview of GPU accelerated image registration. We address both, GPU experienced readers with an interest in accelerated image registration, as […]

Dec, 7

The 2011 International Conference on High Performance Computing & Simulation, HPCS 2011

The conference is to address, explore and exchange information on the state-of-the-art in high performance and large scale computing systems, their use in modeling and simulation and data intensive applications. We encourage papers with both an application or technology flavor (and their multidisciplinary integration). The scope covers architecture, performance, algorithms, middleware, and applications. Work on […]

Dec, 7

Performance evaluation of image processing algorithms on the GPU

The graphics processing unit (GPU), which originally was used exclusively for visualization purposes, has evolved into an extremely powerful co-processor. In the meanwhile, through the development of elaborate interfaces, the GPU can be used to process data and deal with computationally intensive applications. The speed-up factors attained compared to the central processing unit (CPU) are […]

CUDA

Dec, 7

Fast support vector machine training and classification on graphics processors

Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 9-35x over […]

CUDA

Dec, 7

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Most GPU performance “hypes” have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being more […]

CUDA

Dec, 7

BSGP: bulk-synchronous GPU programming

We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. […]

CUDA

Dec, 7

Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

Moore’s Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) […]

Dec, 7

A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets

We present an adaptive out-of-core technique for rendering massive scalar volumes employing single-pass GPU ray casting. The method is based on the decomposition of a volumetric dataset into small cubical bricks, which are then organized into an octree structure maintained out-of-core. The octree contains the original data at the leaves, and a filtered representation of […]

Dec, 7

Vector graphics depicting marbling flow

We present an efficient framework for generating marbled textures that can be exported into a vector graphics format based on an explicit surface tracking method (see Figure 1). The proposed method enables artists to create complex and realistic marbling textures that can be used for design purposes. Our algorithm is unique in that the marbling […]

CUDA