high performance computing on graphics processing units: hgpu.org

Posts

Nov, 6

Graphics Hardware-Based Level-Set Method for Interactive Segmentation and Visualization

This paper presents an efficient graphics hardware-based method to segment and visualize level-set surfaces as interactive rates. Our method is composed of memory manager, level-set solver, and volume renderer. The memory manager which performs in CPU generates page table, inverse page table and available page stack as well as process the activation and inactivation of […]

Nov, 6

PacketShader: a GPU-accelerated software router

We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding […]

CUDA

Nov, 5

An Introduction to GPU Accelerated Surgical Simulation

Modern graphics processing units (GPUs) have recently become fully programmable. Thus a powerful and cost-efficient new computational platform for surgical simulations has emerged. A broad selection of publications has shown that scientific computations obtain a significant speedup if ported from the CPU to the GPU. To take advantage of the GPU however, one must understand […]

OpenGL

Nov, 5

Multi-Level Graph Layout on the GPU

This paper presents a new algorithm for force directed graph layout on the GPU. The algorithm, whose goal is to compute layouts accurately and quickly, has two contributions. The first contribution is proposing a general multi-level scheme, which is based on spectral partitioning. The second contribution is computing the layout on the GPU. Since the […]

OpenGL

Nov, 5

GPU’s for event reconstruction in the FairRoot framework

FairRoot is the simulation and analysis framework used by CBM and PANDA experiments at FAIR/GSI. The use of graphics processor units (GPUs) for event reconstruction in FairRoot will be presented. The fact that CUDA (Nvidia’s Compute Unified Device Architecture) development tools work alongside the conventional C/C++ compiler, makes it possible to mix GPU code with […]

CUDA

Nov, 5

The GPU as numerical simulation engine

Many computer graphics applications require high-intensity numerical simulation. The question arises whether such computations can be performed efficiently on the GPU, which has emerged as a full function streaming processor with high floating point performance. We show in this paper that this is indeed the case using two basic, broadly useful, computational kernels as examples. […]

Nov, 5

Using GPUs for Machine Learning Algorithms

Using dedicated hardware to do machine learning typically ends up in disaster because of cost, obsolescence, and poor software. The popularization of Graphic Processing Units (GPUs), which are now available on every PC, provides an attractive alternative. We propose a generic 2-layer fully connected neural network GPU implementation which yields over 3X speedup for both […]

Nov, 5

Clustering billions of data points using GPUs

In this paper, we report our research on using GPUs to accelerate clustering of very large data sets, which are common in today’s real world applications. While many published works have shown that GPUs can be used to accelerate various general purpose applications with respectable performance gains, few attempts have been made to tackle very […]

CUDA

Nov, 5

Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing

This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data clustering. The divide-and-conquer approach to parallel data clustering is employed […]

Nov, 5

On Dynamic Load Balancing on Graphics Processors

To get maximum performance on the many-core graphics processors it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically […]

CUDA

Nov, 5

Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA

This paper presents a straightforward implementation of a standard evolutionary algorithm that evaluates its population in parallel on a GPGPU card. Tests done on a benchmark and a real world problem using an old NVidia 8800GTX card and a newer but not top of the range GTX260 card show a roughly 30x (resp. 100x) speedup […]

Nov, 5

MPI within a GPU

GPUs offer high-performance floating-point computation at commodity prices, but their usage is hindered by programming models which expose the user to irregularities in the current shared-memory environments and require learning new interfaces and semantics. This thesis will demonstrate that the message-passing paradigm can be conceptually cleaner than the current data-parallel models for programming GPUs because […]

CUDA