high performance computing on graphics processing units: hgpu.org

Posts

Mar, 24

Implementation of Kernel Methods on the GPU

Kernel methods such as kernel principal component analysis and support vector machines have become powerful tools for pattern recognition and computer vision. Unfortunately the high computational cost of kernel methods is a limiting factor for real-time classification tasks when running on the CPU of a standard PC. Over the last few years, commodity Graphics Processing […]

OpenGL

Mar, 24

Using GPUs for beamforming acceleration on SAFT imaging

SAFT techniques are based on the sequential activation, in emission and reception, of the array elements and the post-processing of all the received signals to compose the image. Thus, the image generation can be divided into two stages: (1) the excitation and acquisition stage, where the signals received by each element or group of elements […]

Mar, 24

Optimizing GPU Volume Rendering

Volume Rendering methods employing the GPU capabilities offer high performance on off-the-shelf hardware. In this article, we discuss the various bottlenecks found in the graphics hardware when performing GPU-based Volume Rendering. The specific properties of each bottleneck and the trade-offs between them are described. Further we present a novel strategy to balance the load on […]

OpenGL

Mar, 24

Digital beamforming using a GPU

In this paper we investigate the use of GPUs as digital beamformers. We specify a parallel implementation of a beamformer in time and frequency domain and measure its performance. We also give examples of the processing limits of NVIDIA Geforce 8800 GPU with respect to application parameters: number of sensors, sampling frequency, bandwidth, and number […]

CUDA

Mar, 24

A realtime GPU subdivision kernel

By organizing the control mesh of subdivision in texture memory so that irregularities occur strictly inside independently refinable fragment meshes, all major features of subdivision algorithms can be realized in the framework of highly parallel stream processing. Our implementation of Catmull-Clark subdivision as a GPU kernel in programmable graphics hardware can model features like semi-smooth […]

OpenGL

Mar, 24

General-purpose GPU computing: practice and experience

This workshop will cover advances and innovations in graphic processor unit (GPU) capabilities and functionality into nontraditional, general-purpose computing as an adjunct vector/matrix processor. Examples include game physics, image processing, scientific computing, sorting and database query processing, to name a few.This workshop consists of invited speakers and poster presenters who provide insights into GP2 practice […]

Mar, 24

Accelerating Simulations of Light Scattering Based on Finite-Difference Time-Domain Method with General Purpose GPUs

Simulations of light scattering from nano-structured surface areas require substantial amount of computing time. The emergence of General Purpose Graphics Processing Units (GPGPUs) as affordable PC SIMD arithmetic coprocessors brings the necessary computing power to modern desktop PCs. In this paper we examine how the computation time of the Finite-Difference Time-Domain (FDTD), a classic numerical […]

CUDA

Mar, 24

Fast in-place sorting with CUDA based on bitonic sort

State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance coprocessors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance […]

CUDA

Mar, 24

A GPU approach to FDTD for Radio Coverage Prediction

The benefits of using Finite-Difference alike methods for coverage prediction comprise highly accurate electromagnetic simulations that serve as a reliable input for wireless networks planning and optimization algorithms. These algorithms usually require several thousands of iterations in order to find the optimal network configuration, so to obtain results within reasonable computation times, the applied propagation […]

CUDA

Mar, 24

A Novel Scheme for High Performance Finite-Difference Time-Domain (FDTD) Computations Based on GPU

Finite-Difference Time-Domain (FDTD) has been proved to be a very useful computational electromagnetic algorithm. However, the scheme based on traditional general purpose processors can be computationally prohibitive and require thousands of CPU hours, which hinders the large-scale application of FDTD. With rapid progress on GPU hardware capability and its programmability, we propose in this paper […]

Mar, 23

Modeling GPU-CPU Workloads and Systems

Heterogeneous systems, systems with multiple processors tailored for specialized tasks, are challenging programming environments. While it may be possible for domain experts to optimize a high performance application for a very specific and well documented system, it may not perform as well or even function on a different system. Developers who have less experience with […]

Mar, 23

Gvim: Gpu-accelerated virtual machines

The use of virtualization to abstract underlying hardware can aid in sharing such resources and in efficiently managing their use by high performance applications. Unfortunately, virtualization also prevents efficient access to accelerators, such as Graphics Processing Units (GPUs), that have become critical components in the design and architecture of HPC systems. Supporting General Purpose computing […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

Implementation of Kernel Methods on the GPU

Using GPUs for beamforming acceleration on SAFT imaging

Optimizing GPU Volume Rendering

Digital beamforming using a GPU

A realtime GPU subdivision kernel

General-purpose GPU computing: practice and experience

Accelerating Simulations of Light Scattering Based on Finite-Difference Time-Domain Method with General Purpose GPUs

Fast in-place sorting with CUDA based on bitonic sort

A GPU approach to FDTD for Radio Coverage Prediction

A Novel Scheme for High Performance Finite-Difference Time-Domain (FDTD) Computations Based on GPU

Modeling GPU-CPU Workloads and Systems

Gvim: Gpu-accelerated virtual machines

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)