high performance computing on graphics processing units: hgpu.org

Posts

Jan, 9

Raising the level of many-core programming with compiler technology: meeting a grand challenge

Modern GPUs and CPUs are massively parallel, many-core processors. While application developers for these many-core chips are reporting 10X-100X speedup over sequential code on traditional microprocessors, the current practice of many-core programming based on OpenCL, CUDA, and OpenMP puts strain on software development, testing and support teams. According to the semiconductor industry roadmap, these processors […]

CUDA

•

OpenCL

Jan, 8

Acceleration of FDTD mode solver by high-performance computing techniques

A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the […]

CUDA

Jan, 8

CBESW: sequence alignment on the Playstation 3

BACKGROUND: The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many […]

CUDA

Jan, 8

Four styles of parallel and net programming

This paper reviews the programming landscape for parallel and network computing systems, focusing on four styles of concurrent programming models, and example languages/libraries. The four styles correspond to four scales of the targeted systems. At the smallest coprocessor scale, Single Instruction Multiple Thread (SIMT) and Compute Unified Device Architecture (CUDA) are considered. Transactional memory is […]

CUDA

Jan, 8

Quick-CULLIDE: fast inter- and intra-object collision culling using graphics hardware

We present a fast collision culling algorithm for performing inter- and intra-object collision detection among complex models using graphics hardware. Our algorithm is based on CULLIDE and performs visibility queries on the GPUs to eliminate a subset of geometric primitives that are not in close proximity. We present an extension to CULLIDE to perform intra-object […]

OpenGL

Jan, 8

A constant-space belief propagation algorithm for stereo matching

In this paper, we consider the problem of stereo matching using loopy belief propagation. Unlike previous methods which focus on the original spatial resolution, we hierarchically reduce the disparity search range. By fixing the number of disparity levels on the original resolution, our method solves the message updating problem in a time linear in the […]

Jan, 8

Improving energy and power efficiency using NComputing and approaches for predicting reliability of complex computing systems

Opting to follow the computing-design philosophy that the best way to reduce power consumption and increase energy efficiency is to reduce waste, we propose an architecture with a very simple ready-implementation by using an NComputing device that can allow multi-users but only one computer is needed. This intuitively can save energy, space as well as […]

Jan, 8

Real-time stereo matching using orthogonal reliability-based dynamic programming

A novel algorithm is presented in this paper for estimating reliable stereo matches in real time. Based on the dynamic programming-based technique we previously proposed, the new algorithm can generate semi-dense disparity maps using as few as two dynamic programming passes. The iterative best path tracing process used in traditional dynamic programming is replaced by […]

Jan, 8

Techniques for efficient, real-time, 3D visualization of multi-modality cardiac data using consumer graphics hardware

We exploit consumer graphics hardware to perform real-time processing and visualization of high-resolution, 4D cardiac data. We have implemented real-time, realistic volume rendering, interactive 4D motion segmentation of cardiac data, visualization of multi-modality cardiac data and 3D display of multiple series cardiac MRI. We show that an ATI Radeon 9700 Pro can render a 512x512x128 […]

Jan, 8

GPUMCD: a new GPU-oriented Monte Carlo dose calculation platform

PURPOSE: Monte Carlo methods are considered the gold standard for dosimetric computations in radiotherapy. Their execution time is however still an obstacle to the routine use of Monte Carlo packages in a clinical setting. To address this problem, a completely new, and designed from the ground up for the GPU, Monte Carlo dose calculation package […]

CUDA

Jan, 8

Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU

A GPU becomes an affordable solution for accelerating a slow process on a commercial system. The most of achievements using it for non-rendering problems are the exact re-implementation of existing algorithms designed for a serial CPU. We study about conditions of a good parallel algorithm, and show that it is possible to design an algorithm […]

CUDA

Jan, 8

ClearPath: highly parallel collision avoidance for multi-agent simulation

We present a new local collision avoidance algorithm between multiple agents for real-time simulations. Our approach extends the notion of velocity obstacles from robotics and formulates the conditions for collision free navigation as a quadratic optimization problem. We use a discrete optimization method to efficiently compute the motion of each agent. This resulting algorithm can […]

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

Advances in Semantic Patching for HPC-oriented Refactorings with Coccinelle

DuoReduce: MLIR's benchmark

Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Raising the level of many-core programming with compiler technology: meeting a grand challenge

Acceleration of FDTD mode solver by high-performance computing techniques

CBESW: sequence alignment on the Playstation 3

Four styles of parallel and net programming

Quick-CULLIDE: fast inter- and intra-object collision culling using graphics hardware

A constant-space belief propagation algorithm for stereo matching

Improving energy and power efficiency using NComputing and approaches for predicting reliability of complex computing systems

Real-time stereo matching using orthogonal reliability-based dynamic programming

Techniques for efficient, real-time, 3D visualization of multi-modality cardiac data using consumer graphics hardware

GPUMCD: a new GPU-oriented Monte Carlo dose calculation platform

Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU

ClearPath: highly parallel collision avoidance for multi-agent simulation

Recent source codes

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Most viewed papers (last 30 days)