high performance computing on graphics processing units: hgpu.org

Posts

Dec, 8

A survey of medical image registration on graphics hardware

The rapidly increasing performance of graphics processors, improving programming support and excellent performance-price ratio make graphics processing units (GPUs) a good option for a variety of computationally intensive tasks. Within this survey, we give an overview of GPU accelerated image registration. We address both, GPU experienced readers with an interest in accelerated image registration, as […]

Dec, 7

The 2011 International Conference on High Performance Computing & Simulation, HPCS 2011

The conference is to address, explore and exchange information on the state-of-the-art in high performance and large scale computing systems, their use in modeling and simulation and data intensive applications. We encourage papers with both an application or technology flavor (and their multidisciplinary integration). The scope covers architecture, performance, algorithms, middleware, and applications. Work on […]

Dec, 7

Performance evaluation of image processing algorithms on the GPU

The graphics processing unit (GPU), which originally was used exclusively for visualization purposes, has evolved into an extremely powerful co-processor. In the meanwhile, through the development of elaborate interfaces, the GPU can be used to process data and deal with computationally intensive applications. The speed-up factors attained compared to the central processing unit (CPU) are […]

CUDA

Dec, 7

Fast support vector machine training and classification on graphics processors

Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 9-35x over […]

CUDA

Dec, 7

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Most GPU performance “hypes” have focused around tightly-coupled applications with small memory bandwidth requirements e.g., N-body, but GPUs are also commodity vector machines sporting substantial memory bandwidth; however, effective programming methodologies thereof have been poorly studied. Our new 3-D FFT kernel, written in NVIDIA CUDA, achieves nearly 80 GFLOPS on a top-end GPU, being more […]

CUDA

Dec, 7

BSGP: bulk-synchronous GPU programming

We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. […]

CUDA

Dec, 7

Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

Moore’s Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) […]

Dec, 7

A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets

We present an adaptive out-of-core technique for rendering massive scalar volumes employing single-pass GPU ray casting. The method is based on the decomposition of a volumetric dataset into small cubical bricks, which are then organized into an octree structure maintained out-of-core. The octree contains the original data at the leaves, and a filtered representation of […]

Dec, 7

Vector graphics depicting marbling flow

We present an efficient framework for generating marbled textures that can be exported into a vector graphics format based on an explicit surface tracking method (see Figure 1). The proposed method enables artists to create complex and realistic marbling textures that can be used for design purposes. Our algorithm is unique in that the marbling […]

CUDA

Dec, 7

A Real-Time Multigrid Finite Hexahedra Method for Elasticity Simulation using CUDA

We present a multigrid approach for simulating elastic deformable objects in real time on recent NVIDIA GPU architectures. To accurately simulate large deformations we consider the co-rotated strain formulation. Our method is based on a finite element discretization of the deformable object using hexahedra. It draws upon recent work on multigrid schemes for the efficient […]

CUDA

Dec, 7

GPU-based Monte Carlo simulation in neutron transport and finite differences heat equation evaluation

Graphics Processing Units (GPU) are high performance co-processors originally intended to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate […]

Dec, 7

Simulation of Coarse-Grained Protein-Protein Interactions with Graphics Processing Units

We report a hybrid parallel central and graphics processing units (CPU-GPU) implementation of a coarse-grained model for replica exchange Monte Carlo (REMC) simulations of protein assemblies. We describe the design, optimization, validation, and benchmarking of our algorithms, particularly the parallelization strategy, which is specific to the requirements of GPU hardware. Performance evaluation of our hybrid […]

high performance computing on graphics processing units: hgpu.org

Posts

A survey of medical image registration on graphics hardware

The 2011 International Conference on High Performance Computing & Simulation, HPCS 2011

Performance evaluation of image processing algorithms on the GPU

Fast support vector machine training and classification on graphics processors

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

BSGP: bulk-synchronous GPU programming

Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets

Vector graphics depicting marbling flow

A Real-Time Multigrid Finite Hexahedra Method for Elasticity Simulation using CUDA

GPU-based Monte Carlo simulation in neutron transport and finite differences heat equation evaluation

Simulation of Coarse-Grained Protein-Protein Interactions with Graphics Processing Units

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)