high performance computing on graphics processing units: hgpu.org

Posts

Nov, 5

CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs

If you need to learn CUDA but don’t have experience with parallel computing, CUDA Programming: A Developer’s Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts […]

CUDA

Nov, 5

A survey of GPU-based medical image computing techniques

Medical imaging currently plays a crucial role throughout the entire clinical applications from medical scientific research to diagnostics and treatment planning. However, medical imaging procedures are often computationally demanding due to the large three-dimensional (3D) medical datasets to process in practical clinical applications. With the rapidly enhancing performances of graphics processors, improved programming support, and […]

Nov, 5

Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures

Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly […]

CUDA

Nov, 5

Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering

Ambient occlusion techniques were introduced to improve data comprehension by bringing soft fading shadows to the visualization of 3D datasets. They consist in attenuating light by considering the occlusion resulting from the presence of neighboring structures. Nevertheless they often come with an important precomputation cost, which prevents their use in interactive applications based on transfer […]

CUDA

Nov, 5

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract […]

OpenCL

Nov, 5

Kite: Braided Parallelism for Heterogeneous Systems

Modern processors are evolving into hybrid, heterogeneous processors with both CPU and GPU cores used for general purpose computation. Several languages, such as BrookGPU, CUDA, and more recently OpenCL, have been developed to harness the potential of these processors. These languages typically involve control code running on a host CPU, while performance-critical, massively data-parallel kernel […]

OpenCL

Nov, 1

Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the […]

CUDA

Nov, 1

Quantum.Ligand.Dock: protein-ligand docking with quantum entanglement refinement on a GPU system

Quantum.Ligand.Dock (protein-ligand docking with graphic processing unit (GPU) quantum entanglement refinement on a GPU system) is an original modern method for in silico prediction of protein-ligand interactions via high-performance docking code. The main flavour of our approach is a combination of fast search with a special account for overlooked physical interactions. On the one hand, […]

OpenCL

Nov, 1

DL: A data layout transformation system for heterogeneous computing

For many-core architectures like the GPUs, efficient off-chip memory access is crucial to high performance; the applications are often limited by off-chip memory bandwidth. Transforming data layout is an effective way to reshape the access patterns to improve off-chip memory access behavior, but several challenges had limited the use of automated data layout transformation systems […]

OpenCL

Nov, 1

Numerical Simulation of the Frank-Kamenetskii PDE: GPU vs. CPU Computing

The efficient solution of the Frank-Kamenetskii partial differential equation through the implementation of parallelized numerical algorithms or GPUs (Graphics Processing Units) in MATLAB is a natural progression of the work which has been conducted in an area of practical import. There is an on-going interest in the mathematics describing thermal explosions due to the significance […]

CUDA

Nov, 1

An Intermediate Library for Multi-GPUs Computing Skeletons

This paper introduces a library which supports programmers to write parallel programs on GPU architecture, especially with a system consisting of multi-GPUs. The library is designed from the idea of skeletons, which helps us to make parallel programs easily and quickly as if writing sequential programs. Skeletons usually are described by functional language which supports […]

CUDA

Nov, 1

Speeding up the evaluation of evolutionary learning systems using GPGPUs

In this paper we introduce a method for computing fitness in evolutionary learning systems based on NVIDIA’s massive parallel technology using the CUDA library. Both the match process of a population of classifiers against a training set and the computation of the fitness of each classifier from its matches have been parallelized. This method has […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs

A survey of GPU-based medical image computing techniques

Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures

Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Kite: Braided Parallelism for Heterogeneous Systems

Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

Quantum.Ligand.Dock: protein-ligand docking with quantum entanglement refinement on a GPU system

DL: A data layout transformation system for heterogeneous computing

Numerical Simulation of the Frank-Kamenetskii PDE: GPU vs. CPU Computing

An Intermediate Library for Multi-GPUs Computing Skeletons

Speeding up the evaluation of evolutionary learning systems using GPGPUs

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)