high performance computing on graphics processing units: hgpu.org

Posts

Sep, 27

A GPU approach to parallel replica-exchange polymer simulations

We investigate new programming techniques for parallel tempering Monte Carlo simulations of an elementary bead-spring homopolymer model using graphics processing units (GPUs). For a precise estimation of statistical quantities, like the peak structure of the specific heat, a large number of conformations with substantial statistical data is needed. Therefore the advantage of gathering this data […]

CUDA

Sep, 27

A framework to implement a multifrontal scheme on GPU architectures with OpenCL

In this work we analyze an open-source multifrontal solver implementation (UMFPACK) and modify it to transfer the computation load on an OpenCL device, typically a GPU. To achieve this result the dbOpenCL library has been created, which allows a neat integration of OpenCL code into existent C or C++ code. An analysis and pro ling […]

OpenCL

Sep, 27

GPU Accelerated Computation of the ICON Model

The main objective of this work is to explore the capacity of modern GPUs to accelerate the ICON (ICOsahedral Non-hydrostatic) model [4] developed by the Max-Planck-Institut fur Meteorologie (MPI-M) in Hamburg in collaboration with the Deutscher Wetterdienst (DWD). The ICON model is an atmospheric general circulation model suited for both global and regional scale simulation.

OpenCL

Sep, 27

PIPS Is not (just) Polyhedral Software

Parallel and heterogeneous computing are growing in audience thanks to the increased performance brought by ubiquitous manycores and GPUs. However, available programming models, like OPENCL or CUDA, are far from being straightforward to use. As a consequence, several automated or semi-automated approaches have been proposed to automatically generate hardware-level codes from high-level sequential sources. Polyhedral […]

CUDA

•

OpenCL

Sep, 27

E(A+M)PEC – An OpenCL Atomic and Molecular Plasma Emission Code For Interstellar Medium Simulations

E(A+M)PEC traces the ionization structure, cooling and emission spectra of plasmas. It is written in OpenCL, runs in NVIDIA Graphics Processor Units and can be coupled to any HD or MHD code to follow the dynamical and thermal evolution of any plasma in, e.g., the interstellar medium (ISM).

OpenCL

Sep, 26

PGEM: Preemptive GPGPU Execution Model for Runtime Engines

General-purpose computing on graphics processing units, also known as GPGPU, is a burgeoning technique to enhance the computation of parallel programs. Applying this technique to real-time applications, however, requires additional support for timeliness of execution. In particular, the non-preemptive nature of GPGPU, associated with copying data to/from the device memory and launching code onto the […]

CUDA

Sep, 26

Manycore high-performance computing in bioinformatics

Mining the increasing amount of genomic data requires having very efficient tools. Increasing the efficiency can be obtained with better algorithms, but one could also take advantage of the hardware itself to reduce the application runtimes. Since a few years, issues with heat dissipation prevent the processors from having higher frequencies. One of the answers […]

CUDA

•

OpenCL

Sep, 26

Generating GPU Code from a High-level Representation for Image Processing Kernels

We present a framework for representing image processing kernels based on decoupled access/execute metadata, which allow the programmer to specify both execution constraints and memory access pattern of a kernel. The framework performs source-to-source translation of kernels expressed in highlevel framework-specific C++ classes into low-level CUDA or OpenCL code with effective device-dependent optimizations such as […]

CUDA

•

OpenCL

Sep, 26

LLVM to PTX Backend

The low-level virtual machine (LLVM) compiler infrastructure is a mature and stable framework to implement optimization and compiler passes. H. Rhodin presented an LLVM backend to generate Parallel Thread Execution (PTX) instructions from LLVM bitcode. PTX is used as intermediate representation for parallel programming. This paper discusses Rhodin’s PTX generator. Due to the similarity between […]

CUDA

Sep, 26

A Uniform Platform to Support Multigenerational GPUs for High Performance Stream-based Computing

GPU-based computing has become one of the popular high performance computing fields. The field is called GPGPU. This paper is focused on design and implementation of a uniform GPGPU application that is optimized for both the legacy and the recent GPU architectures. As a typical example of such the GPGPU application, this paper will discuss […]

CUDA

•

OpenCL

•

OpenGL

Sep, 26

High-level GPU computing with jacket for MATLAB and C/C++

We describe a software platform for the rapid development of general purpose GPU (GPGPU) computing applications within the MATLAB computing environment, C, and C++: Jacket. Jacket provides thousands of GPU-tuned function syntaxes within MATLAB, C, and C++, including linear algebra, convolutions, reductions, and FFTs as well as signal, image, statistics, and graphics libraries. Additionally, Jacket […]

CUDA

•

OpenGL

Sep, 26

From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

In this work, we evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide a […]

CUDA

•

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A GPU approach to parallel replica-exchange polymer simulations

A framework to implement a multifrontal scheme on GPU architectures with OpenCL

GPU Accelerated Computation of the ICON Model

PIPS Is not (just) Polyhedral Software

E(A+M)PEC – An OpenCL Atomic and Molecular Plasma Emission Code For Interstellar Medium Simulations

PGEM: Preemptive GPGPU Execution Model for Runtime Engines

Manycore high-performance computing in bioinformatics

Generating GPU Code from a High-level Representation for Image Processing Kernels

LLVM to PTX Backend

A Uniform Platform to Support Multigenerational GPUs for High Performance Stream-based Computing

High-level GPU computing with jacket for MATLAB and C/C++

From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)