high performance computing on graphics processing units: hgpu.org

Posts

Aug, 22

Top ten ways to make formal methods for HPC practical

Almost all fundamental advances in science and engineering crucially depend on the availability of extremely capable high performance computing (HPC) systems. Future HPC systems will increasingly be based on heterogeneous multi-core CPUs, and their programming will involve multiple concurrency models, with the message passing interface (MPI) serving as the dominant model for many years. These […]

CUDA

•

OpenCL

Aug, 22

The VRE volume rendering engine

We present the extendable volume rendering engine VRE which provides an open and flexible environment for both experimental and production level implementation of a wide range of volume visualisation algorithms, including various CPU and GPU based ones. We identify parts of renderer functionality suitable for isolation in logical units and propose various types of plugins. […]

OpenGL

Aug, 22

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a GPU. Capitalizing on the maximum computational power of such architectures (i.e., by simultaneously […]

CUDA

Aug, 22

Reusable software components for accelerator-based clusters

The emerging accelerator-based heterogeneous clusters, comprising specialized processors such as the IBM Cell and GPUs, have exhibited excellent price to performance ratio as well as high energy-efficiency. However, developing and maintaining software for such systems is fraught with challenges, especially for modern high-performance computing (HPC) applications that can benefit the most from leveraging accelerators. If […]

CUDA

Aug, 22

Improving programmability of heterogeneous many-core systems via explicit platform descriptions

In this paper we present ongoing work towards a programming framework for heterogeneous hardware- and software environments. Our framework aims at improving programmability and portability for heterogeneous many-core systems via a Platform Description Language (PDL) for expressing architectural patterns and platform information. We developed a prototypical code generator that takes as input an annotated serial […]

CUDA

•

OpenCL

Aug, 22

Accelerating Haskell array codes with multicore GPUs

Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge. To raise the level of abstraction, we propose a domain-specific high-level language of array computations that captures appropriate idioms in the form of […]

CUDA

Aug, 22

A programming model for GPU-based parallel computing with scalability and abstraction

In this paper, we present a multi-level programming model for recent GPU-based high performance computing systems. Involving cooperative stream threads and symmetric multiprocessing threads our model gives a computational framework that scales through multi-GPU environments to GPU-cluster systems. Instead of hiding the execution environment from the programmer using compiler extensions or metaprogramming techniques we aim […]

CUDA

Aug, 21

Compiling Python to a hybrid execution environment

A new compilation framework enables the execution of numerical-intensive applications, written in Python, on a hybrid execution environment formed by a CPU and a GPU. This compiler automatically computes the set of memory locations that need to be transferred to the GPU, and produces the correct mapping between the CPU and the GPU address spaces. […]

OpenCL

Aug, 21

A declarative API for particle systems

Recent trends in computer-graphics APIs and hardware have made it practical to use high-level functional languages for real-time graphics applications. Thus we have the opportunity to develop new approaches to computer graphics that take advantage of the high-level features of functional languages. This paper describes one such project that uses the techniques of functional programming […]

OpenCL

Aug, 21

Software architecture and system validation of an open, unified model for accelerated multicore computing

For systems that use hardware accelerators to combine multicore and multiprocess technology with libraries and computational kernels, the drawbacks are the complexity of the programming model and the corresponding verification of the software and validation of the system performance capabilities. In this paper, we describe a software approach to utilizing the compute power of the […]

OpenCL

Aug, 21

Mind the gap!: bridging the dichotomy of design and implementation

This paper presents a revamping of a sparse linear algebra design pattern, targeting parallelization within scientific and engineering applications. A proof of concept implementation is developed to compare actual software practices and optimizations with those described in the original design pattern. The case study reveals that the design pattern did not tightly coincide with the […]

OpenCL

Aug, 21

EpiGPU

MOTIVATION: Hundreds of genome-wide association studies have been performed over the last decade, but as single nucleotide polymorphism (SNP) chip density has increased so has the computational burden to search for epistasis [for n SNPs the computational time resource is O(n(n-1)/2)]. While the theoretical contribution of epistasis toward phenotypes of medical and economic importance is […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Top ten ways to make formal methods for HPC practical

The VRE volume rendering engine

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Reusable software components for accelerator-based clusters

Improving programmability of heterogeneous many-core systems via explicit platform descriptions

Accelerating Haskell array codes with multicore GPUs

A programming model for GPU-based parallel computing with scalability and abstraction

Compiling Python to a hybrid execution environment

A declarative API for particle systems

Software architecture and system validation of an open, unified model for accelerated multicore computing

Mind the gap!: bridging the dichotomy of design and implementation

EpiGPU

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)