high performance computing on graphics processing units: hgpu.org

Posts

Oct, 13

Input Sensitivity of GPU Program Optimizations

Graphic Processing Units (GPU) have become increasingly adopted for the enhancement of computing throughput. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Many recent efforts have been employing empirical search-based auto-tuners to tackle the problem, but few […]

CUDA

•

OpenCL

Oct, 13

gpustats: GPU Library for Statistical Computing in Python

In this talk we will discuss gpustats, a new Python library for assisting in "big data" statistical computing applications, particularly Monte Carlobased inference algorithms. The library provides a general code generation / metaprogramming framework for easily implementing discrete and continuous probability density functions and random variable samplers. These functions can be utilized to achieve more […]

CUDA

•

OpenCL

Oct, 13

Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio

We discuss implementation aspects of a software-defined radio system that allows for dynamic waveform reconfiguration during runtime without interrupting dataflow processing. Traditional software-defined radio systems execute a waveform statically, exactly as it is programmed. Reconfiguration is provided by executing a different waveform, which requires the system to stop processing data while reconfiguration occurs, and also […]

OpenCL

Oct, 13

Developing a High Performance GPGPU Compiler Using Cetus

In this paper we present our experience in developing an optimizing compiler for general purpose computation on graphics processing units (GPGPU) based on the Cetus compiler framework. The input to our compiler is a naive GPU kernel procedure, which is functionally correct but without any consideration for performance optimization. Our compiler applies a set of […]

CUDA

Oct, 13

PlinkGPU: A Framework for GPU Acceleration of Whole Genome Data Analysis

Genome-wide association studies (GWAS) are performed in order to detect the genetic variations associated with physical traits (e.g. diseases), and Plink is a popular software system for analyzing the data of GWAS. Due to the large datasets involved, the task of data processing can be very time-consuming. Although GPUs (graphics processing units) are not generally […]

CUDA

•

OpenCL

Oct, 13

State of The Art Report on GPU

This report aims to provide a beginner’s introduction to GPUs, from both a hardware and a software angle. We look at the evolution of specialist graphics hardware from the early days of PC graphics cards to the present day. We describe the currently available hardware from NVIDIA and AMD/ATI, and the current software from both […]

CUDA

•

OpenGL

Oct, 12

Optimizing a High Energy Physics (HEP) Toolkit on Heterogeneous Architectures

A desired trend within high energy physics is to increase particle accelerator luminosities, leading to production of more collision data and higher probabilities of finding interesting physics results. A central data analysis technique used to determine whether results are interesting or not is the maximum likelihood method, and the corresponding evaluation of the negative log-likelihood, […]

OpenCL

Oct, 12

GPU-Based Translation-Invariant 2D Discrete Wavelet Transform for Image Processing

The Discrete Wavelet Transform (DWT) is applied to various signal and image processing applications. However the computation is computational expense. Therefore plenty of approaches have been proposed to accelerate the computation. Graphics processing units (GPUs) can be used as stream processor to speed up the calculation of the DWT. In this paper, we present a […]

OpenGL

Oct, 12

High Performance Computing with Accelerators

High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. HPC has come to be applied to business uses of cluster-based supercomputers, such as data warehouses, line-of-business (LOB) applications, and transaction processing. In the past few years, a new class of HPC systems has emerged. These systems employ unconventional processor architectures-such as […]

Oct, 12

Ray Tracing on Graphics Hardware

Ray tracing is one of the important elements in photo-realistic image synthesis. Since ray tracing is computationally expensive, a large body of research has been devoted to improve the performance of ray tracing. One of the recent developments on efficient ray tracing is the implementation on graphics hardware. Similar to general purpose CPUs, recent graphics […]

CUDA

Oct, 12

Implementing modular arithmetic using OpenCL

Problem description: Most public key algorithms are based on modular arithmetic. The simplest, and original, implementation of the protocol uses the multiplicative group of integers modulo p, where p is prime and g is primitive root mod p. This is the way Diffie-Hellman is implemented. RSA is implemented in a similar way c=me mod p*q. […]

OpenCL

Oct, 12

General purpose computing on graphics processing units using OpenCL

General-Purpose computing using Graphics Processing Units (GPGPU) has been an area of active research for many years. During 2009 and 2010 much has happened in the GPGPU research field with the release of the Open Computing Language (OpenCL) programming framework and the new NVIDIA Fermi Graphics Processing Unit (GPU) architecture. This thesis explores the hardware […]

CUDA

•

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Input Sensitivity of GPU Program Optimizations

gpustats: GPU Library for Statistical Computing in Python

Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio

Developing a High Performance GPGPU Compiler Using Cetus

PlinkGPU: A Framework for GPU Acceleration of Whole Genome Data Analysis

State of The Art Report on GPU

Optimizing a High Energy Physics (HEP) Toolkit on Heterogeneous Architectures

GPU-Based Translation-Invariant 2D Discrete Wavelet Transform for Image Processing

High Performance Computing with Accelerators

Ray Tracing on Graphics Hardware

Implementing modular arithmetic using OpenCL

General purpose computing on graphics processing units using OpenCL

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)