high performance computing on graphics processing units: hgpu.org

Posts

May, 8

GPU acceleration of the iterative physical optics (IPO) method

In this paper, we employ the programmable graphics processing unit (GPU) to accelerate the IPO computation for analyzing the scattering of open cavities. Since the iterative strategy accounts for multiple reflections on the inner wall, the IPO method provides a more accurate solution than the other high frequency asymptotic methods. However, it suffers from a […]

May, 8

GPU Acceleration of 2D-DWT Image Compression in MATLAB with CUDA

This article presents the details about the acceleration of 2D wavelet-based medical data (image) compression on MATLAB with CUDA. It is obvious that the diagnostic materials (mostly as acertain type of image) are increasingly acquired in a digital format. Therefore, common need to daily manipulate huge amount of data brought about the issue of compression […]

CUDA

May, 8

Experiments with Single Core, Multi-core, and GPU Based Computation of Cellular Automata

Cellular automata are a well-known modeling formalism exploited in a wide range of application areas. In many of those, the complexity of models hampers a thorough analysis of the system under study. Therefore, efficient simulation algorithms are required. We present here a comparison of seven different simulation algorithms for cellular automata: the classical ldquofullrdquo simulator, […]

May, 8

A GPU-based architecture for real-time data assessment at synchrotron experiments

Current imaging experiments at synchrotron beam lines often lack a real-time data assessment. X-ray imaging cameras installed at synchrotron facilities like ANKA provide millions of pixels, each with a resolution of 12 bits or more, and take up to several thousand frames per second. A given experiment can produce data sets of multiple gigabytes in […]

May, 7

Parallel ID Shadow-Map Decompression on GPU

ID shadow-maps are used for robust real-time rendering of shadows. The primary disadvantage of using shadow-maps is their excessive size for large scenes in case high quality shadows are needed. To eliminate large memory requirements and texture-size limitations of the current generation GPUs, texture compression is an important tool. We present a framework where compressed […]

OpenGL

May, 7

Cluster versus GPU implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images

Remotely sensed hyperspectral imaging instruments provide high-dimensional data containing rich information in both the spatial and the spectral domain. In many surveillance applications, detecting objects (targets) is a very important task. In particular, algorithms for detecting (moving or static) targets, or targets that could expand their size (such as propagating fires) often require timely responses […]

CUDA

May, 7

Visual cortex on the GPU: Biologically inspired classifier and feature descriptor for rapid recognition

We present a biologically motivated classifier and feature descriptors that are designed for execution on single instruction multi data hardware and are applied to high speed multiclass object recognition. Our feature extractor uses a cellular tuning approach to select the optimal Gabor filters to process a given input, followed by the computation of scale and […]

OpenGL

May, 7

GPU Based Parallel Computing on Blast Program

Sequence alignment is one of the most fundamental and important operation in Bioinformatics. Among lots of Sequence alignment tools, Blast is one of the most popular algorithms. In this paper, we describe the primary strategy of a GPU-based parallel computing on Blast program.

May, 7

An Improved Parallel Implementation of 3D DRIE Simulation on GPU

Deep reactive ion etching (DRIE) technique is a new and powerful tool in Micro-Electro-Mechanical Systems (MEMS) fabrication. A 3D DRIE simulation can help researcher understand the time-evolution of Bosch process used in DRIE. Due to the high complexity of the algorithm used in the simulation, it is necessary to develop an algorithm that can accelerate […]

CUDA

May, 7

Survey of GPU water simulation in game engine

We, in this paper, give a general survey of water simulation adaptable to real-time application like 3D game engine. After that, we implement GPU accompany with CPU to tackle several key point in water simulation ranging from heightmap and normalmap to water geometry modelling. In last part of this paper, we display our experiment to […]

May, 7

Computation of Troposphere Slant Delays on a GPU

The computation of ray-traced troposphere delays which can be utilized for space geodetic applications is a time-consuming effort when a large number of rays has to be calculated. On the other hand, computation time can be tremendously reduced when algorithms are capable of supporting parallel processing architectures. Thus, by the use of an off-the-shelf graphics […]

May, 7

A Parallel Immune Algorithm Based on Fine-Grained Model with GPU-Acceleration

Fine-grained parallel immune algorithm (FGIA), though a popular and robust strategy for solving complicated optimization problems, is sometimes inconvenient to use as its population size is restricted by heavy data communication and the parallel computers are relatively difficult to use, manage, maintain and may not be accessible to most researchers. In this paper, we propose […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

GPU acceleration of the iterative physical optics (IPO) method

GPU Acceleration of 2D-DWT Image Compression in MATLAB with CUDA

Experiments with Single Core, Multi-core, and GPU Based Computation of Cellular Automata

A GPU-based architecture for real-time data assessment at synchrotron experiments

Parallel ID Shadow-Map Decompression on GPU

Cluster versus GPU implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images

Visual cortex on the GPU: Biologically inspired classifier and feature descriptor for rapid recognition

GPU Based Parallel Computing on Blast Program

An Improved Parallel Implementation of 3D DRIE Simulation on GPU

Survey of GPU water simulation in game engine

Computation of Troposphere Slant Delays on a GPU

A Parallel Immune Algorithm Based on Fine-Grained Model with GPU-Acceleration

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)