high performance computing on graphics processing units: hgpu.org

Posts

May, 7

Stereo Matching Algorithm Using Population-Based Incremental Learning on GPU

To solve the general problems of genetic algorithms applied in stereo matching, two measures are proposed. Firstly, the strategy of the simplified population-based incremental learning (PBIL) is adopted to decrease the problems in memory consumption and searching inefficiency, as well as a scheme controlling the distance of neighbors for disparity smoothness is inserted to obtain […]

May, 6

A GPU-based computing framework for CSCW

Graphics processing units (GPUs) have evolved from fixed graphics pipeline processors into more flexible and powerful data-parallel processors. Their ever-increasing computing power makes them an attractive platform for high performance computing at a low cost. Up to the present, most efforts that exploit GPUs are graphical and scientific applications. Nevertheless, little attention has been paid […]

May, 6

Accelerating InSAR raw data simulation on GPU using CUDA

This paper describes a scalable parallel method for interferometric synthetic aperture radar (InSAR) raw data simulation on graphic processing unit (GPU) with common unified device architecture (CUDA). The advantages of the new method rely on the three contributions: GPU hardware provides lots of stream processors for threads calculating, CUDA software environment runs thousands of threads […]

CUDA

May, 6

GPU-Accelerated Evaluation Platform for High Fidelity Network Modeling

High-fidelity simulations of mixed wired and wireless network systems are dependent on detailed simulation models, especially in the lower layers of the network stack. However, detailed modeling can result in prohibitive computation cost. In recent years, commercial graphics cards (GPUs) have drawn attention from the general computing community due to the superior computation capability. In […]

OpenGL

May, 6

High speed 3-D registration using GPU

This paper describes high speed 3D object recognition based on DAI (depth aspect image) matching and M-ICP (modified iterative closest point). We regards GPU(graphic processing units) as coprocessor which are capable of computation for general purpose. We proposed 3D object recognition method which consists of two step pose estimation and positioning, i.e. the DAI matching […]

May, 6

Real-time 3D registration of stereo-vision based range images using GPU

3D registration is a computer vision technique of aligning multi-view range images with respect to a reference co-ordinate system. Aligning range images is an important and time-complex step in complete 3D reconstruction. In this paper, we propose a real-time 3D registration technique by employing the accelerated computing power of GPU (graphic processing unit). In the […]

CUDA

May, 6

Performance Evaluation of Optimized Implementations of Finite Difference Method for Wave Propagation Problems on GPU Architecture

The scattering of acoustic waves in non-homogeneous media has been of practical interest for the petroleum industry, mainly in the determination of new oil deposits. A family of computational models that represent this phenomenon is based on finite difference methods. The simulation of these phenomena demands a high computational cost. In this work we employ […]

May, 6

Efficient nearest-neighbor computation for GPU-based motion planning

We present a novel k-nearest neighbor search algorithm (KNNS) for proximity computation in motion planning algorithm that exploits the computational capabilities of many-core GPUs. Our approach uses locality sensitive hashing and cuckoo hashing to construct an efficient KNNS algorithm that has linear space and time complexity and exploits the multiple cores and data parallelism effectively. […]

CUDA

May, 6

Robust Adaptive 3-D Segmentation of Vessel Laminae From Fluorescence Confocal Microscope Images and Parallel GPU Implementation

This paper presents robust 3-D algorithms to segment vasculature that is imaged by labeling laminae, rather than the lumenal volume. The signal is weak, sparse, noisy, nonuniform, low-contrast, and exhibits gaps and spectral artifacts, so adaptive thresholding and Hessian filtering based methods are not effective. The structure deviates from a tubular geometry, so tracing algorithms […]

CUDA

May, 6

Mechanical Characterization and Performance Optimization for GPU Fan-Sink Cooling Module Assembly

Three GPU fan-sink cooling module assembly mounting mechanisms are mechanically characterized to determine the relationships between the clamping forces and screw torques. The first-order screw torque solutions are determined from the statistical regressions according to current industry recommendations. The screw tension force theoretical solution is derived for application to the finite-element model to assess the […]

May, 6

Large-scale multi-dimensional document clustering on GPU clusters

Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial […]

CUDA

May, 5

Towards accelerating molecular modeling via multi-scale approximation on a GPU

Research efforts to analyze biomolecular properties contribute towards our understanding of biomolecular function. Calculating non-bonded forces (or in our case, electrostatic surface potential) is often a large portion of the computational complexity in analyzing biomolecular properties. Therefore, reducing the computational complexity of these force calculations, either by improving the computational algorithm or by improving the […]

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Stereo Matching Algorithm Using Population-Based Incremental Learning on GPU

A GPU-based computing framework for CSCW

Accelerating InSAR raw data simulation on GPU using CUDA

GPU-Accelerated Evaluation Platform for High Fidelity Network Modeling

High speed 3-D registration using GPU

Real-time 3D registration of stereo-vision based range images using GPU

Performance Evaluation of Optimized Implementations of Finite Difference Method for Wave Propagation Problems on GPU Architecture

Efficient nearest-neighbor computation for GPU-based motion planning

Robust Adaptive 3-D Segmentation of Vessel Laminae From Fluorescence Confocal Microscope Images and Parallel GPU Implementation

Mechanical Characterization and Performance Optimization for GPU Fan-Sink Cooling Module Assembly

Large-scale multi-dimensional document clustering on GPU clusters

Towards accelerating molecular modeling via multi-scale approximation on a GPU

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)