high performance computing on graphics processing units: hgpu.org

Posts

Dec, 24

Simbuca, using a graphics card to simulate Coulomb interactions in a penning trap

In almost all cases, N-body simulations are limited by the computation time available. Coulomb interaction calculations scale with O(N^2) with N the number of particles. Approximation methods exist already to reduce the computation time to O(NlogN), although calculating the interaction still dominates the total simulation time. We present Simbuca, a simulation package for thousands of […]

CUDA

Dec, 23

GPU-based parallel computing for the simulation of complex multibody systems with unilateral and bilateral constraints: an overview

This work reports on advances in large-scale multibody dynamics simulation facilitated by the use of the Graphics Processing Unit (GPU). A description of the GPU execution model along with its memory spaces is provided to illustrate its potential parallel scientific computing. The equations of motion associated with the dynamics of large system of rigid bodies […]

CUDA

Dec, 23

SIMD Floating Point Extension for Ray Tracing

In the last decade, the importance of graphics capabilities have become very important in the mobile market. As a result low power embedded solutions for mobile devices have been eveloped to run computationally intensive graphics applications, which extensively uses floating point calculations. The work proposed in this thesis target the extension of the Silicon Hive […]

CUDA

Dec, 23

Reducing the Size of Nurbs Controls Nets Using Genetic Algorithms and CUDA

The typical goals for defining a control net for a NonUniform Rations B-spline (NURBs) based metamodel from a given set of data the desired result is the smallest set of control points in the least possible time while minimizing local and/or global error. Current metamodel fitting algorithms iteratively find and eliminate the largest sources of […]

CUDA

Dec, 23

Numerical solution of PDEs with hybrid and heterogeneous computing models

This study is a first part of a longer project investigating hybrid and heterogeneous computing models in computational science. This is joint work with Fujitsu Laboratories Europe (FLE) with the purpose of developing numerical software libraries under the Open Petascale Libraries Project. We overview some current work and trends relating to petascale algorithms and their […]

CUDA

Dec, 23

Survey on Benchmarks for a GPU Based Multi Camera Stereo Matching Algorithm

Stereo matching algorithms and multi camera reconstruction algorithms are usually compared using benchmarks. These benchmarks compare the quality of the resulting depth map or reconstructed surface mesh. We describe the differences between several known stereo and multi-view stereo benchmarks and their various datasets. Also the modifications that are necessary to use our own GPU based […]

OpenGL

Dec, 23

GPU Pathfinding Optimization

In recent years, graphics processing units (GPUs) have shown a significant advance of computational resources available for the use of non-graphical applications. The ability to solve problems involving parallel computing as well as the development of new architectures that supports this new paradigm, such as CUDA, has encouraged the use of GPU for general purpose […]

CUDA

Dec, 23

Using Image Morphing for Memory-Efficient Impostor Rendering on GPU

Real-time rendering of large animated crowds consisting thousands of virtual humans is important for several applications including simulations, games and interactive walkthroughs, but cannot be performed using complex polygonal models at interactive frame rates. For that reason, several methods using large numbers of pre-computed image-based representations, which are called as impostors, have been proposed. These […]

OpenGL

Dec, 23

Online rapid prototyping of 3D objects using GPU-based 3D cloud computing: Application to 3D face modelling

An on-line web application that interacts with an Internet user’s 3D webcam (e.g. Minoru stereo webcamera) is described. The application instantly captures and processes stereo images to retrieve 3D object coordinates for further 3D modelling tasks. It offers on-demand semi-automatic camera calibration and automatic image pair rectification for further stereo matching operations. The reconstructed 3D […]

CUDA

Dec, 23

Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU

In this paper we present a new pseudorandom number generator (PRNG) on graphics processing units (GPU). This PRNG is based on the so-called chaotic iterations. It is firstly proven to be chaotic according to the Devaney’s formulation. We thus propose an efficient implementation for GPU that successfully passes the BigCrush tests, deemed to be the […]

CUDA

Dec, 23

LatticeQCD using OpenCL

We report on our implementation of LatticeQCD applications using OpenCL. We focus on the general concept and on distributing different parts on hybrid systems, consisting of both CPUs (Central Processing Units) and GPUs (Graphic Processing Units).

OpenCL

Dec, 22

Improving the usability of hierarchical representations for interactively labeling large image data sets

Image recognition systems require large image data sets for the training process. The annotation of such data sets through users requires a lot of time and effort, and thereby presents the bottleneck in the development of recognition systems. In order to simplify the creation of image recognition systems it is necessary to develop interaction concepts […]

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Simbuca, using a graphics card to simulate Coulomb interactions in a penning trap

GPU-based parallel computing for the simulation of complex multibody systems with unilateral and bilateral constraints: an overview

SIMD Floating Point Extension for Ray Tracing

Reducing the Size of Nurbs Controls Nets Using Genetic Algorithms and CUDA

Numerical solution of PDEs with hybrid and heterogeneous computing models

Survey on Benchmarks for a GPU Based Multi Camera Stereo Matching Algorithm

GPU Pathfinding Optimization

Using Image Morphing for Memory-Efficient Impostor Rendering on GPU

Online rapid prototyping of 3D objects using GPU-based 3D cloud computing: Application to 3D face modelling

Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU

LatticeQCD using OpenCL

Improving the usability of hierarchical representations for interactively labeling large image data sets

Recent source codes

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Most viewed papers (last 30 days)