high performance computing on graphics processing units: hgpu.org

Posts

Dec, 15

GPU Algorithms for the Estimation of Environmental Models Based on Large Datasets

Statistical environmental models are computationally intensive due to the high dimension of the data, both in space and time, and due to the inferential techniques required for parameter estimation and spatial prediction. In particular, the complexity of these procedures is related to matrix operations (inversion, solution of linear systems, factorization) involving large matrices. Recently, much […]

CUDA

Dec, 15

GPU Collision Detection in Conformal Geometric Space

We derive a conformal algebra treatment unifying all types of collisions among points, vectors, areas (defined by bivectors and trivectors) and 3D solid objects (defined by trivectors and quadvectors), based in a reformulation of collision queries from R^3 to conformal R^4,1 space. The algebraic formulation in this 5D space is then implemented in GPU to […]

CUDA

Dec, 15

Performance in GPU Architectures: Potentials and Distances

GPUs can execute up to one TFLOPs at their peak performance. This peak performance, however, is rarely reached as a result of resource underutilization. Three parameters contribute to this inefficiency: branch divergence, memory access delays and limited workload parallelism. To this end we suggest machine models to estimate performance gain potentials obtainable by eliminating each […]

CUDA

Dec, 15

Minimising Testing in Genetic Programming

The cost of optimisation can be reduced by evaluating candidate designs on only a fraction of all possible use cases. We show how genetic programming (GP) can avoid overfitting and evolve general solutions from fitness test suites as small as just one dynamic training case. Search effort can be greatly reduced.

CUDA

Dec, 15

Free surface flow simulations on GPGPUs using the LBM

In this paper, we present the implementation of a volume-of-fluid-(VOF)-based algorithm for the simulation of free-surface flow problems on general purpose graphical processing units (GPGPUs). For the solution of the flow field and the additional advection equation for the VOF fill level, the lattice Boltzmann method on the basis of an MRT collision operator is […]

CUDA

Dec, 15

Speed sign detection and recognition by convolutional neural networks

From the desire to update the maximum road speed data for navigation devices, a speed sign recognition and detection system is proposed. This system should prevent accidental speeding at roads where the map data is incorrect for example due to construction work. Multiple examples of road sign classification systems already exist but none uses a […]

CUDA

Dec, 15

On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods

Multiple results in the literature exist that indicate that all computed solutions to chaotic dynamical systems are time-step dependent. That is, solutions with small but different time steps will decouple from each other after a certain (small) finite amount of simulation time. When using double precision floating point arithmetic time step independent solutions have been […]

CUDA

Dec, 14

Graph Generation on GPUs using Dynamic Memory Allocation

Complex networks are often studied using statistical measurements over many independently generated samples. Irregular data structures such as graphs that involve dynamical memory management and "pointer chasing" are an important class of application and have attracted recent interest in the form of the Graph500 benchmark formulation. The generation of simulated sample network graphs and measurement […]

CUDA

Dec, 14

A Novel Multi-GPU Neural Simulator

Between the biophysical and behavioral studies of the brain lies computational neuroscience. The goal of which, among other things, is to help bridge the gap in our knowledge and provide alternative or complimentary theories to other neurological studies. As more information is provided and more complex theories are developed, the size and computational cost of […]

CUDA

Dec, 14

Fast thermal simulation of 2D/3D integrated circuits exploiting neural networks and GPUs

Heat removal is one of the major challenges faced in developing the new generation of high density integrated circuits. Future design technologies strongly depend on the availability of efficient means for thermal modeling and analysis. These thermal models must be also accurate and provide the most efficient level of abstraction enabling fast execution. We propose […]

CUDA

Dec, 14

Power consumption of mixed precision in the iterative solution of sparse linear systems

This paper presents a detailed analysis of a mixed precision iterative refinement solver applied to a linear system obtained from the 2D discretization of a fluid flow problem. The total execution time and energy need of different soft- and hardware implementations are measured and compared with those of a plain GMRES-based solver in double precision. […]

CUDA

Dec, 14

Rethinking Runtime Verification on Hundreds of Cores: Challenges and Opportunities

We propose a novel approach for runtime monitoring and verification on computers with a large number of computation cores. The goal of the approach is to minimize the impact of runtime verification on the performance of the application being monitored. We distinguish between two kinds of computational overhead: (i) overhead caused by instrumentation and/or logging, […]

CUDA

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU Algorithms for the Estimation of Environmental Models Based on Large Datasets

GPU Collision Detection in Conformal Geometric Space

Performance in GPU Architectures: Potentials and Distances

Minimising Testing in Genetic Programming

Free surface flow simulations on GPGPUs using the LBM

Speed sign detection and recognition by convolutional neural networks

On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods

Graph Generation on GPUs using Dynamic Memory Allocation

A Novel Multi-GPU Neural Simulator

Fast thermal simulation of 2D/3D integrated circuits exploiting neural networks and GPUs

Power consumption of mixed precision in the iterative solution of sparse linear systems

Rethinking Runtime Verification on Hundreds of Cores: Challenges and Opportunities

Recent source codes

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Most viewed papers (last 30 days)