high performance computing on graphics processing units: hgpu.org

Posts

Feb, 11

Comparing CUDA and OpenGL implementations for a Jacobi iteration

The use of the GPU as a general purpose processor is becoming more popular and there are different approaches for this kind of programming. In this paper we present a comparison between different implementations of the OpenGL and CUDA approaches for solving our test case, a weighted Jacobi iteration with a structured matrix originating from […]

CUDA

•

OpenGL

Feb, 11

Comparison of several parallel API for cloth modelling on modern GPUs

The paper compares three APIs for the implementation of cloth modelling on modern graphics processor units (GPU): OpenGL plus GLSL, NVIDIA CUDA and OpenCL. They are compared by programming features, platform and device portability, and performance for the purpose of dynamic cloth simulation. Results about performance are given and conclusions are drawn about use cases.

CUDA

•

OpenCL

•

OpenGL

Feb, 11

GPU-based fast pencil beam algorithm for proton therapy

Performance of a treatment planning system is an essential factor in making sophisticated plans. The dose calculation is a major time-consuming process in planning operations. The standard algorithm for proton dose calculations is the pencil beam algorithm which produces relatively accurate results, but is time consuming. In order to shorten the computational time, we have […]

Feb, 11

Energy-efficient algorithms

Algorithmic solutions can help reduce energy consumption in computing environs. Energy conservation is a major concern today. Federal programs provide incentives to save energy and promote the use of renewable energy resources. Individuals, companies, and organizations seek energyefficient products as the energy cost to run equipment has grown to be a major factor.

Feb, 11

GPU-accelerated indirect boundary element method for voxel model analyses with fast multipole method

An indirect boundary element method (BEM) that uses the fast multipole method (FMM) was accelerated using graphics processing units (GPUs) to reduce the time required to calculate a three-dimensional electrostatic field. The BEM is designed to handle cubic voxel models and is specialized to consider square voxel walls as boundary surface elements. The FMM handles […]

CUDA

Feb, 11

EigenCFA: accelerating flow analysis with GPUs

We describe, implement and benchmark EigenCFA, an algorithm for accelerating higher-order control-flow analysis (specifically, 0CFA) with a GPU. Ultimately, our program transformations, reductions and optimizations achieve a factor of 72 speedup over an optimized CPU implementation. We began our investigation with the view that GPUs accelerate high-arithmetic, data-parallel computations with a poor tolerance for branching. […]

Feb, 11

Fast computing of scattering maps of nanostructures using graphical processing units

Scattering maps from strained or disordered nano-structures around a Bragg reflection can either be computed quickly using approximations and a (Fast) Fourier transform, or using individual atomic positions. In this article we show that it is possible to compute up to 4.10^10 $reflections.atoms/s using a single graphic card, and we evaluate how this speed depends […]

CUDA

Feb, 11

Accelerating the solution of families of shifted linear systems with CUDA

We describe the GPU implementation of shifted or multimass iterative solvers for sparse linear systems of the sort encountered in lattice gauge theory. We provide a generic tool that can be used by those without GPU programming experience to accelerate the simulation of a wide array of theories. We stress genericity, which is important to […]

CUDA

Feb, 10

Interpretive OpenGL for computer graphics

OpenGL is the industry-leading, cross-platform graphics application programming interface (API), and the only major API with support for virtually all operating systems. Many languages, such as Fortran, Java, Tcl/Tk, and Python, have OpenGL bindings to take advantage of OpenGL visualization power. In this article, we present Ch OpenGL Toolkit, a truly platform-independent Ch binding to […]

OpenGL

Feb, 10

Interactive Computer Graphics: A Top-Down Approach Using OpenGL (5th Edition)

Interactive Computer Graphics fourth edition presents introductory computer graphics concepts using a proven top-down, programming-oriented approach and careful integration of OpenGL to teach core concepts. The fourth edition has been revised to more closely follow the OpenGL pipeline architecture and includes a new chapter on programmable hardware topics (vertex shaders). As with previous editions, readers […]

OpenGL

Feb, 10

OpenGL(R) Programming Guide: The Official Guide to Learning OpenGL(R), Version 2 (5th Edition)

The “OpenGL Programming Guide”, now in its third edition, is the definitive volume for programmers using this evolving graphics interface standard. Written by members of the OpenGL Architecture Review Board, this book offers understandable tutorials and lessons on getting up to speed and getting the most out of the latest version of OpenGL, version 1.2. […]

OpenGL

Feb, 10

OpenGL(R) Shading Language (2nd Edition)

OpenGL(R) Shading Language, Second Edition, extensively updated for OpenGL 2.0, is the experienced application programmer’s guide to writing shaders. Part reference, part tutorial, this book thoroughly explains the shift from fixed-functionality graphics hardware to the new era of programmable graphics hardware and the additions to the OpenGL API that support this programmability. With OpenGL and […]

OpenGL

high performance computing on graphics processing units: hgpu.org

Posts

Comparing CUDA and OpenGL implementations for a Jacobi iteration

Comparison of several parallel API for cloth modelling on modern GPUs

GPU-based fast pencil beam algorithm for proton therapy

Energy-efficient algorithms

GPU-accelerated indirect boundary element method for voxel model analyses with fast multipole method

EigenCFA: accelerating flow analysis with GPUs

Fast computing of scattering maps of nanostructures using graphical processing units

Accelerating the solution of families of shifted linear systems with CUDA

Interpretive OpenGL for computer graphics

Interactive Computer Graphics: A Top-Down Approach Using OpenGL (5th Edition)

OpenGL(R) Programming Guide: The Official Guide to Learning OpenGL(R), Version 2 (5th Edition)

OpenGL(R) Shading Language (2nd Edition)

Recent source codes

DITRON: Distributed Compiler based on Triton for Parallel Systems

IntelliKit: Agent-first tooling for AMD hardware

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

Agentic Code Optimization via Compiler-LLM Cooperation

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Most viewed papers (last 30 days)