high performance computing on graphics processing units: hgpu.org

Posts

Feb, 12

Spatial splits in bounding volume hierarchies

Bounding volume hierarchies (BVH) have become a widely used alternative to kD-trees as the acceleration structure of choice in modern ray tracing systems. However, BVHs adapt poorly to non-uniformly tessellated scenes, which leads to increased ray shooting costs. This paper presents a novel and practical BVH construction algorithm, which addresses the issue by utilizing spatial […]

CUDA

Feb, 12

Understanding the efficiency of ray traversal on GPUs

We discuss the mapping of elementary ray tracing operations—acceleration structure traversal and primitive intersection—onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody […]

CUDA

Feb, 12

A meshless hierarchical representation for light transport

We introduce a meshless hierarchical representation for solving light transport problems. Precomputed radiance transfer (PRT) and finite elements require a discrete representation of illumination over the scene. Non-hierarchical approaches such as per-vertex values are simple to implement, but lead to long precomputation. Hierarchical bases like wavelets lead to dramatic acceleration, but in their basic form […]

Feb, 12

Loop Transformation Recipes for Code Generation and Auto-Tuning

In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tuning framework that explores a set of different implementations of the same computation […]

CUDA

Feb, 12

Automated Dynamic Analysis of CUDA Programs

Recent increases in the programmability and performance of GPUs have led to a surge of interest in utilizing them for general-purpose computations. Tools such as NVIDIA’s Cuda allow programmers to use a C-like language to code algorithms for execution on the GPU. Unfortunately, parallel programs are prone to subtle correctness and performance bugs, and Cuda […]

CUDA

Feb, 12

GPU-powered tools boost molecular visualization

Recent advances in experimental structure determination provide a wealth of structural data on huge macromolecular assemblies such as the ribosome or viral capsids, available in public databases. Further structural models arise from reconstructions using symmetry orders or fitting crystal structures into low-resolution maps obtained by electron-microscopy or small angle X-ray scattering experiments. Visual inspection of […]

Feb, 11

Compiling an Array Language to a Graphics Processor

Graphics processors are significantly faster than traditional processors, particularly for numerical code, and in recent years have become flexible enough to permit general-purpose use, rather than just graphics use. NVIDIA’s CUDA makes general-purpose graphics processor computing feasible, but it still requires significant programmer effort. My thesis is that array programming can be an effective way […]

CUDA

Feb, 11

The fast multipole method on parallel clusters, multicore processors, and graphics processing units

In this article, we discuss how the fast multipole method (FMM) can be implemented on modern parallel computers, ranging from computer clusters to multicore processors and graphics cards (GPU). The FMM is a somewhat difficult application for parallel computing because of its tree structure and the fact that it requires many complex operations which are […]

CUDA

Feb, 11

Comparing CUDA and OpenGL implementations for a Jacobi iteration

The use of the GPU as a general purpose processor is becoming more popular and there are different approaches for this kind of programming. In this paper we present a comparison between different implementations of the OpenGL and CUDA approaches for solving our test case, a weighted Jacobi iteration with a structured matrix originating from […]

CUDA

•

OpenGL

Feb, 11

Comparison of several parallel API for cloth modelling on modern GPUs

The paper compares three APIs for the implementation of cloth modelling on modern graphics processor units (GPU): OpenGL plus GLSL, NVIDIA CUDA and OpenCL. They are compared by programming features, platform and device portability, and performance for the purpose of dynamic cloth simulation. Results about performance are given and conclusions are drawn about use cases.

CUDA

•

OpenCL

•

OpenGL

Feb, 11

GPU-based fast pencil beam algorithm for proton therapy

Performance of a treatment planning system is an essential factor in making sophisticated plans. The dose calculation is a major time-consuming process in planning operations. The standard algorithm for proton dose calculations is the pencil beam algorithm which produces relatively accurate results, but is time consuming. In order to shorten the computational time, we have […]

Feb, 11

Energy-efficient algorithms

Algorithmic solutions can help reduce energy consumption in computing environs. Energy conservation is a major concern today. Federal programs provide incentives to save energy and promote the use of renewable energy resources. Individuals, companies, and organizations seek energyefficient products as the energy cost to run equipment has grown to be a major factor.

high performance computing on graphics processing units: hgpu.org

Posts

Spatial splits in bounding volume hierarchies

Understanding the efficiency of ray traversal on GPUs

A meshless hierarchical representation for light transport

Loop Transformation Recipes for Code Generation and Auto-Tuning

Automated Dynamic Analysis of CUDA Programs

GPU-powered tools boost molecular visualization

Compiling an Array Language to a Graphics Processor

The fast multipole method on parallel clusters, multicore processors, and graphics processing units

Comparing CUDA and OpenGL implementations for a Jacobi iteration

Comparison of several parallel API for cloth modelling on modern GPUs

GPU-based fast pencil beam algorithm for proton therapy

Energy-efficient algorithms

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)