high performance computing on graphics processing units: hgpu.org

Posts

Sep, 13

AnySL: efficient and portable shading for ray tracing

While a number of different shading languages have been developed, their efficient integration into an existing renderer is notoriously difficult, often boiling down to implementing an entire compiler toolchain for each language. Furthermore, no shading language is broadly supported across the variety of rendering systems. AnySL attacks this issue from multiple directions: We compile shaders […]

Sep, 13

A directionally adaptive edge anti-aliasing filter

The latest generation of graphics hardware provides direct access to multisample anti-aliasing (MSAA) rendering data. By taking advantage of these existing pixel subsample values, an intelligent reconstruction filter can be computed using programmable GPU shader units. This paper describes an adaptive anti-aliasing (AA) filter for real-time rendering on the GPU. Improved quality is achieved by […]

Sep, 13

Real-Time Volumetric Shadows using 1D Min-Max Mipmaps

Light scattering in a participating medium is responsible for several important effects we see in the natural world. In the presence of occluders, computing single scattering requires integrating the illumination scattered towards the eye along the camera ray, modulated by the visibility towards the light at each point. Unfortunately, incorporating volumetric shadows into this integral, […]

Sep, 13

Exploring the use of glossy light volumes for interactive global illumination

From the literature, it is known that backward polygon beam tracing and other light volume methods are well suited to gather path coherency from specular scattering surfaces. This is of course useful for modelling and efficiently simulating caustics (LS+DE paths). This paper generalises backward polygon beam tracing to also model glossy scattering surfaces. To this […]

OpenGL

Sep, 13

Computational stereo camera system with programmable control loop

Stereoscopic 3D has gained significant importance in the entertainment industry. However, production of high quality stereoscopic content is still a challenging art that requires mastering the complex interplay of human perception, 3D display properties, and artistic intent. In this paper, we present a computational stereo camera system that closes the control loop from capture and […]

CUDA

Sep, 13

Data-intensive document clustering on graphics processing unit (GPU) clusters

Document clustering is a central method to mine massive amounts of data. Due to the explosion of raw documents generated on the Internet and the necessity to analyze them efficiently in various intelligent information systems, clustering techniques have reached their limitations on single processors. Instead of single processors, general-purpose multi-core chips are increasingly deployed in […]

CUDA

Sep, 13

Lossless data compression on GPGPU architectures

Modern graphics processors provide exceptional computa- tional power, but only for certain computational models. While they have revolutionized computation in many fields, compression has been largely unnaffected. This paper aims to explain the current issues and possibili- ties in GPGPU compression. This is done by a high level overview of the GPGPU computational model in […]

Sep, 13

Parallel volume rendering implementation on graphics cards using CUDA

The ever-increasing amounts of volume data require high-end parallel visualization methods to process this data interactively. To meet the demands, progamming on graphics cards offers an effective and fast approach to compute volume rendering methods due to the parallel architecture of today’s graphics cards. In this paper, we introduce a volume ray casting method working […]

CUDA

•

OpenGL

Sep, 13

Gate-Level Simulation with GPU Computing

Functional verification of modern digital designs is a crucial, time-consuming task impacting not only the correctness of the final product, but also its time to market. At the heart of most of today’s verification efforts is logic simulation, used heavily to verify the functional correctness of a design for a broad range of abstraction levels. […]

CUDA

Sep, 13

Software Challenges for Extreme Scale Computing: Going From Petascale to Exascale Systems

Preparing applications for a transition from petascale to exascale systems will require a very large investment in several areas of software research and development. The introduction of manycore nodes, the abundance of parallelism, an increase in system faults (including soft errors) and a complicated, multi-component software environment are some of the most challenging issues we […]

Sep, 12

Energy-efficient computing for extreme-scale science

A many-core processor design for high-performance systems draws from embedded computing’s low-power architectures and design processes, providing a radical alternative to cluster solutions. The computational power required to accurately model extreme problem spaces, such as climate change, requires more than a business-as-usual approach. Building ever-larger clusters of commercial off-the-shelf (COTS) hardware will be increasingly constrained […]

Sep, 12

Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing

Although hardware/software partitioning of embedded applications onto FPGAs is widely known to have performance and power advantages, FPGA usage has been typically limited to hardware experts, due largely to several problems: 1) difficulty of integrating hardware design tools into well-established software tool flows, 2) increasingly lengthy FPGA design iterations due to placement and routing, and […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

AnySL: efficient and portable shading for ray tracing

A directionally adaptive edge anti-aliasing filter

Real-Time Volumetric Shadows using 1D Min-Max Mipmaps

Exploring the use of glossy light volumes for interactive global illumination

Computational stereo camera system with programmable control loop

Data-intensive document clustering on graphics processing unit (GPU) clusters

Lossless data compression on GPGPU architectures

Parallel volume rendering implementation on graphics cards using CUDA

Gate-Level Simulation with GPU Computing

Software Challenges for Extreme Scale Computing: Going From Petascale to Exascale Systems

Energy-efficient computing for extreme-scale science

Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)