high performance computing on graphics processing units: hgpu.org

Posts

Jan, 31

Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)

We develop a new algorithm for isosurface extraction and view-dependent filtering from large time-varying fields, by using a novel persistent time-octree (PTOT) indexing structure. Previously, the persistent octree (POT) was proposed to perform isosurface extraction and view-dependent filtering, which combines the advantages of the interval tree (for optimal searches of active cells) and of the […]

CUDA

Jan, 31

Scientific Computing on Heterogeneous Architectures

The CPU has traditionally been the computational work horse in scientific computing, but we have seen a tremendous increase in the use of accelerators, such as Graphics Processing Units (GPUs), in the last decade. These architectures are used because they consume less power and offer higher performance than equivalent CPU solutions. They are typically also […]

CUDA

Jan, 31

OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices

This work presents the Open Reconfigurable Computing Language (OpenRCL) system designed to enable low-power high-performance reconfigurable computing with imperative programming language such as C/C++. The key idea is to expose the FPGA platform as a compiler target for applications expressed in the OpenCL paradigm. To this end, we present a combination of low-level virtual machine […]

OpenCL

Jan, 31

Simulation and visualization of the Saint-Venant system using GPUs

We consider three high-resolution schemes for computing shallow-water waves as described by the Saint-Venant system and discuss how to develop highly efficient implementations using graphical processing units (GPUs). The schemes are well-balanced for lake-at-rest problems, handle dry states, and support linear friction models. The first two schemes handle dry states by switching variables in the […]

CUDA

Jan, 31

Highly interactive computational steering for coupled 3D flow problems utilizing multiple GPUs

Most computational fluid dynamics (CFD) simulations require massive computational power which is usually provided by traditional High Performance Computing (HPC) environments. Although interactivity of the simulation process is highly appreciated by scientists and engineers, due to limitations of typical HPC environments, present CFD simulations are usually executed non interactively. A recent trend is to harness […]

Jan, 31

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching: A Quantitative Performance Analysis and Optimization Study on the Cell/B.E. Processor

In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at […]

Jan, 31

Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

We explore the intersection between an emerging class of architectures and a prominent workload: GPGPUs (General-Purpose Graphics Processing Units) and regular expression matching, respectively. It is a challenging task because this workload — with its irregular, non-coalesceable memory access patterns — is very different from the regular, numerical workloads that run efficiently on GPGPUs. Small-ruleset […]

CUDA

Jan, 30

A Novel Monte Carlo Noise Reduction Operator

We propose a novel Monte Carlo noise reduction operator in this article. We apply and extend the standard bilateral filtering method and build a new local adaptive noise reduction kernel. It first computes an initial estimate for the value of each pixel, and then applies bilateral filtering using this initial estimate in its range filter […]

Jan, 30

Geometry Textures and Applications

Geometry textures are a novel geometric representation for surfaces based on height maps. The visualization is done through a graphics processing unit (GPU) ray casting algorithm applied to the whole object. At rendering time, the fine-scale details (mesostructures) are reconstructed preserving original quality. Visualizing surfaces with geometry textures allows a natural level-of-detail (LOD) behaviour. There […]

OpenGL

Jan, 30

Real-Time Depth-of-Field Rendering Using Point Splatting on Per-Pixel Layers

We present a real-time method for rendering a depth-of-field effect based on the per-pixel layered splatting where source pixels are scattered on one of the three layers of a destination pixel. In addition, the missing information behind foreground objects is filled with an additional image of the areas occluded by nearer objects. The method creates […]

OpenGL

Jan, 30

Efficient image reconstruction for point-based and line-based rendering

We address the problem of an efficient image-space reconstruction of adaptively sampled scenes in the context of point-based and line-based graphics. The image-space reconstruction offers an advantageous time complexity compared to surface splatting techniques and, in fact, our improved GPU implementation performs significantly better than splatting implementations for large point-based models. We discuss the integration […]

OpenGL

Jan, 30

Fast and Scalable CPU/GPU Collision Detection for Rigid and Deformable Surfaces

We present a new hybrid CPU/GPU collision detection technique for rigid and deformable objects based on spatial subdivision. Our approach efficiently exploits the massive computational capabilities of modern CPUs and GPUs commonly found in off-the-shelf computer systems. The algorithm is specifically tailored to be highly scalable on both the CPU and the GPU sides. We […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)

Scientific Computing on Heterogeneous Architectures

OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices

Simulation and visualization of the Saint-Venant system using GPUs

Highly interactive computational steering for coupled 3D flow problems utilizing multiple GPUs

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching: A Quantitative Performance Analysis and Optimization Study on the Cell/B.E. Processor

Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

A Novel Monte Carlo Noise Reduction Operator

Geometry Textures and Applications

Real-Time Depth-of-Field Rendering Using Point Splatting on Per-Pixel Layers

Efficient image reconstruction for point-based and line-based rendering

Fast and Scalable CPU/GPU Collision Detection for Rigid and Deformable Surfaces

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)