high performance computing on graphics processing units: hgpu.org

Posts

Jun, 4

Monte Carlo Radiative Transport on the GPU

This paper presents a fast parallel Monte Carlo method to solve the radiative transport equation in inhomogeneous participating media. The implementation is based on CUDA and runs on the GPU. In order to meet the requirements of the parallel GPU architecture and to reuse shooting paths, we follow a photon mapping approach where during gathering […]

CUDA

Jun, 4

High performance stream computing for particle beam transport simulations

Understanding modern particle accelerators requires simulating charged particle transport through the machine elements. These simulations can be very time consuming due to the large number of particles and the need to consider many turns of a circular machine. Stream computing offers an attractive way to dramatically improve the performance of such simulations by calculating the […]

Jun, 3

A streaming narrow-band algorithm: interactive computation and visualization of level sets

Deformable isosurfaces, implemented with level-set methods, have demonstrated a great potential in visualization and computer graphics for applications such as segmentation, surface processing, and physically-based modeling. Their usefulness has been limited, however, by their high computational cost and reliance on significant parameter tuning. We present a solution to these challenges by describing graphics processor (GPU) […]

OpenGL

Jun, 3

Synthesizing Subdivision Meshes Using Real Time Tessellation

We propose a new GPU method for synthesizing subdivision meshes with exact adaptive geometry in real time. Our GPU kernel builds upon precomputed tables of basis functions for subdivision surfaces and is therefore supporting all subdivision schemes, either interpolating or approximating, for triangle or quad meshes. We designed our kernel so that it can be […]

OpenGL

Jun, 3

High Performance Stereo Vision Designed for Massively Data Parallel Platforms

Real-time stereo vision is attractive in many applications like robot navigation and 3-D scene reconstruction. Data parallel platforms, e.g., graphics processing unit (GPU), are often used for real-time stereo, because most stereo algorithms involve a large portion of data parallel computations. In this paper, we propose a stereo system on GPU which pushes the Pareto-efficiency […]

CUDA

Jun, 3

Multiple-GPUs Algorithm for Lattice Boltzmann Method

It is studied about parallel algorithm of lattice Boltzmann method. The data’s arrangement, commutation and computational progress are redesigned in a marriage of message passing interface and general purpose graphic processing Units. On the single-GPU, novel techniques appearing in shader model 3.0 such as frame buffer object (FBO), multiple-channels-rendering and, rendering-to-textures are used to improve […]

OpenGL

Jun, 3

Interactive Approximate Rendering of Reflections, Refractions, and Caustics

Reflections, refractions, and caustics are very important for rendering global illumination images. Although many methods can be applied to generate these effects, the rendering performance is not satisfactory for interactive applications. In this paper, complex ray-object intersections are simplified so that the intersections can be computed on a GPU, and an iterative computing scheme based […]

Jun, 3

Mixed-Tool Performance Analysis on Hybrid Multicore Architectures

This paper proposes a triangular solve algorithm with variable block size for graphics processing unit (GPU). By using diagonal blocks inversion with recursion, this algorithm works with tunable block size to achieve the best performance. Various methods are shown on how to make use of existing profiling tools to successfully measure and analyze performance of […]

CUDA

Jun, 3

Toward efficient GPU-accelerated N-body simulations

N-body algorithms are applicable to a number of common problems in computational physics including gravitation, electrostatics, and fluid dynamics. Fast algorithms (those with better than O(N^2) performance) exist, but have not been successfully implemented on GPU hardware for practical problems. In the present work, we introduce not only best-in-class performance for a multipole-accelerated treecode method, […]

CUDA

Jun, 3

Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster

The vortex core shed from rotorcraft blades maintains coherency-and thus dynamic relevance-many blade turns after its creation. This presents a challenge to traditional Eulerian computational methods, as fine grids are required to suppress numerical diffusion which would weaken the vortex cores after a small number of revolutions. Vortex methods have been used in the past […]

Jun, 3

A GPU-accelerated Boundary Element Method and Vortex Particle Method

Vortex particle methods, when combined with multipole-accelerated boundary element methods (BEM), become a complete tool for direct numerical simulation (DNS) of internal or external vortex-dominated flows. In previous work, we presented a method to accelerate the vorticity-velocity inversion at the heart of vortex particle methods by performing a multipole treecode N-body method on parallel graphics […]

Jun, 3

Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors

We explore the use of generalized t priors on regression coefficients to help understand the nature of association signal within "hit regions" of genome-wide association studies. The particular generalized t distribution we adopt is a Student distribution on the absolute value of its argument. For low degrees of freedom we show that the generalized t […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Monte Carlo Radiative Transport on the GPU

High performance stream computing for particle beam transport simulations

A streaming narrow-band algorithm: interactive computation and visualization of level sets

Synthesizing Subdivision Meshes Using Real Time Tessellation

High Performance Stereo Vision Designed for Massively Data Parallel Platforms

Multiple-GPUs Algorithm for Lattice Boltzmann Method

Interactive Approximate Rendering of Reflections, Refractions, and Caustics

Mixed-Tool Performance Analysis on Hybrid Multicore Architectures

Toward efficient GPU-accelerated N-body simulations

Modeling Rotor Wakes with a Hybrid OVERFLOW-Vortex Method on a GPU Cluster

A GPU-accelerated Boundary Element Method and Vortex Particle Method

Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)