high performance computing on graphics processing units: hgpu.org

Posts

Dec, 31

Performance Considerations When Using a Dedicated Ray Traversal Engine

In the recent years we have witnessed massive boost to hardware graphics accelerators (graphics cards), not only in the raw performance, but also in their programmability, introducing the concept of GPGPU. However, despite this, the current architectures still favor feed-forward algorithms over recursive ones. While shading is, in this sense, a feed-forward algorithm, ray tracing, […]

Dec, 31

GPU-Based Research of Highly Efficient Ray Tracing

By further study of GPU architecture and GPU stream programming model. In this paper, uniform grid acceleration structure implements on the GPU stream programming model of the ray tracing. It has a lot of ray intersection calculations in the whole rendering process, reducing the efficiency of the whole scene rendering. Rendering without compromising the quality […]

Dec, 31

Fast Computing Adaptively Sampled Distance Field on GPU

In this paper we present an efficient method to compute the signed distance field for a large triangle mesh, which can run interactively with GPU accelerated. Restricted by absence of flexible pointer addressing on GPU, we design a novel multi-layer hash table to organize the voxel/triangle overlap pairs as two-tuples, such strategy provides an efficient […]

CUDA

Dec, 31

Efficient Triangle and Quadrilateral Clipping within Shaders

Clipping a triangle or a convex quadrilateral to a plane is a common operation in computer graphics. This clipping is implemented by fixed-function units within the graphics pipeline under most rasterization APIs. It is increasingly interesting to perform clipping in programmable stages as well. For example, to clip bounding volumes generated in the Geometry unit […]

OpenGL

Dec, 31

Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine "who spoke when" in an audio recording. While state-of-the-art in accuracy, this method is computa-tionally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require […]

CUDA

Dec, 31

A GPU Accelerated Volumetric Ray Tracer for Incandescent Gas

The initial goal of this project was to create a physically accurate GPU-accelerated simulation of fire. Due to limited time available in the semester (combined with the inherent difficult of debugging CUDA code) we ended up reducing the scope somewhat and focusing on a realistic GPU-accelerated technique for rendering incandescent gas, as in flames, without […]

CUDA

Dec, 31

Boosting quantum evolutions using Trotter-Suzuki algorithms on GPUs

The evolution calculation of quantum systems represents a great challenge nowadays. Numerical implementations typically scale exponentially with the size of the system, demanding high amounts of resources. General Purpose Graphics Processor Units (GPGPUs) enable a new range of possibilities for numerical simulations of quantum systems. In this work we implemented, optimized and compared the quantum […]

CUDA

Dec, 31

Fast K-selection Algorithms for Graphics Processing Units

Finding the kth largest value in a list of n values is a well-studied problem for which many algorithms have been proposed. A naive approach is to sort the list and then simply select the kth term in the sorted list. However, when the sorted list is not needed, this method has done quite a […]

CUDA

Dec, 31

Mapping the SBR and TW-ILDCs to Heterogeneous CPU-GPU Architecture for Fast Computation of Electromagnetic Scattering

In this paper, the shooting and bouncing ray (SBR) method in combination with the truncated wedge incremental length diffraction coefficients (TW-ILDCs) is implemented on the heterogeneous CPU-GPU architecture to effectively solve the electromagnetic scattering problems. The SBR is mapped to the GPU because numerous independent ray tubes can make full use of the massively parallel […]

CUDA

Dec, 31

Hierarchical Stochastic Motion Blur Rasterization

We present a hierarchical traversal algorithm for stochastic rasterization of motion blur, which efficiently reduces the number of inside tests needed to resolve spatio-temporal visibility. Our method is based on novel tile against moving primitive tests that also provide temporal bounds for the overlap. The algorithm works entirely in homogeneous coordinates, supports MSAA, facilitates efficient […]

CUDA

Dec, 31

Comparison of Fragmentation/Dispersion Models for Asteroid Nuclear Disruption Mission Design

This paper considers the problem of developing statistical orbit predictions of nearEarth object (NEO) fragmentation for nuclear disruption mission design and analysis. The critical component of NEO fragmentation modeling is developed for a momentum-preserving hypervelocity impact of a spacecraft carrying nuclear payload. The results of the fragmentation process are compared to static models and results […]

CUDA

Dec, 31

Optimising the DBCSR GPU Implementation

The DBCSR library solves the sparse matrix multiplication required to perform atomistic simulations using the CP2K software. The GPU implementation of DBCSR was targeted for optimisation, and having its scope increased to allow it to function with larger block sizes. It was found that the main kernel could be sped up by 16% by augmenting […]

CUDA