high performance computing on graphics processing units: hgpu.org

Posts

Nov, 2

General purpose molecular dynamics simulations fully implemented on graphics processing units

Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 […]

CUDA

Nov, 2

Thread-Scalable Evaluation of Multi-Jet Observables

A leading-order, leading-color parton-level event generator is developed for use on a multi-threaded GPU. Speed-up factors between 150 and 300 are obtained compared to an unoptimized CPU-based implementation of the event generator. In this first paper we study the feasibility of a GPU-based event generator with an emphasis on the constraints imposed by the hardware. […]

CUDA

Nov, 2

Graphics processing unit implementation of lattice Boltzmann models for flowing soft systems

A graphic processing unit (GPU) implementation of the multicomponent lattice Boltzmann equation with multirange interactions for soft-glassy materials [“glassy” lattice Boltzmann (LB)] is presented. Performance measurements for flows under shear indicate a GPU/CPU speed up in excess of 10 for 1024 2 grids. Such significant speed up permits to carry out multimillion time-steps simulations of […]

Nov, 2

GPU-accelerated deep shadow maps for direct volume rendering

Deep shadow maps unify the computation of volumetric and geometric shadows. For each pixel in the shadow map, a fractional visibility function is sampled, pre-filtered, and compressed as a piecewise linear function. However, the original implementation targets software-based off-line rendering. Similar previous algorithms on GPUs focus on geometric shadows and lose many important benefits of […]

Nov, 2

GPU-Based Interactive Visualization of Billion Point Cosmological Simulations

Despite the recent advances in graphics hardware capabilities, a brute force approach is incapable of interactively displaying terabytes of data. We have implemented a system that uses hierarchical level-of-detailing for the results of cosmological simulations, in order to display visually accurate results without loading in the full dataset (containing over 10 billion points). The guiding […]

Nov, 2

GPU powered CNN simulator (SIMCNN) with graphical flow based programmability

In this paper, we introduce an innovative CNN algorithm development environment that significantly assists algorithmic design. The introduced graphical user interface uses Matlab Simulink with UMF-like program description, where direct functionality accompanies better accessability. The new generation of graphical cards incorporate many general purpose graphics processing units, giving the power of parallel computing to a […]

Nov, 2

3D finite difference computation on GPUs using CUDA

In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implementation for both the stencil-only computation, as well as the discretization of the wave equation, which is currently of great interest in seismic computing. For the larger […]

CUDA

Nov, 2

Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead

Sorting is a well-investigated topic in Computer Science in general and by now many efficient sorting algorithms for CPUs and GPUs have been developed. There is no swapping, paging, etc. available on GPUs to provide more virtual memory than physically available, thus if one wants to sort sequences that exceed GPU memory using the GPU […]

CUDA

Nov, 2

Linear algebra operators for GPU implementation of numerical algorithms

In this work, the emphasis is on the development of strategies to realize techniques of numerical computing on the graphics chip. In particular, the focus is on the acceleration of techniques for solving sets of algebraic equations as they occur in numerical simulation. We introduce a framework for the implementation of linear algebra operators on […]

OpenGL

Nov, 2

Improving Performance of Matrix Multiplication and FFT on GPU

In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is achieved on NVIDIA GeForce GTX280 for the […]

CUDA

Nov, 2

Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU)

The Finite-Difference Time-Domain (FDTD) method is used extensively in areas of microwave engineering and optics. However, FDTD runs too slow for some simulations to be practical, especially when run on standard desktop computers. The suitability of dedicated hardware for the acceleration of FDTD computations has been investigated. It is demonstrated that standard consumer Graphics Processor […]

OpenGL

Nov, 1

A control-structure splitting optimization for GPGPU

Control statements in a GPU program such as loops and branches pose serious challenges for the efficient usage of GPU resources because those control statements will lead to the serialization of threads and consequently ruin the occupancy of GPU, that is, the number of threads running concurrently. Unlike traditional vector processing units that are inside […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

General purpose molecular dynamics simulations fully implemented on graphics processing units

Thread-Scalable Evaluation of Multi-Jet Observables

Graphics processing unit implementation of lattice Boltzmann models for flowing soft systems

GPU-accelerated deep shadow maps for direct volume rendering

GPU-Based Interactive Visualization of Billion Point Cosmological Simulations

GPU powered CNN simulator (SIMCNN) with graphical flow based programmability

3D finite difference computation on GPUs using CUDA

Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead

Linear algebra operators for GPU implementation of numerical algorithms

Improving Performance of Matrix Multiplication and FFT on GPU

Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU)

A control-structure splitting optimization for GPGPU

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)