high performance computing on graphics processing units: hgpu.org

Posts

Feb, 13

Speed and Portability issues for Random Number Generation on Graphical Processing Units with CUDA and other Processing Accelerators

Generating quality random numbers is a performance-critical application for many scientific simulations. Modern processing acceleration techniques such as: graphical co-processing units(GPUs), multi-core conventional CPUs; special purpose multicore CPUs; and parallel computing approaches such as multi-threading on shared memory or message passing on clusters, all offer ways to speed up random number generation (RNG). Providing fast […]

Feb, 13

Cluster and Fast-Update Simulations of Regular and Rewired Lattice Ising Models Using CUDA and Graphical Processing Units

Models such as the Ising system in computational physics are still important tools for analysing phase transitions and universal behaviours for new irregular and distorted lattice networks. Data-parallelism can be exploited to speed up such simulations as well as their analysis using general purpose graphical processing units (GPU) and other accelerating devices. We report on […]

CUDA

Feb, 13

Automated and parallel code generation for finite-differencing stencils with arbitrary data types

Finite-Differencing and other regular and direct approaches to solving partial differential equations (PDEs) are methods that fit well on data-parallel computer systems. These problems continue to arise in many application areas of computational science and engineering but still offer some programming challenges as they are not readily incorporated into a general standard software library that […]

CUDA

Feb, 13

Visualising spins and clusters in regular and small-world Ising models with GPUs

Visualising computational simulation models of solid state physical systems is a hard problem for dense lattice models. Fly-throughs and cutaways can aid viewer understanding of a simulated system. Interactive time model parameter updates and overlaying of measurements and graticules, cluster colour labelling and other visual highlighting cues can also enhance user intuition of the model’s […]

CUDA

•

OpenGL

Feb, 13

Data-Parallelism and GPUs for Lattice Gas Fluid Simulations

Lattice gas cellular automata (LGCA) models provide a relatively fast means of simulating fluid flow and can give both quantitative and qualitative insights into flow patterns around complex obstacles. Symmetry requirements inherent in the Navier-Stokes equation mandate that lattice-gas approximations to the full field equations be run on triangular lattices in two dimensions and on […]

CUDA

Feb, 13

GPU-based Multi-Volume Rendering of Complex Data in Neuroscience and Neurosurgery

Recent advances in image acquisition technology and its availability in the medical and bio-medical fields have lead to an unprecedented amount of high-resolution imaging data. However, the inherent complexity of this data, caused by its tremendous size, complex structure or multi-modality poses several challenges for current visualization tools. Recent developments in graphics hardware architecture have […]

CUDA

Feb, 13

Comparison of GPU Architectures for Asynchronous Communication with Finite-Differencing Applications

Graphical Processing Units (GPUs) are good data-parallel performance accelerators for solving regular mesh partial differential equations (PDEs) whereby low-latency communications and high compute to communications ratios can yield very high levels of computational efficiency. Finite-difference time-domain methods still play an important role for many PDE applications. Iterative multi-grid and multilevel algorithms can converge faster than […]

CUDA

Feb, 13

Interactive visualisation of spins and clusters in regular and small-world Ising models with CUDA on GPUs

Three-dimensional simulation models are hard to visualise for dense lattice systems, even with cutaways and flythrough techniques. We use multiple Graphics Processing Units (GPUs), CUDA and OpenGL to increase our understanding of computational simulation models such as the 2-D and 3-D Ising systems with small-world link rewiring by accelerating both the simulation and visualisation into […]

CUDA

•

OpenGL

Feb, 13

Auto-Generation of Parallel Finite-Differencing Code for MPI, TBB and CUDA

Finite-difference methods can be useful for solving certain partial differential equations (PDEs) in the time domain. Compiler technologies can be used to parse an application domain specific representation of these PDEs and build an abstract representation of both the equation and the desired solver. This abstract representation can be used to generate a language-specific implementation. […]

CUDA

Feb, 13

Copperhead: Compiling an embedded data parallel language

Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level […]

CUDA

Feb, 12

Efficient Sparse Voxel Octrees – Analysis, Extensions, and Implementation

This technical report extends our previous paper on sparse voxel octrees. We first discuss the benefits and drawbacks of voxel representations and how the storage space requirements behave for different kinds of content. Then, we explain in detail our compact data structure for storing voxels and an efficient ray cast algorithm that utilizes this structure, […]

CUDA

Feb, 12

Efficient sparse voxel octrees

In this paper we examine the possibilities of using voxel representations as a generic way for expressing complex and feature-rich geometry on current and future GPUs. We present in detail a compact data structure for storing voxels and an efficient algorithm for performing ray casts using this structure. We augment the voxel data with novel […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Speed and Portability issues for Random Number Generation on Graphical Processing Units with CUDA and other Processing Accelerators

Cluster and Fast-Update Simulations of Regular and Rewired Lattice Ising Models Using CUDA and Graphical Processing Units

Automated and parallel code generation for finite-differencing stencils with arbitrary data types

Visualising spins and clusters in regular and small-world Ising models with GPUs

Data-Parallelism and GPUs for Lattice Gas Fluid Simulations

GPU-based Multi-Volume Rendering of Complex Data in Neuroscience and Neurosurgery

Comparison of GPU Architectures for Asynchronous Communication with Finite-Differencing Applications

Interactive visualisation of spins and clusters in regular and small-world Ising models with CUDA on GPUs

Auto-Generation of Parallel Finite-Differencing Code for MPI, TBB and CUDA

Copperhead: Compiling an embedded data parallel language

Efficient Sparse Voxel Octrees – Analysis, Extensions, and Implementation

Efficient sparse voxel octrees

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)