high performance computing on graphics processing units: hgpu.org

Posts

Nov, 4

GAMER with out-of-core computation

GAMER is a GPU-accelerated Adaptive-MEsh-Refinement code for astrophysical simulations. In this work, two further extensions of the code are reported. First, we have implemented the MUSCL-Hancock method with the Roe’s Riemann solver for the hydrodynamic evolution, by which the accuracy, overall performance and the GPU versus CPU speed-up factor are improved. Second, we have implemented […]

CUDA

Nov, 4

Relational joins on graphics processors

We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for writing to random memory locations, efficient inter-processor communication, and a programming model for general-purpose computing. Taking advantage of these new features, we design a set of data-parallel primitives such as […]

CUDA

Nov, 4

Approximate Dynamic Programming and Neural Networks on Game Hardware

Modern graphics processing units (GPU) and game consoles are used for much more than simply 3D graphics applications and video games. From machine vision to finite element analysis, GPU’s are being used in diverse applications, collectively called General Purpose computation onf graphics processor units (GPGPU). Additionally, game consoles are entering the market of high performance […]

Nov, 4

GPU processing of particle system animation

An approach to particle system processing on a GPU is discussed. Balancing of CPU and GPU loads is described in detail. Original approaches aimed at reducing the data flow from the system memory to the video memory are proposed. A comparison between the proposed GPU-based approach and the classical CPU-based particle system animation is given.

Nov, 4

Molecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1-39)

To date, the slowest-folding proteins folded ab initio by all-atom molecular dynamics simulations have had folding times in the range of nanoseconds to microseconds. We report simulations of several folding trajectories of NTL9(1-39), a protein which has a folding time of ~1.5 ms. Distributed molecular dynamics simulations in implicit solvent on GPU processors were used […]

Nov, 4

High-Level programming of graphics hardware to increase performance of electromagnetics simulation

Modern graphics processing units (GPU’s) utilize a programmable parallel pipeline architecture to render complex scenes onto a two-dimensional screen. Rendering these scenes requires rasterization, texturing operations, and multiple stages of lighting operations. These processes are computationally intensive and must be performed near real-time in today’s gaming and workstation applications. These industries have driven the performance […]

Nov, 4

Artificial neural network computation on graphic process unit

Artificial neural network (ANN) is widely used in pattern recognition related area. In some case, the computational load is very heavy, in other case, real time process is required. So there is a need to apply a parallel algorithm on it, and usually the computation for ANN is inherently parallel. In this paper, graphic hardware […]

OpenGL

Nov, 4

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism […]

CUDA

Nov, 4

CuPP – A framework for easy CUDA integration

This paper reports on CuPP, our newly developed C++ framework designed to ease integration of NVIDIAs GPGPU system CUDA into existing C++ applications. CuPP provides interfaces to reoccurring tasks that are easier to use than the standard CUDA interfaces. In this paper we concentrate on memory management and related data structures. CuPP offers both a […]

CUDA

Nov, 4

Interactive 3D distance field computation using linear factorization

We present an interactive algorithm to compute discretized 3D Euclidean distance fields. Given a set of piecewise linear geometric primitives, our algorithm computes the distance field for each slice of a uniform spatial grid. We express the non-linear distance function of each primitive as a dot product of linear factors. The linear terms are efficiently […]

OpenGL

Nov, 4

NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units

We present and discuss the characteristics and performances, both in term of computational speed and precision, of a numerical code which numerically integrates the equation of motions of N ‘particles’ interacting via Newtonian gravitation and move in an external galactic smooth field. The force evaluation on every particle is done by mean of direct summation […]

CUDA

Nov, 3

Brook for GPUs: Stream Computing on Graphics Hardware

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present […]

OpenGL

high performance computing on graphics processing units: hgpu.org

Posts

GAMER with out-of-core computation

Relational joins on graphics processors

Approximate Dynamic Programming and Neural Networks on Game Hardware

GPU processing of particle system animation

Molecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1-39)

High-Level programming of graphics hardware to increase performance of electromagnetics simulation

Artificial neural network computation on graphic process unit

Implementing sparse matrix-vector multiplication on throughput-oriented processors

CuPP – A framework for easy CUDA integration

Interactive 3D distance field computation using linear factorization

NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units

Brook for GPUs: Stream Computing on Graphics Hardware

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)