high performance computing on graphics processing units: hgpu.org

Posts

Dec, 31

Comparison of Fragmentation/Dispersion Models for Asteroid Nuclear Disruption Mission Design

This paper considers the problem of developing statistical orbit predictions of nearEarth object (NEO) fragmentation for nuclear disruption mission design and analysis. The critical component of NEO fragmentation modeling is developed for a momentum-preserving hypervelocity impact of a spacecraft carrying nuclear payload. The results of the fragmentation process are compared to static models and results […]

CUDA

Dec, 31

Optimising the DBCSR GPU Implementation

The DBCSR library solves the sparse matrix multiplication required to perform atomistic simulations using the CP2K software. The GPU implementation of DBCSR was targeted for optimisation, and having its scope increased to allow it to function with larger block sizes. It was found that the main kernel could be sped up by 16% by augmenting […]

CUDA

Dec, 31

Torch7: A Matlab-like Environment for Machine Learning

Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily […]

CUDA

Dec, 31

Hardware-Assisted High-Efficiency Ray Casting of Unstructured Time-Varying Flows Using Temporal Coherence

Advances in computational power are enabling high-precision numerical simulations of unsteady flows using unstructured grids. The dynamic ray casting technique with the aid of texture hardware can achieve high-accuracy volume rendering of unstructured time-varying data from these simulations. However, the existing approach does not pay enough attention to temporal coherence, which depresses the rendering rate. […]

Dec, 31

A 3D Convex Hull Algorithm for Graphics Hardware

This report presents a novel approach, termed gHull, to compute the convex hull for a given point set in R3 using the graphics processing units (GPUs). While the 2D problem can easily and efficiently be solved in the GPU, there is no known obvious, classical parallel solution that works well in the GPU for the […]

CUDA

•

OpenGL

Dec, 31

Construction and Rendering of Trimmed Blending Surfaces with Sharp Features on a GPU

We construct surfaces with darts, creases, and corners by blending different types of local geometries. We also render these surfaces efficiently using programmable graphics hardware. Points on the blending surface are evaluated using simplified computation which can easily be performed on a graphics processing unit. Results show an eighteen-fold to twenty-fold increase in rendering speed […]

Dec, 30

Faster Dark Matter Calculations Using the GPU

We have investigated the use of the graphical processing unit to accelerate the software package DarkSUSY. DarkSUSY is, among other things, used for calculating the dark matter relic density — an measurable quantity — given the supersymmetric neutralino, tilde{Chi}, as a dark matter candidate. Supersymmetric theories have many free parameters and we want to calculate […]

CUDA

Dec, 30

Building Human Brain Network in 3D Coefficient Map Determined by X-ray Microtomography

X-ray microtomography can visualize 3D structures of biological soft tissues at cellular to subcellular resolution. Such 3D structures are composed of a great number of cells and extracellular matrices that should be assigned separately as tissue constituents. Here, we report a method for building a skeletonized model of the human brain network in a 3D […]

CUDA

Dec, 30

Deep Shadow Maps from Volumetric Data on the GPU

A method of generating Deep Shadow Maps from a 3D data set is presented. This method uses ray tracing on the GPU to accumulate opacity and store them in a deep shadow map. The deep shadow map is then sampled based on view direction to determine how much light got to a particular fragment. The […]

Dec, 29

Resource-Aware Compiler Prefetching for Fine-Grained Many-Cores

Super-scalar, out-of-order processors that can have tens of read and write requests in the execution window place significant demands on Memory Level Parallelism (MLP). Multi- and many-cores with shared parallel caches further increase MLP demand. Current cache hierarchies however have been unable to keep up with this trend, with modern designs allowing only 4-16 concurrent […]

Dec, 29

Acceleration of PIC Simulation with GPU

Particle-in-cell (PIC) is a simulation technique for plasma physics. The large number of particles in highresolution plasma simulation increases the volume computation required, making it vital to increase computation speed. In this study, we attempt to accelerate computation speed on graphics processing units (GPUs) using KEMPO, a PIC simulation code package [H. Matsumoto and Y. […]

CUDA

Dec, 29

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Comparison of Fragmentation/Dispersion Models for Asteroid Nuclear Disruption Mission Design

Optimising the DBCSR GPU Implementation

Torch7: A Matlab-like Environment for Machine Learning

Hardware-Assisted High-Efficiency Ray Casting of Unstructured Time-Varying Flows Using Temporal Coherence

A 3D Convex Hull Algorithm for Graphics Hardware

Construction and Rendering of Trimmed Blending Surfaces with Sharp Features on a GPU

Faster Dark Matter Calculations Using the GPU

Building Human Brain Network in 3D Coefficient Map Determined by X-ray Microtomography

Deep Shadow Maps from Volumetric Data on the GPU

Resource-Aware Compiler Prefetching for Fine-Grained Many-Cores

Acceleration of PIC Simulation with GPU

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Recent source codes

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

HPC Benchmark Survey

HDM: Home made Diffusion Models

General Matrix Multiplication (GEMM)

CrossTL: Universal Programming Language & Translator

TBD-GPU

DG-SWEM - The Discontinuous Galerkin Shallow Water Equation Model

torchPDLP: Primal-Dual Linear Programming in PyTorch. In collaboration with AMD and IPAM

Benchmarks for Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs

Most viewed papers (last 30 days)