high performance computing on graphics processing units: hgpu.org

Posts

Nov, 18

Feature-preserving triangular geometry images for level-of-detail representation of static and skinned meshes

Geometry images resample meshes to represent them as texture for efficient GPU processing by forcing a regular parameterization that often incurs a large amount of distortion. Previous approaches broke the geometry image into multiple rectangular or irregular charts to reduce distortion, but complicated the automatic level of detail one gets from MIP-maps of the geometry […]

CUDA

Nov, 18

Graphics processing unit–accelerated holography by simulated annealing

Computer-generated holography is a computationally intensive process particularly well suited to the architecture of graphics processing units (GPUs). This work investigates the performance improvements achievable through utilization of a GPU for optimization of holograms via simulated annealing. Two examples are given; accelerated training of an optical correlator to accept or reject inputs over sets of […]

Nov, 18

A flexible high-performance Lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries

We describe the porting of the Lattice Boltzmann component of MUPHY, a multi-physics/scale simulation software, to multiple graphics processing units using the Compute Unified Device Architecture. The novelty of this work is the development of ad hoc techniques for optimizing the indirect addressing that MUPHY uses for efficient simulations of irregular domains.

CUDA

Nov, 18

GPU-enabled FREALIGN: Accelerating single particle 3D reconstruction and refinement in Fourier space on graphic processors

Among all the factors that determine the resolution of a 3D reconstruction by single particle electron cryo-microscopy (cryoEM), the number of particle images used in the dataset plays a major role. More images generally yield better resolution, assuming the imaged protein complex is conformationally and compositionally homogeneous. To facilitate processing of very large datasets, we […]

CUDA

Nov, 18

Low cost, high performance GPU computing solution for atomic resolution cryoEM single-particle reconstruction

Recent advancements in cryo-electron microscopy (cryoEM) have made it technically possible to determine the three-dimensional (3D) structures of macromolecular complexes at atomic resolution. However, processing the large amount of data needed for atomic resolution reconstructions requires either accessing to very expensive computer clusters or waiting for weeks of continuous computation in a personal computer (PC). […]

CUDA

Nov, 18

An adaptive Expectation-Maximization algorithm with GPU implementation for electron cryomicroscopy

Maximum-likelihood (ML) estimation has very desirable properties for reconstructing 3D volumes from noisy cryo-EM images of single macromolecular particles. Current implementations of ML estimation make use of the Expectation-Maximization (EM) algorithm or its variants. However, the EM algorithm is notoriously computation-intensive, as it involves integrals over all orientations and positions for each particle image. We […]

CUDA

Nov, 17

Correlation analysis on GPU systems using NVIDIA’s CUDA

Functional magnetic resonance imaging allows non-invasive measurements of brain dynamics and has already been used for neurofeedback experiments, which relies on real time data processing. The limited computational resources that are typically available for this have hindered the use of connectivity analysis in this context. A basic, but already computationally demanding analysis method of neural […]

CUDA

Nov, 17

A Survey of Medical Image Registration on Multicore and the GPU

In this article, we look at early, recent, and state-of-the-art methods for registration of medical images using a range of high-performance computing (HPC) architectures including symmetric multiprocessing (SMP), massively multiprocessing (MMP), and architectures with distributed memory (DM), and nonuniform memory access (NUMA). The article is designed to be self-sufficient. We will take the time to […]

Nov, 17

TeraFLOP computing on a desktop PC with GPUs for 3D CFD

A very efficient implementation of a lattice Boltzmann (LB) kernel in 3D on a graphical processing unit using the compute unified device architecture interface developed by nVIDIA is presented. By exploiting the explicit parallelism offered by the graphics hardware, we obtain an efficiency gain of up to two orders of magnitude with respect to the […]

Nov, 17

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to […]

Nov, 17

Scalable parallel programming with CUDA

Is CUDA the parallel programming model that application developers have been waiting for?

CUDA

Nov, 17

Fast free-form deformation using graphics processing units

A large number of algorithms have been developed to perform non-rigid registration and it is a tool commonly used in medical image analysis. The free-form deformation algorithm is a well-established technique, but is extremely time consuming. In this paper we present a parallel-friendly formulation of the algorithm suitable for graphics processing unit execution. Using our […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Feature-preserving triangular geometry images for level-of-detail representation of static and skinned meshes

Graphics processing unit–accelerated holography by simulated annealing

A flexible high-performance Lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries

GPU-enabled FREALIGN: Accelerating single particle 3D reconstruction and refinement in Fourier space on graphic processors

Low cost, high performance GPU computing solution for atomic resolution cryoEM single-particle reconstruction

An adaptive Expectation-Maximization algorithm with GPU implementation for electron cryomicroscopy

Correlation analysis on GPU systems using NVIDIA’s CUDA

A Survey of Medical Image Registration on Multicore and the GPU

TeraFLOP computing on a desktop PC with GPUs for 3D CFD

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Scalable parallel programming with CUDA

Fast free-form deformation using graphics processing units

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)