high performance computing on graphics processing units: hgpu.org

Posts

Nov, 10

A GPU implementation of a track-repeating algorithm for proton radiotherapy dose calculations

An essential component in proton radiotherapy is the algorithm to calculate the radiation dose to be delivered to the patient. The most common dose algorithms are fast but they are approximate analytical approaches. However their level of accuracy is not always satisfactory, especially for heterogeneous anatomic areas, like the thorax. Monte Carlo techniques provide superior […]

CUDA

Nov, 10

GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical concern in most image guided radiation therapy procedures. It is the goal of this paper to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. For this purpose, […]

CUDA

Nov, 10

Faster Radix Sort via Virtual Memory and Write-Combining

Sorting algorithms are the deciding factor for the performance of common operations such as removal of duplicates or database sort-merge joins. This work focuses on 32-bit integer keys, optionally paired with a 32-bit value. We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory […]

Nov, 10

Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to […]

CUDA

•

OpenCL

Nov, 10

GPU-based Low-dose 4DCT Reconstruction via Temporal Non-local Means

Four-dimensional computed tomography (4DCT) has been widely used in cancer radiotherapy for accurate target delineation and motion measurement for tumors in thorax and upper abdomen areas. However, 4DCT simulation is associated with much higher imaging dose than conventional CT simulation, which is a major concern in its clinical application. Conventionally, each phase of 4DCT is […]

CUDA

Nov, 10

Compressive Phase Contrast Tomography

When x-rays penetrate soft matter, their phase changes more rapidly than their amplitude. Interference effects visible with high brightness sources creates higher contrast, edge enhanced images. When the object is piecewise smooth (made of big blocks of a few components), such higher contrast datasets have a sparse solution. We apply basis pursuit solvers to improve […]

CUDA

Nov, 9

Sop-GPU: Accelerating biomolecular simulations in the centisecond timescale using graphics processors

Theoretical exploration of fundamental biological processes involving the forced unraveling of multimeric proteins, the sliding motion in protein fibers and the mechanical deformation of biomolecular assemblies under physiological force loads is challenging even for distributed computing systems. Using a (C)alpha-based coarse-grained self organized polymer (SOP) model, we implemented the Langevin simulations of proteins on graphics […]

CUDA

Nov, 9

GPU-based Low Dose CT Reconstruction via Edge-preserving Total Variation Regularization

High radiation dose in CT scans increases a lifetime risk of cancer and has become a major clinical concern. Recently, iterative reconstruction algorithms with Total Variation (TV) regularization have been developed to reconstruct CT images from highly undersampled data acquired at low mAs levels in order to reduce the imaging dose. Nonetheless, TV regularization may […]

CUDA

Nov, 9

How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation […]

CUDA

Nov, 9

General purpose Molecular Dynamics Simulations on GPUs: Issues of Pair Forces and Scaling to large Clusters

We present an implementation of a general purpose GPU-Molecular Dynamics code named LAMMPScuda which is based on LAMMPS. It exhibits excellent scaling behavior, allowing for the efficient usage of hundreds of GPUs for a single simulation. At the same time each GPU provides the equivalent performance of approximately 5 modern Quad Core CPUs. By supporting […]

CUDA

Nov, 9

Improving many flavor QCD simulations using multiple GPUs

We accelerate many-flavor lattice QCD simulations using multiple GPUs. Multiple pseudo-fermion fields are introduced additively and independently for each flavor in the many-flavor HMC algorithm. Using the independence of each pseudo-fermion field and the blocking technique for the quark solver, we can assign the solver task to each GPU card. In this report we present […]

CUDA

Nov, 9

Direct N-body simulations of globular clusters: (I) Palomar 14

We present the first ever direct $N$-body computations of an old Milky Way globular cluster over its entire life time on a star-by-star basis. Using recent GPU hardware at Bonn University, we have performed a comprehensive set of $N$-body calculations to model the distant outer halo globular cluster Palomar 14 (Pal 14). By varying the […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A GPU implementation of a track-repeating algorithm for proton radiotherapy dose calculations

GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

Faster Radix Sort via Virtual Memory and Write-Combining

Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

GPU-based Low-dose 4DCT Reconstruction via Temporal Non-local Means

Compressive Phase Contrast Tomography

Sop-GPU: Accelerating biomolecular simulations in the centisecond timescale using graphics processors

GPU-based Low Dose CT Reconstruction via Edge-preserving Total Variation Regularization

How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

General purpose Molecular Dynamics Simulations on GPUs: Issues of Pair Forces and Scaling to large Clusters

Improving many flavor QCD simulations using multiple GPUs

Direct N-body simulations of globular clusters: (I) Palomar 14

Recent source codes

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)