high performance computing on graphics processing units: hgpu.org

Posts

Nov, 12

Real-time digital holographic microscopy observable in multi-view and multi-resolution

We propose a real-time digital holographic microscopy, that enables simultaneous multiple reconstructed images with arbitrary resolution, depth and positions, using Shifted-Fresnel diffraction instead of Fresnel diffraction. In this system, we used four graphics processing units (GPU) for multiple reconstructions in real-time. We show the demonstration of four reconstruction images from a hologram with arbitrary depths, […]

CUDA

Nov, 12

Fast calculation of computer-generated-hologram on AMD HD5000 series GPU and OpenCL

In this paper, we report fast calculation of a computer-generated-hologram using a new architecture of the HD5000 series GPU (RV870) made by AMD and its new software development environment, OpenCL. Using a RV870 GPU and OpenCL, we can calculate 1,920 * 1,024 resolution of a CGH from a 3D object consisting of 1,024 points in […]

OpenCL

Nov, 12

GPU-based Fast Cone Beam CT Reconstruction from Undersampled and Noisy Projection Data via Total Variation

Purpose: Cone-beam CT (CBCT) plays an important role in image guided radiation therapy (IGRT). However, the large radiation dose from serial CBCT scans in most IGRT procedures raises a clinical concern, especially for pediatric patients who are essentially excluded from receiving IGRT for this reason. The goal of this work is to develop a fast […]

CUDA

Nov, 12

Exploring the Limits of GPUs With Parallel Graph Algorithms

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected components. Such graph problems represent a worst case scenario for coalescing parallel memory accesses on GPUs which is critical for good […]

CUDA

Nov, 12

Langevin dynamics simulations of biomolecules on graphics processors

Due to the very long timescales involved (us-s), theoretical modeling of fundamental biological processes including folding, misfolding, and mechanical unraveling of biomolecules, under physiologically relevant conditions, is challenging even for distributed computing systems. Graphics Processing Units (GPUs) are emerging as an alternative programming platform to the more traditional CPUs as they provide high raw computational […]

CUDA

Nov, 12

Deterministic Sample Sort For GPUs

We present and evaluate GPU Bucket Sort, a parallel deterministic sample sort algorithm for many-core GPUs. Our method is considerably faster than Thrust Merge (Satish et.al., Proc. IPDPS 2009), the best comparison-based sorting algorithm for GPUs, and it is as fast as the new randomized sample sort for GPUs by Leischner et.al. (to appear in […]

Nov, 12

GPGPU based simulations for one and two dimensional quantum walks

Simulations of standard 1D and 2D quantum walks have been performed within Quantum Computer Simulator (QCS system) environment and with the use of GPU supported by CUDA technology. In particular, simulations of quantum walks may be seen as an appropriate benchmarks for testing calculational power of the processors used. It was demonstrated by a series […]

CUDA

Nov, 12

Measuring Bandwidth for Super Computer Workloads

Parallel computing plays a major role in almost all the fields from research to major concern problem solving purposes. Many researches are till now focusing towards the area of parallel processing. Nowadays it extends its usage towards the end user application such as GPU as well as multi-core processor development. The bandwidth measurement is essential […]

Nov, 12

Enabling a High Throughput Real Time Data Pipeline for a Large Radio Telescope Array with GPUs

The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. Raw data will be generated continuously at 5GiB/s, grouped into 8s cadences. This high throughput motivates the development of on-site, real time processing and reduction in preference to archiving, transport and off-line processing. Each batch of […]

CUDA

Nov, 12

GPU-based ultra-fast direct aperture optimization for online adaptive radiation therapy

Online adaptive radiation therapy (ART) has great promise to significantly reduce normal tissue toxicity and/or improve tumor control through real-time treatment adaptations based on the current patient anatomy. However, the major technical obstacle for clinical realization of online ART, namely the inability to achieve real-time efficiency in treatment re-planning, has yet to be solved. To […]

CUDA

Nov, 12

Real-time volumetric image reconstruction and 3D tumor localization based on a single x-ray projection image for lung cancer radiotherapy

Purpose: To develop an algorithm for real-time volumetric image reconstruction and 3D tumor localization based on a single x-ray projection image for lung cancer radiotherapy. Methods: Given a set of volumetric images of a patient at N breathing phases as the training data, we perform deformable image registration between a reference phase and the other […]

CUDA

Nov, 12

Large-Scale DNS of Gas-Solid Flow on Mole-8.5

Direct numerical simulation (DNS) for gas-solid flow is implemented on a multi-scale supercomputing system, Mole-8.5, featuring massive parallel GPU-CPU hybrid computing, for which the lattice Boltzmann method (LBM) is deployed together with the immersed moving boundary (IMB) method and discrete element method (DEM). A two-dimensional suspension with about 1,166,400 75-micron solid particles distributed in an […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Real-time digital holographic microscopy observable in multi-view and multi-resolution

Fast calculation of computer-generated-hologram on AMD HD5000 series GPU and OpenCL

GPU-based Fast Cone Beam CT Reconstruction from Undersampled and Noisy Projection Data via Total Variation

Exploring the Limits of GPUs With Parallel Graph Algorithms

Langevin dynamics simulations of biomolecules on graphics processors

Deterministic Sample Sort For GPUs

GPGPU based simulations for one and two dimensional quantum walks

Measuring Bandwidth for Super Computer Workloads

Enabling a High Throughput Real Time Data Pipeline for a Large Radio Telescope Array with GPUs

GPU-based ultra-fast direct aperture optimization for online adaptive radiation therapy

Real-time volumetric image reconstruction and 3D tumor localization based on a single x-ray projection image for lung cancer radiotherapy

Large-Scale DNS of Gas-Solid Flow on Mole-8.5

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)