high performance computing on graphics processing units: hgpu.org

Posts

Jun, 24

An algorithm for efficient computation of spatial impulse response on the GPU with application in ultrasound simulation

Computation of the spatial impulse response (SIR) is a time-consuming but fundamental step in the computation of the linear ultrasonic fields in homogeneous media and the scattering fields in the presence of non-homogeneity. In this paper, we present a new algorithm for the computation of the SIR which is suitable for parallelization on massively multiprocessing […]

CUDA

Jun, 24

Design and implementation of a time-division multiplexing scan architecture using serializer and deserializer in GPU chips

We present the design and implementation details of a time-division demultiplexing/multiplexing based scan architecture using serializer/deserializer. This is one of the key DFT features implemented on NVIDIA’s Fermi family GPU (Graphic Processing Unit) chips. We provide a comprehensive description on the architecture and specifications. We also depict a compact serializer/deserializer module design, test timing consideration, […]

Jun, 24

GPU-Based Parallel Signature Scanning and Hash Generation

Today, nearly every user of electronic devices is affected by threats. Computer viruses infect harmless programs and change the function of that program. One means against these threats is a virus scanner, searching for signatures of known viruses within code and/or data. In this work, we present a novel approach to on-line virus scanning and […]

CUDA

Jun, 23

Efficiently GPU-accelerating long kernel convolutions in 3-D DIRECT TOF PET reconstruction via a kernel decomposition scheme

The DIRECT approach for 3-D Time-of-Flight (TOF) PET reconstruction performs all iterative predictor-corrector operations directly in image space. A computational bottleneck here is the convolution with the long TOF (resolution) kernels. Accelerating this convolution operation using GPUs is very important especially for spatially variant resolution kernels, which cannot be efficiently implemented in the Fourier domain. […]

Jun, 23

Real-time arbitrary view rendering on GPU from stereo video and time-of-flight camera

Generating in-between images from multiple views of a scene is a crucial task for both computer vision and computer graphics fields. Photorealistic rendering, 3DTV and robot navigation are some of many applications which benefit from arbitrary view synthesis, if it is achieved in real-time. GPUs excel in achieving high computation power by processing arrays of […]

Jun, 23

Rapid star map simulation based on GPU

Proposed and implemented a rapid star map simulation method based on GPU. The method used the vertex shaders of OpenGL shading language to perform real-time calculation of star positions. Combined with the display list technology, the results of calculation were used to perform the simulation of the initial star maps, then the initial star maps […]

OpenGL

Jun, 23

Fast implementation of fully iterative scatter corrected OSEM for HRRT using GPU

Accurate scatter correction is especially important for high-resolution 3D PETs due to the lack of inter-slice septa. To address this problem, a fully 3D iterative scatter-corrected OSEM in which a 3D single scatter simulation (SSS) is alternatively performed with a 3D OSEM reconstruction until convergence was recently proposed. However, due to the computational complexity of […]

CUDA

Jun, 23

LOD Terrain Rendering by Local Parallel Processing on GPU

In this paper, we present a new technique for highly efficient terrain rendering using continuous view-dependent Level-of-Detail based on hardware tessellation unit found in modern GPUs. Our technique is based on parallel local processing, in the sense that the results at each terrain patch do not depend on results already obtained at other patches. This […]

Jun, 23

Case study: Runtime reduction of a buffer insertion algorithm using GPU parallel programming

In this paper, we present a case study on runtime reduction of VLSI CAD programs using parallel computing. Specifically, we parallelize a buffer insertion algorithm that minimizes power dissipation. We choose Graphic Processing Units (GPUs) as the low-cost hardware that supports parallel computing. We redesign the algorithm data structure to accommodate GPU based computing. As […]

Jun, 23

GCS: High-Performance Gate-Level Simulation with GP-GPUs

In recent years, the verification of digital designs has become one of the most challenging, time consuming and critical tasks in the entire hardware development process. Within this area, the vast majority of the verification effort in industry relies on logic simulation tools. However, logic simulators deliver limited performance when faced with vastly complex modern […]

CUDA

Jun, 23

GPU-accelerated fault simulation and its new applications

GPUs have recently been explored as a new general-purpose computing platform, which are suitable for the acceleration of compute-intensive EDA applications. In this paper we describe a GPU-based one- to n-detection fault simulator for both stuck-at and transition faults, which demonstrates a 20X speedup over a commercial CPU-based fault simulator. We further show new fault-simulation-based […]

Jun, 23

GPU Implementation of the LFT Shape Matching Algorithm

Registration of partial scan data sets is still a challenge for today’s CAD systems and CAD system users. Many of the known methods rely on user interaction or feature recognition. For non-regular users this is too time consuming and error prone. The paper describes a method to register partial scan data by fitting a large […]

CUDA

high performance computing on graphics processing units: hgpu.org

Posts

An algorithm for efficient computation of spatial impulse response on the GPU with application in ultrasound simulation

Design and implementation of a time-division multiplexing scan architecture using serializer and deserializer in GPU chips

GPU-Based Parallel Signature Scanning and Hash Generation

Efficiently GPU-accelerating long kernel convolutions in 3-D DIRECT TOF PET reconstruction via a kernel decomposition scheme

Real-time arbitrary view rendering on GPU from stereo video and time-of-flight camera

Rapid star map simulation based on GPU

Fast implementation of fully iterative scatter corrected OSEM for HRRT using GPU

LOD Terrain Rendering by Local Parallel Processing on GPU

Case study: Runtime reduction of a buffer insertion algorithm using GPU parallel programming

GCS: High-Performance Gate-Level Simulation with GP-GPUs

GPU-accelerated fault simulation and its new applications

GPU Implementation of the LFT Shape Matching Algorithm

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)