high performance computing on graphics processing units: hgpu.org

Posts

Nov, 22

Interventional 4-D Motion Estimation and Reconstruction of Cardiac Vasculature without Motion Periodicity Assumption

Anatomical and functional information of cardiac vasculature is a key component in the field of interventional cardiology. With the technology of C-arm CT it is possible to reconstruct static intraprocedural 3-D images from angiographic projection data. Current approaches attempt to add the temporal dimension (4-D). In the assumption of periodic heart motion, ECG-gating techniques can […]

CUDA

Nov, 22

A dynamically configurable coprocessor for convolutional neural networks

Convolutional neural networks (CNN) applications range from recognition and reasoning (such as handwriting recognition, facial expression recognition and video surveillance) to intelligent text applications such as semantic text analysis and natural language processing applications. Two key observations drive the design of a new architecture for CNN. First, CNN workloads exhibit a widely varying mix of […]

Nov, 22

A fast stereo matching algorithm suitable for embedded real-time systems

In this paper, the challenge of fast stereo matching for embedded systems is tackled. Limited resources, e.g. memory and processing power, and most importantly real-time capability on embedded systems for robotic applications, do not permit the use of most sophisticated stereo matching approaches. The strengths and weaknesses of different matching approaches have been analyzed and […]

Nov, 22

Optimising GPR modelling: A practical, multi-threaded approach to 3D FDTD numerical modelling

The demand for advanced interpretational tools has lead to the development of highly sophisticated, computationally demanding, 3D GPR processing and modelling techniques. Many of these methods solve very large problems with stepwise methods that utilise numerically similar functions within iterative computational loops. Problems of this nature are readily parallelised by splitting the computational domain into […]

Nov, 22

Fast reduction of undersampling artifacts in radial MR angiography with 3D total variation on graphics hardware

OBJECTIVE: Subsampling of radially encoded MRI acquisitions in combination with sparsity promoting methods opened a door to significantly increased imaging speed, which is crucial for many important clinical applications. In particular, it has been shown recently that total variation (TV) regularization efficiently reduces undersampling artifacts. The drawback of the method is the long reconstruction time […]

CUDA

Nov, 22

Real-time ambient occlusion and halos with summed area tables

Volume models often show high depth complexity. This poses difficulties to the observer in judging the spatial relationships accurately. Illustrators usually use certain techniques such as improving the shading through shadows, halos, or edge darkening in order to enhance depth perception of certain structures. Both effects are difficult to generate in real-time for volumetric models. […]

CUDA

Nov, 22

Reionization Simulations Powered by Graphics Processing Units. I. On the Structure of the Ultraviolet Radiation Field

We present a set of cosmological simulations with radiative transfer in order to model the reionization history of the universe from z = 18 down to z = 6. Galaxy formation and the associated star formation are followed self-consistently with gas and dark matter dynamics using the RAMSES code, while radiative transfer is performed as […]

CUDA

Nov, 22

Scale-dependent and example-based grayscale stippling

We present an example-based approach to synthesizing stipple illustrations for static 2D images that produces scale-dependent results appropriate for an intended spatial output size and resolution. We show how treating stippling as a grayscale process allows us to both produce on-screen output and to achieve stipple merging at medium tonal ranges. At the same time […]

Nov, 22

Field modelling acceleration on ultrasonic systems using graphic hardware

Field modelling is a common practice in the area of ultrasonic non-destructive evaluation (NDE) because it is a useful tool for assessing NDE imaging. However, it is a very time consuming task because of its complexity and data volume, making difficult its use in systems demanding real time responses. Recently, graphics processing units (GPUs) have […]

Nov, 22

Dense photometric stereo reconstruction on many core GPUs

Photometric stereo algorithms are used in many applications for the 3D reconstruction of scenes from a number of 2D images, illuminated by calibrated light sources of different directions. However, the widely used assumption that the direction of the light remains constant across all pixels of the image usually induces reconstruction errors. We propose here a […]

CUDA

Nov, 22

Parallel Position Weight Matrices Algorithms

Position Weight Matrices (PWMs) are broadly used in computational biology. The basic problems, Scan and Multiscan, aim to find all the occurrences of a given PWM or a set of PWMs in long sequences. Some other PWM tasks share a common NP-hard subproblem, ScoreDistribution The existing algorithms rely on the enumeration on a large set […]

CUDA

Nov, 22

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU

Software based decoding of low-density parity-check (LDPC) codes frequently takes very long time, thus the general purpose graphics processing units (GPGPUs) that support massively parallel processing can be very useful for speeding up the simulation. In LDPC decoding, the parity-check matrix H needs to be accessed at every node updating process, and the size of […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Interventional 4-D Motion Estimation and Reconstruction of Cardiac Vasculature without Motion Periodicity Assumption

A dynamically configurable coprocessor for convolutional neural networks

A fast stereo matching algorithm suitable for embedded real-time systems

Optimising GPR modelling: A practical, multi-threaded approach to 3D FDTD numerical modelling

Fast reduction of undersampling artifacts in radial MR angiography with 3D total variation on graphics hardware

Real-time ambient occlusion and halos with summed area tables

Reionization Simulations Powered by Graphics Processing Units. I. On the Structure of the Ultraviolet Radiation Field

Scale-dependent and example-based grayscale stippling

Field modelling acceleration on ultrasonic systems using graphic hardware

Dense photometric stereo reconstruction on many core GPUs

Parallel Position Weight Matrices Algorithms

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)