high performance computing on graphics processing units: hgpu.org

Posts

Jul, 26

Homunculus Warping: Conveying importance using self-intersection-free non-homogeneous mesh deformation

Size matters. Human perception most naturally relates relative extent, area or volume to importance, nearness and weight. Reversely, conveying importance of something by depicting it at a different size is a classic artistic principle, in particular when importance varies across a domain. One striking example is the neuronal homunculus; a human figure where the size […]

OpenCL

Jul, 25

Fast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code

The Multi-threaded Adaptive Optics Simulator (MAOS) was developed at TMT to efficiently simulate various kind of AO systems. In particular, it can finish a time step of full end-to-end simulation of an ELT size multi-conjugate AO system in 1 second on 8 contemporary cpu cores. We recently ported it to run on graphical processing units […]

CUDA

Jul, 25

Accelerating Noninvasive Transmural Electrophysiological Imaging with CUDA

The human heart is a vital muscle of the body. Abnormalities in the heart can disrupt its normal operation. One such abnormality that affects the middle layer of the heart wall (myocardium) is called myocardial scars. Just like any tissue in the body, damage to healthy tissue will trigger scar tissue to form. Normally this […]

CUDA

Jul, 25

Remote GPU-Accelerated Online Pre-processing of Raster Maps for Terrain Rendering

We present a distributed architecture for accelerated pre-processing of remote sensing data for immediate terrain visualization. Interactive 3D visualization approaches for large terrain datasets employ level of detail techniques that require a multi-resolution data representation. The high computational cost of constructing these representations is often not viewed as a major drawback, as it is considered […]

CUDA

Jul, 25

Optimising Cosmological N-body Simulations in GPU Clusters

Cosmological simulations play an important role in understanding the evolution of our universe. Since the experiments on the formation of galaxies cannot be performed in laboratory, simulation is the only way to understand this phenomenon. The cosmological simulations are usually modelled as N-body problems. The Barnes-Hut (BH) tree code algorithm is one of the popular […]

CUDA

Jul, 25

Ice Simulation Using GPGPU

Simulation of the behaviour of a ship operating in pack ice is a computationally intensive process to which General Purpose Computing on Graphical Processing Units (GPGPU) can be applied. In this paper we present an efficient parallel implementation of such a simulator developed using the NVIDIA Compute Unified Device Architecture (CUDA). We have conducted an […]

CUDA

Jul, 24

Source-to-source transformations for irregular and multithreaded code optimization

Source-to-Source optimization is an efficient method to generate, from a basic implementation, a high performance program for the two main challenges that are irregular codes and heterogeneous implementation. In the last decade, general purpose CPUs moved towards multi-core architectures, and the end of the increase in processors frequency marked a turning point obtaining the best […]

CUDA

Jul, 24

Evaluation of state-of-the-art polyhedral tools for automatic code generation on GPUs

At present, multi-core and manycore platforms lead the computer industry, forcing software developers to adopt new programming paradigms, in order to fully exploit their computing capabilities. Nowadays, Graphics Processing Units (GPUs) are one of representatives of many-core architectures, and certainly the most widespread. This paper evaluates and compares tool frameworks that automatically generate code for […]

CUDA

Jul, 24

Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

Processing vast numbers of data streams is a common problem in modern computer systems and is known as the "online big data problem." Adding hard real-time constraints to the processing makes the scheduling problem a very challenging task that this paper aims to address. In such an environment, each data stream is manipulated by a […]

CUDA

Jul, 24

A Splitting Algorithm for Directional Regularization and Sparsification

We present a new split-type algorithm for the minimization of a p-harmonic energy with added data fidelity term. The half-quadratic splitting reduces the original problem to two straightforward problems, that can be minimized efficiently. The minimizers to the two sub-problems can typically be computed pointwise and are easily implemented on massively parallel processors. Furthermore the […]

CUDA

Jul, 24

A Reconfigurable GPU Implementation for Tomlinson-Harashima Precoding

Fast parallel processing capability of general purpose Graphic Processing Units (GPU) can be exploited to accelerate the precoding calculation needed in spatially multiplexed wireless communication systems. In this paper, a GPU-based implementation of the well-known multiuser TomlinsonHarashima precoding (THP) scheme combined with a latticereduction (LR) stage is presented. The proposed approach allows the LR stage […]

CUDA

Jul, 23

LBCL: multi-device automatic load balancing

This paper presents the Load Balancing for OpenCL (lbcl) library, devoted to automatically solve load balancing issues on both multi-platform and heterogeneous environments. Using this library, a single kernel can be executed on a set of heterogeneous devices, giving each device an amount of work proportional to its computing power. A wrapper has been developed […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Homunculus Warping: Conveying importance using self-intersection-free non-homogeneous mesh deformation

Fast End-to-End Multi-Conjugate AO Simulations Using Graphical Processing Units and the MAOS Simulation Code

Accelerating Noninvasive Transmural Electrophysiological Imaging with CUDA

Remote GPU-Accelerated Online Pre-processing of Raster Maps for Terrain Rendering

Optimising Cosmological N-body Simulations in GPU Clusters

Ice Simulation Using GPGPU

Source-to-source transformations for irregular and multithreaded code optimization

Evaluation of state-of-the-art polyhedral tools for automatic code generation on GPUs

Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

A Splitting Algorithm for Directional Regularization and Sparsification

A Reconfigurable GPU Implementation for Tomlinson-Harashima Precoding

LBCL: multi-device automatic load balancing

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)