high performance computing on graphics processing units: hgpu.org

Posts

Dec, 12

Inter-block synchronization on a GPGPU

With the invention of multi-core processing unit technology, the graphics processing unit has evolved from single core graphic processing unit to multi-core programmable graphics processing units. Because of the GPUs’ architecture, people found that it is not only good at processing graphics related data, but also suitable for performing general purpose parallel computations. However, since […]

OpenCL

Dec, 6

Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments

Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the […]

CUDA

•

OpenCL

Dec, 4

Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. […]

CUDA

Dec, 3

Real-time High Resolution Fusion of Depth Maps on GPU

A system for live high quality surface reconstruction using a single moving depth camera on a commodity hardware is presented. High accuracy and real-time frame rate is achieved by utilizing graphics hardware computing capabilities via OpenCL and by using sparse data structure for volumetric surface representation. Depth sensor pose is estimated by combining serial texture […]

OpenCL

Nov, 28

The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?

Recently, parallel programming has become necessary in order to obtain performance gains, primarily due to power limitations. However parallel architectures differ substantially from each other, often require specialized knowledge, and typically necessitate reimplementation and fine tuning of application code. These slow tasks frequently result in situations where most of the time is spent reimplementing old […]

OpenCL

Nov, 27

Regression Modelling of Power Consumption for Heterogeneous Processors

This thesis is composed of two parts, that relate to both parallel and heterogeneous processing. The first describes DistCL, a distributed OpenCL framework that allows a cluster of GPUs to be programmed like a single device. It uses programmer-supplied meta-functions that associate work-items to memory. DistCL achieves speedups of up to 29x using 32 peers. […]

OpenCL

Nov, 23

Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures

With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets and smartphones constitute the vast majority of hardware platforms used for displaying JPEG images. Despite the fact that these platforms are heterogeneous multicores, no approach exists yet that is capable […]

OpenCL

Nov, 22

Accelerating Sequential Computer Vision Algorithms Using Commodity Parallel Hardware

Since 2004, the clock frequency of CPUs has not increased significantly. Computer Vision applications have an increasing demand for more processing power and are limited by the performance capabilities of sequential processor architectures. The only way to get better performance using commodity hardware is to adopt parallel programming. Many other related research projects have considered […]

OpenCL

Nov, 20

Multi-GPU Support on the Marrow Algorithmic Skeleton Framework

With the proliferation of general purpose GPUs, workload parallelization and datatransfer optimization became an increasing concern. The natural evolution from using a single GPU, is multiplying the amount of available processors, presenting new challenges, as tuning the workload decompositions and load balancing, when dealing with heterogeneous systems. Higher-level programming is a very important asset in […]

CUDA

•

OpenCL

Nov, 19

Adaptive implementation selection in the SkePU skeleton programming library

In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for […]

CUDA

•

OpenCL

Nov, 18

Specification and verification of GPGPU programs

Graphics Processing Units (GPUs) are increasingly used for general-purpose applications because of their low price, energy efficiency and enormous computing power. Considering the importance of GPU applications, it is vital that the behaviour of GPU programs can be specified and proven correct formally. This paper presents a logic to verify GPU kernels written in OpenCL, […]

OpenCL

Nov, 12

Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method

We present Sailfish, an open source fluid simulation package implementing the lattice Boltzmann method (LBM) on modern Graphics Processing Units (GPUs) using CUDA/OpenCL. We take a novel approach to GPU code implementation and use run-time code generation techniques and a high level programming language (Python) to achieve state of the art performance, while allowing easy […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Inter-block synchronization on a GPGPU

Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments

Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

Real-time High Resolution Fusion of Depth Maps on GPU

The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?

Regression Modelling of Power Consumption for Heterogeneous Processors

Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures

Accelerating Sequential Computer Vision Algorithms Using Commodity Parallel Hardware

Multi-GPU Support on the Marrow Algorithmic Skeleton Framework

Adaptive implementation selection in the SkePU skeleton programming library

Specification and verification of GPGPU programs

Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)