high performance computing on graphics processing units: hgpu.org

Posts

Jan, 19

GPU based Implementation of Film Flicker Reduction Algorithms

In this work we propose an algorithm for film restoration aimed at reducing the flicker effect while preserving the original overall illumination of the film. We also present a comparative study of the performance of this algorithm implemented following a sequential approach on a CPU and following a parallel approach on a GPU using OpenCL.

OpenCL

Jan, 14

Adaptation of an acoustic propagation model to the parallel architecture of a graphics processor

High performance underwater acoustic models are of great importance for enabling real-time acoustic source tracking, geoacoustic inversion, environmental monitoring and high-frequency underwater communications. Given the parallelizable nature of raytracing, in general, and of the ray superposition algorithm in particular, use of multiple computing units for the development of real-time efficient applications based on ray tracing […]

OpenCL

Jan, 14

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Heterogeneous architectures have been widely used in the domain of high performance computing. On one hand, it allows a designer to use multiple types of computing units and each able to execute the tasks that it is best suited for to increase performance; on the other hand, it brings many challenges in programming for novice […]

OpenCL

Jan, 14

Towards Portable Performance for Explicit Hydrodynamics Codes

Significantly increasing intra-node parallelism is widely recognised as being a key prerequisite for reaching exascale levels of computational performance. In future exascale systems it is likely that this performance improvement will be realised by increasing the parallelism available in traditional CPU devices and using massively-parallel hardware accelerators. The MPI programming model is starting to reach […]

OpenCL

Jan, 14

Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

In this paper, we parallelize and optimize the popular feature detection algorithms, i.e. SIFT and SURF, on the latest embedded GPU. Using conventional OpenGL shading language and recently developed OpenCL as the GPGPU software platforms, we compare the implementation efficiency and speed performance between each other as well as between GPU and CPU. Experimental result […]

OpenCL

•

OpenGL

Jan, 6

PySPH: A Python framework for SPH

We present an open source, object oriented framework for Smoothed Particle Hydrodynamics called PySPH. The framework is written in the high level, Python programming language and is designed to be user friendly, flexible and application agnostic. PySPH supports distributed memory computing using the message passing paradigm and (limited) shared memory like parallel processing on hybrid […]

OpenCL

Jan, 5

DEF-G: Declarative Framework for GPU Environment

DEF-G is a declarative language and framework for the efficient generation of OpenCL GPU applications. Using our proof-of-concept DEF-G implementation, run-time and lines-of-code comparisons are provided for three well-known algorithms (Sobel image filtering, breadth-first search and all-pairs shortest path), each evaluated on three different platforms. The DEF-G declarative language and corresponding OpenCL kernels generated complete […]

OpenCL

Jan, 2

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e. for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: […]

OpenCL

Dec, 29

Multi-GPU numerical simulation of electromagnetic waves

In this paper we present three-dimensional numerical simulations of electromagnetic waves. The Maxwell equations are solved by the Discontinuous Galerkin (DG) method. For achieving high performance, we exploit two levels of parallelism. The coarse grain parallelism is managed through MPI and a classical domain decomposition. The fine grain parallelism is managed with OpenCL in order […]

OpenCL

Dec, 22

Speed-Up Improvement Using Parallel Approach in Image Steganography

This paper presents a parallel approach to improve the time complexity problem associated with sequential algorithms. An image steganography algorithm in transform domain is considered for implementation. Image steganography is a technique to hide secret message in an image. With the parallel implementation, large message can be hidden in large image since it does not […]

OpenCL

Dec, 20

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other […]

CUDA

•

OpenCL

Dec, 20

Pannotia: Understanding Irregular GPGPU Graph Applications

GPUs have become popular recently to accelerate general-purpose data-parallel applications. However, most existing work has focused on GPU-friendly applications with regular data structures and access patterns. While a few prior studies have shown that some irregular workloads can also achieve speedups on GPUs, this domain has not been investigated thoroughly. Graph applications are one such […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

GPU based Implementation of Film Flicker Reduction Algorithms

Adaptation of an acoustic propagation model to the parallel architecture of a graphics processor

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Towards Portable Performance for Explicit Hydrodynamics Codes

Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

PySPH: A Python framework for SPH

DEF-G: Declarative Framework for GPU Environment

Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer’s Disease

Multi-GPU numerical simulation of electromagnetic waves

Speed-Up Improvement Using Parallel Approach in Image Steganography

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Pannotia: Understanding Irregular GPGPU Graph Applications

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)