high performance computing on graphics processing units: hgpu.org

Posts

Aug, 20

Parallel 3D multigrid methods on the STI cell BE architecture

The STI Cell Broadband Engine (BE) is a highly capable heterogeneous multicore processor with large bandwidth and computing power perfectly suited for numerical simulation. However, all performance benefits come at the price of productivity since more responsibility is put to the programmer. In particular, programming with the IBM Cell SDK is hampered by not only […]

OpenCL

Aug, 19

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

In this paper we present the design and implementation of a dynamic instrumentation infrastructure for PTX programs that procedurally transforms kernels and manages related data structures. We show how performing instrumentation within the GPU Ocelot dynamic compiler infrastructure provides unique capabilities not available to other profiling and instrumentation toolchains for GPU computing. We demonstrate the […]

CUDA

•

OpenCL

Aug, 19

A balanced programming model for emerging heterogeneous multicore systems

Computer systems are moving towards a heterogeneous architecture with a combination of one or more CPUs and one or more accelerator processors. Such heterogeneous systems pose a new challenge to the parallel programming community. Languages such as OpenCL and CUDA provide a program environment for such systems. However, they focus on data parallel programming where […]

OpenCL

Aug, 19

Extending abstract GPU APIs to shared memory

Parallel programming is used extensively for general-purpose computations. However, performance of parallel APIs varies for a given problem and a given architecture. This gives rise to the need for having an abstract way to express the parallel problems. This poster presents a new approach through which programmers can access these APIs without having to focus […]

CUDA

•

OpenCL

Aug, 19

A framework for lab-based real-time video analysis on distributed camera networks

In the field of video analytics for surveillance, the trend towards the use of multi-camera and high definition video is increasing. This poses significant technical challenges in terms of video transmission and real-time processing for surveillance analytics, such as people recognition and tracking. Currently, available solutions are typically proprietary commercial systems which are costly to […]

OpenCL

Aug, 19

A cluster for CS education in the manycore era

Traditional Beowulf clusters have been homogeneous platforms for distributed-memory MIMD parallelism. However, the shift to multicore architectures has made shared-memory MIMD parallelism increasingly important, and inexpensive manycore GPGPUs have revived SIMD parallelism. This paper presents a case study in designing and building a heterogeneous cluster as a learning platform for tera-scale distributed- and shared-memory MIMD […]

CUDA

•

OpenCL

Aug, 19

Benchmarking and modelling of POWER7, Westmere, BG/P, and GPUs: an industry case study

This paper introduces an industry strength, multi-purpose, benchmark: Shamrock. Developed at the Atomic Weapons Establishment (AWE), Shamrock is a two dimensional (2D) structured hydrocode; one of its aims is to assess the impacts of a change in hardware, and (in conjunction with a larger HPC Benchmark Suite) to provide guidance in procurement of future systems. […]

OpenCL

Aug, 19

Real-time rendering and dynamic updating of 3-d volumetric data

A dense 3-d terrain model obtained using reconstruction methods from aerial images is represented in a probabilistic volumetric framework. The choice of probabilistic representation is to represent inherent ambiguity in reconstruction of surface from images. Such probabilistic representation handles the ambiguity very well but leads to expensive dense volumetric storage. The area coverage required for […]

OpenCL

Aug, 19

Caracal: dynamic translation of runtime environments for GPUs

Graphics Processing Units (GPU) have become the platform of choice for accelerating a large range of data parallel and task parallel applications. Both AMD and NVIDIA have developed GPU implementations targeted at the high performance computing market. The rapid adoption of GPU computing has been greatly aided by the introduction of high-level programming environments such […]

CUDA

•

OpenCL

Aug, 19

Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

SkePU is a C++ template library that provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. […]

CUDA

•

OpenCL

Aug, 19

Frameworks for multi-core architectures: a comprehensive evaluation using 2D/3D image registration

The development of standard processors changed in the last years moving from bigger, more complex, and faster cores to putting several more simple cores onto one chip. This changed also the way programs are written in order to leverage the processing power of multiple cores of the same processor. In the beginning, programmers had to […]

OpenCL

Aug, 18

SkePU: a multi-backend skeleton programming library for multi-GPU systems

We present SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel 3D multigrid methods on the STI cell BE architecture

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

A balanced programming model for emerging heterogeneous multicore systems

Extending abstract GPU APIs to shared memory

A framework for lab-based real-time video analysis on distributed camera networks

A cluster for CS education in the manycore era

Benchmarking and modelling of POWER7, Westmere, BG/P, and GPUs: an industry case study

Real-time rendering and dynamic updating of 3-d volumetric data

Caracal: dynamic translation of runtime environments for GPUs

Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

Frameworks for multi-core architectures: a comprehensive evaluation using 2D/3D image registration

SkePU: a multi-backend skeleton programming library for multi-GPU systems

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)