high performance computing on graphics processing units: hgpu.org

Posts

Jun, 10

Crane – Fast and Migratable GPU Passthrough for OpenCL applications

General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration between physical hosts. Crane is a new technique for virtualizing OpenCL-based GPGPU computing that achieves within 5.25% of passthrough GPU performance while supporting VM migration. […]

OpenCL

Jun, 5

Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU

Reaching the so-called "performance wall" in 2004 inspired innovative approaches to performance improvement. Parallel programming, distributive computing, and System on a Chip (SOC) design drove change. Hardware acceleration in mainstream computing systems brought significant improvement in the performance of applications targeted directly to a specific hardware platform. Targeting a single hardware platform, however, typically requires […]

OpenCL

Jun, 5

UT-OCL: An OpenCL Framework for Embedded Systems Using Xilinx FPGAs

The number of heterogeneous components on a System-on-Chip (SoC) has continued to increase. Software developers leverage these heterogeneous systems by using high-level languages to enable the execution of applications. For the application to execute correctly, hardware support for features and constructs of the programming model need to be incorporated into the system. OpenCL is a […]

OpenCL

May, 18

CLBlast: A Tuned OpenCL BLAS Library

This work demonstrates how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, […]

OpenCL

May, 11

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator kernels which can be compiled at run-time for a specific architecture, thus enabling portability. However, the prohibitive compilation times (specifically the […]

OpenCL

Apr, 30

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today’s embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to […]

OpenCL

Apr, 26

OpenCL-Based FPGA Accelerator for 3D FDTD with Periodic and Absorbing Boundary Conditions

Finite difference time domain (FDTD) method is a very poplar way of numerically solving partial differential equations. FDTD has a low operational intensity so that the performances in CPUs and GPUs are often restricted by the memory bandwidth. Recently, deeply pipelined FPGA accelerators have shown a lot of success by exploiting streaming data flows in […]

OpenCL

Apr, 26

OpenCL JIT Compilation for Dynamic Programming Languages

Graphics Processor Units (GPUs) are powerful hardware to parallelize and speed-up applications. However, programming these devices is too complex for most users and the existing standards for GPU programming are available only for low-level languages such as C. Dynamic programming languages offer higher abstractions and functionality for many users. GPU programming is possible for dynamic […]

OpenCL

Apr, 20

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, […]

CUDA

•

OpenCL

Apr, 17

Random Finite Set Based Bayesian Filtering with OpenCL in a Heterogeneous Platform

While most filtering approaches based on random finite sets have focused on improving performance, in this paper, we argue that computation times are very important in order to enable real-time applications such as pedestrian detection. Towards this goal, this paper investigates the use of OpenCL to accelerate the computation of random finite set-based Bayesian filtering […]

OpenCL

Apr, 11

A modular GPU raytracer using OpenCL for non-interactive graphics

We describe the development of a modular plugin based raytracer renderer called RenderGirl suitable for running inside the OpenCL framework. We aim to take advantage of heterogeneous computing devices such as GPUs and many-core CPUs, focusing on parallelism. We implemented the traditional partitioning scheme called bounding volume hierarchies, where each scene is hierarchically subdivided into […]

OpenCL

Apr, 3

Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators. As such systems become mainstream, they move away from specialized highperformance single application platforms to a more general setting with multiple, concurrent, application jobs. Determining how jobs should be dynamically best scheduled to heterogeneous devices is non-trivial. In certain cases, performance is […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Crane – Fast and Migratable GPU Passthrough for OpenCL applications

Program Acceleration in a Heterogeneous Computing Environment Using OpenCL, FPGA, and CPU

UT-OCL: An OpenCL Framework for Embedded Systems Using Xilinx FPGAs

CLBlast: A Tuned OpenCL BLAS Library

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

OpenCL-Based FPGA Accelerator for 3D FDTD with Periodic and Absorbing Boundary Conditions

OpenCL JIT Compilation for Dynamic Programming Languages

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Random Finite Set Based Bayesian Filtering with OpenCL in a Heterogeneous Platform

A modular GPU raytracer using OpenCL for non-interactive graphics

Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)