high performance computing on graphics processing units: hgpu.org

Posts

Jul, 10

FPGA Implementation of Bluetooth Low Energy Physical Layer with OpenCL

This dissertation is primarily presenting the design of Digital Signal Processing (DSP) between the transmission in Bluetooth Low Energy Physical Layer (BLE PHY), and its implementation in a Field Programmable Gate Array (FPGA) device with Open Computing Language (OpenCL). During the design of DSP, it bases on the In-Phase/Quadrature-Phase (IQ) architecture to construct the modulation […]

OpenCL

Apr, 17

Performance Comparison of Different OpenCL Implementations of LBM Simulation on Commodity Computer Hardware

Parallel programming is increasingly used to improve the performance of solving numerical methods used for scientific purposes. Numerical methods in the field of fluid dynamics require the calculation of a large number of operations per second. One of the methods that is easily parallelized and often used is the Lattice Boltzmann method (LBM). Today, it […]

OpenCL

Feb, 20

A ML-based resource utilization OpenCL GPU-kernel fusion model

Massive data parallelism can be achieved by using general-purpose graphics processing units (GPGPU) with the help of the OpenCL framework. When smaller data with higher GPU memory is executed, it results in a low resource utilization ratio and energy inefficiencies. Up until now, there is no existing model to share GPU for further execution. In […]

OpenCL

Jan, 16

Fancier: A Unified Framework for Java, C, and OpenCL Integration

Graphics Processing Units (GPUs) have evolved from very specialized designs geared towards computer graphics to accommodate general-purpose highly-parallel workloads. Harnessing the performance that these accelerators provide requires the use of specialized native programming interfaces, such as CUDA or OpenCL, or higher-level programming models like OpenMP or OpenACC. However, on managed programming languages, offloading execution into […]

OpenCL

Jan, 16

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

High Level Synthesis (HLS) tools, like the Intel FPGA SDK for OpenCL, improve design productivity and enable efficient design space exploration guided by simple program directives (pragmas), but may sometimes miss important optimizations necessary for high performance. In this paper, we present a study of the tradeoffs in HLS optimizations, and the potential of a […]

OpenCL

Dec, 26

FSpGEMM: An OpenCL-based HPC Framework for Accelerating General Sparse Matrix-Matrix Multiplication on FPGAs

General sparse matrix-matrix multiplication (SpGEMM) is an integral part of many scientific computing, high-performance computing (HPC), and graph analytic applications. This paper presents a new compressed sparse vector (CSV) format for representing sparse matrices and FSpGEMM, an OpenCL-based HPC framework for accelerating general sparse matrix-matrix multiplication on FPGAs. The proposed FSpGEMM framework includes an FPGA […]

OpenCL

Dec, 26

OpenCL-HPX Integration

Distributed applications combine the computational capabilities of heterogeneous nodes. As such, they offer challenges regarding data transfer and synchronization. HPX is a library for concurrent, parallel applications. It strives not only to address challenges regarding distributed systems, but also to conform to current and upcoming C++ standards. One of the solutions found in heterogeneous systems […]

OpenCL

Dec, 19

Optimization of Compiler-generated OpenCL CNN Kernels and Runtime for FPGAs

This work explores the viability of end-to-end convolutional neural network inference using OpenCL HLS kernels generated from TVM on Intel FPGAs. We explore layer-pipelined execution for small networks and time-multiplexed kernels for larger CNNs. Naively generated kernels do not produce efficient hardware. We propose a set of optimizations to increase parallelism, resource utilization, and more […]

OpenCL

Nov, 28

Concurrency Mapping to FPGAs with OpenCL: A Case Study with a Shallow Water Kernel

FPGAs have been around for over 30 years and are a viable accelerator for compute-intensive workloads on HPC systems. The adoption of FPGAs for scientific applications has been stimulated recently by the emergence of better programming environments such as High-Level Synthesis (HLS) and OpenCL available through the Xilinx SDSoC design tool. The mapping of the […]

OpenCL

Aug, 8

PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime

Offloading the most demanding parts of applications to an edge GPU server cluster to save power or improve the result quality is a solution that becomes increasingly realistic with new networking technologies. In order to make such a computing scheme feasible, an application programming layer that can provide both low latency and scalable utilization of […]

OpenCL

Jul, 25

A method for decompilation of AMD GCN kernels to OpenCL

Introduction: Decompilers are useful tools for software analysis and support in the absence of source code. They are available for many hardware architectures and programming languages. However, none of the existing decompilers support modern AMD GPU architectures such as AMD GCN and RDNA. Purpose: We aim at developing the first assembly decompiler tool for a […]

OpenCL

Jul, 18

OpenCL FPGA Optimization guided by memory accesses and roofline model analysis applied to tomography acceleration

Backward projection is one of the most time-consuming steps in method-based iterative reconstruction computed tomography. The 3D backprojection memory access pattern is potentially enough regular to exploit efficiently the computation power of acceleration boards based on GPU or FPGA. The highlevel tools like HLS or OpenCL ease consider such particular memory accesses during the design […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

FPGA Implementation of Bluetooth Low Energy Physical Layer with OpenCL

Performance Comparison of Different OpenCL Implementations of LBM Simulation on Commodity Computer Hardware

A ML-based resource utilization OpenCL GPU-kernel fusion model

Fancier: A Unified Framework for Java, C, and OpenCL Integration

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

FSpGEMM: An OpenCL-based HPC Framework for Accelerating General Sparse Matrix-Matrix Multiplication on FPGAs

OpenCL-HPX Integration

Optimization of Compiler-generated OpenCL CNN Kernels and Runtime for FPGAs

Concurrency Mapping to FPGAs with OpenCL: A Case Study with a Shallow Water Kernel

PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime

A method for decompilation of AMD GCN kernels to OpenCL

OpenCL FPGA Optimization guided by memory accesses and roofline model analysis applied to tomography acceleration

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)