high performance computing on graphics processing units: hgpu.org

Posts

Oct, 25

OpenCL Performance on the Intel Heterogeneous Architecture Research Platform

The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect […]

OpenCL

Oct, 4

An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs

3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallelize due to communications requirements. We present a distributed OpenCL 3D FFT implementation on Intel Stratix 10 FPGAs for grids up to 128^3. We use FPGA hardware features such as HBM2 memory and multiple 100 Gbps links to provide scalable memory […]

OpenCL

Sep, 27

Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments

In this work, we adapt a reconfigurable computer system based on FPGA technologies to OpenCL programming environments. The reconfigurable system is part of a compute prototype of the MANGO European project that includes 96 FPGAs. To optimize the use and to obtain its maximum performance, it is essential to adapt it to heterogeneous systems programming […]

OpenCL

Jul, 26

Darknet on OpenCL: a multi-platform tool for object detection and classification

The article’s goal is to overview challenges and problems on the way from the state of the art CUDA accelerated neural networks code to multi-GPU code. For this purpose, the authors describe the journey of porting the existing in the GitHub, fully-featured CUDA accelerated Darknet engine to OpenCL. The article presents lessons learned and the […]

CUDA

•

OpenCL

Jul, 26

EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform

Visual semantic segmentation, which is represented by the semantic segmentation network, has been widely used in many fields, such as intelligent robots, security, and autonomous driving. However, these Convolutional Neural Network (CNN)-based networks have high requirements for computing resources and programmability for hardware platforms. For embedded platforms and terminal devices in particular, Graphics Processing Unit […]

OpenCL

Jul, 19

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

Nowadays, embedded systems are comprised of heterogeneous multi-core architectures, i.e., CPUs and GPUs. If the application is mapped to an appropriate processing core, then these architectures provide many performance benefits to applications. Typically, programmers map sequential applications to CPU and parallel applications to GPU. The task mapping becomes challenging because of the usage of evolving […]

OpenCL

Jul, 12

Bayesian inference for artificial perception using OpenCL on FPGAs and GPUs

This dissertation project addresses the implementation of Bayesian inference on FPGAs and GPUs, following a top-down approach and using OpenCL. The target application of this Bayesian inference algorithms is artificial perception in robotics. The aim is to improve the power efficiency of Bayesian inference computations. Previous work at our university in the scope of an […]

OpenCL

Jun, 7

SOFF: An OpenCL High-Level Synthesis Framework for FPGAs

Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor performance and usability. This paper proposes a highlevel synthesis framework of OpenCL for FPGAs, called SOFF. It automatically synthesizes a datapath to execute many OpenCL kernel threads in a pipelined manner. It […]

OpenCL

May, 17

Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study

The open computing language (OpenCL) is a standard open source specification for parallel computing on heterogeneous architectures. OpenCL offers a set of abstract models for substantial acceleration in parallel computing and is supported by most of the leading hardware vendors. In this paper, we present a systematic approach for employing OpenCL as a hardware abstraction […]

OpenCL

Apr, 26

Evaluating FPGA Accelerator Performance with a Parameterized OpenCL Adaptation of the HPCChallenge Benchmark Suite

FPGAs have found increasing adoption in data center applications since a new generation of high-level tools have become available which noticeably reduce development time for FPGA accelerators and still provide high quality of results. There is however no high-level benchmark suite available which specifically enables a comparison of FPGA architectures, programming tools and libraries for […]

OpenCL

Apr, 19

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better […]

CUDA

•

OpenCL

Apr, 19

Design Space Exploration of an OpenCL Based SAXPY Kernel Implementation on FPGAs

High-performance computing researchers are trying to find new options, tools to satisfy the performance criteria of a hardware design. FPGA (Field Programmable Gate Array) is one of the accelerators which is widely used for power-efficient applications due to its reconfigurability and high performance. Traditionally FPGA can be programmed using Hardware Description Language (HDL). Using HDL, […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

OpenCL Performance on the Intel Heterogeneous Architecture Research Platform

An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs

Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments

Darknet on OpenCL: a multi-platform tool for object detection and classification

EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform

A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

Bayesian inference for artificial perception using OpenCL on FPGAs and GPUs

SOFF: An OpenCL High-Level Synthesis Framework for FPGAs

Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study

Evaluating FPGA Accelerator Performance with a Parameterized OpenCL Adaptation of the HPCChallenge Benchmark Suite

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Design Space Exploration of an OpenCL Based SAXPY Kernel Implementation on FPGAs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)