high performance computing on graphics processing units: hgpu.org

Tags Results

Authors Results

Posts

Dec, 31

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this […]

OpenCL

Nov, 19

AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management

OpenCL provides a consistent programming model across CPUs, GPUs, and FPGAs. However, to get reasonable performance out of FPGAs, OpenCL programs created for other platforms need to be modified. These modifications are often vendor-specific, limiting the portability of OpenCL programs between devices from different vendors. In this paper, we propose AFOCL: a cross-vendor portable programming […]

OpenCL

Oct, 1

Experience Migrating OpenCL to SYCL: A Case Study on Searches for Potential Off-Target Sites of Cas9 RNA-Guided Endonucleases on AMD GPUs

Cas-OFFinder is a popular application written in OpenCL for searching potential off-target sites in parallel on a GPU. In this work, we describe our experience of migrating the application from OpenCL to SYCL. Evaluating the performance of the OpenCL and SYCL application using human genome sequences shows that the SYCL program could achieve performance portability […]

OpenCL

Sep, 17

Improving the Efficiency of OpenCL Kernels through Pipes

Over the past few years, there has been an increased interest in using FPGAs alongside CPUs and GPUs in high-performance computing systems and data centers. This trend has led to a push toward the use of high-level programming models and libraries, such as OpenCL, both to lower the barriers to the adoption of FPGAs by […]

OpenCL

Aug, 20

Generating Parallel OpenCL and OpenMP Programs from Dataflow Graphs

This thesis describes and analyzes the automatic generation of threads from a sequential MiniC program by translating the program to an equivalent dataflow graph and partitioning this dataflow graph. These threads are generated through different graph partitionings, including splitting the graph into its single nodes and calculating a minimum vertex-disjoint cover. The threads can be […]

OpenCL

Jan, 29

Pulsar search acceleration using FPGAs and OpenCL templates

The Square Kilometre Array (SKA) is the world’s largest radio telescope currently under construction, and will employ elaborate signal processing to detect new pulsars, i.e. highly magnetised rotating neutron stars. This paper addresses the acceleration of demanding computations for this pulsar search on Field-Programmable Gate Arrays (FPGAs) using a new high-level design process based on […]

OpenCL

Jan, 29

Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL

Motion Estimation is one of the main tasks behind any video encoder. It is a computationally costly task; therefore, it is usually delegated to specific or reconfigurable hardware, such as FPGAs. Over the years, multiple FPGA implementations have been developed, mainly using hardware description languages such as Verilog or VHDL. Since programming using hardware description […]

OpenCL

Jan, 22

Efficient OpenCL system integration of non-blocking FPGA accelerators

OpenCL functions as a portability layer for diverse heterogeneous hardware platforms including CPUs, GPUs, FPGAs, and hardware accelerators. However, OpenCL programs utilizing multiple of these devices in the same computing platform suffer from poor coordination between OpenCL implementations of different hardware vendors. This paper proposes a vendor-independent open source method for integrating custom FPGA accelerators […]

OpenCL

Nov, 27

Design Space Exploration of Concurrency Mapping to FPGAs in Weather and Climate Applications with Xilinx SDSoC OpenCL, SDSoC C++ and Vivad

Recent years have seen increased interest from the HPC community in Field Programmable Gate Arrays (FPGAs) as an alternative/additional accelerator. This has been largely due to the slowdown in the transistor scaling and the difficulty of gaining performance improvement and energy efficiency from the current processing solutions. General (scientific) software programmers have shied away from […]

OpenCL

Oct, 23

Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA

Exchanging halo data is a common task in modern scientific computing applications and efficient handling of this operation is critical for the performance of the overall simulation. Tausch is a novel header-only library that provides a simple API for efficiently handling these types of data movements. Tausch supports both simple CPU-only systems, but also more […]

CUDA

•

OpenCL

Oct, 2

An OpenCL-Based FPGA Accelerator for Faster R-CNN

In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s […]

OpenCL

Sep, 4

Towards making the most of NLP-based device mapping optimization for OpenCL kernels

Nowadays, we are living in an era of extreme device heterogeneity. Despite the high variety of conventional CPU architectures, accelerator devices, such as GPUs and FPGAs, also appear in the foreground exploding the pool of available solutions to execute applications. However, choosing the appropriate device per application needs is an extremely challenging task due to […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Tags Results

Authors Results

Posts

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management

Experience Migrating OpenCL to SYCL: A Case Study on Searches for Potential Off-Target Sites of Cas9 RNA-Guided Endonucleases on AMD GPUs

Improving the Efficiency of OpenCL Kernels through Pipes

Generating Parallel OpenCL and OpenMP Programs from Dataflow Graphs

Pulsar search acceleration using FPGAs and OpenCL templates

Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL

Efficient OpenCL system integration of non-blocking FPGA accelerators

Design Space Exploration of Concurrency Mapping to FPGAs in Weather and Climate Applications with Xilinx SDSoC OpenCL, SDSoC C++ and Vivad

Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA

An OpenCL-Based FPGA Accelerator for Faster R-CNN

Towards making the most of NLP-based device mapping optimization for OpenCL kernels

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)