28926

Tags Results

Authors Results

Posts

Dec, 31

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this […]
Nov, 19

AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management

OpenCL provides a consistent programming model across CPUs, GPUs, and FPGAs. However, to get reasonable performance out of FPGAs, OpenCL programs created for other platforms need to be modified. These modifications are often vendor-specific, limiting the portability of OpenCL programs between devices from different vendors. In this paper, we propose AFOCL: a cross-vendor portable programming […]
Oct, 1

Experience Migrating OpenCL to SYCL: A Case Study on Searches for Potential Off-Target Sites of Cas9 RNA-Guided Endonucleases on AMD GPUs

Cas-OFFinder is a popular application written in OpenCL for searching potential off-target sites in parallel on a GPU. In this work, we describe our experience of migrating the application from OpenCL to SYCL. Evaluating the performance of the OpenCL and SYCL application using human genome sequences shows that the SYCL program could achieve performance portability […]
Sep, 17

Improving the Efficiency of OpenCL Kernels through Pipes

Over the past few years, there has been an increased interest in using FPGAs alongside CPUs and GPUs in high-performance computing systems and data centers. This trend has led to a push toward the use of high-level programming models and libraries, such as OpenCL, both to lower the barriers to the adoption of FPGAs by […]
Aug, 20

Generating Parallel OpenCL and OpenMP Programs from Dataflow Graphs

This thesis describes and analyzes the automatic generation of threads from a sequential MiniC program by translating the program to an equivalent dataflow graph and partitioning this dataflow graph. These threads are generated through different graph partitionings, including splitting the graph into its single nodes and calculating a minimum vertex-disjoint cover. The threads can be […]
Jan, 29

Pulsar search acceleration using FPGAs and OpenCL templates

The Square Kilometre Array (SKA) is the world’s largest radio telescope currently under construction, and will employ elaborate signal processing to detect new pulsars, i.e. highly magnetised rotating neutron stars. This paper addresses the acceleration of demanding computations for this pulsar search on Field-Programmable Gate Arrays (FPGAs) using a new high-level design process based on […]
Jan, 29

Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL

Motion Estimation is one of the main tasks behind any video encoder. It is a computationally costly task; therefore, it is usually delegated to specific or reconfigurable hardware, such as FPGAs. Over the years, multiple FPGA implementations have been developed, mainly using hardware description languages such as Verilog or VHDL. Since programming using hardware description […]
Jan, 22

Efficient OpenCL system integration of non-blocking FPGA accelerators

OpenCL functions as a portability layer for diverse heterogeneous hardware platforms including CPUs, GPUs, FPGAs, and hardware accelerators. However, OpenCL programs utilizing multiple of these devices in the same computing platform suffer from poor coordination between OpenCL implementations of different hardware vendors. This paper proposes a vendor-independent open source method for integrating custom FPGA accelerators […]
Nov, 27

Design Space Exploration of Concurrency Mapping to FPGAs in Weather and Climate Applications with Xilinx SDSoC OpenCL, SDSoC C++ and Vivad

Recent years have seen increased interest from the HPC community in Field Programmable Gate Arrays (FPGAs) as an alternative/additional accelerator. This has been largely due to the slowdown in the transistor scaling and the difficulty of gaining performance improvement and energy efficiency from the current processing solutions. General (scientific) software programmers have shied away from […]
Oct, 23

Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA

Exchanging halo data is a common task in modern scientific computing applications and efficient handling of this operation is critical for the performance of the overall simulation. Tausch is a novel header-only library that provides a simple API for efficiently handling these types of data movements. Tausch supports both simple CPU-only systems, but also more […]
Oct, 2

An OpenCL-Based FPGA Accelerator for Faster R-CNN

In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s […]
Sep, 4

Towards making the most of NLP-based device mapping optimization for OpenCL kernels

Nowadays, we are living in an era of extreme device heterogeneity. Despite the high variety of conventional CPU architectures, accelerator devices, such as GPUs and FPGAs, also appear in the foreground exploding the pool of available solutions to execute applications. However, choosing the appropriate device per application needs is an extremely challenging task due to […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: