high performance computing on graphics processing units: hgpu.org

Posts

Aug, 11

A Case Study in Using OpenCL on FPGAs: Creating an Open-Source Accelerator of the AutoDock Molecular Docking Software

In recent years, OpenCL has been increasingly adopted as it enables software programmers to harness the performance and power efficiency of FPGAs. Despite simplifying the FPGA programming challenge, achieving high performance and energy efficiency with OpenCL is still a difficult task. In order to further contribute to the advance of the OpenCL usage for FPGAs, […]

OpenCL

Aug, 5

OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices

Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in terms of housekeeping tasks (device selection and initialization, data streaming, synchronization with the CPU and others), which may hinder […]

OpenCL

Jul, 28

Optimization of OpenCL applications on FPGA

Since Moore’s Law is over, specialized accelerators have becoming more and more trending over the years. FPGA is one of this accelerators and their "reconfigurable hardware" capabilities make it really promising. FPGA are programmed with HDL languages which is hard and time-consuming so many high-level alternatives (such HLS, OpenCL, SystemC, …) have emerged to provide […]

OpenCL

Jul, 7

An Efficient Dispatcher for Large Scale GraphProcessing on OpenCL-based FPGAs

High parallel framework has been proved to be very suitable for graph processing. There are various work to optimize the implementation in FPGAs, a pipeline parallel device. The key to make use of the parallel performance of FPGAs is to process graph data in pipeline model and take advantage of on-chip memory to realize necessary […]

OpenCL

Jul, 5

Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL

Discovering identical or near-identical items is urgently important in many applications such as Web crawling since it drastically reduces the text processing costs. Simhash is a widely used technique, able to attribute a bit-string identity to a text, such that similar texts have similar identities. In this study, a real-time solution for a simhash calculation […]

OpenCL

Jun, 28

Improving tasks throughput on accelerators using OpenCL command concurrency

A heterogeneous architecture composed by a host and an accelerator must frequently deal with situations where several independent tasks are available to be offloaded onto the accelerator. These tasks can be generated by concurrent applications executing in the host or, in case the host is a node of a computer cluster, by applications running on […]

OpenCL

Jun, 17

Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL

Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometre Array (SKA) as acceleration hardware. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalised hardware, for cross-discipline interoperability and to achieve fast […]

OpenCL

Jun, 17

Acceleration of k-Nearest Neighbor and SRAD Algorithms Using Intel FPGA SDK for OpenCL

Field Programmable Gate Arrays (FPGAs) have been widely used for accelerating machine learning algorithms. However, the high design cost and time for implementing FPGA-based accelerators using traditional HDL-based design methodologies has discouraged users from designing FPGA-based accelerators. In recent years, a new CAD tool called Intel FPGA SDK for OpenCL (IFSO) allowed fast and efficient […]

OpenCL

Jun, 13

Efficient Large-scale Approximate Nearest Neighbor Search on OpenCL FPGA

We present a new method for Product Quantization (PQ) based approximated nearest neighbor search (ANN) in high dimensional spaces. Specifically, we first propose a quantization scheme for the codebook of coarse quantizer, product quantizer, and rotation matrix, to reduce the cost of accessing these codebooks. Our approach also combines a highly parallel k-selection method, which […]

OpenCL

Jun, 2

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

CPU has insufficient resources to satisfy the efficient computation of the Convolution Neural Network (CNN), especially for embedded applications. Therefore, heterogeneous computing platforms are widely used to accelerate CNN tasks, such as GPU, FPGA and ASIC. Among these, FPGA can accelerate the computation by mapping the algorithm to the parallel hardware instead of CPU, which […]

OpenCL

Jun, 2

FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

The Square Kilometre Array (SKA) project will be the world largest radio telescope array. With its large number of antennas, the number of signals that need to be processed is dramatic. One important element of the SKA’s Central Signal Processor package is pulsar search. This paper focuses on the FPGA-based acceleration of the Frequency-Domain Acceleration […]

OpenCL

May, 26

OpenCL 2.2 API Specification

Modern processor architectures have embraced parallelism as an important pathway to increased performance. Facing technical challenges with higher clock speeds in a fixed power envelope, Central Processing Units (CPUs) now improve performance by adding multiple cores. Graphics Processing Units (GPUs) have also evolved from fixed function rendering devices into programmable parallel processors. As todays computer […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

A Case Study in Using OpenCL on FPGAs: Creating an Open-Source Accelerator of the AutoDock Molecular Docking Software

OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices

Optimization of OpenCL applications on FPGA

An Efficient Dispatcher for Large Scale GraphProcessing on OpenCL-based FPGAs

Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL

Improving tasks throughput on accelerators using OpenCL command concurrency

Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL

Acceleration of k-Nearest Neighbor and SRAD Algorithms Using Intel FPGA SDK for OpenCL

Efficient Large-scale Approximate Nearest Neighbor Search on OpenCL FPGA

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

OpenCL 2.2 API Specification

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)