high performance computing on graphics processing units: hgpu.org

Posts

Nov, 25

SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences

BACKGROUND: The Smith-Waterman (SW) algorithm is the best choice for searching similar regions between two DNA or protein sequences. However, it may become impracticable in some contexts due to its high computational demands. Consequently, the computer science community has focused on the use of modern parallel architectures such as Graphics Processing Units (GPUs), Xeon Phi […]

OpenCL

Nov, 3

Power analysis of sorting algorithms on FPGA using OpenCL

With the advent of big data and cloud computing, there is tremendous interest in optimised algorithms and architectures for sorting either using software or hardware. Field Programmable Gate Arrays (FPGAs) are being increasingly used in high end data servers providing a bridge between the flexibility of software and performance benefits of hardware. In this paper […]

OpenCL

Nov, 3

OpenCL Performance Prediction using Architecture-Independent Features

OpenCL is an attractive model for heterogeneous high-performance computing systems, with wide support from hardware vendors and significant performance portability. To support efficient scheduling on HPC systems it is necessary to perform accurate performance predictions for OpenCL workloads on varied compute devices, which is challenging due to diverse computation, communication and memory access characteristics which […]

OpenCL

Oct, 28

High Performance Computing with FPGAs and OpenCL

In this work we evaluate the potential of FPGAs for accelerating HPC workloads as a more power-efficient alternative to GPUs. Using High-Level Synthesis and a large set of optimization techniques, we show that FPGAs can achieve better performance than CPUs, and better power efficiency than both CPUs and GPUs for typical HPC workloads. Furthermore, we […]

OpenCL

Oct, 28

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Heterogeneous computing systems with multiple CPUs and GPUs are increasingly popular. Today, heterogeneous platforms are deployed in many setups, ranging from low-power mobile systems to high performance computing systems. Such platforms are usually programmed using OpenCL which allows to execute the same program on different types of device. Nevertheless, programming such platforms is a challenging […]

OpenCL

Oct, 28

Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering

Automatic compiler phase selection/ordering has traditionally been focused on CPUs and, to a lesser extent, FPGAs. We present experiments regarding compiler phase ordering specialization of OpenCL kernels targeting a GPU. We use iterative exploration to specialize LLVM phase orders on 15 OpenCL benchmarks to an NVIDIA GPU. We analyze the generated NVIDIA PTX code for […]

OpenCL

Oct, 21

Exploiting Task Parallelism with OpenCL: A Case Study

While data parallelism aspects of OpenCL have been of primary interest due to the massively data parallel GPUs being on focus, OpenCL also provides powerful capabilities to describe task parallelism. In this article we study the task parallel concepts available in OpenCL and find out how well the different vendor-specific implementations can exploit task parallelism […]

OpenCL

Oct, 13

Resource Elastic Virtualization for FPGAs using OpenCL

FPGAs are rising in popularity for acceleration in all kinds of systems. However, even in cloud environments, FPGA devices are typically still used exclusively by one application only. To overcome this, and as an approach to manage FPGA resources with OS functionality, this paper introduces the concept of resource elastic virtualization which allows shrinking and […]

OpenCL

Oct, 13

Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels

OpenCL defines a common parallel programming language for all devices, although writing tasks adapted to the devices, managing communication and load-balancing issues are left to the programmer. We propose in this paper a static/dynamic approach for the execution of an iterated sequence of data-dependent kernels on a multi-device heterogeneous architecture. The method allows to automatically […]

OpenCL

Oct, 6

Live Migration for OpenCL FPGA Accelerators

FPGAs are currently being deployed at a large scale across data-centres for various applications because of their performance and power benefits. In particular, the cloud operators have started providing FPGAs as a Service. However, to completely integrate FPGAs in a data-centre environment like standard software systems, support for fault tolerance and task migration is essential. […]

OpenCL

Sep, 16

ZUCL: A ZYNQ UltraScale+ Framework for OpenCL HLS Applications

In this work, we are proposing the ZUCL framework for implementing and running OpenCL applications for the latest Xilinx ZYNQ UltraScale+ platform. ZUCL is a holistic framework addressing the FPGA OS infrastructure, high level synthesis (HLS) module implementation as well as the runtime management. ZUCL enables partial reconfiguration (PR) on this platform by providing an […]

OpenCL

Sep, 2

Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Matrix multiplication is one of the fundamental building blocks of numerical linear algebra. It requires computer systems have huge computing capability and consumes much more power as problem size is increased. In this research, an OpenCL-based matrix multiplier is presented. When data are single precision floating-points, compared with the software simulations based on the Intel […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences

Power analysis of sorting algorithms on FPGA using OpenCL

OpenCL Performance Prediction using Architecture-Independent Features

High Performance Computing with FPGAs and OpenCL

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering

Exploiting Task Parallelism with OpenCL: A Case Study

Resource Elastic Virtualization for FPGAs using OpenCL

Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels

Live Migration for OpenCL FPGA Accelerators

ZUCL: A ZYNQ UltraScale+ Framework for OpenCL HLS Applications

Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)