high performance computing on graphics processing units: hgpu.org

Posts

Apr, 12

Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Hardware designers use High-Level Synthesis (HLS) tools in order to reduce the design time and design complexity. OpenCL is a framework that uses HLS tools and permits the programmer to write standardized C-like code for the host as well as for the hardware accelerators. Using OpenCL, a program can be written using different memory access […]

OpenCL

Mar, 15

Abstracting OpenCL for Multi-Application Workloads on CPU-FPGA Clusters

Field-programmable gate arrays (FPGAs) continue to see integration in data centres, where customized hardware accelerators provide improved performance for cloud workloads. However, existing programming models for such environments typically require a manual assignment of application tasks between CPUs and FPGA-based accelerators. Furthermore, coordinating the execution of tasks from multiple applications necessitates the use of a […]

OpenCL

Mar, 15

Automated test generation for OpenCL kernels using fuzzing and constraint solving

Graphics Processing Units (GPUs) are massively parallel processors offering performance acceleration and energy efficiency unmatched by current processors (CPUs) in computers. These advantages along with recent advances in the programmability of GPUs have made them attractive for general-purpose computations. Despite the advances in programmability, GPU kernels are hard to code and analyse due to the […]

OpenCL

Mar, 15

Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL

When considering different hardware platforms, not just the time-to-solution can be of importance but also the energy necessary to reach it. This is not only the case with battery powered and mobile devices but also with high-performance parallel cluster systems due to financial and practical limits on power consumption and cooling. Recent developments in hard- […]

OpenCL

Mar, 8

Solving convex optimization problems on FPGA using OpenCL

The application of accelerators in HPC applications has seen enormous growth in the last decade. In the field of HPC demands on throughput are steadily growing. Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization […]

OpenCL

Mar, 1

Evaluating the Energy Efficiency of OpenCL-accelerated AutoDock Molecular Docking

AUTODOCK is a molecular docking application that consists of a genetic algorithm coupled with the Solis-Wets localsearch method. Despite its wide usage, its power consumption on heterogeneous systems has not been evaluated extensively. In this work, we evaluate the energy efficiency of an OpenCL-accelerated version of AUTODOCK that, along with the traditional SolisWets method, newly […]

OpenCL

Feb, 23

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective. This allows us to reach similar, or even higher, […]

OpenCL

Feb, 9

MKPipe: A Compiler Framework for Optimizing Multi-Kernel Workloads in OpenCL for FPGA

OpenCL for FPGA enables developers to design FPGAs using a programming model similar for processors. Recent works have shown that code optimization at the OpenCL level is important to achieve high computational efficiency. However, existing works either focus primarily on optimizing single kernels or solely depend on channels to design multi-kernel pipelines. In this paper, […]

OpenCL

Feb, 2

Optimization of a discontinuous Galerkin solver with OpenCL and StarPU

Since the recent advance in microprocessor design, the optimization of computing software becomes more and more technical. One of the difficulties is to transform sequential algorithms into parallel ones. A possible solution is the task-based design. In this approach, it is possible to describe the parallelization possibilities of the algorithm automatically. The task-based design is […]

OpenCL

Feb, 2

Noise Removal from Remote Sensed Images by NonLocal Means with OpenCL Algorithm

We introduce a multi-platform portable implementation of the NonLocal Means methodology aimed at noise removal from remotely sensed images. It is particularly suited for hyperspectral sensors for which real-time applications are not possible with only CPU based algorithms. In the last decades computational devices have usually been a compound of cross-vendor sets of specifications (heterogeneous […]

OpenCL

Jan, 19

Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL

The trend of increasingly model size in Deep Neural Network (DNN) algorithms boost the performance of visual recognition tasks. These gains in performance have come at a cost of increase in computational complexity and memory bandwidth. Recent studies have explored the fixed-point implementation of DNN algorithms such as AlexNet and VGG on Field Programmable Gate […]

OpenCL

Jan, 5

LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs an attractive platform for power-efficient high-performance execution of massively parallel applications. At the same time, new design challenges emerge for massive thread-level parallelism on FPGAs. One major execution bottleneck is the high number of memory stalls exposed to data-path which overshadows the benefits of data-path […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Abstracting OpenCL for Multi-Application Workloads on CPU-FPGA Clusters

Automated test generation for OpenCL kernels using fuzzing and constraint solving

Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL

Solving convex optimization problems on FPGA using OpenCL

Evaluating the Energy Efficiency of OpenCL-accelerated AutoDock Molecular Docking

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

MKPipe: A Compiler Framework for Optimizing Multi-Kernel Workloads in OpenCL for FPGA

Optimization of a discontinuous Galerkin solver with OpenCL and StarPU

Noise Removal from Remote Sensed Images by NonLocal Means with OpenCL Algorithm

Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL

LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)