high performance computing on graphics processing units: hgpu.org

Posts

Jul, 14

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation

Over the past decade, accelerator-based supercomputers have grown from 0% to 42% performance share on the TOP500. Ideally, GPUaccelerated code on such systems should be "write once, run anywhere," regardless of the GPU device (or for that matter, any parallel device, e.g., CPU or FPGA). In practice, however, portability can be significantly more limited due […]

CUDA

•

OpenCL

Jul, 10

GPU-based Parallel Computation Support for Stan

This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other matrix algebra primitives and some commonly used likelihoods, with more additions planned for the near future. Stan users can now benefit from speedups offered by GPUs with little effort […]

OpenCL

Jun, 27

ReSYCLator: Transforming CUDA C++ source code into SYCL

CUDA while very popular, is not as flexible with respect to target devices as OpenCL. While parallel algorithm research might address problems first with a CUDA C++ solution, those results are not easily portable to a target not directly supported by CUDA. In contrast, a SYCL C++ solution can operate on the larger variety of […]

CUDA

•

OpenCL

Jun, 9

Temporospatial Epidemic Simulations Using Heterogeneous Computing

Discrete Event Simulation (DES) is widely used for analysis of complex temporospatial epidemic models. In such simulations, a conspicuous fraction (50%-90%) of simulation runtime is typically spent in solving equations used to model epidemic progression. General Purpose Graphics Processing Units (GPGPUs) hold considerable potential to reduce time for solving epidemic equations. However, the significant differences […]

OpenCL

Jun, 2

Heterogeneous Resource-Elastic Scheduling for CPU+FPGA Architectures

Heterogeneous computing is a key strategy to meet the requirements of many compute-intensive applications. However, currently, CPU+FPGA platforms are commonly underutilized as scheduling is often constrained to a run-tocompletion model or acceleration of a single application at a time. To tackle this, this paper proposes heterogeneous resource-elastic scheduling for maximizing the utilization of both CPU […]

OpenCL

May, 26

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation

This paper proposes a hybrid framework for fast and accurate performance estimation of OpenCL kernels running on GPUs. The kernel execution flow is statically analyzed and thereupon the execution trace is generated via a loop-based bidirectional branch search. Then the trace is dynamically simulated to perform a dummy execution of the kernel to obtain the […]

OpenCL

May, 19

Automatic Virtualization of Accelerators

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore’s Law. These technological trends are incompatible. Cloud applications run on virtual platforms, but traditional I/O virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by using […]

OpenCL

May, 8

FPGA-based acceleration of a particle simulation High Performance Computing application

In the present thesis, it has been studied the possibility to insert FPGAs in the world of High Performance Computing (HPC) systems. Such systems are hybrid platforms that exploit the pure parallel computation of GPUs in order to reach very high performances. Nevertheless, GPU-based systems are power-hungry and require a power consumption so large, that […]

OpenCL

Mar, 17

Novel Data-Partitioning Algorithms for Performance and Energy Optimization of Data-Parallel Applications on Modern Heterogeneous HPC Platforms

Heterogeneity has turned into one of the most profound and challenging characteristics of today’s HPC environments. Modern HPC platforms have become highly heterogeneous owing to the tight integration of multicore CPUs and accelerators (such as Graphics Processing Units, Intel Xeon Phis, or Field-Programmable Gate Arrays) empowering them to maximize the dominant objectives of performance and […]

CUDA

•

OpenCL

Mar, 17

CLTestCheck: Measuring Test Effectiveness for GPU Kernels

Massive parallelism, and energy efficiency of GPUs, along with advances in their programmability with OpenCL and CUDA programming models have made them attractive for general-purpose computations across many application domains. Techniques for testing GPU kernels have emerged recently to aid the construction of correct GPU software. However, there exists no means of measuring quality and […]

OpenCL

Mar, 17

Performance Optimization of Memory Intensive Applications on FPGA Accelerator

Hardware accelerators are a fundamental part of modern high performance computing (HPC) systems due to their performance capabilities. The two most commonly used accelerators are GPUs and FPGAs. Despite the easier programmability and better memory performance of GPUs, generally FPGAs perform equally well for computationally challenging applications while dramatically reducing the energy consumption. Furthermore, with […]

OpenCL

Mar, 10

Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel ® […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation

GPU-based Parallel Computation Support for Stan

ReSYCLator: Transforming CUDA C++ source code into SYCL

Temporospatial Epidemic Simulations Using Heterogeneous Computing

Heterogeneous Resource-Elastic Scheduling for CPU+FPGA Architectures

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation

Automatic Virtualization of Accelerators

FPGA-based acceleration of a particle simulation High Performance Computing application

Novel Data-Partitioning Algorithms for Performance and Energy Optimization of Data-Parallel Applications on Modern Heterogeneous HPC Platforms

CLTestCheck: Measuring Test Effectiveness for GPU Kernels

Performance Optimization of Memory Intensive Applications on FPGA Accelerator

Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

Recent source codes

CuPBoP-AMD: Extending CUDA to AMD Platforms

Adopter: Automated Deep Learning Optimization via DSL-based Source Code Transformation

ROCm's implementation of Gromacs

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

Most viewed papers (last 30 days)