high performance computing on graphics processing units: hgpu.org

Posts

Sep, 27

RoadRunner: a fast and flexible exoplanet transit model

I present RoadRunner, a fast exoplanet transit model that can use any radially symmetric function to model stellar limb darkening while still being faster to evaluate than the analytical transit model for quadratic limb darkening by Mandel & Agol (2002). CPU and GPU implementations of the model are available in the PyTransit transit modelling package, […]

OpenCL

Sep, 20

PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems

In the past decade, high performance compute capabilities exhibited by heterogeneous GPGPU platforms have led to the popularity of data parallel programming languages such as CUDA and OpenCL. Such languages, however, involve a steep learning curve as well as developing an extensive understanding of the underlying architecture of the compute devices in heterogeneous platforms. This […]

OpenCL

Aug, 23

Compiler-Based Tools to Aid in Data Transfer Optimization and On-Chip Debug of Heterogeneous Compute Systems

First, we present techniques to efficiently schedule data transfers through compiler analyses. Compared to transferring data immediately before and after the kernel executes, our scheduling results in orders of magnitude improvements in execution time, number of data transfers, and number of bytes transferred. Second, we demonstrate techniques to provide on-chip debugging for heterogeneous systems through […]

OpenCL

Aug, 23

Modular FPGA Systems with Support for Dynamic Workloads and Virtualisation

This thesis shows that it is feasible to build modular FPGA systems which can dynamically change the hardware resources in the spatial and the temporal domains using existing tools and accelerators, to improve maintainability, adaptability, and accessibility for FPGA systems. To achieve this, first, a modular FPGA development flow is proposed to build an FPGA […]

OpenCL

Jul, 19

Compyle: a Python package for parallel computing

Compyle allows users to execute a restricted subset of Python on a variety of HPC platforms. It is an embedded domain-specific language (eDSL) for parallel computing. It currently supports multi-core execution using Cython, and OpenCL and CUDA for GPU devices. Users write code in a restricted subset of Python that is automatically transpiled to high-performance […]

CUDA

•

OpenCL

Jul, 5

Studies on CUDA Offloading for Real-Time Simulation and Visualization

The Graphics Processing Unit (GPU) is a co-processor designed to aid the Central Processing Unit (CPU) for rendering 3D graphics. The prompt development of these graphics chips due to the popularity of games and media design helped the GPU to evolve its ubiquitous parallel architecture. The programmability of these devices increased with the introduction of […]

CUDA

Jun, 30

The Fifth International Workshop on GPU Computing and AI (GCA), 2020

==================================================== The Fifth International Workshop on GPU Computing and AI (GCA’20) to be held in conjunction with The Eighth International Symposium on Computing and Networking (CANDAR’20),Naha, Okinawa, Japan, November 24-27, 2020 ==================================================== Special announcement regarding COVID-19 situation– Although we are still working with the possibility of having physical meetings for CANDAR 2020 as planned, the […]

Jun, 21

FPGA Based Satisfiability Checking

The Boolean satisfiability problem, abbreviated as SAT, is the backbone of many applications in VLSI design automation and verification. Over the years, many SAT solvers, both complete and incomplete, have been developed. Complete solvers are usually based on the DPLL (Davis–Putnam–Logemann–Loveland) algorithm, which is a backtracking algorithm. Industrial strength problems are very large and make […]

OpenCL

Jun, 14

The Rodinia Benchmark Suite in SYCL

We apply the SYCL programming model to the Rodinia benchmark suite, describe the transformations from the OpenCL implementations to the SYCL implementations, and evaluate the benchmarks on microprocessors with a CPU and an integrated GPU. The publicly available implementations of the benchmark suite will track the development of the SYCL compilers, and provide programs for […]

OpenCL

Jun, 7

OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms

The execution of agent-based simulations (ABSs) on hardware accelerator devices such as graphics processing units (GPUs) has been shown to offer great performance potentials. However, in heterogeneous hardware environments, it can become increasingly difficult to find viable partitions of the simulation and provide implementations for different hardware devices. To automate this process, we present OpenABLext, […]

OpenCL

Jun, 7

Investigating Single Precision Floating General Matrix Multiply in Heterogeneous

The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect […]

OpenCL

May, 31

Evaluating the performance of HPC-style SYCL applications

SYCL is a parallel programming model for developing single-source programs for running on heterogeneous platforms. To this end, it allows for one code to be written which can run on a different architectures. For this study, we develop applications in SYCL which are representative of those often used in High-Performance Computing. Their performance is benchmarked […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

RoadRunner: a fast and flexible exoplanet transit model

PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems

Compiler-Based Tools to Aid in Data Transfer Optimization and On-Chip Debug of Heterogeneous Compute Systems

Modular FPGA Systems with Support for Dynamic Workloads and Virtualisation

Compyle: a Python package for parallel computing

Studies on CUDA Offloading for Real-Time Simulation and Visualization

The Fifth International Workshop on GPU Computing and AI (GCA), 2020

FPGA Based Satisfiability Checking

The Rodinia Benchmark Suite in SYCL

OpenABLext: An automatic code generation framework for agent-based simulations on CPU-GPU-FPGA heterogeneous platforms

Investigating Single Precision Floating General Matrix Multiply in Heterogeneous

Evaluating the performance of HPC-style SYCL applications

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)