high performance computing on graphics processing units: hgpu.org

Posts

Dec, 19

A Framework to Generate High-Performance Time-stepped Agent-based Simulations on Heterogeneous Hardware

Agent-Based Simulation (ABS) is a modelling approach where simulated entities i.e., agents, perform actions autonomously and interact with other agents based on a set of rules. ABSs have demonstrated their usefulness in various domains such as transportation, social science, or biology. Agent-based simulators commonly rely vastly on Central Processing Unit (CPU)-based sequential execution. As a […]

OpenCL

Dec, 19

FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing

Accelerators, such as GPUs (Graphics Processing Unit) that is suitable for handling highly parallel data, and FPGA (Field Programmable Gate Array) with algorithms customized architectures, are widely adopted. The motivation is that algorithms with various parallel characteristics can efficiently map to the heterogeneous computing architecture by collaborated GPU and FPGA. However, current applications always utilize […]

OpenCL

Dec, 11

Towards energy efficiency and productivity for decision making in mobile robot navigation

Our goal in this work is to make it easy and feasible to implement solutions for autonomous decision-making and planning under uncertainty on low-power mobile platforms. We focus on practical applications, such as autonomous driving and service robotics, that must run on mobile SoC platforms. These applications often have real-time execution constraints. The main challenge […]

OpenCL

Nov, 20

Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications

The ever-increasing demand for high-performance Big Data analytics and data processing has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into modern Big Data platforms. Currently, this integration comes at the cost of programmability, as the end-user Application Programming Interface (API) […]

CUDA

•

OpenCL

Nov, 6

An Open-source FPGA Library for Data Sorting

Field-programmable gate arrays (FPGAs) have garnered significant interest in research on high-performance computing because their flexibility enables the building of application-specific computation pipelines and data supply systems. In addition to the flexibility, toolchains for the development of FPGAs in OpenCL have been developed and offered by FPGA vendors that reduce the programming effort required. However, […]

OpenCL

Nov, 6

Apple Silicon Performance in Scientific Computing

With the release of the Apple Silicon System-on-a-Chip processors, and the impressive performance shown in general use by both the M1 and M1 Ultra, the potential use for Apple Silicon processors in scientific computing is explored. Both the M1 and M1 Ultra are compared to current state-of-the-art data-center GPUs, including an NVIDIA V100 with PCIe, […]

OpenCL

Sep, 4

Lina: a fast design optimisation tool for software-based FPGA programming

The continuous technology push on the semiconductor industry has led to the development of several alternate architectures for efficient computing. Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) are examples of devices used to accelerate applications. FPGAs are able to provide massive parallelism for suitable tasks when properly programmed. However, designing for FPGA is […]

OpenCL

Aug, 28

Exploring Thread Coarsening on FPGA

Over the past few years, there has been an increased interest in including FPGAs in data centers and high-performance computing clusters along with GPUs and other accelerators. As a result, it has become increasingly important to have a unified, high-level programming interface for CPUs, GPUs and FPGAs. This has led to the development of compiler […]

OpenCL

Aug, 21

BenchPress: A Deep Active Benchmark Generator

We develop BenchPress, the first ML benchmark generator for compilers that is steerable within feature space representations of source code. BenchPress synthesizes compiling functions by adding new code in any part of an empty or existing sequence by jointly observing its left and right context, achieving excellent compilation rate. BenchPress steers benchmark generation towards desired […]

OpenCL

Aug, 21

Optimization of GPU workloads using natural language processing based on deep learning techniques

Setting program parameters is challenging due to the abstract relationship between hardware and software. Automatic optimization algorithms that are accurate are required to cope with the complexity and variety of current hardware and software. Autotuning has always relied on time-consuming trial and error approaches. Machine learning (ML) and Natural Language Processing (NLP) has flourished over […]

CUDA

•

OpenCL

Aug, 7

Design and Implementation of ShenWei Universal C/C++

The ShenWei many-core series processors powering multiple cutting-edge supercomputers are equipped with their unique on-chip heterogeneous architecture. They have long required programmers to write separate codes for the control part on Management Processing Element (MPE) and accelerated part on Compute Processing Element (CPE), which is similar to open standards like OpenCL. Such a programming model […]

CUDA

•

OpenCL

Jul, 17

Heterogeneous Energy-aware Load Balancing for Industry 4.0 and IoT Environments

With the improvement of global infrastructure, Cyber-Physical Systems (CPS) have become an important component of Industry 4.0. Both the application as well as the machine work together to improve the task of interdependencies. Machine learning methods in CPS require the monitoring of computational algorithms, including adopting optimizations, fine-tuning cyber systems, improving resource utilization, as well […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Framework to Generate High-Performance Time-stepped Agent-based Simulations on Heterogeneous Hardware

FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing

Towards energy efficiency and productivity for decision making in mobile robot navigation

Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications

An Open-source FPGA Library for Data Sorting

Apple Silicon Performance in Scientific Computing

Lina: a fast design optimisation tool for software-based FPGA programming

Exploring Thread Coarsening on FPGA

BenchPress: A Deep Active Benchmark Generator

Optimization of GPU workloads using natural language processing based on deep learning techniques

Design and Implementation of ShenWei Universal C/C++

Heterogeneous Energy-aware Load Balancing for Industry 4.0 and IoT Environments

Recent source codes

CuPBoP-AMD: Extending CUDA to AMD Platforms

Adopter: Automated Deep Learning Optimization via DSL-based Source Code Transformation

ROCm's implementation of Gromacs

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

Most viewed papers (last 30 days)