high performance computing on graphics processing units: hgpu.org

Posts

May, 14

Descend: A Safe GPU Systems Programming Language

Graphics Processing Units (GPU) offer tremendous computational power by following a throughput oriented computing paradigm where many thousand computational units operate in parallel. Programming this massively parallel hardware is challenging. Programmers must correctly and efficiently coordinate thousands of threads and their accesses to various shared memory spaces. Existing mainstream GPU programming languages, such as CUDA […]

CUDA

•

OpenCL

Apr, 23

Simple and efficient GPU accelerated topology optimisation: Codes and applications

This work presents topology optimisation implementations for linear elastic compliance minimisation in three dimensions, accelerated using Graphics Processing Units (GPUs). Three different open-source implementations are presented for linear problems. Two implementations use GPU acceleration, based on either OpenMP 4.5 or the Futhark language to implement the hardware acceleration. Both GPU implementations are based on high […]

CUDA

•

OpenCL

Apr, 16

Kernel Tuning Toolkit

Kernel Tuning Toolkit (KTT) is an autotuning framework for CUDA, OpenCL and Vulkan kernels. KTT provides advanced autotuning features such as support for both dynamic (online) and offline tuning, and an ability to tune multiple kernels together with shared tuning parameters. Furthermore, it offers customization features that make integration into larger software suites possible. The […]

CUDA

•

OpenCL

Apr, 2

Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge

In the world of artificial intelligence (AI) at the edge, we need to focus primarily on the energy efficiency with which we approach deep neural network (DNN) applications. In many applications, the speed of obtaining an inference can be critical; but many applications easily meet their time requirements, and the energy needed to calculate the […]

OpenCL

Dec, 19

A Framework to Generate High-Performance Time-stepped Agent-based Simulations on Heterogeneous Hardware

Agent-Based Simulation (ABS) is a modelling approach where simulated entities i.e., agents, perform actions autonomously and interact with other agents based on a set of rules. ABSs have demonstrated their usefulness in various domains such as transportation, social science, or biology. Agent-based simulators commonly rely vastly on Central Processing Unit (CPU)-based sequential execution. As a […]

OpenCL

Dec, 19

FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing

Accelerators, such as GPUs (Graphics Processing Unit) that is suitable for handling highly parallel data, and FPGA (Field Programmable Gate Array) with algorithms customized architectures, are widely adopted. The motivation is that algorithms with various parallel characteristics can efficiently map to the heterogeneous computing architecture by collaborated GPU and FPGA. However, current applications always utilize […]

OpenCL

Dec, 11

Towards energy efficiency and productivity for decision making in mobile robot navigation

Our goal in this work is to make it easy and feasible to implement solutions for autonomous decision-making and planning under uncertainty on low-power mobile platforms. We focus on practical applications, such as autonomous driving and service robotics, that must run on mobile SoC platforms. These applications often have real-time execution constraints. The main challenge […]

OpenCL

Nov, 20

Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications

The ever-increasing demand for high-performance Big Data analytics and data processing has paved the way for heterogeneous hardware accelerators, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to be integrated into modern Big Data platforms. Currently, this integration comes at the cost of programmability, as the end-user Application Programming Interface (API) […]

CUDA

•

OpenCL

Nov, 6

An Open-source FPGA Library for Data Sorting

Field-programmable gate arrays (FPGAs) have garnered significant interest in research on high-performance computing because their flexibility enables the building of application-specific computation pipelines and data supply systems. In addition to the flexibility, toolchains for the development of FPGAs in OpenCL have been developed and offered by FPGA vendors that reduce the programming effort required. However, […]

OpenCL

Nov, 6

Apple Silicon Performance in Scientific Computing

With the release of the Apple Silicon System-on-a-Chip processors, and the impressive performance shown in general use by both the M1 and M1 Ultra, the potential use for Apple Silicon processors in scientific computing is explored. Both the M1 and M1 Ultra are compared to current state-of-the-art data-center GPUs, including an NVIDIA V100 with PCIe, […]

OpenCL

Sep, 4

Lina: a fast design optimisation tool for software-based FPGA programming

The continuous technology push on the semiconductor industry has led to the development of several alternate architectures for efficient computing. Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) are examples of devices used to accelerate applications. FPGAs are able to provide massive parallelism for suitable tasks when properly programmed. However, designing for FPGA is […]

OpenCL

Aug, 28

Exploring Thread Coarsening on FPGA

Over the past few years, there has been an increased interest in including FPGAs in data centers and high-performance computing clusters along with GPUs and other accelerators. As a result, it has become increasingly important to have a unified, high-level programming interface for CPUs, GPUs and FPGAs. This has led to the development of compiler […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Descend: A Safe GPU Systems Programming Language

Simple and efficient GPU accelerated topology optimisation: Codes and applications

Kernel Tuning Toolkit

Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge

A Framework to Generate High-Performance Time-stepped Agent-based Simulations on Heterogeneous Hardware

FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing

Towards energy efficiency and productivity for decision making in mobile robot navigation

Challenges and Techniques for Transparent Acceleration of Unmodified Big Data Applications

An Open-source FPGA Library for Data Sorting

Apple Silicon Performance in Scientific Computing

Lina: a fast design optimisation tool for software-based FPGA programming

Exploring Thread Coarsening on FPGA

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)