high performance computing on graphics processing units: hgpu.org

Posts

Sep, 3

A Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map Calculation Example

Modern SoC-FPGA that consists of FPGA with embedded ARM cores is being popularized as an embedded vision system platform. However, the design approach of SoC-FPGA applications still follows traditional hardware-software separate workflow, which becomes the barrier of rapid product design and iteration on SoC-FPGA. High-Level Synthesis (HLS) and OpenCL-based system-level design approaches provide programmers the […]

OpenCL

Aug, 24

First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’15), 2015

With Exascale systems on the horizon at the same time that conventional von-Neumann architectures are suffering from rising power densities, we are facing an era with power, energy-efficiency, and cooling as first-class constraints for scalable HPC. FPGAs can tailor the hardware to the application, avoiding overheads of general-purpose architectures–for example, through customized datapaths and memory […]

Aug, 18

Runtime Code Generation and Data Management for Heterogeneous Computing in Java

GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages […]

OpenCL

Aug, 18

RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices

This project presents a library that automates the parallelisation of several higherorder functions, originally provided within the Ruby standard-library. The library distributes computation across many compute-units, following an annotation specifying that primitives are solely operating on numerical data. RubiCL harnesses the OpenCL framework in order to allow execution to occur on CPU or GPU devices. […]

OpenCL

Aug, 12

GPU Pro 6: Advanced Rendering Techniques

The latest edition of this bestselling game development reference offers proven tips and techniques for the real-time rendering of special effects and visualization data that are useful for beginners and seasoned game and graphics programmers alike. Exploring recent developments in the rapidly evolving field of real-time rendering, GPU Pro6: Advanced Rendering Techniques assembles a high-quality […]

CUDA

•

OpenCL

•

OpenGL

Aug, 10

CRINK: Automatic CUDA code generation for affine C programs

Parallel programming has largely evolved as an efficient solution to a large number of compute intensive applications. Graphics Processing Unit (GPUs), traditionally designed to process computer graphics, are now widely applied to process large chunks of data parallely in many computationally expensive applications. While developing parallel programs to run on parallel computing platforms, such as […]

CUDA

Aug, 7

Behavioral Spherical Harmonics for Long-Range Agents’ Interaction

We introduce behavioral spherical harmonic (BSH), a novel approach to efficiently and compactly represent the directional-dependent behavior of agent. BSH is based on spherical harmonics to project the directional information of a group of multiple agents to a vector of few coefficients; thus, BSH drastically reduces the complexity of the directional evaluation, as it requires […]

OpenCL

•

OpenGL

Aug, 1

A University-Industry Collaboration Case Study: Intel Real-Time Multi-View Face Detection Capstone Design Projects

Since 2011, University of Michigan-Shanghai Jiao Tong University Joint Institute (JI) has established 122 corporate-sponsored Capstone Design Projects (CDPs) with world leading companies such as Covidien, General Electric, Hewlett Packard, Intel, and Siemens. Of these corporations, Intel was the first sponsor, having funded 21 projects and mentored 105 students over four consecutive years. This paper […]

OpenCL

Jul, 29

Sound Synthesis Using Physical Modeling on Heterogeneous Computing Platforms

The paper presents a comparison of central processing unit (CPU) and graphics processing unit (GPU) performance in sound synthesis based on physical modeling. The goal was to achieve real-time performance with two- and three-dimensional finite difference (FD) instrument models. Two abstract instruments, a membrane and a block, were modeled and tested using a CPU and […]

OpenCL

Jul, 29

OKL: A Unified Language for Parallel Architectures

Rapid evolution of computer processor architectures has spawned multiple programming languages and standards. This thesis strives to address the challenges caused by fast and cyclical changes in programming models. The novel contribution of this thesis is the introduction of an abstract unified framework which addresses portability and performance for programming manycore devices. To test this […]

CUDA

•

OpenCL

Jul, 28

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

Computer vision (CV) is widely expected to be the next big thing in mobile computing. The availability of a camera and a large number of sensors in mobile devices will enable CV applications that understand the environment and enhance people’s lives through augmented reality. One of the problems yet to solve is how to transfer […]

OpenCL

Jul, 15

Automatic Optimization of Thread Mapping for a GPGPU Programming Framework

Although General Purpose computation on Graphics Processing Units (GPGPU) is widely used for the high-performance computing, standard programming frameworks such as CUDA and OpenCL are still difficult to use.They require low-level specifications and the hand-optimization is a large burden. Therefore we are developing an easier framework named MESI-CUDA. Based on a virtual shared memory model, […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map Calculation Example

First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC’15), 2015

Runtime Code Generation and Data Management for Heterogeneous Computing in Java

RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices

GPU Pro 6: Advanced Rendering Techniques

CRINK: Automatic CUDA code generation for affine C programs

Behavioral Spherical Harmonics for Long-Range Agents’ Interaction

A University-Industry Collaboration Case Study: Intel Real-Time Multi-View Face Detection Capstone Design Projects

Sound Synthesis Using Physical Modeling on Heterogeneous Computing Platforms

OKL: A Unified Language for Parallel Architectures

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

Automatic Optimization of Thread Mapping for a GPGPU Programming Framework

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)