high performance computing on graphics processing units: hgpu.org

Posts

Apr, 23

A High-resolution approach for Tsunami impact simulation on graphics processing units

Having learned a great deal about the problem and also the solutions over the course of this project, it is the opinion of the author that the method undertaken within this report is unsatisfactory for delivering performance enhancement over alternative approaches. Firstly the domain transfers result in reduced performance. For larger simulations these prove to […]

OpenCL

Apr, 12

Robust real time face recognition and tracking on gpu using fusion of rgb and depth image

This paper presents a real-time face recognition system using kinect sensor. The algorithm is implemented on GPU using opencl and significant speed improvements are observed. We use kinect depth image to increase the robustness and reduce computational cost of conventional LBP based face recognition. The main objective of this paper was to perform robust, high […]

OpenCL

Apr, 12

Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

In this report we present a novel approach to model coupling for shared-memory multicore systems hosting OpenCL-compliant accelerators, which we call The Glasgow Model Coupling Framework (GMCF). We discuss the implementation of a prototype of GMCF and its application to coupling the Weather Research and Forecasting Model and an OpenCL-accelerated version of the Large Eddy […]

OpenCL

Apr, 8

Finite element numerical integration for first order approximations on multi-core architectures

The paper presents investigations on the implementation and performance of the finite element numerical integration algorithm for first order approximations and three processor architectures, popular in scientific computing, classical CPU, Intel Xeon Phi and NVIDIA Kepler GPU. A unifying programming model and portable OpenCL implementation is considered for all architectures. Variations of the algorithm due […]

OpenCL

Apr, 4

The Design and Implementation of a Verification Technique for GPU Kernels

We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed synchronous, delayed visibility (SDV) semantics, which captures the execution of a GPU kernel by multiple groups of threads. The […]

CUDA

•

OpenCL

Mar, 30

Massively Parallel Analysis of Similarity Matrices on Heterogeneous Hardware

We conduct a study that investigates the performance characteristics of a set of parallel implementations of the recurrence quantification analysis (RQA) using OpenCL. Being an important tool in climate impact and medical research, a central aspect of RQA is the construction of a binary matrix that captures the similarities of multi-dimensional vectors. Based on this […]

OpenCL

Mar, 23

GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing

Interferometric radio telescopes often rely on computationally expensive O(N^2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded […]

OpenCL

Mar, 22

Speeding Up Computer Vision Applications on Mobile Computing Platforms

Computer vision (CV) is widely expected to be the next "Big Thing" in mobile computing. For example, Google has recently announced their project "Tango", a 5-inch Android phone containing highly customized hardware and software designed to track the full 3-dimensional motion of the device as you hold it while simultaneously creating a map of the […]

OpenCL

Mar, 22

Raising the Bar for Using GPUs in Software Packet Processing

Numerous recent research efforts have explored the use of Graphics Processing Units (GPUs) as accelerators for software-based routing and packet handling applications, typically demonstrating throughput several times higher than using legacy code on the CPU alone. In this paper, we explore a new hypothesis about such designs: For many such applications, the benefits arise less […]

CUDA

Mar, 18

Portable GPU-Based Artificial Neural Networks for Accelerated Data-Driven Modeling

Artificial neural network (ANN) is widely applied as the data-driven modeling tool in hydroinformatics due to its broad applicability of handling implicit and nonlinear relationships between the input and output data. To obtain a reliable ANN model, training ANN using the data is essential, but the training is usually taking many hours for a large […]

CUDA

•

OpenCL

Mar, 14

Heterogeneous Acceleration of Volumetric JPEG 2000

We present the implementation of a volumetric JPEG 2000 codec as a real-world use case of software acceleration with GPUs and multi-core CPUs. We present a generic methodology to accelerate existing code written in C with OpenCL. Furthermore, we account for the volumetric nature of the processed data and formulate associated optimization guidelines. The resulting […]

OpenCL

Mar, 8

HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms

Heterogeneous computing has emerged as one of the major computing platforms in many domains. Although there have been several proposals to aid programming for heterogeneous computing platforms, optimizing applications on heterogeneous computing platforms is not an easy task. Identifying which parallel regions (or tasks) should run on GPUs or CPUs is one of the critical […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A High-resolution approach for Tsunami impact simulation on graphics processing units

Robust real time face recognition and tracking on gpu using fusion of rgb and depth image

Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

Finite element numerical integration for first order approximations on multi-core architectures

The Design and Implementation of a Verification Technique for GPU Kernels

Massively Parallel Analysis of Similarity Matrices on Heterogeneous Hardware

GPU Kernels for High-Speed 4-Bit Astrophysical Data Processing

Speeding Up Computer Vision Applications on Mobile Computing Platforms

Raising the Bar for Using GPUs in Software Packet Processing

Portable GPU-Based Artificial Neural Networks for Accelerated Data-Driven Modeling

Heterogeneous Acceleration of Volumetric JPEG 2000

HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms

Recent source codes

CuPBoP-AMD: Extending CUDA to AMD Platforms

Adopter: Automated Deep Learning Optimization via DSL-based Source Code Transformation

ROCm's implementation of Gromacs

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

Most viewed papers (last 30 days)