high performance computing on graphics processing units: hgpu.org

Posts

Oct, 1

Parallel Application Library for Object Recognition

Computer vision research enables machines to understand the world. Humans usually interpret and analyze the world through what they see – the objects they capture with their eyes. Similarly, machines can better understand the world by recognizing objects in images. Object recognition is therefore a major branch of computer vision. To achieve the highest accuracy, […]

OpenCL

Oct, 1

GPGPU Accelerated Texture-Based Radiosity Calculation

Radiosity is a popular global illumination algorithm capable of achieving photorealistic rendering results. However, its use in interactive environments is limited by its computational complexity. This paper presents a GPGPU-based implementation of the gathering radiosity approach using texture-based discretisation and the OpenCL framework. Hemicubes are rendered to a texture array and are processed by OpenCL […]

OpenCL

•

OpenGL

Sep, 25

GPF: a framework for general packet classification on GPU co-processors

This thesis explores the design and experimental implementation of GPF, a novel protocol-independent, multi-match packet classification framework. This framework is targeted and optimised for flexible, efficient execution on NVIDIA GPU platforms through the CUDA API, but should not be difficult to port to other platforms, such as OpenCL, in the future. GPF was conceived and […]

CUDA

Sep, 22

Exploration of Parallelization Frameworks for Computational Finance

This paper presents a comparison of parallelization frameworks for efficient execution of computational finance workloads. We use a Value-at-Risk (VaR) workload to evaluate OpenCL and OpenMP parallelization frameworks on multi-core CPUs as opposed to GPUs. In addition, we study the impact of SMT on performance using GCC (4.4) and IBM XLC (11.01) compilers for both […]

OpenCL

Sep, 14

An Optimized Parallel IDCT on Graphics Processing Units

In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations […]

OpenCL

Sep, 13

GPU Fluid Simulation using Smoothed Particle Hydrodynamics

In this paper we present an overview of our implementation of a fluid simulation technique called "Smoothed Particle Hydrodynamics". Our implementation uses a hybrid CPU+GPU hash based data structure to provide quick lookups of particle nearest neighbors and improve memory access patterns.In our discussion we begin with a brief overview of the Navier Stokes equations […]

OpenCL

•

OpenGL

Sep, 8

Mastering Software Variant Explosion for GPU Accelerators

Mapping algorithms in an efficient way to the target hardware poses a challenge for algorithm designers. This is particular true for heterogeneous systems hosting accelerators like graphics cards. While algorithm developers have profound knowledge of the application domain, they often lack detailed insight into the underlying hardware of accelerators in order to exploit the provided […]

CUDA

•

OpenCL

Sep, 8

OpenACC Implementations Comparison

Using GPUs for general purpose programming is, nowadays, much easier than the previous years. In the very beginning were Brook-GPU or Close To Metal the approaches used for exploring the new possibilities of hardware accelerators. After that, CUDA and OpenCL were released. They had been adopted by many programmers due to theirs advantages but, however, […]

CUDA

Sep, 4

Accelerating distance matrix calculations utilizing GPU

When modeling pedestrian movement, it is necessary to find a path to the target point. It is possible to use a distance matrix or derived gradient map for this purpose. Calculations of distance matrix for large areas and multiple targets are very time-consuming. Therefore this article focuses on acceleration of these calculations utilizing Graphics Processing […]

OpenCL

Sep, 1

A Portable High-Productivity Approach to Program Heterogeneous Systems

The exploitation of heterogeneous resources is becoming increasingly important for general purpose computing. Unfortunately, heterogeneous systems require much more effort to be programmed than the traditional single or even multi-core computers most programmers are familiar with. Not only new concepts, but also new tools with different restrictions must be learned and applied. Additionally, many of […]

OpenCL

Sep, 1

Parallel GPU-accelerated Recursion-based Generators of Pseudorandom Numbers

The aim of the paper is to show how to design fast parallel algorithms for linear congruential and lagged Fibonacci pseudorandom numbers generators. The new algorithms employ the divide-and-conquer approach for solving linear recurrence systems and can be easily implemented on GPU-accelerated hybrid systems using CUDA or OpenCL. Numerical experiments performed on a computer system […]

CUDA

•

OpenCL

Aug, 28

GPUVerify: A Verifier for GPU Kernels

We present a technique for verifying race- and divergencefreedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Parallel Application Library for Object Recognition

GPGPU Accelerated Texture-Based Radiosity Calculation

GPF: a framework for general packet classification on GPU co-processors

Exploration of Parallelization Frameworks for Computational Finance

An Optimized Parallel IDCT on Graphics Processing Units

GPU Fluid Simulation using Smoothed Particle Hydrodynamics

Mastering Software Variant Explosion for GPU Accelerators

OpenACC Implementations Comparison

Accelerating distance matrix calculations utilizing GPU

A Portable High-Productivity Approach to Program Heterogeneous Systems

Parallel GPU-accelerated Recursion-based Generators of Pseudorandom Numbers

GPUVerify: A Verifier for GPU Kernels

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)