high performance computing on graphics processing units: hgpu.org

Posts

Apr, 30

KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters

GPUVerify is a static analysis tool for verifying that GPU kernels are free from data races and barrier divergence. It is intended as an automatic tool, but its usability is impaired by the fact that the user must explicitly supply the kernel source code, the number of threads, and some kernel arguments. Extracting this information […]

OpenCL

Apr, 27

Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures

Based on the premise that preconditioners needed for scientific computing are not only required to be robust in the numerical sense, but also scalable for up to thousands of light-weight cores, we argue that this two-fold goal is achieved for the recently developed self-adaptive multi-elimination preconditioner. For this purpose, we revise the underlying idea and […]

CUDA

•

OpenCL

Apr, 21

GPU Encrypt: AES Encryption on Mobile Devices

In this report, we have taken the first steps in investigating the feasibility of using the GPU as a cryptographic accelerator for the AES algorithm on mobile devices. In particular, our focus was on exploring the use of OpenCL as a framework for implementing the algorithm. Using modifications of an existing implementation [11], we first […]

OpenCL

Apr, 12

Analysis and Review of Sorting Algorithms

One of the fundamental issues in computer science is ordering a list of items. Although there is a huge number of sorting algorithms, sorting problem has attracted a great deal of research; because efficient sorting is important to optimize the use of other algorithms. Sorting algorithms have been studied extensively since past three decades. Their […]

OpenCL

Apr, 11

High performance in silico virtual drug screening on many-core processors

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their […]

OpenCL

Apr, 9

3D Hydrodynamic Simulation of Classical Nova Explosions

The purpose of this project is to develop a computer model to investigate the formation and life cycle of classical novae. A nova is an orbiting system consisting of a white dwarf and star. Over time, the white dwarf pulls hydrogen gas from the star which gathers onto the surface of the white dwarf (the […]

OpenCL

Apr, 1

GPU Based Performance Acceleration of Radar Imaging Algorithms

We consider the performance acceleration of the conventional Time Domain Backprojection and Kirchhoff Migration algorithms for imaging concealed targets. The Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL) are used here for accelerating these algorithms on Graphics Processing Units (GPUs). Data generated by means of analytical methods, simulation and experiment are used for […]

CUDA

•

OpenCL

Apr, 1

Enhanced Parallel NegaMax Tree Search Algorithm on GPU

Parallel performance for GPUs today surpasses the traditional multi-core CPUs. Currently, many AI algorithms started to be tested on GPUs rather than CPUs, especially after the release of libraries such as Cuda and OpenCL that allows the implementation of general algorithms on the GPU. One of the most famous game tree search algorithms is Negamax, […]

CUDA

•

OpenCL

Apr, 1

Code Generation for Embedded Heterogeneous Architectures on Android

The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs […]

Mar, 28

Improving Cache Locality for GPU-based Volume Rendering

We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically selects […]

CUDA

•

OpenCL

Mar, 28

GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a […]

OpenCL

Mar, 28

Implementation of Just In Time Value Specialization for the Optimization of Data Parallel Kernels

This dissertation explores just-in-time (JIT) specialization as an optimization for OpenCL data-parallel compute kernels. It describes the implementation and performance of two extensions to OpenCL, Bacon and Specialization Annotated OpenCL (SOCL). Bacon is a replacement interface for OpenCL that provides improved usability and has JIT specialization built in. SOCL is a simple extension to OpenCL […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters

Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures

GPU Encrypt: AES Encryption on Mobile Devices

Analysis and Review of Sorting Algorithms

High performance in silico virtual drug screening on many-core processors

3D Hydrodynamic Simulation of Classical Nova Explosions

GPU Based Performance Acceleration of Radar Imaging Algorithms

Enhanced Parallel NegaMax Tree Search Algorithm on GPU

Code Generation for Embedded Heterogeneous Architectures on Android

Improving Cache Locality for GPU-based Volume Rendering

GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

Implementation of Just In Time Value Specialization for the Optimization of Data Parallel Kernels

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)