high performance computing on graphics processing units: hgpu.org

Posts

Feb, 9

GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers

With an aim to realizing highly accurate position estimation, we propose in this paper a method for efficiently and accurately detecting the 3D positions and poses of traditional fiducial markers with black frames in high-resolution images taken by ordinary web cameras. Our tracking method can be efficiently executed utilizing GPGPU computation, and in order to […]

OpenCL

Feb, 9

Benchmarks for Intel MIC Architecture

Intel Many Integrated Core (MIC) Architecture combines about 60 cores onto a single chips. Intel MIC brand named Xeon Phi offers a theoretical maximum of more than 3 double precision GFLOPs than Intel Xeon E5 core. We carry out benchmarks for Intel MIC with a Monte Carlo simulation of LIBOR Market Model. The results show […]

Feb, 9

Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA, which makes it a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high level of programming abstraction […]

OpenCL

Feb, 9

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

Graphics processing units (GPUs) are increasingly used for non-graphics computing. However, applications with divergent control flow incur performance degradation on current GPUs. These GPUs implement the SIMT execution model by serializing the execution of different control flow paths encountered by a warp. This serialization can mask thread level parallelism among the scalar threads comprising a […]

Feb, 9

GPU-based high-performance computing for radiation therapy

Recent developments in radiotherapy therapy demand high computation powers to solve challenging problems in a timely fashion in a clinical environment. The graphics processing unit (GPU), as an emerging high-performance computing platform, has been introduced to radiotherapy. It is particularly attractive due to its high computational power, small size, and low cost for facility deployment […]

CUDA

Feb, 8

Fast 2-D Ultrasound Strain Imaging: The Benefits of Using a GPU

Deformation of tissue can be accurately estimated from radio-frequency ultrasound data using a 2-dimensional normalized cross correlation (NCC)-based algorithm. This procedure, however, is very computationally time-consuming. A major time reduction can be achieved by parallelizing the numerous computations of NCC. In this paper, two approaches for parallelization have been investigated: the OpenMP interface on a […]

CUDA

Feb, 8

State-Based Gauss-Seidel Framework for Real-time 2D Ultrasound Image Sequence Denoising on GPUs

The ultrasound image sequences are not only majorly contaminated by multiplicative noises but they are also usually contaminated with additive noises. As in the past few decades, there were some works, which had focused on removing the noises from ultrasound images, such as in the JY model [1] and in the variational model, which were […]

CUDA

Feb, 8

Fast 3D Graphics Rendering Technique with CUDA Parallel Processing

3D Graphic Rendering has been used to express realistic, 3-dimensional, and emphasized effects in the graphics. As 3D Graphic Rendering developed and became more prevalent, the need for acceleration in data processing grew as well, leading to a development of GPU (Graphic Processing Unit) and shading language used for GPU such as GLSL (OpenGL Shading […]

CUDA

•

OpenGL

Feb, 8

Developmental Directions in Parallel Accelerators

Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such accelerator is now quite common for many applications, but the use of multiple devices and hybrid combinations is still […]

CUDA

Feb, 8

Simulating and Benchmarking the Shallow-Water Fluid Dynamical Equations on Multiple Graphical Processing Units

The shallow-water model equations provide a simple yet realistic benchmark problem in computational fluid dynamics (CFD) that can be implemented on a variety of computational platforms. Graphical Processing Units can be used to accelerate such problems either singly using a data parallel decompositional scheme or with multiple devices using a domain decompositional approach. We implement […]

CUDA

Feb, 5

Auto-Generation and Auto-Tuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

This paper develops and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, […]

CUDA

Feb, 5

OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method

A GPU-accelerated discontinuous Galerkin (DG) method is presented for the solution of compressible flows on 3-D unstructured grids. The present work has employed two of the most attractive features in a new programming standard of parallel computing – OpenACC: 1) multi-platform/compiler support and 2) descriptive directive interface to upgrade a legacy CFD solver with the […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers

Benchmarks for Intel MIC Architecture

Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems

A Scalable Multi-Path Microarchitecture for Efficient GPU Control Flow

GPU-based high-performance computing for radiation therapy

Fast 2-D Ultrasound Strain Imaging: The Benefits of Using a GPU

State-Based Gauss-Seidel Framework for Real-time 2D Ultrasound Image Sequence Denoising on GPUs

Fast 3D Graphics Rendering Technique with CUDA Parallel Processing

Developmental Directions in Parallel Accelerators

Simulating and Benchmarking the Shallow-Water Fluid Dynamical Equations on Multiple Graphical Processing Units

Auto-Generation and Auto-Tuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)