high performance computing on graphics processing units: hgpu.org

Posts

Sep, 27

E(A+M)PEC – An OpenCL Atomic and Molecular Plasma Emission Code For Interstellar Medium Simulations

E(A+M)PEC traces the ionization structure, cooling and emission spectra of plasmas. It is written in OpenCL, runs in NVIDIA Graphics Processor Units and can be coupled to any HD or MHD code to follow the dynamical and thermal evolution of any plasma in, e.g., the interstellar medium (ISM).

OpenCL

Sep, 26

From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

In this work, we evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide a […]

CUDA

•

OpenCL

Sep, 25

Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

We present a study of three important kernels that occur frequently in iterative statistical applications: K-Means, Multi-Dimensional Scaling (MDS), and PageRank. We implemented each kernel using OpenCL and evaluated their performance on an NVIDIA Tesla GPGPU card. By examining the underlying algorithms and empirically measuring the performance of various components of the kernel we explored […]

OpenCL

Sep, 24

Utilising OpenCL Framework for Ray-Tracing Acceleration

Modern graphics accelerators do not serve for classic computer games graphics computation accelerations only any more. Their highly parallel architectures enable their use in a broad spectrum of calculations. Because of the release of the OpenCL library and our interest in ray-tracing, we decided to show that ray-tracing is feasible not only on a multi-core […]

OpenCL

Sep, 24

A portable implementation of the radix sort algorithm in OpenCL

We present a portable OpenCL implementation of the radix sort algorithm. We test it on several GPUs or CPUs in order to assess its good performances on different hardware. We also apply our implementation to the Particle-In-Cell (PIC) sorting, which is useful in plasma physics simulations.

OpenCL

Sep, 24

Analyzing Use of OpenCL on the Cell Broadband Engine and a Proposal for OpenCL Extensions

Current processor architectures are diverse and heterogeneous. Examples include multicore chips, GPUs and the Cell Broadband Engine (CBE). The recent Open Compute Language (OpenCL) standard aims at efficiency and portability. This paper explores its efficiency when implemented on the CBE, without using CBE-specific features such as explicit asynchronous memory transfers. We based our experiments on […]

OpenCL

Sep, 24

Automatic Translation of CUDA to OpenCL and Comparison of Performance Optimizations on GPUs

As an open, royalty-free framework for writing programs that execute across heterogeneous platforms, OpenCL gives programmers access to a variety of data parallel processors including CPUs, GPUs, the Cell and DSPs. All OpenCL-compliant implementations support a core specification, thus ensuring robust functional portability of any OpenCL program. This thesis presents the CUDAtoOpenCL source-to-source tool that […]

CUDA

•

OpenCL

Sep, 24

OpenCL: a viable solution for high-performance medical image reconstruction?

Reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. For interventional image reconstruction, hardware optimization is mandatory. Manufacturers of medical equipment use a variety of high-performance computing (HPC) platforms, like FPGAs, graphics cards, or multi-core CPUs. A problem of this diversity is that many different frameworks and (vendor-specific) programming languages […]

OpenCL

Sep, 24

Single Scattering of Aspherical Particles in DDA Calculations on GPUs Using OpenCL

The global distribution and climatology of ice clouds are among the main uncertainties in climate modelling and prediction. In order to retrieve ice cloud properties from remote sensing measurements, the scattering properties of all cloud ice particle types must be known. The Discrete Dipole Approximation (DDA) simulates scattering of radiation by arbitrarily shaped particles and […]

OpenCL

Sep, 24

Dynamic Data Structures for Taskgraph Scheduling Policies with Applications in OpenCL Accelerators

OpenCL is an emerging open framework for parallel programming in heterogenous systems. OpenCL accelerators need to schedule the execution of submitted jobs with no (or only very imprecise) estimates of execution times, but respecting dependencies among them, which are given in the form of directed acyclic graph. This problem is known as stochastic taskgraph scheduling, […]

OpenCL

Sep, 24

CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures

The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation […]

CUDA

•

OpenCL

Sep, 24

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Advanced engineering and scientific communities have used parallel programming to solve their large scale complex problems. Achieving high performance is the main advantage for this choice. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

E(A+M)PEC – An OpenCL Atomic and Molecular Plasma Emission Code For Interstellar Medium Simulations

From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming

Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

Utilising OpenCL Framework for Ray-Tracing Acceleration

A portable implementation of the radix sort algorithm in OpenCL

Analyzing Use of OpenCL on the Cell Broadband Engine and a Proposal for OpenCL Extensions

Automatic Translation of CUDA to OpenCL and Comparison of Performance Optimizations on GPUs

OpenCL: a viable solution for high-performance medical image reconstruction?

Single Scattering of Aspherical Particles in DDA Calculations on GPUs Using OpenCL

Dynamic Data Structures for Taskgraph Scheduling Policies with Applications in OpenCL Accelerators

CU2CL: A CUDA-to-OpenCL Translator for Multi-and Many-core Architectures

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)