high performance computing on graphics processing units: hgpu.org

hgpu.org » performance portability

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

Pietro Incardona, Aryaman Gupta, Serhii Yaskovets, Ivo F. Sbalzarini

View

Download (PDF)

Source codes

Tags: AMD RX Vega 64, ATI, Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 3090, OpenACC, OpenCL, OpenMP, Package, Performance, performance portability, SYCL

July 30, 2023 by hgpu

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Pablo Antonio Martínez Sánchez

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia GeForce RTX 2080, nVidia GeForce RTX 2080 Ti, Performance, performance portability, Thesis

July 16, 2023 by hgpu

Implementation Techniques for SPMD Kernels on CPUs

Joachim Meyer, Aksel Alpay, Sebastian Hack, Holger Fröning, Vincent Heuveline

View

Download (PDF)

Source codes

Tags: Compilers, Computer science, CUDA, Heterogeneous systems, nVidia, OpenCL, Package, performance portability

June 4, 2023 by hgpu

Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU

Zheming Jin, Jeffrey S. Vetter

View

Download (PDF)

Source codes

Tags: Benchmarking, Bioinformatics, Computer science, CUDA, Heterogeneous systems, nVidia, Package, performance portability, SYCL, Tesla V100

April 16, 2023 by hgpu

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Yehonatan Fridman, Guy Tamir, Gal Oren

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, Intel, Intel Ponte Vecchio Max 1100, nVidia, nVidia A100, oneAPI, OpenMP, Package, performance portability

April 16, 2023 by hgpu

Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

Stijn Heldens, Ben van Werkhoven

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Fluid dynamics, nVidia, nVidia A100, Package, Performance, performance portability

March 26, 2023 by hgpu

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Giorgis Georgakoudis, Konstantinos Parasyris, Chunhua Liao, David Beckingsale, Todd Gamblin, Bronis de Supinski

View

Download (PDF)

Tags: AMD Radeon Instinct Mi50, ATI, Benchmarking, Code generation, Compilers, Computer science, CUDA, Heterogeneous systems, Machine learning, nVidia, OpenMP, performance portability, Tesla P100, Tesla V100

March 19, 2023 by hgpu

Towards a Benchmarking Suite for Kernel Tuners

Jacob O. Tørring, Ben van Werkhoven, Filip Petrovic, Floris-Jan Willemsen, Jirí Filipovic, Anne C. Elster

View

Download (PDF)

Source codes

Tags: Auto-Tuning, Benchmarking, Computer science, CUDA, nVidia, nVidia GeForce RTX 2080 Ti, nVidia GeForce RTX 3060, nVidia GeForce RTX 3090, nVidia Titan RTX, Package, performance portability

March 19, 2023 by hgpu

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Polykarpos Thomadakis, Nikos Chrisochoides

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, MPI, nVidia, performance portability, Tesla V100

March 12, 2023 by hgpu

Extending MAGMA Portability with OneAPI

Anna Fortenberry, Stanimire Tomov

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Heterogeneous systems, Linear Algebra, Matrix multiplication, nVidia, nVidia GeForce RTX 3060, oneAPI, Package, performance portability

December 25, 2022 by hgpu

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

August Ernstsson, Dalvan Griebler, Christoph Kessler

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, nVidia, OpenCL, OpenMP, Package, performance portability, Tesla K20

December 11, 2022 by hgpu

Providing performance portable numerics for Intel GPUs

Yu-Hsiang M. Tsai, Terry Cojean, Hartwig Anzt

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI100, ATI, Computer science, CUDA, Linear Algebra, nVidia, nVidia A100, OpenCL, Package, performance portability, Sparse, SYCL

October 30, 2022 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Implementation Techniques for SPMD Kernels on CPUs

Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Towards a Benchmarking Suite for Kernel Tuners

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Extending MAGMA Portability with OneAPI

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Providing performance portable numerics for Intel GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)