high performance computing on graphics processing units: hgpu.org

hgpu.org » performance portability

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

August Ernstsson, Dalvan Griebler, Christoph Kessler

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, nVidia, OpenCL, OpenMP, Package, performance portability, Tesla K20

December 11, 2022 by hgpu

Providing performance portable numerics for Intel GPUs

Yu-Hsiang M. Tsai, Terry Cojean, Hartwig Anzt

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI100, ATI, Computer science, CUDA, Linear Algebra, nVidia, nVidia A100, OpenCL, Package, performance portability, Sparse, SYCL

October 30, 2022 by hgpu

From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels

Gregor Daiß, Patrick Diehl, Dominic Marcello, Alireza Kheirkhahan, Hartmut Kaiser, Dirk Pflüger

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI100, Astrophysics, ATI, CUDA, HIP, nVidia, nVidia A100, Package, performance portability, Physics

October 23, 2022 by hgpu

Towards Performance Portable Programming for Distributed Heterogeneous Systems

Polykarpos Thomadakis, Nikos Chrisochoides

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, OpenCL, Package, Performance, performance portability, Tesla V100

October 9, 2022 by hgpu

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Zheming Jin, Jeffrey S. Vetter

View

Download (PDF)

Tags: Computer science, CUDA, Genomics, nVidia, performance portability, SYCL, Tesla V100

October 9, 2022 by hgpu

Enhancing the Performance Portability of Heterogeneous Circuit Analysis Programs

Tsung-Wei Huang

View

Download (PDF)

Tags: AMD Radeon RX 6900 XT, ATI, Computer science, Deep learning, nVidia, nVidia GeForce RTX 3090, OpenCL, performance portability, PyTorch, SYCL

September 11, 2022 by hgpu

Understanding the Power of Evolutionary Computation for GPU Code Optimization

Jhe-Yu Liou, Muaaz Awan, Steven Hofmeyr, Stephanie Forrest, Carole-Jean Wu

View

Download (PDF)

Source codes

Tags: Bioinformatics, Biology, Computer science, CUDA, Evolutionary Computations, Neural and Evolutionary Computing, nVidia, nVidia GeForce GTX 1080 Ti, Package, performance portability, Sequence alignment, Tesla P100, Tesla V100

September 4, 2022 by hgpu

Exploring Thread Coarsening on FPGA

Mostafa Eghbali Zarch, Reece Neff, Michela Becchi

View

Download (PDF)

Tags: Compilers, Computer science, FPGA, OpenCL, Performance, performance portability

August 28, 2022 by hgpu

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, nVidia, nVidia GeForce RTX 2080 Ti, Package, performance portability, Programming Languages

July 10, 2022 by hgpu

Portable, Scalable Approaches for Improving Asynchronous Many-Task Runtime Node Use

John Keith Holmen

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, MPI, nVidia, nVidia GeForce GTX Titan X, performance portability, Task scheduling

June 5, 2022 by hgpu

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Pietro Ghiglio, Uwe Dolinsky, Mehdi Goli, Kumudha Narasimhan

View

Download (PDF)

Tags: Computer science, CUDA, nVidia, OpenCL, Performance, performance portability, SYCL

May 1, 2022 by hgpu

Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library

Vincent R. Pascuzzi, Mehdi Goli

View

Download (PDF)

Tags: AMD Radeon Instinct MI100, ATI, Benchmarking, Computer science, FFT, Heterogeneous systems, HIP, HPC, nVidia, nVidia A100, OpenCL, performance portability, SYCL

March 20, 2022 by hgpu

GPUODEBenchmarks: Comparsion of Julia's GPU Kernel based ODE solvers with other open-source GPU ODE solvers

Automating Heterogeneous Parallelism in Numerical Differential Equations

Dat3M: Memory Model Aware Verification

Towards Unified Analysis of GPU Consistency

NVIDIA Federated Learning Application Runtime Environment

Supercharging Federated Learning with Flower and NVIDIA FLARE

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

PSCToolkit: solving sparse linear systems with a large number of GPUs

CATBench: Benchmarking Framework

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

General-purpose Polyhedral Compilers

A Survey of General-purpose Polyhedral Compilers

Optimal Kernel Orchestration for Tensor Programs with Korch

Astaroth: A Scalable Multi-GPU Library for Stencil Computations

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Providing performance portable numerics for Intel GPUs

From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels

Towards Performance Portable Programming for Distributed Heterogeneous Systems

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Enhancing the Performance Portability of Heterogeneous Circuit Analysis Programs

Understanding the Power of Evolutionary Computation for GPU Code Optimization

Exploring Thread Coarsening on FPGA

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Portable, Scalable Approaches for Improving Asynchronous Many-Task Runtime Node Use

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library

Recent source codes

GPUODEBenchmarks: Comparsion of Julia's GPU Kernel based ODE solvers with other open-source GPU ODE solvers

Dat3M: Memory Model Aware Verification

NVIDIA Federated Learning Application Runtime Environment

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

PSCToolkit: solving sparse linear systems with a large number of GPUs

CATBench: Benchmarking Framework

General-purpose Polyhedral Compilers

Optimal Kernel Orchestration for Tensor Programs with Korch

Astaroth: A Scalable Multi-GPU Library for Stencil Computations

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking

Most viewed papers (last 30 days)