high performance computing on graphics processing units: hgpu.org

hgpu.org » performance portability

Performance portability evaluation of blocked stencil computations on GPUs

Oscar Antepara, Hans Johansen, Samuel Williams, Tuowen Zhao, Samantha Hirsch, Priya Goyal, Mary Hall

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI250X, ATI, Code generation, Computer science, CUDA, HIP, nVidia, nVidia A100, Package, performance portability, Stencil computation, SYCL

October 29, 2023 by hgpu

A Performance-Portable SYCL Implementation of CRK-HACC for Exascale

Esteban M. Rangel, S. John Pennycook, Adrian Pope, Nicholas Frontiere, Zhiqiang Ma, Varsha Madananth

View

Download (PDF)

Tags: AMD Radeon Instinct MI250X, ATI, Computer science, Cosmology, CUDA, HIP, HPC, nVidia, nVidia A100, Performance, performance portability, Physics, SYCL

October 29, 2023 by hgpu

Performance portability analysis of SYCL with a classical CG on CPU, GPU, and FPGA

Julian Franquinet

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, FPGA, nVidia, nVidia Quadro GP100, Optimization, Performance, performance portability, SYCL

October 22, 2023 by hgpu

Open SYCL on heterogeneous GPU systems: A case of study

Rocío Carratalá-Sáez, Francisco J. andújar, Yuri Torres, Arturo Gonzalez-Escribano, Diego R. Llanos

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Heterogeneous systems, HIP, nVidia, Package, performance portability, SYCL

October 15, 2023 by hgpu

Experience Migrating OpenCL to SYCL: A Case Study on Searches for Potential Off-Target Sites of Cas9 RNA-Guided Endonucleases on AMD GPUs

Zheming Jin, Jeffrey S. Vetter

View

Download (PDF)

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI60, AMD Radeon VII, ATI, ATI Radeon HD 7870, Bioinformatics, Biology, OpenCL, performance portability, RNA, SYCL

October 1, 2023 by hgpu

OpenMP Kernel Language Extensions for Performance Portable GPU Codes

Shilei Tian, Tom Scogland, Barbara Chapman, Johannes Doerfert

View

Download (PDF)

Tags: AMD Radeon Instinct MI250, ATI, Benchmarking, Compilers, Computer science, CUDA, HIP, nVidia, nVidia A100, OpenMP, performance portability

October 1, 2023 by hgpu

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Istvan Z Reguly

View

Download (PDF)

Tags: AMD Radeon Instinct MI250X, ATI, Benchmarking, cfd, Computer science, CUDA, Fluid dynamics, HIP, Intel, Intel Ponte Vecchio Max 1100, nVidia, nVidia A100, OpenCL, performance portability, SYCL

September 24, 2023 by hgpu

Improving the Efficiency of OpenCL Kernels through Pipes

Mostafa Eghbali Zarch, Michela Becchi

View

Download (PDF)

Tags: Computer science, FPGA, OpenCL, Performance, performance portability

September 17, 2023 by hgpu

Mashing load balancing algorithm to boost hybrid kernels in molecular dynamics simulations

Raúl Nozal, Jose Luis Bosque

View

Download (PDF)

Source codes

Tags: AMD Radeon RX 5700 XT, ATI, Computer science, Heterogeneous systems, HPC, Molecular dynamics, OpenCL, OpenMP, Package, performance portability

August 28, 2023 by hgpu

Porting Batched Iterative Solvers onto Intel GPUs with SYCL

Phuong Nguyen, Pratik Nayak, Hartwig Anzt

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Intel, Intel Data Center GPU Max 1550, nVidia, nVidia A100, nVidia H100, Package, performance portability, Physics, SYCL

August 20, 2023 by hgpu

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

Pietro Incardona, Aryaman Gupta, Serhii Yaskovets, Ivo F. Sbalzarini

View

Download (PDF)

Source codes

Tags: AMD RX Vega 64, ATI, Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 3090, OpenACC, OpenCL, OpenMP, Package, Performance, performance portability, SYCL

July 30, 2023 by hgpu

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Pablo Antonio Martínez Sánchez

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia GeForce RTX 2080, nVidia GeForce RTX 2080 Ti, Performance, performance portability, Thesis

July 16, 2023 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Performance portability evaluation of blocked stencil computations on GPUs

A Performance-Portable SYCL Implementation of CRK-HACC for Exascale

Performance portability analysis of SYCL with a classical CG on CPU, GPU, and FPGA

Open SYCL on heterogeneous GPU systems: A case of study

Experience Migrating OpenCL to SYCL: A Case Study on Searches for Potential Off-Target Sites of Cas9 RNA-Guided Endonucleases on AMD GPUs

OpenMP Kernel Language Extensions for Performance Portable GPU Codes

Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications

Improving the Efficiency of OpenCL Kernels through Pipes

Mashing load balancing algorithm to boost hybrid kernels in molecular dynamics simulations

Porting Batched Iterative Solvers onto Intel GPUs with SYCL

A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)