high performance computing on graphics processing units: hgpu.org

hgpu.org » AMD Radeon Instinct MI100

Reducing Synchronous GPU Memory Transfers: Design and implementation of a Futhark compiler optimisation

Philip Jon Børgesen

View

Tags: AMD Radeon Instinct MI100, ATI, Computer science, CUDA, nVidia, nVidia A100, OpenCL, Performance, Thesis

July 17, 2022 by hgpu

Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading

Shilei Tian, Joseph Huber, Barbara Chapman, Johannes Doerfert

View

Tags: AMD Radeon Instinct MI100, ATI, Compilers, Computer science, CUDA, nVidia, nVidia A100, OpenCL, OpenMP

July 17, 2022 by hgpu

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

André Müller, Bertil Schmidt, Richard Membarth, Roland Leißa, Sebastian Hack

View

Tags: AMD Radeon Instinct MI100, ATI, Bioinformatics, Biology, CUDA, Next-Generation sequencing, nVidia, nVidia GeForce RTX 3090, OpenCL, Package, Sequence alignment, Tesla A100

May 22, 2022 by hgpu

Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library

Vincent R. Pascuzzi, Mehdi Goli

View

Tags: AMD Radeon Instinct MI100, ATI, Benchmarking, Computer science, FFT, Heterogeneous systems, HIP, HPC, nVidia, nVidia A100, OpenCL, performance portability, SYCL

March 20, 2022 by hgpu

HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark

Noel Chalmers, Abhishek Mishra, Damon McDougall, Tim Warburton

View

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250X, ATI, Benchmarking, Computer science, CUDA, HIP, HPC, MPI, nVidia, Package, Performance, Tesla V100

March 11, 2022 by hgpu

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Moritz Lehmann, Mathias J. Krause, Giorgio Amati, Marcello Sega, Jens Harting, Stephan Gekle

View

Tags: AMD Radeon Instinct MI100, AMD Radeon VI, ATI, Fluid dynamics, lattice Boltzmann, Mixed precision, nVidia, OpenCL, Tesla K20, Tesla K40, Tesla K80, Tesla P100, Tesla V100

December 19, 2021 by hgpu

GPU Algorithms for Efficient Exascale Discretizations

Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Ryan Bleile, Jed Brown, Jean-Sylvain Camier, Robert Carson, Noel Chalmers, Veselin Dobrev, Yohann Dudouit, Paul Fischer, Ali Karakus, Stefan Kerkemeier, Tzanio Kolev, Yu-Hsiang Lan, Elia Merzari, Misun Min, Malachi Phillips, Thilina Rathnayake, Robert Rieben, Thomas Stitt, Ananias Tomboulides, Stanimire Tomov, Vladimir Tomov, Arturo Vargas, Tim Warburton, Kenneth Weiss

View

Tags: Algorithms, AMD Radeon Instinct MI100, ATI, Computer science, CUDA, Finite element method, nVidia, OCCA, Tesla V100

September 19, 2021 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

Probe-and-Refine Tuning of Repository Guidance for Coding Agents

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: