high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia

Exploring SYCL for batched kernels with memory allocations

Aymeric Millan, Thomas Padioleau, Julien Bigot

View

Tags: AMD Radeon Instinct MI250X, ATI, Computer science, CUDA, FFT, Neural networks, nVidia, nVidia A100, Package, performance portability, SYCL

May 25, 2025 by hgpu

Performance of Confidential Computing GPUs

Antonio Martínez Ibarra, Julian James Stephen, Aurora González Vidal, K. R. Jayaram, Antonio Fernando Skarmeta Gómez

View

Tags: Computer science, CUDA, LLM, nVidia, nVidia H100, Performance, Security

May 25, 2025 by hgpu

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud

View

Tags: AI, AMD Radeon RX 7900 XT, ATI, Computer science, CUDA, HIP, Machine learning, nVidia, nVidia A100, OpenCL, Package, Programming Languages, PTX

May 25, 2025 by hgpu

FLASH: Fast All-to-All Communication in GPU Clusters

Yiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi

View

Tags: AMD Radeon Instinct MI300X, ATI, Computer science, GPU cluster, Heterogeneous systems, MPI, nVidia, nVidia A100, nVidia B200, nVidia H100

May 25, 2025 by hgpu

Comparing Parallel Functional Array Languages: Programming and Performance

David van Balen, Tiziano De Matteis, Clemens Grelck, Troels Henriksen, Aaron W. Hsu, Gabriele K. Keller, Thomas Koopman, Trevor L. McDonell, Cosmin Oancea, Sven-Bodo Scholz, Artjoms Sinkarovs, Tom Smeding, Phil Trinder, Ivo Gabe de Wolff, Alexandros Nikolaos Ziogas

View

Tags: Benchmarking, Computer science, CUDA, HIP, N-body simulation, nVidia, nVidia A30, OpenCL, Package, Performance, performance portability, Programming Languages

May 18, 2025 by hgpu

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Zhonggen Li, Xiangyu Ke, Yifan Zhu, Yunjun Gao, Feifei Li

View

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia A100, Package, Performance, Prefetch

May 18, 2025 by hgpu

Can Large Language Models Predict Parallel Code Performance?

Gregory Bolet, Giorgis Georgakoudis, Harshitha Menon, Konstantinos Parasyris, Niranjan Hasabnis, Hayden Estes, Kirk W. Cameron, Gal Oren

View

Tags: Benchmarking, Computer science, CUDA, LLM, nVidia, nVidia GeForce RTX 3080, OpenMP, Package, Performance, performance portability

May 18, 2025 by hgpu

GPU Performance Portability needs Autotuning

Burkhard Ringlein, Thomas Parnell, Radu Stoica

View

Tags: AMD Radeon Instinct MI250, ATI, Auto-Tuning, Computer science, CUDA, DSL, HIP, LLM, nVidia, nVidia A100, Performance, performance portability

May 18, 2025 by hgpu

Exploration of Cryptocurrency Mining-Specific GPUs in AI Applications: A Case Study of CMP 170HX

Xing Kangwei

View

Tags: AI, Artificial intelligence, Benchmarking, Computer science, CUDA, Finance, nVidia, nVidia CMP 170HX, OpenCL, Performance

May 18, 2025 by hgpu

Scaling On-Device GPU Inference for Large Generative Models

Jiuqiang Tang, Raman Sarokin, Ekaterina Ignasheva, Grant Jensen, Lin Chen, Juhyun Lee, Andrei Kulik, Matthias Grundmann

View

Tags: AI, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia GeForce RTX 4090, OpenCL

May 4, 2025 by hgpu

Mìmir: A real-time interactive visualization library for CUDA programs

Francisco Carter, Nancy Hitschfeld, Cristóbal A. Navarro

View

Tags: Computer science, CUDA, nVidia, nVidia GeForce RTX 2070, Rendering, Visualization, Vulkan

May 4, 2025 by hgpu

Dynamic Memory Management on GPUs with SYCL

Russell K. Standish

View

Tags: Computer science, CUDA, HIP, Memory model, nVidia, Package, Performance, SYCL

May 4, 2025 by hgpu

SYCL Container

Exploring SYCL for batched kernels with memory allocations

CASS: Cuda-Amd aSSembly

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Cluser of smartphones for edge computing application using TensorFlow

Low-cost edge computing using upcycled smartphones

CFAL-bench

Comparing Parallel Functional Array Languages: Programming and Performance

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Efficient deep learning inference on end devices

Ouroboros: Virtualized Queues for dynamic memory management

Dynamic Memory Management on GPUs with SYCL

MSCCL++: A GPU-driven communication stack for scalable AI applications

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications

Benchmark compute shader of Unity against InteropUnityCUDA

InteropUnityCUDA: A Tool for Interoperability Between Unity and CUDA

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us:

contact@hpgu.org