high performance computing on graphics processing units: hgpu.org

hgpu.org » CUDA

A methodology for comparing optimization algorithms for auto-tuning

Floris-Jan Willemsen, Richard Schoonhoven, Jiří Filipovič, Jacob O. Tørring, Rob van Nieuwpoort, Ben van Werkhoven

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, nVidia, nVidia GeForce RTX 2080 Ti, nVidia GeForce RTX 3090, Package, Performance

June 16, 2024 by hgpu

How much can we gain from Tensor Kernel Fusion on GPUs?

Wei Sun, Ang Li, Sander Stuijk, Henk Corporaal

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Matrix multiplication, Neural networks, nVidia, nVidia A100, nVidia H100

June 16, 2024 by hgpu

Memory Interference and Performance Prediction in GPU-Accelerated Heterogeneous Systems

Alessio Masola

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce RTX 2070, nVidia GeForce RTX 2080, nVidia Jetson AGX Xavier, Rendering, Thesis, Visualization

June 9, 2024 by hgpu

Gaining Cross-Platform Parallelism for HAL’s Molecular Dynamics Package using SYCL

Viktor Skoblin, Felix Höfling, Steffen Christgau

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI210, ATI, Computer science, CUDA, Molecular dynamics, nVidia, nVidia A100, nVidia A40, Package, Physics, SYCL

June 9, 2024 by hgpu

Addressing Challenges in Utilizing GPUs for Accelerating Privacy-Preserving Computation

Ardhi Wiratama Baskara Yudha

View

Download (PDF)

Tags: Computer science, CUDA, Machine learning, nVidia, nVidia GeForce RTX 2080, nVidia GeForce RTX 3090, Security, Thesis

June 2, 2024 by hgpu

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

L.A. Torres, Carlos J. Barrios H, Yves Denneulin

View

Download (PDF)

Source codes

Tags: Computer science, CUBLAS, CUDA, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia A100, Package, Performance, SYCL

June 2, 2024 by hgpu

An implementation of tensor product patch smoothers on GPU

Cu Cui, Paul Grosse-Bley, Guido Kanschat, Robert Strzodka

View

Download (PDF)

Tags: CUDA, FEM, Finite element method, Mathematics, Numerical Analysis, nVidia, nVidia A100

June 2, 2024 by hgpu

A Survey of Cloud-Based GPU Threats and Their Impact on AI, HPC, and Cloud Computing

Numaan Huq, Philippe Lin, Roel Reyes, Charles Perine

View

Download (PDF)

Tags: AMD Radeon Pro V520, Artificial intelligence, ATI, Cloud, Computer science, CUDA, Deep learning, nVidia, OpenCL, Security, Tesla T4

June 2, 2024 by hgpu

GPU Implementations for Midsize Integer Addition and Multiplication

Cosmin E. Oancea, Stephen M. Watt

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia A100, Performance, Programming Languages

May 26, 2024 by hgpu

STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

Roberto L. Castro, Diego Andrade, Basilio B. Fraguela

View

Download (PDF)

Tags: Auto-Tuning, Computer science, CUDA, Deep learning, Machine learning, nVidia, nVidia A100, Tesla T4

May 26, 2024 by hgpu

Kernel-Centric Optimizations for Deep Neural Networks on GPGPU

Zhaodong Chen

View

Download (PDF)

Tags: Computer science, Computer vision, CUDA, Deep learning, Neural networks, nVidia, nVidia A100, nVidia V100, Thesis

May 26, 2024 by hgpu

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Urvij Saroliya, Eishi Arima, Dai Liu, Martin Schulz

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, Machine learning, nVidia, nVidia A100, PC cluster, Task scheduling

May 20, 2024 by hgpu

GPUODEBenchmarks: Comparsion of Julia's GPU Kernel based ODE solvers with other open-source GPU ODE solvers

Automating Heterogeneous Parallelism in Numerical Differential Equations

Dat3M: Memory Model Aware Verification

Towards Unified Analysis of GPU Consistency

NVIDIA Federated Learning Application Runtime Environment

Supercharging Federated Learning with Flower and NVIDIA FLARE

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

PSCToolkit: solving sparse linear systems with a large number of GPUs

CATBench: Benchmarking Framework

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

General-purpose Polyhedral Compilers

A Survey of General-purpose Polyhedral Compilers

Optimal Kernel Orchestration for Tensor Programs with Korch

Astaroth: A Scalable Multi-GPU Library for Stencil Computations

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

A methodology for comparing optimization algorithms for auto-tuning

How much can we gain from Tensor Kernel Fusion on GPUs?

Memory Interference and Performance Prediction in GPU-Accelerated Heterogeneous Systems

Gaining Cross-Platform Parallelism for HAL’s Molecular Dynamics Package using SYCL

Addressing Challenges in Utilizing GPUs for Accelerating Privacy-Preserving Computation

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

An implementation of tensor product patch smoothers on GPU

GPU Implementations for Midsize Integer Addition and Multiplication

STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

Kernel-Centric Optimizations for Deep Neural Networks on GPGPU

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Recent source codes

GPUODEBenchmarks: Comparsion of Julia's GPU Kernel based ODE solvers with other open-source GPU ODE solvers

Dat3M: Memory Model Aware Verification

NVIDIA Federated Learning Application Runtime Environment

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

PSCToolkit: solving sparse linear systems with a large number of GPUs

CATBench: Benchmarking Framework

General-purpose Polyhedral Compilers

Optimal Kernel Orchestration for Tensor Programs with Korch

Astaroth: A Scalable Multi-GPU Library for Stencil Computations

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking

Most viewed papers (last 30 days)