high performance computing on graphics processing units: hgpu.org

Applications

hgpu.org » Applications

Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs

Endri Taka, Dimitrios Gourounas, Andreas Gerstlauer, Diana Marculescu, Aman Arora

View

Download (PDF)

Tags: AI, Computer science, Deep learning, FPGA, GEMM, Matrix multiplication

April 21, 2024 by hgpu

Software Optimization and Orchestration for Heterogeneous and Distributed Architectures

Francesco Lumpp

View

Download (PDF)

Tags: Computer science, Heterogeneous systems, nVidia, nVidia Jetson AGX Xavier, nVidia Jetson TX2, Performance, Thesis

April 21, 2024 by hgpu

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

Peter Thoman, Fabian Knorr, Luigi Crisci

View

Download (PDF)

Source codes

Tags: Computer science, HPC, Package, Performance, SYCL

April 21, 2024 by hgpu

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Feng Liang, Zhen Zhang, Haifeng Lu, Victor C. M. Leung, Yanyi Guo, Xiping Hu

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Heterogeneous systems, nVidia, nVidia A100, survey, Task scheduling

April 21, 2024 by hgpu

Python-Based Quantum Chemistry Calculations with GPU Acceleration

Xiaojie Wu, Qiming Sun, Zhichen Pu, Tianze Zheng, Wenzhi Ma, Wen Yan, Xia Yu, Zhengxiao Wu, Mian Huo, Xiang Li, Weiluo Ren, Sheng Gong, Yumin Zhang, Weihao Gao

View

Download (PDF)

Source codes

Tags: Chemical Physics, Chemistry, Computational Physics, CUDA, nVidia, nVidia A100, Package, Python, Quantum Physics

April 21, 2024 by hgpu

High Performance Privacy Preserving AI

Jayavanth Shenoy, Patrick Grinaway, Shriphani Palakodety

View

Download (PDF)

Tags: AI, Artificial intelligence, Book, Computer science, Neural networks, Security

April 14, 2024 by hgpu

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

Barnaby van Straaten, Joseph Hickie, Lucas Schorling, Jonas Schuff, Federico Fedele, Natalia Ares

View

Download (PDF)

Source codes

Tags: Condensed matter, Machine learning, Mesoscale and Nanoscale Physics, nVidia, nVidia GeForce GTX 1080 Ti, Package, Physics, Python, Rust

April 14, 2024 by hgpu

OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

Francesco Salvadore, Giacomo Rossi, Srikanth Sathyanarayana, Matteo Bernardini

View

Download (PDF)

Tags: AMD Radeon Instinct MI250X, ATI, Benchmarking, cfd, Compression, Fluid dynamics, Intel, Intel Data Center GPU Max 1550, nVidia, nVidia A100, OpenMP

April 14, 2024 by hgpu

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Jianhua Gao, Bingjie Liu, Weixing Ji, Hua Huang

View

Download (PDF)

Tags: Computer science, FPGA, Heterogeneous systems, Linear Algebra, Machine learning, Overview, Sparse matrix

April 14, 2024 by hgpu

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Peter Thoman, Philip Salzmann

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, GPU cluster, HPC, nVidia, nVidia V100, Package, SYCL

April 14, 2024 by hgpu

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

Keller Jordan

View

Download (PDF)

Source codes

Tags: Computer science, Computer vision, Image processing, Machine learning, nVidia

April 7, 2024 by hgpu

Speed, power and cost implications for GPU acceleration of Computational Fluid Dynamics on HPC systems

Zachary Cooper-Baldock, Brenda Vara Almirall, Kiao Inthavong

View

Download (PDF)

Tags: cfd, Fluid dynamics, HPC, MPI, nVidia, nVidia A100, nVidia V100, Performance

April 7, 2024 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Applications

Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs

Software Optimization and Orchestration for Heterogeneous and Distributed Architectures

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Python-Based Quantum Chemistry Calculations with GPU Acceleration

High Performance Privacy Preserving AI

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

Speed, power and cost implications for GPU acceleration of Computational Fluid Dynamics on HPC systems

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)