high performance computing on graphics processing units: hgpu.org

hgpu.org » Performance

kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

Keichi Takahashi, Wassapon Watanakeesuntorn, Kohei Ichikawa, Joseph Park, Ryousei Takano, Jason Haga, George Sugihara, Gerald M. Pao

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, HPC, nVidia, OpenCL, Package, Performance, performance portability

May 30, 2021 by hgpu

NPBench: A Benchmarking Suite for High-Performance NumPy

Alexandros Nikolaos Ziogas, Tal Ben-Nun, Timo Schneider, Torsten Hoefler

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, HPC, Linear Algebra, nVidia, Package, Performance, Python, Tesla V100

May 16, 2021 by hgpu

Winograd Algorithm for AdderNet

Wenshuo Li, Hanting Chen, Mingqiang Huang, Xinghao Chen, Chunjing Xu, Yunhe Wang

View

Download (PDF)

Tags: Algorithms, CNN, Computer science, Deep learning, FPGA, Machine learning, Neural networks, Performance

May 16, 2021 by hgpu

Performance analysis and optimization of highly diverging algorithms on GPUs

Hendrik Schwanekamp

View

Download (PDF)

Tags: CUDA, nVidia, nVidia A100, nVidia GeForce GTX 1080, Performance, Physics

May 2, 2021 by hgpu

Performance Analysis and Optimization Opportunities for NVIDIA Automotive GPUs

Hamid Tabani, Fabio Mazzocchetti, Pedro Benedicte, Jaume Abella, Francisco J. Cazorla

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, nVidia, nVidia Jetson AGX Xavier, nVidia Jetson TX2, Performance

April 25, 2021 by hgpu

Performance Monitoring of Multi-FPGA Systems

Arzhang Rafii

View

Download (PDF)

Source codes

Tags: Computer science, FPGA, Package, Performance, Thesis

April 11, 2021 by hgpu

Accelerating Deep Neural Networks implementation: A survey

Meriam Dhouibi, Ahmed Karim Ben Salem, Afef Saidi, Slim Ben Saoud

View

Download (PDF)

Tags: ASIC, Computer science, Deep learning, FPGA, HLS, Neural networks, OpenCL, Optimization, Performance, survey

March 28, 2021 by hgpu

Accelerating Sparse Approximate Matrix Multiplication on GPUs

Xiaoyan Liu, Yi Liu, Ming Dun, Bohong Yin, Hailong Yang, Zhongzhi Luan, Depei Qian

View

Download (PDF)

Tags: Computer science, CUBLAS, CUDA, Linear Algebra, Matrix multiplication, nVidia, Performance, Sparse matrix, Tesla V100

March 28, 2021 by hgpu

Porting a sparse linear algebra math library to Intel GPUs

Yuhsiang M. Tsai, Terry Cojean, Hartwig Anzt

View

Download (PDF)

Source codes

Tags: AMD Radeon VII, ATI, Benchmarking, Computer science, HIP, Linear Algebra, nVidia, Package, Performance, Sparse, Sparse matrix, Tesla V100

March 21, 2021 by hgpu

A Deep Learning Based Cost Model for Automatic Code Optimization

Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham Leghettas, Kamel Abdous, Taha Arbaoui, Karima Benatchba, Saman Amarasinghe

View

Download (PDF)

Tags: Compilers, Computer science, Deep learning, Optimization, Performance

March 21, 2021 by hgpu

Using hardware performance counters to speed up autotuning convergence on GPUs

Jiří Filipovič, Jana Hozzová, Amin Nezarat, Jaroslav Oľha, Filip Petrovič

View

Download (PDF)

Source codes

Tags: Auto-Tuning, Computer science, CUDA, HPC, Machine learning, nVidia, nVidia GeForce GTX 1070, nVidia GeForce RTX 2080, OpenCL, Package, Performance, Vulkan

February 23, 2021 by hgpu

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, nVidia GeForce RTX 2070, Performance, Python, Tesla P100, Tesla V100

February 7, 2021 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

NPBench: A Benchmarking Suite for High-Performance NumPy

Winograd Algorithm for AdderNet

Performance analysis and optimization of highly diverging algorithms on GPUs

Performance Analysis and Optimization Opportunities for NVIDIA Automotive GPUs

Performance Monitoring of Multi-FPGA Systems

Accelerating Deep Neural Networks implementation: A survey

Accelerating Sparse Approximate Matrix Multiplication on GPUs

Porting a sparse linear algebra math library to Intel GPUs

A Deep Learning Based Cost Model for Automatic Code Optimization

Using hardware performance counters to speed up autotuning convergence on GPUs

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)