high performance computing on graphics processing units: hgpu.org

hgpu.org » CUDA

Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure

Naylor Garcia Bachiega

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Heterogeneous systems, HPC, nVidia, nVidia GeForce GTX 650, Package, Thesis, Virtualization

January 30, 2022 by hgpu

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Cody Rivera, Sheng Di, Jiannan Tian, Xiaodong Yu, Dingwen Tao, Franck Cappello

View

Download (PDF)

Source codes

Tags: Algorithms, Compression, Computer science, CUDA, nVidia, Package, Tesla V100

January 30, 2022 by hgpu

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU

Jou-An Chen, Hsin-Hsuan Sung, Nathan Tallent, Kevin Barker, Xipeng Shen, Ang Li

View

Download (PDF)

Source codes

Tags: BLAS, Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce GTX 1080, nVidia GeForce GTX Titan V, Package, Sparse

January 30, 2022 by hgpu

Multi-hetero Acceleration by GPU and FPGA for Astrophysics Simulation on oneAPI Environment

Ryuta Kashino, Ryohei Kobayashi, Norihisa Fujita, Taisuke Boku

View

Download (PDF)

Tags: Astrophysics, CUDA, FPGA, Heterogeneous systems, nVidia, OpenCL, Physics, SYCL

January 23, 2022 by hgpu

NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics

Raimondas Galvelis, Alejandro Varela-Rial, Stefan Doerr, Roberto Fino, Peter Eastman, Thomas E. Markland, John D. Chodera, Gianni De Fabritiis

View

Download (PDF)

Source codes

Tags: Biology, Chemistry, CUDA, GeForce RTX 2080 Ti, Machine learning, Molecular dynamics, Molecular simulation, Neural networks, nVidia, Package

January 23, 2022 by hgpu

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens

View

Download (PDF)

Source codes

Tags: Algorithms, Computer science, CUDA, Deep learning, Machine learning, nVidia, nVidia GeForce GTX Titan XP, Package, Performance, Tesla P100, Tesla V100

January 23, 2022 by hgpu

Research and Development of Porting SYCL on QNX Operating System for High Parallelism

Dengpan Wang

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, OpenCL, Operating systems, PTX, SYCL, Thesis

January 16, 2022 by hgpu

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Mhd Ghaith Olabi, Juan Gómez Luna, Onur Mutlu, Wen-mei Hwu, Izzat El Hajj

View

Download (PDF)

Source codes

Tags: Compilers, Computer science, CUDA, nVidia, Package, Performance, Tesla V100

January 16, 2022 by hgpu

Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment

Hulin Dai, Xuan Peng, Xuanhua Shi, Ligang He, Qian Xiong, Hai Jin

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, Deep learning, Neural networks, nVidia, Performance, PyTorch, TensorFlow, Tesla P100

January 9, 2022 by hgpu

Analysis of High Level implementations for Recursive Methods on GPUs

Cheng Cao, Justin Kalloor

View

Download (PDF)

Tags: Computer science, CUDA, DSL, nVidia, nVidia GeForce RTX 3080, OpenGL, Vulkan

January 9, 2022 by hgpu

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Farui Wang, Weizhe Zhang, Shichao Lai, Meng Hao, Zheng Wang

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, Energy-efficient computing, Machine learning, nVidia, nVidia GeForce RTX 3080 Ti, Package

January 9, 2022 by hgpu

Domain-Specific On-Device Object Detection Method

Seongju Kang, Jaegi Hwang, Kwangsue Chung

View

Download (PDF)

Tags: Computer science, Computer vision, CUDA, Neural networks, nVidia, nVidia GeForce RTX 2060

January 9, 2022 by hgpu

EnergyUCB-Bandit

Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

Faial: finds bugs in CUDA kernels

Sound and Partially-Complete Static Analysis of Data-Races in GPU Programs

Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation

UVaFTLE: Lagrangian finite time Lyapunov exponent extraction for fluid dynamic applications

Challenging Portability Paradigms: FPGA Acceleration Using SYCL and OpenCL

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Teaching Parallel Programming in Containers: Virtualization of a Heterogeneous Local Infrastructure

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU

Multi-hetero Acceleration by GPU and FPGA for Astrophysics Simulation on oneAPI Environment

NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Research and Development of Porting SYCL on QNX Operating System for High Parallelism

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment

Analysis of High Level implementations for Recursive Methods on GPUs

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Domain-Specific On-Device Object Detection Method

Recent source codes

EnergyUCB-Bandit

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

Faial: finds bugs in CUDA kernels

Effects of OpenCL-Based Parallelization Methods on Explicit Numerical Methods to Solve the Heat Equation

Intel® SHMEM: Device initiated shared memory based communication library

miniLB: Lattice Botlzmann miniapp w/SYCL

AFOCL

2domination

MFC: Exascale simulation of multiphase/physics fluid dynamics

UVaFTLE: Lagrangian finite time Lyapunov exponent extraction for fluid dynamic applications

Most viewed papers (last 30 days)