high performance computing on graphics processing units: hgpu.org

hgpu.org » OpenCL

Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling

Wentao Liang, Norihisa Fujita, Ryohei Kobayashi, Taisuke Boku

View

Download (PDF)

Tags: Computer science, CUDA, FPGA, Heterogeneous systems, nVidia, oneAPI, OpenCL, SYCL, Tesla V100

April 7, 2024 by hgpu

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

Massinissa Merouani, Khaled Afif Boudaoud, Iheb Nassim Aouadj, Nassim Tchoulak, Islam Kara Bernou, Hamza Benyamina, Fatima Benbouzid-Si Tayeb, Karima Benatchba, Hugh Leather, Riyadh Baghdadi

View

Download (PDF)

Source codes

Tags: Compilers, Computer science, Deep learning, Machine learning, OpenCL, Package, Programming Languages

March 24, 2024 by hgpu

Parallel programming in mobile devices with FancyJCL

Sergio Afonso, Óscar Gómez-Cárdenes, Paula Expósito, Vicente Blanco, Francisco Almeida

View

Download (PDF)

Source codes

Tags: Computer science, DSP, Heterogeneous systems, Image processing, Java, OpenCL, Package

March 3, 2024 by hgpu

Analyzing GPU Performance in Virtualized Environments: A Case Study

Adel Belkhiri, Michel Dagenais

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Java, nVidia, OpenCL, Package, Performance, Virtualization

February 25, 2024 by hgpu

Deductive verification for SYCL

Ellen Wittingen

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Heterogeneous systems, HIP, nVidia, OpenCL, OpenMP, Package, SYCL, Thesis

February 4, 2024 by hgpu

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

Tyler Sorensen, Heidy Khlaaf

View

Download (PDF)

Source codes

Tags: Computer science, Data recovery, nVidia, nVidia GeForce RTX 4070, OpenCL, Package, Security, Vulkan

February 4, 2024 by hgpu

Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame

Jolly Chen, Monica Dessole, Ana Lucia Varbanescu

View

Download (PDF)

Source codes

Tags: CUDA, HEP, nVidia, nVidia Quadro RTX 4000, OpenCL, Package, Performance, Physics, SYCL

January 28, 2024 by hgpu

A Heterogeneous Inference Framework for a Deep Neural Network

Rafael Gadea-Gironés, José Luís Rocabado-Rocha, Jorge Fe, Jose M. Monzo

View

Download (PDF)

Tags: Artificial intelligence, Computer science, Deep learning, FPGA, Heterogeneous systems, HLS, Machine learning, Neural networks, OpenCL, PyTorch

January 28, 2024 by hgpu

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC

Wei-Chen Lin, Simon McIntosh-Smith, Tom Deakin

View

Download (PDF)

Tags: AMD Radeon Instinct MI100, AMD Radeon VII, Computer science, Heterogeneous systems, HIP, nVidia, OpenCL, Performance, SYCL

January 14, 2024 by hgpu

Code Generation for a Variety of Accelerators for a Graph DSL

Ashwina Kumar, M. Venkata Krishna, Prasanna Bartakke, Rahul Kumar, Rajesh Pandian M, Nibedita Behera, Rupesh Nasre

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, DSL, nVidia, nVidia GeForce RTX 2080 Ti, OpenACC, OpenCL, Package, SYCL, Tesla V100

January 14, 2024 by hgpu

Deep Learning for Obfuscated Code Analysis

Alexander Shroyer

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, nVidia, nVidia GeForce RTX 2060, OpenCL, Package, PyTorch, Thesis

January 7, 2024 by hgpu

UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework

Biyao Che, Zixiao Wang, Ying Chen, Liang Guo, Yuan Liu, Yuan Tian, Jizhuang Zhao

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, FPGA, nVidia, OpenCL, Security, Tesla T4

January 7, 2024 by hgpu

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

Parallel programming in mobile devices with FancyJCL

Analyzing GPU Performance in Virtualized Environments: A Case Study

Deductive verification for SYCL

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame

A Heterogeneous Inference Framework for a Deep Neural Network

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC

Code Generation for a Variety of Accelerators for a Graph DSL

Deep Learning for Obfuscated Code Analysis

UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)