high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

HipKittens: Fast and Furious AMD Kernels

HIPRT: A Ray Tracing Framework in HIP

HiRace: Accurate and Fast Source-Level Race Checking of GPU Programs

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics

Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications

Historygrams: Enabling Interactive Global Illumination in Direct Volume Rendering using Photon Mapping

HLS Portability from Intel to Xilinx: A Case Study

hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis

hlslib: Software Engineering for Hardware Design

HOCL: A Family of Embedded Languages

Home-made Diffusion Model from Scratch to Hatch

Homomorphic Autocomplete

Homomorphic-Encrypted Volume Rendering

Homunculus Warping: Conveying importance using self-intersection-free non-homogeneous mesh deformation

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

HORIZON: Accelerated General Relativistic Magnetohydrodynamics

Hotspot Analysis Based Partial CUDA Acceleration of HMMER 3.0 on GPGPUs

How a Single Chip Causes Massive Power Bills. GPUSimPow: A GPGPU Power Simulator

How GPUs Can Improve the Quality of Magnetic Resonance Imaging

How GPUs Work

How much can we gain from Tensor Kernel Fusion on GPUs?

How to Benefit from AMD, Intel and Nvidia Accelerator Technologies in Scilab

How to Correctly Deal With Pseudorandom Numbers in Manycore Environments – Application to GPU programming with Shoverand

How to distribute most efficiently a computation intensive calculation on an Android device to external compute units with an Android API

How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

How to Render FDTD Computations More Effective Using a Graphics Accelerator

How to Rent GPUs on a Budget

How to scale distributed deep learning?

How to Train BERT with an Academic Budget

How well do STARLAB and NBODY compare? II: Hardware and accuracy

HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU

HPC acceleration of large (min, +) matrix products to compute domination-type parameters in graphs

HPC on the Intel Xeon Phi: Homomorphic Word Searching

HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages

HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms

HPP-Controller: An intra-node controller designed for connecting heterogeneous CPUs

HPVM: A Portable Virtual Instruction Set for Heterogeneous Parallel Systems

HPVM: Heterogeneous Parallel Virtual Machine

HPX – The C++ Standard Library for Parallelism and Concurrency

HSApriori: High Speed Association Rule Mining using Apriori Based Algorithm for GPU

HSPA+/LTE-A Turbo Decoder on GPU and Multicore CPU

HSTREAM: A directive-based language extension for heterogeneous stream computing

HTML5 WebSocket protocol and its application to distributed computing

HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads

Human Re-identification System On Highly Parallel GPU and CPU Architectures

Humanoid navigation planning using future perceptive capability

Hunting CUDA Bugs at Scale with cuFuzz

Hybrid Acceleration of a Molecular Dynamics Simulation Using Short-Ranged Potentials

Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators

Hybrid Algorithms for List Ranking and Graph Connected Components

Hybrid coherence for scalable multicore architectures

Hybrid computational voxelization using the graphics pipeline

Hybrid Core Acceleration of UWB SIRE Radar Signal Processing

Hybrid CPU and GPGPU Volunteer Computing Framework over the Extensible Messaging and Presence Protocol for Prallel Branch and Bound Optimization of Truss Structures

Hybrid CPU-GPU Distributed Framework for Large Scale Mobile Networks Simulation

Hybrid CPU-GPU execution support in the skeleton programming framework SkePU

Hybrid CPU-GPU Framework for Network Motifs

Hybrid CPU-GPU generation of the Hamiltonian and Overlap matrices in FLAPW methods

Hybrid CPU-GPU Implementation of Tracking-Learning-Detection Algorithm

Hybrid CPU-GPU Pipeline Framework

Hybrid CPU/GPU KD-Tree Construction for Versatile Ray Tracing

Hybrid CPU/GPU/APU accelerated query, insert, update and erase operations in hash tables with string keys

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Hybrid Embarrassingly Parallel on heterogeneous platform

Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model

Hybrid Framework for pairwise DNA Sequence Alignment Using the CUDA compatible GPU

Hybrid GATE: A GPU/CPU implementation for imaging and therapy applications

Hybrid general-purpose computation on GPU (GPGPU) and computer graphics synthetic aperture radar simulation for complex scenes

Hybrid GPU-Based Single- and Double-Bounce SAR Simulation

Hybrid GPU-CPU Adaptive Precision Ray-Triangle Intersection Tests for Robust High-Performance GPU Dosimetry Computations

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters

Hybrid Monte Carlo CT Simulation on GPU

Hybrid Monte Carlo with Wilson Dirac operator on the Fermi GPU

Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

Hybrid MPI/GPU Interpolation for Grid DEM Construction

Hybrid Multicore Algorithms for Some Semi-Numerical Applications and Graphs

Hybrid of genetic algorithm and local search to solve MAX-SAT problem using nVidia CUDA framework

Hybrid OpenCL over high speed networks

Hybrid OpenCL: Connecting Different OpenCL Implementations over Network

Hybrid OpenCL: Enhancing OpenCL for Distributed Processing

Hybrid Parallel Light-Weight Programming of Hybrid Systems

Hybrid parallel programming – evaluation of OpenACC

Hybrid Parallel Streamline Extraction Combining MPI and OpenCL

Hybrid Parallelism for Volume Rendering on Large, Multi-and Many-core Systems

Hybrid Particle Lattice Boltzmann Shallow Water for interactive fluid simulations

Hybrid Programming using OpenSHMEM and OpenACC

Hybrid quantum programming with PennyLane Lightning on HPC platforms

Hybrid Ray Tracing and Path Tracing of Bezier Surfaces Using A Mixed Hierarchy

Hybrid Sample-based Surface Rendering

Hybrid Scheduling for Event-driven Simulation over Heterogeneous Computers

Hybrid Single/Double Precision Floating-Point Computation on GPU Accelerators for 2-D FDTD

Hybrid smoothed particle hydrodynamics

Hybrid strategy for stencil computations on the APU

Hybrid Update Algorithms for Regular Lattice and Small-World Ising Models on Graphical Processing Units

Brief statistics for this page

Titles: 100

Download open PDFs: 91

Package packages: 25

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)