high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision

Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology

Performing efficient NURBS modeling operations on the GPU

Performing with CUDA

PeriPy – A High Performance OpenCL Peridynamics Package

permGPU: Using graphics processing units in RNA microarray association studies

Permutation Index and GPU to Solve efficiently Many Queries

Persistent Kernels for Iterative Memory-bound GPU Applications

Persistent RNNs: Stashing Recurrent Weights On-Chip

Perturbation Functions in Computer Graphics

Petaflop biofluidics simulations on a two million-core system

Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems

Petascale computations for Large-scale Atomic and Molecular collisions

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures

Petascale elliptic solvers for anisotropic PDEs on GPU clusters

Petascale turbulence simulation using a highly parallel fast multipole method

Petascale visualization: Approaches and initial results

PFAC Library: GPU-based string matching algorithm

PFunc: modern task parallelism for modern high performance computing

PG-PuReMD: A Parallel-GPU Reactive Molecular Dynamics Package

PGEM: Preemptive GPGPU Execution Model for Runtime Engines

Pgx: Hardware-accelerated parallel game simulation for reinforcement learning

Phase Aware Memory Scheduling

Phase Based Volume Registration on the GPU with Application to Quantitative MRI

Phase Based Volume Registration Using CUDA

Phase diagram and critical behavior of the square-lattice Ising model with competing nearest- and next-nearest-neighbor interactions

Phase Transition in 3d Heisenberg Spin Glasses with Strong Random Anisotropies, through a Multi-GPU Parallelization

phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems

Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors

Photon mapping on programmable graphics hardware

Physical and graphical effects in OpenCL by example

Physical modeling and high-performance GPU computing for characterization, interception, and disruption of hazardous near-Earth objects

Physically Based Rendering: Implementation of Path Tracer

Physically-Based Interactive Flow Visualization Based on Schlieren and Interferometry Experimental Techniques

Physically-based interactive schlieren flow visualization

Physically-based painting style 3D image synthesis using GPU

Physically-Based Sound Synthesis on GPUs

Physically-based visual simulation on graphics hardware

Physics and Computing Performance of the Exa.TrkX TrackML Pipeline

Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers

PhysProver: Advancing Automatic Theorem Proving for Physics

Piccolo: building fast, distributed programs with partitioned tables

PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster

PIConGPU: Predictive Simulations of Laser-Particle Accelerators with Manycore Hardware

Piecewise Tri-linear Contouring for Multi-material Volumes

PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks

Piko: A Design Framework for Programmable Graphics Pipelines

PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural Framework

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks

Pipeline strategies to accelerate range query processing on a multi-GPU environment

Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units

Pipelined MapReduce: A Decoupled MapReduce RunTime for Shared Memory Multi-Processors

Pipelined Training with Stale Weights of Deep Convolutional Neural Networks

Pipelining the Fast Multipole Method over a Runtime System

PIPS Is not (just) Polyhedral Software

PIR: PMaC’s Idiom Recognizer

PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators

Pixel-Exact Rendering of Spacetime Finite Element Solutions

PixelPie: Maximal Poisson-disk Sampling with Rasterization

Places205-VGGNet Models for Scene Recognition

Planetary-Scale Terrain Composition

Plant Leaf Modeling and Rendering Based-On GPU

Plasma Visualization in Parallel using Particle Systems on Graphical Processing Units

Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications

Platform Characterization for Domain-Specific Computing

Platform-independent parallelization of the Lattice Boltzmann method with OpenCL

Platform-Specific Optimization and Mapping of Stencil Codes through Refinement

Playdoh: A lightweight Python library for distributed computing and optimisation

PLB-HeC: A Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters

Plenoptic Rendering With Interactive Performance Using GPUs

PlinkGPU: A Framework for GPU Acceleration of Whole Genome Data Analysis

PM4Py-GPU: a High-Performance General-Purpose Library for Process Mining

PMT: Power Measurement Toolkit

PNG1 triangles for tangent plane continuous surfaces on the GPU

PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime

PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability

pocl: A Performance-Portable OpenCL Implementation

Point Based Approximate Color Bleeding With Cuda

Point Based Color Bleeding with CUDA and Caching

Point Rendering in CUDA Path Tracer

Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs

Point to Line Mappings and Other Line Parameterizations not only for Hough Transform

Point to point processing of digital images using parallel computing

Point-wise Adaptive Filtering for Fast Monte Carlo Noise Reduction

Pointer Analysis for Semi-Automatic Code Parallelizers

Poisson-Boltzmann model for protein-surface electrostatic interactions and grid-convergence study using the PyGBe code

Policy-based Tuning for Performance Portability and Library Co-optimization

Polly – Polyhedral optimization in LLVM

Polly-ACC: Transparent compilation to heterogeneous hardware

Polyconvexification of the multi-label optical flow problem

Polymer Field-Theory Simulations on Graphics Processing Units

POMPEI: Programming with OpenMP4 for Exascale Investigations

PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope

PopSparse: Accelerated block sparse matrix multiplication on IPU

Population Parallel GP on the G80 GPU

Porous Rock Simulations and Lattice Boltzmann on GPUs

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Portability of Fortran’s ‘do concurrent’ on GPUs

Portable and Performant GPU/Heterogeneous Asynchronous Many-Task Runtime System

Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing

Brief statistics for this page

Titles: 100

Download open PDFs: 95

Package packages: 28

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)