Papers on hgpu.org (.txt-file)
Iris: First-Class Multi-GPU Programming Experience in Triton

IRIS: Illustrative Rendering for Integral Surfaces

Irradiation Instability at the Inner Edges of Accretion Disks

Irregular algorithms on the Xeon Phi

Irregularity Mitigation and Portability Abstractions for Accelerated Sparse Matrix Factorization

Is GPGPU CCL worth it? A performance comparison between some GPU and CPU algorithms for solving connected components labeling on binary images

Is OpenCL a suitable platform for algorithm development in health care systems?

Is the game worth the candle? Evaluation of OpenCL for object detection algorithm optimization

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs

Isocube: Exploiting the Cubemap Hardware

Isolated Scheduling for Distributed Training Tasks in GPU Clusters

Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)

Issues and challenges in compiling for graphics processors

Issues in Heterogenenous GPU Clusters

It’s all about data movement: Optimising FPGA data access to boost performance

Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures

Iterative CT Reconstruction on the GPU

Iterative GPGPU Linear Solvers for Sparse Matrices

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Iterative induced dipoles computation for molecular mechanics on GPUs

Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units

Iterative layer-based raytracing on CUDA
Iterative Methods for Visualization of Implicit Surfaces On GPU

Iterative optimization methods for efficient image restoration on multicore architectures

Iterative SLE Solvers over a CPU-GPU Platform
Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA

Iterative Statistical Kernels on Contemporary GPUs

iTree: Exploring Time-Varying Data using Indexable Tree

Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns

Jailbreaking LLM-Controlled Robots

Java with Auto-Parallelization on Graphics Coprocessing Architecture

JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

JIT-Compilation for Interactive Scientific Visualization

Jit4OpenCL: a compiler from Python to OpenCL

Job Parallelism using Graphical Processing Unit individual Multi-Processors and Highly Localised Memory

Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory

Join Algorithms on GPUs: A Revisit After Seven Years

Join Execution Using Fragmented Columnar Indices on GPU and MIC

Joint Forces: From Multithreaded Programming to GPU Computing
Joint-MAP Tomographic Reconstruction with Patch Similarity Based Mixture Prior Model

JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication

JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems

JSDoop and TensorFlow.js: Volunteer Distributed Web Browser-Based Neural Network Training

Julia as a unifying end-to-end workflow language on the Frontier exascale system

Jump flooding in GPU with applications to Voronoi diagram and distance transform

Just-in-time Acceleration of JavaScript

Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading

K-Means on Commodity GPUs with CUDA

K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching

k+-buffer: Fragment Synchronized k-buffer

K3 Moore’s Law in the Era of GPU Computing
KAdvice: infering synchronization patterns from an existing codebase
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks

Kalman Filter Tracking on Parallel Architectures

Kalman-Filter-Based Particle Tracking on Parallel Architectures at Hadron Colliders

kANN on the GPU with Shifted Sorting

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

Kargus: a Highly-scalable Software-based Intrusion Detection System

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators

Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs

KD-tree acceleration structures for a GPU raytracer

Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU

kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

Keeneland: Bringing heterogeneous GPU computing to the computational science community

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU
Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

Kernel Tuner: A search-optimizing GPU code auto-tuner

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Kernel-as-a-Service: A Serverless Interface to GPUs

Kernel-Centric Optimizations for Deep Neural Networks on GPGPU

KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit

KernelBench: Can LLMs Write Efficient GPU Kernels?

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting

KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs

KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters

Kernelized Renyi distance for speaker recognition

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

Kevin: Multi-Turn RL for Generating CUDA Kernels

Key derivation functions and their GPU implementation

Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography

Keynote address: Immersive exploration of large datasets
KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors

Kinematic Modelling of Disc Galaxies using Graphics Processing Units

Kinetics of liquid-solid phase transition in large nickel clusters

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Kite: Braided Parallelism for Heterogeneous Systems

KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs

kNN Query Processing in Metric Spaces Using GPUs

Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen

Titles: 100
open PDFs: 92
packages: 28
