Papers on hgpu.org (.txt-file)
Iterative GPGPU Linear Solvers for Sparse Matrices

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Iterative induced dipoles computation for molecular mechanics on GPUs

Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units

Iterative layer-based raytracing on CUDA
Iterative Methods for Visualization of Implicit Surfaces On GPU

Iterative optimization methods for efficient image restoration on multicore architectures

Iterative SLE Solvers over a CPU-GPU Platform
Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA

Iterative Statistical Kernels on Contemporary GPUs

iTree: Exploring Time-Varying Data using Indexable Tree

Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns

Jailbreaking LLM-Controlled Robots

Java with Auto-Parallelization on Graphics Coprocessing Architecture

JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

JIT-Compilation for Interactive Scientific Visualization

Jit4OpenCL: a compiler from Python to OpenCL

Job Parallelism using Graphical Processing Unit individual Multi-Processors and Highly Localised Memory

Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory

Join Algorithms on GPUs: A Revisit After Seven Years

Join Execution Using Fragmented Columnar Indices on GPU and MIC

Joint Forces: From Multithreaded Programming to GPU Computing
Joint-MAP Tomographic Reconstruction with Patch Similarity Based Mixture Prior Model

JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication

JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems

JSDoop and TensorFlow.js: Volunteer Distributed Web Browser-Based Neural Network Training

Julia as a unifying end-to-end workflow language on the Frontier exascale system

Jump flooding in GPU with applications to Voronoi diagram and distance transform

Just-in-time Acceleration of JavaScript

Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading

K-Means on Commodity GPUs with CUDA

K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching

k+-buffer: Fragment Synchronized k-buffer

K3 Moore’s Law in the Era of GPU Computing
KAdvice: infering synchronization patterns from an existing codebase
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks

Kalman Filter Tracking on Parallel Architectures

Kalman-Filter-Based Particle Tracking on Parallel Architectures at Hadron Colliders

kANN on the GPU with Shifted Sorting

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

Kargus: a Highly-scalable Software-based Intrusion Detection System

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators

Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs

KD-tree acceleration structures for a GPU raytracer

Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU

kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

Keeneland: Bringing heterogeneous GPU computing to the computational science community

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU
Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications

Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

Kernel Tuner: A search-optimizing GPU code auto-tuner

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Kernel-as-a-Service: A Serverless Interface to GPUs

Kernel-Centric Optimizations for Deep Neural Networks on GPGPU

KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit

KernelBench: Can LLMs Write Efficient GPU Kernels?

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting

KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs

KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters

Kernelized Renyi distance for speaker recognition

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

Kevin: Multi-Turn RL for Generating CUDA Kernels

Key derivation functions and their GPU implementation

Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography

Keynote address: Immersive exploration of large datasets
KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors

Kinematic Modelling of Disc Galaxies using Graphics Processing Units

Kinetics of liquid-solid phase transition in large nickel clusters

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling

Kite: Braided Parallelism for Heterogeneous Systems

KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs

kNN Query Processing in Metric Spaces Using GPUs

Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen

Kokkos: Enabling performance portability across manycore architectures

Krylov Subspace Accelerated Algebraic Multigrid for Mimetic Finite Differences on GPUs

KUDA: GPU Accelerated Split Race Checker

LAMDA: Learning-Assisted Multi-Stage Autotuning for FPGA Design Closure

LAMMPS’ PPPM Long-Range Solver for the Second Generation Xeon Phi

LAMMPScuda – a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses

Landau Gauge Fixing on GPUs and String Tension

Langevin dynamics simulations of biomolecules on graphics processors

Language Modeling with Gated Convolutional Networks

Language virtualization for heterogeneous parallel computing

Large calculation of the flow over a hypersonic vehicle using a GPU

Large data visualization on distributed memory multi-GPU clusters

Large Integer Arithmetic in GPU for Cryptography

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

Large neighborhood local search optimization on graphics processing units

Large scale 3D shape retrieval by exploiting multi-core and GPU

Titles: 100
open PDFs: 91
packages: 32
