Papers on hgpu.org (.txt-file)
Invited paper: Accelerating neuromorphic vision on FPGAs
IODA: an Input/Output Deep Architecture for image labeling
IP routing processing with graphic processors
IPMACC: Open Source OpenACC to CUDA/OpenCL Translator
IPMACC: Translating OpenACC API to OpenCL
Iris Matching Algorithm on Many-Core Platforms
Iris recognition on GPU with the usage of Non-Negative Matrix Factorization
IRIS: Illustrative Rendering for Integral Surfaces
Irradiation Instability at the Inner Edges of Accretion Disks
Irregular algorithms on the Xeon Phi
Irregularity Mitigation and Portability Abstractions for Accelerated Sparse Matrix Factorization
Is GPGPU CCL worth it? A performance comparison between some GPU and CPU algorithms for solving connected components labeling on binary images
Is OpenCL a suitable platform for algorithm development in health care systems?
Is the game worth the candle? Evaluation of OpenCL for object detection algorithm optimization
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs
Isocube: Exploiting the Cubemap Hardware
Isolated Scheduling for Distributed Training Tasks in GPU Clusters
Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)
Issues and challenges in compiling for graphics processors
Issues in Heterogenenous GPU Clusters
It’s all about data movement: Optimising FPGA data access to boost performance
Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures
Iterative CT Reconstruction on the GPU
Iterative GPGPU Linear Solvers for Sparse Matrices
Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies
Iterative induced dipoles computation for molecular mechanics on GPUs
Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units
Iterative layer-based raytracing on CUDA
Iterative Methods for Visualization of Implicit Surfaces On GPU
Iterative optimization methods for efficient image restoration on multicore architectures
Iterative SLE Solvers over a CPU-GPU Platform
Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA
Iterative Statistical Kernels on Contemporary GPUs
iTree: Exploring Time-Varying Data using Indexable Tree
Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns
Jailbreaking LLM-Controlled Robots
Java with Auto-Parallelization on Graphics Coprocessing Architecture
JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
JIT-Compilation for Interactive Scientific Visualization
Jit4OpenCL: a compiler from Python to OpenCL
Job Parallelism using Graphical Processing Unit individual Multi-Processors and Highly Localised Memory
Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory
Join Algorithms on GPUs: A Revisit After Seven Years
Join Execution Using Fragmented Columnar Indices on GPU and MIC
Joint Forces: From Multithreaded Programming to GPU Computing
Joint-MAP Tomographic Reconstruction with Patch Similarity Based Mixture Prior Model
JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication
JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems
JSDoop and TensorFlow.js: Volunteer Distributed Web Browser-Based Neural Network Training
Julia as a unifying end-to-end workflow language on the Frontier exascale system
Jump flooding in GPU with applications to Voronoi diagram and distance transform
Just-in-time Acceleration of JavaScript
Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading
K-Means on Commodity GPUs with CUDA
K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching
k+-buffer: Fragment Synchronized k-buffer
K3 Moore’s Law in the Era of GPU Computing
KAdvice: infering synchronization patterns from an existing codebase
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks
Kalman Filter Tracking on Parallel Architectures
Kalman-Filter-Based Particle Tracking on Parallel Architectures at Hadron Colliders
kANN on the GPU with Shifted Sorting
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Kargus: a Highly-scalable Software-based Intrusion Detection System
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs
KD-tree acceleration structures for a GPU raytracer
Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU
kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos
Keeneland: Bringing heterogeneous GPU computing to the computational science community
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3
Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU
Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications
Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
Kernel Tuner: A search-optimizing GPU code auto-tuner
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
Kernel-as-a-Service: A Serverless Interface to GPUs
Kernel-Centric Optimizations for Deep Neural Networks on GPGPU
KernelBench: Can LLMs Write Efficient GPU Kernels?
Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
KERNELGEN – A Toolchain for Automatic GPU-centric Applications Porting
KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs
KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters
Kernelized Renyi distance for speaker recognition
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications
Key derivation functions and their GPU implementation
Key Reconciliation with Low-Density Parity-Check Codes for Long-Distance Quantum Cryptography
Keynote address: Immersive exploration of large datasets
KFusion: Obtaining Modularity and Performance with Regards to General Purpose GPU Computing and Co-processors
Kinematic Modelling of Disc Galaxies using Graphics Processing Units
Kinetics of liquid-solid phase transition in large nickel clusters
Kite: Braided Parallelism for Heterogeneous Systems
KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs
Titles: 100
open PDFs: 91
packages: 27