Papers on hgpu.org (.txt-file)
Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers

Investigating Host-Device communication in a GPU-based H.264 encoder

Investigating Input Representations and Representation Models of Source Code for Machine Learning

Investigating performance portability of a highly scalable particle-in-cell simulation code on various multi-core architectures

Investigating performance variations of an optimized GPU-ported granulometry algorithm

Investigating Single Precision Floating General Matrix Multiply in Heterogeneous

Investigating SRAM PUFs in large CPUs and GPUs

Investigating the Impact of Data Parallelism and GPU Technology on Computer Gaming

Investigating the Performance of Motion Estimation Block-Matching Algorithms on GPU Cards

Investigating the use of GPU-accelerated nodes for SAR image formation

Investigating the use of GPUs with a Monte Carlo Astrophysical Simulation

Investigating Warp Size Impact in GPUs

Investigation of General-Purpose Computing on Graphics Processing Units and its Application to the Finite Element Analysis of Electromagnetic Problems

Investigation of GPU-based Pattern Matching

Investigation of heterogeneous computing through novel parallel programming platforms

Investigation of Parallel Computation – MPI, CUDA and Parallel Visualization

Investigation of the OpenCL SYCL Programming Model

Investigation of the SYCL for OpenCL Programming Model

Investigation on the Use of GPGPU for Fast Sparse Matrix Factorization
Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: OpenCL

Invited paper: Accelerating neuromorphic vision on FPGAs

IODA: an Input/Output Deep Architecture for image labeling

IP routing processing with graphic processors

IPMACC: Open Source OpenACC to CUDA/OpenCL Translator

IPMACC: Translating OpenACC API to OpenCL

Iris Matching Algorithm on Many-Core Platforms

Iris recognition on GPU with the usage of Non-Negative Matrix Factorization
Iris: First-Class Multi-GPU Programming Experience in Triton

IRIS: Illustrative Rendering for Integral Surfaces

Irradiation Instability at the Inner Edges of Accretion Disks

Irregular algorithms on the Xeon Phi

Irregularity Mitigation and Portability Abstractions for Accelerated Sparse Matrix Factorization

Is GPGPU CCL worth it? A performance comparison between some GPU and CPU algorithms for solving connected components labeling on binary images

Is OpenCL a suitable platform for algorithm development in health care systems?

Is the game worth the candle? Evaluation of OpenCL for object detection algorithm optimization

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs

Isocube: Exploiting the Cubemap Hardware

Isolated Scheduling for Distributed Training Tasks in GPU Clusters

Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)

Issues and challenges in compiling for graphics processors

Issues in Heterogenenous GPU Clusters

It’s all about data movement: Optimising FPGA data access to boost performance

Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures

Iterative CT Reconstruction on the GPU

Iterative GPGPU Linear Solvers for Sparse Matrices

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Iterative induced dipoles computation for molecular mechanics on GPUs

Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units

Iterative layer-based raytracing on CUDA
Iterative Methods for Visualization of Implicit Surfaces On GPU

Iterative optimization methods for efficient image restoration on multicore architectures

Iterative SLE Solvers over a CPU-GPU Platform
Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA

Iterative Statistical Kernels on Contemporary GPUs

iTree: Exploring Time-Varying Data using Indexable Tree

Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns

Jailbreaking LLM-Controlled Robots

Java with Auto-Parallelization on Graphics Coprocessing Architecture

JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

JIT-Compilation for Interactive Scientific Visualization

Jit4OpenCL: a compiler from Python to OpenCL

Job Parallelism using Graphical Processing Unit individual Multi-Processors and Highly Localised Memory

Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory

Join Algorithms on GPUs: A Revisit After Seven Years

Join Execution Using Fragmented Columnar Indices on GPU and MIC

Joint Forces: From Multithreaded Programming to GPU Computing
Joint Training on AMD and NVIDIA GPUs

Joint-MAP Tomographic Reconstruction with Patch Similarity Based Mixture Prior Model

JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication

JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems

JSDoop and TensorFlow.js: Volunteer Distributed Web Browser-Based Neural Network Training

Julia as a unifying end-to-end workflow language on the Frontier exascale system

Jump flooding in GPU with applications to Voronoi diagram and distance transform

Just-in-time Acceleration of JavaScript

Just-in-Time Catching Test Generation at Meta

Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading

K-Means on Commodity GPUs with CUDA

K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching

k+-buffer: Fragment Synchronized k-buffer

K3 Moore’s Law in the Era of GPU Computing
KAdvice: infering synchronization patterns from an existing codebase
KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks

Kalman Filter Tracking on Parallel Architectures

Kalman-Filter-Based Particle Tracking on Parallel Architectures at Hadron Colliders

kANN on the GPU with Shifted Sorting

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

Kargus: a Highly-scalable Software-based Intrusion Detection System

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators

Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs

KD-tree acceleration structures for a GPU raytracer

Kd-tree Based N-Body Simulations with Volume-Mass Heuristic on the GPU

kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

Keeneland: Bringing heterogeneous GPU computing to the computational science community

Keras Sig: Efficient Path Signature Computation on GPU in Keras 3

Titles: 100
open PDFs: 92
packages: 18
