Papers on hgpu.org (.txt-file)
Intra-node Memory Safe GPU Co-Scheduling
Introducing ‘Bones’: A Parallelizing Source-to-Source Compiler Based on Algorithmic Skeletons
Introducing CURRENNT – the Munich open-source CUDA RecurREnt Neural Network Toolkit
Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit
Introducing Energy Efficiency into Graphics Processors
Introducing Parallelism to the Ranges TS
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM
Introduction to GPGPU programming
Introduction to GPGPU, a hardware and software background
Introduction to GPU Computing and CUDA Programming: A Case Study on FDTD [EM Programmer’s Notebook]
Introduction to GPU programming for EDA
Introduction to GPU Programming with GLSL
Introduction to GPU Radix Sort
Introduction to the Report “Interlanguages and Synchronic Models of Computation.”
Introduction to the Special Issue on Digital Signal Processing in Radio Astronomy
Intrusion Detection Architecture Utilizing Graphics Processors
Intrusion Detection using Spiking Neural Networks
Inverse scattering and refraction corrected reflection for breast cancer imaging
Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers
Investigating Host-Device communication in a GPU-based H.264 encoder
Investigating Input Representations and Representation Models of Source Code for Machine Learning
Investigating performance portability of a highly scalable particle-in-cell simulation code on various multi-core architectures
Investigating performance variations of an optimized GPU-ported granulometry algorithm
Investigating Single Precision Floating General Matrix Multiply in Heterogeneous
Investigating SRAM PUFs in large CPUs and GPUs
Investigating the Impact of Data Parallelism and GPU Technology on Computer Gaming
Investigating the Performance of Motion Estimation Block-Matching Algorithms on GPU Cards
Investigating the use of GPU-accelerated nodes for SAR image formation
Investigating the use of GPUs with a Monte Carlo Astrophysical Simulation
Investigating Warp Size Impact in GPUs
Investigation of General-Purpose Computing on Graphics Processing Units and its Application to the Finite Element Analysis of Electromagnetic Problems
Investigation of GPU-based Pattern Matching
Investigation of heterogeneous computing through novel parallel programming platforms
Investigation of Parallel Computation – MPI, CUDA and Parallel Visualization
Investigation of the OpenCL SYCL Programming Model
Investigation of the SYCL for OpenCL Programming Model
Investigation on the Use of GPGPU for Fast Sparse Matrix Factorization
Invitation to a Standard Programming Interface for Massively Parallel Computing Environment: OpenCL
Invited paper: Accelerating neuromorphic vision on FPGAs
IODA: an Input/Output Deep Architecture for image labeling
IP routing processing with graphic processors
IPMACC: Open Source OpenACC to CUDA/OpenCL Translator
IPMACC: Translating OpenACC API to OpenCL
Iris Matching Algorithm on Many-Core Platforms
Iris recognition on GPU with the usage of Non-Negative Matrix Factorization
IRIS: Illustrative Rendering for Integral Surfaces
Irradiation Instability at the Inner Edges of Accretion Disks
Irregular algorithms on the Xeon Phi
Irregularity Mitigation and Portability Abstractions for Accelerated Sparse Matrix Factorization
Is GPGPU CCL worth it? A performance comparison between some GPU and CPU algorithms for solving connected components labeling on binary images
Is OpenCL a suitable platform for algorithm development in health care systems?
Is the game worth the candle? Evaluation of OpenCL for object detection algorithm optimization
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs
Isocube: Exploiting the Cubemap Hardware
Isolated Scheduling for Distributed Training Tasks in GPU Clusters
Isosurface Extraction and View-Dependent Filtering from Time-Varying Fields Using Persistent Time-Octree (PTOT)
Issues and challenges in compiling for graphics processors
Issues in Heterogenenous GPU Clusters
It’s all about data movement: Optimising FPGA data access to boost performance
Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures
Iterative CT Reconstruction on the GPU
Iterative GPGPU Linear Solvers for Sparse Matrices
Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies
Iterative induced dipoles computation for molecular mechanics on GPUs
Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units
Iterative layer-based raytracing on CUDA
Iterative Methods for Visualization of Implicit Surfaces On GPU
Iterative optimization methods for efficient image restoration on multicore architectures
Iterative SLE Solvers over a CPU-GPU Platform
Iterative Solution of Linear Systems in Electromagnetics (and not only): Experiences with CUDA
Iterative Statistical Kernels on Contemporary GPUs
iTree: Exploring Time-Varying Data using Indexable Tree
Jacobian-free Newton-Krylov methods with GPU acceleration for computing nonlinear ship wave patterns
Jailbreaking LLM-Controlled Robots
Java with Auto-Parallelization on Graphics Coprocessing Architecture
JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
JIT-Compilation for Interactive Scientific Visualization
Jit4OpenCL: a compiler from Python to OpenCL
Job Parallelism using Graphical Processing Unit individual Multi-Processors and Highly Localised Memory
Job Parallelism using Graphical Processing Unit Individual Multi-Processors and Localised Memory
Join Algorithms on GPUs: A Revisit After Seven Years
Join Execution Using Fragmented Columnar Indices on GPU and MIC
Joint Forces: From Multithreaded Programming to GPU Computing
Joint-MAP Tomographic Reconstruction with Patch Similarity Based Mixture Prior Model
JPEG 2000 Wireless Image Transmission System using Encryption Domain Authentication
JPEG-GPU:: a GPGPU Implementation of JPEG Core Coding Systems
JSDoop and TensorFlow.js: Volunteer Distributed Web Browser-Based Neural Network Training
Julia as a unifying end-to-end workflow language on the Frontier exascale system
Jump flooding in GPU with applications to Voronoi diagram and distance transform
Just-in-time Acceleration of JavaScript
Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading
K-Means on Commodity GPUs with CUDA
K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching
k+-buffer: Fragment Synchronized k-buffer
Titles: 100
open PDFs: 89
packages: 17