Papers on hgpu.org (.txt-file)
Improving the Performance of the Sparse Matrix Vector Product with GPUs

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Improving the Programmability of GPU Architectures

Improving the scalability of modern applications by parallel multi-core and many-core programming

Improving the speed of neural networks on CPUs

Improving the Speed of Virtual Rear Projection: A GPU-Centric Architecture

Improving the usability of hierarchical representations for interactively labeling large image data sets

In Search of Self-Organization

In Situ Power Analysis of General Purpose Graphical Processing Units
In vivo interactive visualization of four-dimensional blood flow patterns
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Memory Data Analytics on Coupled CPU-GPU Architectures

In-memory database acceleration on FPGAs: a survey

In-memory grid files on graphics processors

In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL

In-process optical characterization method for sub-100-nm nanostructures
In-Situ Statistical Analysis of Autotune Simulation Data using Graphical Processing Units

In-Situ Techniques on GPU-Accelerated Data-Intensive Applications

Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS

Increased reliability on Intel GPUs via software diverse redundancy

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Increasing Memory Miss Tolerance for SIMD Cores

Increasing precision of uniform pseudorandom number generators

Increasing predictability of GPU’s

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Increasing Realism and Supporting Content Planning for Dynamic Scenes in a Mixed Reality System incorporating a Time-of-Flight Camera

Increasing the Accuracy of the Space-Sweeping Approach to Stereo Reconstruction, using Spherical Backprojection Surfaces

Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA

Incremental Bounded Model Checking of Artificial Neural Networks in CUDA

Incremental Raycasting of Piecewise Quadratic Surfaces on the GPU

Indexing million of packets per second using GPUs

Indexing of Spatiotemporal Trajectories for Efficient Distance Threshold Similarity Searches on the GPU

Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction

Industrial Robot Collision Handling in Harsh Environments

Inertial Coupling Method for particles in an incompressible fluctuating fluid

Inertial-aided KLT feature tracking for a moving camera

Inexpensive Immersive Projection

iNFAnt: NFA pattern matching on GPGPU devices

Inferring the Scheduling Policies of an Embedded CUDA GPU

Infiniband-Verbs on GPU: A case study of controlling an Infiniband network device from the GPU

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization

Information Visualization of Multi-dimensional Cellular Automata using GPU Programming

Initial condition for efficient mapping of level set algorithms on many-core architectures

Initial Experiences Porting a Bioinformatics Application to a Graphics Processor

Initial Explorations of ARM Processors for Scientific Computing

Inline Vector Compression for Computational Physics

Innovative prospective of Antenna-Gain removing the pain of EMI engineers

Input Sensitivity of GPU Program Optimizations

Input Space Splitting for OpenCL

Input-Aware Auto-Tuning for Directive-based GPU Programming

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Inside VOLT: Designing an Open-Source GPU Compiler

Inside VOLT: Designing an Open-Source GPU Compiler (Tool)

INSPIRE: an interactive image assisted non-photorealistic rendering system

INSTA-YOLO: Real-Time Instance Segmentation

Instructions’ Latencies Characterization for NVIDIA GPGPUs

Instruments of Productivity for High Performance Computing

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Integer sorting on multicores: some (experiments and) observations

Integrated Arrival and Departure Schedule Optimization Under Uncertainty

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

Integrated GPUs: how useful are they in HPC?

Integrated Modelling of Hydrodynamic Processes, Faecal Indicator Organisms and Related Parameters with Improved Accuracy using Parallel (GPU) Computing

Integrating a large-scale testing campaign in the CK framework

Integrating Accelerators in Heterogeneous Systems

Integrating GPGPU computations with CPU coroutines in C++

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST

Integrating Multi-GPU Execution in an OpenACC Compiler

Integrating multi-threading and accelerators into DUNE-ISTL

Integrating Object Detection with 3D Tracking Towards a Better Driver Assistance System

Integrating Occlusion Culling with Parallel LOD for Rendering Complex 3D Environments on GPU

Integrating Post-Newtonian Equations on Graphics Processing Units

Integrating Profiling into MDE Compilers

Integrating SkePU’s algorithmic skeletons with GPI on a cluster

Integrating Two-Way Interaction Between Fluids and Rigid Bodies in the Real-Time Particle Systems Library

Integration of CUDA Processing within the C++ library for parallelism and concurrency (HPX)

Integrative multicellular biological modeling: a case study of 3D epidermal development using GPU algorithms

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

Intel oneAPI DPC++ FPGA Optimization Guide

Intel Xeon Phi acceleration of Hybrid Total FETI solver

Intel Xeon Phi Coprocessor High-Performance Programming

Intel’s Array Building Blocks: A retargetable, dynamic compiler and embedded language

Intel(R) SHMEM: GPU-initiated OpenSHMEM using SYCL

Intelligent Edge Detection using a CUDA Simulator of Multilayer Neural Network Based on Multi-Valued Neurons

Intelligent GPGPU Classification in Volume Visualization: A framework based on Error-Correcting Output Codes

Intensity model with blur effect on GPUs applied to large-scale star simulators

Inter-APU Communication on AMD MI300A Systems via Infinity Fabric: a Deep Dive

Inter-Block GPU Communication via Fast Barrier Synchronization

Inter-block synchronization on a GPGPU

Inter-cluster communication on clustered SIMD architectures

Inter-Warp Instruction Temporal Locality in Deep-Multithreaded GPUs

Interacting with Volume Data: Deformations using Forward Projection

Interaction and Visualization Techniques for Immersive Exploration and Perception of 3D datasets

Interactive 3D distance field computation using linear factorization

Interactive Approximate Rendering of Reflections, Refractions, and Caustics
Interactive Bi-scale Editing of Highly Glossy Materials

Titles: 100
open PDFs: 96
packages: 19
