Papers on hgpu.org (.txt-file)
Improving GPGPU Concurrency with Elastic Kernels

Improving GPU particle filter with shader model 3.0 for visual tracking

Improving GPU Performance by Regrouping CPU-Memory Data

Improving GPU Performance Prediction with Data Transfer Modeling

Improving GPU Performance through Instruction Redistribution and Diversification

Improving GPU Performance via Large Warps and Two-Level Warp Scheduling

Improving GPU Performance: Reducing Memory Conflicts and Latency

Improving GPU programming models through hardware cache coherence

Improving GPU Robustness by Making Use of Faulty Parts

Improving GPU Simulations of Spiking Neural P Systems

Improving GPU Sparse Matrix-Vector Multiplication for Probabilistic Model Checking

Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search

Improving Hybrid OpenCL Performance by High Speed Networks
Improving Locality of Unstructured Mesh Algorithms on GPUs

Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS

Improving many flavor QCD simulations using multiple GPUs

Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms

Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs

Improving OpenACC compatibility within accULL

Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering

Improving OpenCL Programmability with the Heterogeneous Programming Library

Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers

Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra

Improving Performance and Energy Efficiency of GPUs through Locality Analysis

Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA

Improving performance for emergent environments parameter tuning and simulation in games using GPU
Improving Performance of Hardware Accelerators by Optimizing Data Movement: A Bioinformatics Case Study

Improving Performance of Iterative Applications through Interleaved Execution of Approximated CUDA Kernels

Improving Performance of Matrix Multiplication and FFT on GPU

Improving Performance of OpenCL on CPUs

Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow

Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations

Improving Performance Portability in OpenCL Programs

Improving processing time for visual measurements of displacements of IPMC actuators using CUDA
Improving programmability of heterogeneous many-core systems via explicit platform descriptions

Improving Resource Efficiency in Virtualized Datacenters

Improving Resource Utilization in Heterogeneous CPU-GPU Systems

Improving Scheduling Techniques in Heterogeneous Systems with Dynamic, On-Line Optimisations

Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU

Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

Improving SMT performance: an application of genetic algorithms to configure resizable caches
Improving Student Learning in Computer Science Courses by Using Virtual OpenCL Laboratory

Improving Synchronization and Data Access in Parallel Programming Models

Improving tasks throughput on accelerators using OpenCL command concurrency

Improving the Efficiency of GPU Clusters

Improving the Efficiency of OpenCL Kernels through Pipes

Improving the GPU space of computation under triangular domain problems

Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

Improving the Neural GPU Architecture for Algorithm Learning

Improving the Performance of a Ray Tracing Algorithm Using a GPU
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose

Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

Improving the performance of PIR Protocol in Outsourced Databases
Improving the performance of spatial raster analysis in GIS using GPU
Improving the Performance of the Contextual Spaces Re-Ranking Algorithm on Heterogeneous Systems

Improving the Performance of the Linear Systems Solvers Using CUDA

Improving the Performance of the Sparse Matrix Vector Product with GPUs

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Improving the Programmability of GPU Architectures

Improving the scalability of modern applications by parallel multi-core and many-core programming

Improving the speed of neural networks on CPUs

Improving the Speed of Virtual Rear Projection: A GPU-Centric Architecture

Improving the usability of hierarchical representations for interactively labeling large image data sets

In Search of Self-Organization

In Situ Power Analysis of General Purpose Graphical Processing Units
In vivo interactive visualization of four-dimensional blood flow patterns
In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Memory Data Analytics on Coupled CPU-GPU Architectures

In-memory database acceleration on FPGAs: a survey

In-memory grid files on graphics processors

In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL

In-process optical characterization method for sub-100-nm nanostructures
In-Situ Statistical Analysis of Autotune Simulation Data using Graphical Processing Units

In-Situ Techniques on GPU-Accelerated Data-Intensive Applications

Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS

Increased reliability on Intel GPUs via software diverse redundancy

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Increasing Memory Miss Tolerance for SIMD Cores

Increasing precision of uniform pseudorandom number generators

Increasing predictability of GPU’s

Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis

Increasing Realism and Supporting Content Planning for Dynamic Scenes in a Mixed Reality System incorporating a Time-of-Flight Camera

Increasing the Accuracy of the Space-Sweeping Approach to Stereo Reconstruction, using Spherical Backprojection Surfaces

Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA

Incremental Bounded Model Checking of Artificial Neural Networks in CUDA

Incremental Raycasting of Piecewise Quadratic Surfaces on the GPU

Indexing million of packets per second using GPUs

Indexing of Spatiotemporal Trajectories for Efficient Distance Threshold Similarity Searches on the GPU

Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction

Industrial Robot Collision Handling in Harsh Environments

Inertial Coupling Method for particles in an incompressible fluctuating fluid

Inertial-aided KLT feature tracking for a moving camera

Inexpensive Immersive Projection

iNFAnt: NFA pattern matching on GPGPU devices

Inferring the Scheduling Policies of an Embedded CUDA GPU

Infiniband-Verbs on GPU: A case study of controlling an Infiniband network device from the GPU

Titles: 100
open PDFs: 90
packages: 12
