Papers on hgpu.org (.txt-file)
Improving GPU Performance through Instruction Redistribution and Diversification
Improving GPU Performance via Large Warps and Two-Level Warp Scheduling
Improving GPU Performance: Reducing Memory Conflicts and Latency
Improving GPU programming models through hardware cache coherence
Improving GPU Robustness by Making Use of Faulty Parts
Improving GPU Simulations of Spiking Neural P Systems
Improving GPU Sparse Matrix-Vector Multiplication for Probabilistic Model Checking
Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search
Improving Hybrid OpenCL Performance by High Speed Networks
Improving Locality of Unstructured Mesh Algorithms on GPUs
Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS
Improving many flavor QCD simulations using multiple GPUs
Improving Numerical Accuracy for Non-Negative Matrix Multiplication on GPUs using Recursive Algorithms
Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs
Improving OpenACC compatibility within accULL
Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering
Improving OpenCL Programmability with the Heterogeneous Programming Library
Improving Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra
Improving Performance and Energy Efficiency of GPUs through Locality Analysis
Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA
Improving performance for emergent environments parameter tuning and simulation in games using GPU
Improving Performance of Hardware Accelerators by Optimizing Data Movement: A Bioinformatics Case Study
Improving Performance of Iterative Applications through Interleaved Execution of Approximated CUDA Kernels
Improving Performance of Matrix Multiplication and FFT on GPU
Improving Performance of OpenCL on CPUs
Improving performance of SYCL applications on CPU architectures using LLVM-directed compilation flow
Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations
Improving Performance Portability in OpenCL Programs
Improving processing time for visual measurements of displacements of IPMC actuators using CUDA
Improving programmability of heterogeneous many-core systems via explicit platform descriptions
Improving Resource Efficiency in Virtualized Datacenters
Improving Resource Utilization in Heterogeneous CPU-GPU Systems
Improving Scheduling Techniques in Heterogeneous Systems with Dynamic, On-Line Optimisations
Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels
Improving SMT performance: an application of genetic algorithms to configure resizable caches
Improving Student Learning in Computer Science Courses by Using Virtual OpenCL Laboratory
Improving Synchronization and Data Access in Parallel Programming Models
Improving tasks throughput on accelerators using OpenCL command concurrency
Improving the Efficiency of GPU Clusters
Improving the Efficiency of OpenCL Kernels through Pipes
Improving the GPU space of computation under triangular domain problems
Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs
Improving the Neural GPU Architecture for Algorithm Learning
Improving the Performance of a Ray Tracing Algorithm Using a GPU
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose
Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
Improving the performance of PIR Protocol in Outsourced Databases
Improving the performance of spatial raster analysis in GIS using GPU
Improving the Performance of the Contextual Spaces Re-Ranking Algorithm on Heterogeneous Systems
Improving the Performance of the Linear Systems Solvers Using CUDA
Improving the Performance of the Sparse Matrix Vector Product with GPUs
Improving the Performance, Portability, and Productivity of Hardware Accelerators
Improving the Programmability of GPU Architectures
Improving the scalability of modern applications by parallel multi-core and many-core programming
Improving the speed of neural networks on CPUs
Improving the Speed of Virtual Rear Projection: A GPU-Centric Architecture
Improving the usability of hierarchical representations for interactively labeling large image data sets
In Search of Self-Organization
In Situ Power Analysis of General Purpose Graphical Processing Units
In vivo interactive visualization of four-dimensional blood flow patterns
In-Datacenter Performance Analysis of a Tensor Processing Unit
In-Memory Data Analytics on Coupled CPU-GPU Architectures
In-memory database acceleration on FPGAs: a survey
In-memory grid files on graphics processors
In-Place Recursive Approach for All-Pairs Shortest Paths Problem Using OpenCL
In-process optical characterization method for sub-100-nm nanostructures
In-Situ Statistical Analysis of Autotune Simulation Data using Graphical Processing Units
Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS
Increased reliability on Intel GPUs via software diverse redundancy
Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition
Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling
Increasing Memory Miss Tolerance for SIMD Cores
Increasing precision of uniform pseudorandom number generators
Increasing predictability of GPU’s
Increasing programmability of an embedded domain specific language for GPGPU kernels using static analysis
Increasing Realism and Supporting Content Planning for Dynamic Scenes in a Mixed Reality System incorporating a Time-of-Flight Camera
Increasing the Accuracy of the Space-Sweeping Approach to Stereo Reconstruction, using Spherical Backprojection Surfaces
Increasing the performance of AllToAll variant of self-organizing migration algorithm using CUDA
Incremental Bounded Model Checking of Artificial Neural Networks in CUDA
Incremental Raycasting of Piecewise Quadratic Surfaces on the GPU
Indexing million of packets per second using GPUs
Indexing of Spatiotemporal Trajectories for Efficient Distance Threshold Similarity Searches on the GPU
Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction
Industrial Robot Collision Handling in Harsh Environments
Inertial Coupling Method for particles in an incompressible fluctuating fluid
Inertial-aided KLT feature tracking for a moving camera
Inexpensive Immersive Projection
iNFAnt: NFA pattern matching on GPGPU devices
Inferring the Scheduling Policies of an Embedded CUDA GPU
Infiniband-Verbs on GPU: A case study of controlling an Infiniband network device from the GPU
Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization
Information Visualization of Multi-dimensional Cellular Automata using GPU Programming
Initial condition for efficient mapping of level set algorithms on many-core architectures
Initial Experiences Porting a Bioinformatics Application to a Graphics Processor
Initial Explorations of ARM Processors for Scientific Computing
Inline Vector Compression for Computational Physics
Titles: 100
open PDFs: 90
packages: 14