Papers on hgpu.org (.txt-file)
VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing

Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems

VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units

VolQD: Direct Volume Rendering of Multi-million Atom Quantum Dot Simulations

Volume and Isosurface Rendering with GPU-Accelerated Cell Projection

Volume exploration using ellipsoidal Gaussian transfer functions

Volume Raycasting Performance Using DirectCompute

Volume rendering visualization of 3D spherical mantle convection with an unstructured mesh

Volume Visualization: A Technical Overview with a Focus on Medical Applications
Volume-preserving FFD for programmable graphics hardware

Volumetric Ambient Occlusion for Real-Time Rendering and Games

Volumetric Rendering Techniques for Scientific Visualization

Voreen: A Rapid-Prototyping Environment for Ray-Casting-Based Volume Visualizations

Voronoi Toolpaths for PCB Mechanical Etch: Simple and Intuitive Algorithms with the 3D GPU

Vortex Methods for Fluid Simulation in Computer Graphics

Vortex methods for incompressible flow simulations on the GPU
Vortex particle method and parallel computing

Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics

Voxelized Minkowski sum computation on the GPU with robust culling

VoxelPipe: a programmable pipeline for 3D voxelization

VSIPL++ Acceleration Using Commodity Graphics Processors

vSMC: Parallel Sequential Monte Carlo in C++

Vulkan 1.1.97 – A Specification (with all registered Vulkan extensions)

Vulnerability Analysis and Attacks on Intel Xeon Phi Coprocessor

Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU

Wait-free programming for general purpose computations on graphics processors

waLBerla: A block-structured high-performance framework for multiphysics simulations

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

Wanted: Floating-Point Add Round-off Error instruction

Warp Size Impact in GPUs: Large or Small?

Warp-Level Divergence in GPUs: Characterization, Impact, and Mitigation

Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU

WarpCore: A Library for fast Hash Tables on GPUs

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Warped Register File: A Power Efficient Register File for GPGPUs

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Wasserstein-Fisher-Rao Document Distance

Waste Not, Want Not! Managing relational data in asymmetric memories

Waste Not… Efficient Co-Processing of Relational Data

Water simulation based on HLSL

Water simulation for cell based sandbox games

Water Surface Animation using Damped Wave Equation and CUDA Acceleration

wav2letter++: The Fastest Open-source Speech Recognition System

Wave field synthesis for 3D audio: architectural prospectives

Wavefront raycasting using larger filter kernels for on-the-fly GPU gradient reconstruction
Wavelet Encoding and Multi-GPU Programming

Wavelet Model-based Stereo for Fast, Robust Face Reconstruction

WAYPOINT: scaling coherence to thousand-core architectures

WCCV: Improving the Vectorization of IF-statements with Warp-Coherent Conditions

Weak execution ordering – exploiting iterative methods on many-core GPUs

WebCL for Hardware-Accelerated Web Applications

Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems

Weighted Residuals for Very Deep Networks

WgPy: GPU-accelerated NumPy-like array library for web browsers

What you see is what you snap: snapping to geometry deformed on the GPU

When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

When Machine Learning Meets Quantum Computers: A Case Study

Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU

Why does PHM matter? – Nvidia’s GPU problems reviewed
Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS

Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors

WiLLM: An Open Wireless LLM Communication System

Wilson and Domainwall Kernels on Oakforest-PACS

Winograd Algorithm for AdderNet

Wire Speed Name Lookup: A GPU-based Approach

Wireless Interference Identification with Convolutional Neural Networks

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement

Work Efficient Parallel Algorithms for Large Graph Exploration

Work in Progress: Vortex Detection and Visualization for Design of Micro Air Vehicles and Turbomachinery

Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths

Working With Incremental Spatial Data During Parallel (GPU) Computation

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Workload and network-optimized computing systems
Workload Aware Algorithms for Heterogeneous Platforms

Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation

Workload Characterization of 3D Games

Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB

Workload Scheduling on Heterogeneous Devices

Workload-aware Automatic Parallelization for Multi-GPU DNN Training

Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures

WPA/WPA2 Password Security Testing using Graphics Processing Units

Wrinkling Coarse Meshes on the GPU

Writing a modular GPGPU program in Java

Writing a performance-portable matrix multiplication

Writing self-adaptive codes for heterogeneous systems

X-Device Query Processing by Bitwise Distribution

X-toon: an extended toon shader

XBOOLE-CUDA: Fast Boolean Operations on the GPU

Xbox360 Front Side Bus – A 21.6 GB/s End-to-End Interface Design

Xeon Phi: A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems

Titles: 100
open PDFs: 93
packages: 20
