Papers on hgpu.org (.txt-file)
vSMC: Parallel Sequential Monte Carlo in C++
Vulkan 1.1.97 – A Specification (with all registered Vulkan extensions)
Vulnerability Analysis and Attacks on Intel Xeon Phi Coprocessor
Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU
Wait-free programming for general purpose computations on graphics processors
waLBerla: A block-structured high-performance framework for multiphysics simulations
Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning
Wanted: Floating-Point Add Round-off Error instruction
Warp Size Impact in GPUs: Large or Small?
Warp-Level Divergence in GPUs: Characterization, Impact, and Mitigation
Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU
WarpCore: A Library for fast Hash Tables on GPUs
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
Warped Register File: A Power Efficient Register File for GPGPUs
Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels
Wasserstein-Fisher-Rao Document Distance
Waste Not, Want Not! Managing relational data in asymmetric memories
Waste Not… Efficient Co-Processing of Relational Data
Water simulation based on HLSL
Water simulation for cell based sandbox games
Water Surface Animation using Damped Wave Equation and CUDA Acceleration
wav2letter++: The Fastest Open-source Speech Recognition System
Wave field synthesis for 3D audio: architectural prospectives
Wavefront raycasting using larger filter kernels for on-the-fly GPU gradient reconstruction
Wavelet Encoding and Multi-GPU Programming
Wavelet Model-based Stereo for Fast, Robust Face Reconstruction
WAYPOINT: scaling coherence to thousand-core architectures
WCCV: Improving the Vectorization of IF-statements with Warp-Coherent Conditions
Weak execution ordering – exploiting iterative methods on many-core GPUs
WebCL for Hardware-Accelerated Web Applications
Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems
Weighted Residuals for Very Deep Networks
What you see is what you snap: snapping to geometry deformed on the GPU
When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization
When Machine Learning Meets Quantum Computers: A Case Study
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU
Why does PHM matter? – Nvidia’s GPU problems reviewed
Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?
Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS
Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors
Wilson and Domainwall Kernels on Oakforest-PACS
Winograd Algorithm for AdderNet
Wire Speed Name Lookup: A GPU-based Approach
Wireless Interference Identification with Convolutional Neural Networks
word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement
Work Efficient Parallel Algorithms for Large Graph Exploration
Work in Progress: Vortex Detection and Visualization for Design of Micro Air Vehicles and Turbomachinery
Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths
Working With Incremental Spatial Data During Parallel (GPU) Computation
Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone
Workload and network-optimized computing systems
Workload Aware Algorithms for Heterogeneous Platforms
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation
Workload Characterization of 3D Games
Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB
Workload Scheduling on Heterogeneous Devices
Workload-aware Automatic Parallelization for Multi-GPU DNN Training
Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures
WPA/WPA2 Password Security Testing using Graphics Processing Units
Wrinkling Coarse Meshes on the GPU
Writing a modular GPGPU program in Java
Writing a performance-portable matrix multiplication
Writing self-adaptive codes for heterogeneous systems
X-Device Query Processing by Bitwise Distribution
X-toon: an extended toon shader
XBOOLE-CUDA: Fast Boolean Operations on the GPU
Xbox360 Front Side Bus – A 21.6 GB/s End-to-End Interface Design
Xeon Phi: A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems
XeonPhi Meets Astrophysical Fluid Dynamics
XGBoost: Scalable GPU Accelerated Learning
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines
XML3D: interactive 3D graphics for the web
XMT-GPU: A PRAM Architecture for Graphics Computation
XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD
YaDiV-an open platform for 3D visualization and 3D segmentation of medical data
YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights
You Can Type, but You Can’t Hide: A Stealthy GPU-based Keylogger
Ypnos: declarative, parallel structured grid programming
ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales
ZAME: Interactive Large-Scale Graph Visualization
Zero-copy I/O processing for low-latency GPU computing
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
Zippy: A Framework for Computation and Visualization on a GPU Cluster
ZNN – A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines
Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management
ZUCL: A ZYNQ UltraScale+ Framework for OpenCL HLS Applications
Titles: 93
open PDFs: 89
packages: 23