Papers on hgpu.org (.txt-file)
Volume Raycasting Performance Using DirectCompute
Volume rendering visualization of 3D spherical mantle convection with an unstructured mesh
Volume Visualization: A Technical Overview with a Focus on Medical Applications
Volume-preserving FFD for programmable graphics hardware
Volumetric Ambient Occlusion for Real-Time Rendering and Games
Volumetric Rendering Techniques for Scientific Visualization
Voreen: A Rapid-Prototyping Environment for Ray-Casting-Based Volume Visualizations
Voronoi Toolpaths for PCB Mechanical Etch: Simple and Intuitive Algorithms with the 3D GPU
Vortex Methods for Fluid Simulation in Computer Graphics
Vortex methods for incompressible flow simulations on the GPU
Vortex particle method and parallel computing
Voxelized Minkowski sum computation on the GPU with robust culling
VoxelPipe: a programmable pipeline for 3D voxelization
VSIPL++ Acceleration Using Commodity Graphics Processors
vSMC: Parallel Sequential Monte Carlo in C++
Vulkan 1.1.97 – A Specification (with all registered Vulkan extensions)
Vulnerability Analysis and Attacks on Intel Xeon Phi Coprocessor
Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU
Wait-free programming for general purpose computations on graphics processors
waLBerla: A block-structured high-performance framework for multiphysics simulations
Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning
Wanted: Floating-Point Add Round-off Error instruction
Warp Size Impact in GPUs: Large or Small?
Warp-Level Divergence in GPUs: Characterization, Impact, and Mitigation
Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU
WarpCore: A Library for fast Hash Tables on GPUs
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
Warped Register File: A Power Efficient Register File for GPGPUs
Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels
Wasserstein-Fisher-Rao Document Distance
Waste Not, Want Not! Managing relational data in asymmetric memories
Waste Not… Efficient Co-Processing of Relational Data
Water simulation based on HLSL
Water simulation for cell based sandbox games
Water Surface Animation using Damped Wave Equation and CUDA Acceleration
wav2letter++: The Fastest Open-source Speech Recognition System
Wave field synthesis for 3D audio: architectural prospectives
Wavefront raycasting using larger filter kernels for on-the-fly GPU gradient reconstruction
Wavelet Encoding and Multi-GPU Programming
Wavelet Model-based Stereo for Fast, Robust Face Reconstruction
WAYPOINT: scaling coherence to thousand-core architectures
WCCV: Improving the Vectorization of IF-statements with Warp-Coherent Conditions
Weak execution ordering – exploiting iterative methods on many-core GPUs
WebCL for Hardware-Accelerated Web Applications
Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems
Weighted Residuals for Very Deep Networks
What you see is what you snap: snapping to geometry deformed on the GPU
When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization
When Machine Learning Meets Quantum Computers: A Case Study
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU
Why does PHM matter? – Nvidia’s GPU problems reviewed
Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?
Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS
Wideband Channelization for Software-Defined Radio via Mobile Graphics Processors
Wilson and Domainwall Kernels on Oakforest-PACS
Winograd Algorithm for AdderNet
Wire Speed Name Lookup: A GPU-based Approach
Wireless Interference Identification with Convolutional Neural Networks
word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement
Work Efficient Parallel Algorithms for Large Graph Exploration
Work in Progress: Vortex Detection and Visualization for Design of Micro Air Vehicles and Turbomachinery
Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths
Working With Incremental Spatial Data During Parallel (GPU) Computation
Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone
Workload and network-optimized computing systems
Workload Aware Algorithms for Heterogeneous Platforms
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation
Workload Characterization of 3D Games
Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB
Workload-aware Automatic Parallelization for Multi-GPU DNN Training
Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures
WPA/WPA2 Password Security Testing using Graphics Processing Units
Wrinkling Coarse Meshes on the GPU
Writing a modular GPGPU program in Java
Writing a performance-portable matrix multiplication
Writing self-adaptive codes for heterogeneous systems
X-Device Query Processing by Bitwise Distribution
X-toon: an extended toon shader
XBOOLE-CUDA: Fast Boolean Operations on the GPU
Xbox360 Front Side Bus – A 21.6 GB/s End-to-End Interface Design
Xeon Phi: A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems
XeonPhi Meets Astrophysical Fluid Dynamics
XGBoost: Scalable GPU Accelerated Learning
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines
XML3D: interactive 3D graphics for the web
XMT-GPU: A PRAM Architecture for Graphics Computation
XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD
YaDiV-an open platform for 3D visualization and 3D segmentation of medical data
YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights
You Can Type, but You Can’t Hide: A Stealthy GPU-based Keylogger
Ypnos: declarative, parallel structured grid programming
Titles: 100
open PDFs: 94
packages: 21