Papers on hgpu.org (.txt-file)
Accelerator: using data parallelism to program GPUs for general-purpose uses
AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures
AccFFT: A library for distributed-memory FFT on CPU and GPU architectures
Accounting for Secondary Uncertainty: Efficient Computation of Portfolio Risk Measures on Multi and Many Core Architectures
Accounting for Uncertainty in Medical Data: A CUDA Implementation of Normalized Convolution
ACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applications
AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks
accULL: An User-directed Approach to Heterogeneous Programming
Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study
Accuracy, Memory, and Speed Strategies in GPU-Based Finite-Element Matrix-Generation
Accurate Analytic Models to Estimate Execution Time on GPU Applications
Accurate and Efficient Filtering using Anistropic Filter Decomposition
Accurate Cross-Architecture Performance Modeling for Sparse Matrix-Vector Multiplication (SpMV) on GPUs
Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication
Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels
Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing
Accurate multi-view reconstruction using robust binocular stereo and surface meshing
Accurate real-time stereo correspondence using intra- and inter-scanline optimization
Accurate Sequence Alignment using Distributed Filtering on GPU Clusters
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale
Achieving a single compute device image in OpenCL for multiple GPUs
Achieving High Throughput Sequencing with Graphics Processing Units
Achieving high-performance with a sparse direct solver on Intel KNL
Achieving near native runtime performance and cross-platform performance portability for random number generation through SYCL interoperability
Achieving O(1) IP lookup on GPU-based software routers
Achieving Speedup in Aggregate Risk Analysis using Multiple GPUs
Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs
ACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs Shortest Path Algorithm in ACL2
ACO on Multiple GPUs with CUDA for Faster Solution of QAPs
ACO with tabu search on a GPU for solving QAPs using move-cost adjusted thread assignment
Acquisition Method of Spread Spectrum Signals Based on GPU Acceleration
ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time
Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis
Action-Based Multifield Video Visualization
Active Structured Learning for High-Speed Object Detection
Active thread compaction for GPU path tracing
Activity recognition from videos with parallel hypergraph matching on GPUs
Adaboost GPU-based Classifier for Direct Volume Rendering
AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles
Adaptable particle-in-cell algorithms for graphical processing units
Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation
Adaptation of algorithms for underwater sonar data processing to GPU-based systems
Adaptation of an acoustic propagation model to the parallel architecture of a graphics processor
Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments
Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels
Adapting a message-driven parallel application to GPU-accelerated clusters
Adapting data processing methods to modern GPU architecture
Adapting database components to heterogeneous environments
Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework
Adapting MoM with RWG Basis Functions to GPU Technology Using CUDA
Adapting Particle Filter Algorithms to Many-Core Architectures
Adapting the GA Approach to Solve Traveling Salesman Problems on CUDA Architecture
Adaptive algebraic multigrid on SIMD architectures
Adaptive and Hybrid Machine Learning Approaches Utilizing General Purpose Computing on Graphical Processing Units
Adaptive and Transparent Cache Bypassing for GPUs
Adaptive Data Migration in Load-Imbalanced HPC Applications
Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search
Adaptive enhancement and noise reduction in very low light-level video
Adaptive fast multipole methods on the GPU
Adaptive GPU Array Layout Auto-Tuning
Adaptive Hardware-accelerated Terrain Tessellation
Adaptive implementation selection in the SkePU skeleton programming library
Adaptive Input-aware Compilation for Graphics Engines
Adaptive Kinetic-Fluid Solvers for Heterogeneous Computing Architectures
Adaptive Line Tracking with Multiple Hypotheses for Augmented Reality
Adaptive load balancing for raycasting of non-uniformly bricked volumes
Adaptive Mesh Fluid Simulations on GPU
Adaptive Multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model
Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU
Adaptive OpenCL (ACL) Execution in GPU Architectures
Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
Adaptive parallelism mapping in dynamic environments using machine learning
Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels
Adaptive proxy geometry for direct volume manipulation
Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU
Adaptive sampling in three dimensions for volume rendering on GPUs
Adaptive sampling of intersectable models exploiting image and object-space coherence
Adaptive Sequential Posterior Simulators for Massively Parallel Computing Environments
Adaptive Simulation of Large-Scale Ocean Surface
Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity
Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing
Adaptive Treelet Meshes for Efficient Streak-Surface Visualization on the GPU
Adaptive Video Encoding Based on OpenCL Face Recognition
Adaptive Work-Efficient Connected Components on the GPU
Adaptive, real-time visual simultaneous localization and mapping
Adding fault tolerance to OpenCL: Through redundant heterogeneous computing
Adding GPU Computing to Computer Organization Courses
Adding special-purpose processor support to the Erlang VM
Address Selection for Efficient Barriers on the Intel Xeon Phi
Addressing Challenges in Utilizing GPUs for Accelerating Privacy-Preserving Computation
ADHA: Automatic Data layout framework for Heterogeneous Architectures
Adhoc On-Demand Distance Vector Protocol For Energy Efficiency
Adjoint Algorithmic Differentiation of a GPU Accelerated Application
Adjoint Lattice Boltzmann for Topology Optimization on multi-GPU architecture
Adjustable GPU Acceleration for Hermitian Eigensystems
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers
Advanced 2D Rasterization on Modern CPUs
Titles: 100
open PDFs: 95
packages: 14