Papers on hgpu.org (.txt-file)
Acceleration of tensor-product operations for high-order finite element methods

Acceleration of the 3D ADI-FDTD method using graphics processor units
Acceleration of the GAMESS-UK electronic structure package on graphical processing units
Acceleration of the Method of Moments Calculations by Using Graphics Processing Units
Acceleration of the MMFF94 routines within OpenBabel using Eigen and OpenCL

Acceleration of the Smith-Waterman Algorithm using Single and Multiple Graphics Processors

Acceleration of the speed of tissue characterization algorithm for coronary plaque by employing GPGPU technique

Acceleration of Time-Domain Finite Element Method (TD-FEM) Using Graphics Processor Units (GPU)
Acceleration of TM cylinder EFIE with CUDA

Acceleration of Tsunami Wave Propagation Modeling based on Re-engineering of Computational Components

Acceleration of Variance of Color Differences-Based Demosaicing Using CUDA

Acceleration of Various Direct/Iterative Solvers for MoM by GPU and Its Computational Cost

Acceleration technique for volume rendering using 2D texture based ray plane casting on GPU
Acceleration Techniques for GPU-based Volume Rendering

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

Accelerator Aware MPI Micro-benchmarking using CUDA, OpenACC and OpenCL

Accelerator weather forecasting

Accelerator-Oriented Algorithm Transformation for Temporal Data Mining
Accelerator: using data parallelism to program GPUs for general-purpose uses

AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures

AccFFT: A library for distributed-memory FFT on CPU and GPU architectures

Accounting for Secondary Uncertainty: Efficient Computation of Portfolio Risk Measures on Multi and Many Core Architectures

Accounting for Uncertainty in Medical Data: A CUDA Implementation of Normalized Convolution

ACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applications

AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks

accULL: An User-directed Approach to Heterogeneous Programming

Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study
Accuracy, Memory, and Speed Strategies in GPU-Based Finite-Element Matrix-Generation

Accurate Analytic Models to Estimate Execution Time on GPU Applications

Accurate and Efficient Filtering using Anistropic Filter Decomposition

Accurate Cross-Architecture Performance Modeling for Sparse Matrix-Vector Multiplication (SpMV) on GPUs

Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication

Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing

Accurate multi-view reconstruction using robust binocular stereo and surface meshing

Accurate real-time stereo correspondence using intra- and inter-scanline optimization

Accurate Sequence Alignment using Distributed Filtering on GPU Clusters

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale

Achieving a single compute device image in OpenCL for multiple GPUs
Achieving High Throughput Sequencing with Graphics Processing Units

Achieving high-performance with a sparse direct solver on Intel KNL

Achieving near native runtime performance and cross-platform performance portability for random number generation through SYCL interoperability

Achieving O(1) IP lookup on GPU-based software routers

Achieving Speedup in Aggregate Risk Analysis using Multiple GPUs

Achieving TeraCUPS on Longest Common Subsequence Problem using GPGPUs

ACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs Shortest Path Algorithm in ACL2

ACO on Multiple GPUs with CUDA for Faster Solution of QAPs

ACO with tabu search on a GPU for solving QAPs using move-cost adjusted thread assignment

Acquisition Method of Spread Spectrum Signals Based on GPU Acceleration

ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time

Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis

Action-Based Multifield Video Visualization

Active Structured Learning for High-Speed Object Detection

Active thread compaction for GPU path tracing

Activity recognition from videos with parallel hypergraph matching on GPUs

Adaboost GPU-based Classifier for Direct Volume Rendering

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

Adaptable particle-in-cell algorithms for graphical processing units

Adaptable Two-Dimension Sliding Windows on NVIDIA GPUs with Runtime Compilation

Adaptation of algorithms for underwater sonar data processing to GPU-based systems

Adaptation of an acoustic propagation model to the parallel architecture of a graphics processor

Adaptation of High Performance and High Capacity Reconfigurable Systems to OpenCL Programming Environments

Adaptation of the MapReduce programming framework to compute-intensive data-analytics kernels

Adapting a message-driven parallel application to GPU-accelerated clusters

Adapting data processing methods to modern GPU architecture

Adapting database components to heterogeneous environments

Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework

Adapting MoM with RWG Basis Functions to GPU Technology Using CUDA
Adapting Particle Filter Algorithms to Many-Core Architectures

Adapting the GA Approach to Solve Traveling Salesman Problems on CUDA Architecture

Adaptive algebraic multigrid on SIMD architectures

Adaptive and Hybrid Machine Learning Approaches Utilizing General Purpose Computing on Graphical Processing Units

Adaptive and Transparent Cache Bypassing for GPUs

Adaptive Data Migration in Load-Imbalanced HPC Applications

Adaptive Dynamic Load Balancing in Heterogeneous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search

Adaptive enhancement and noise reduction in very low light-level video

Adaptive fast multipole methods on the GPU

Adaptive GPU Array Layout Auto-Tuning

Adaptive Hardware-accelerated Terrain Tessellation

Adaptive implementation selection in the SkePU skeleton programming library

Adaptive Input-aware Compilation for Graphics Engines

Adaptive Kinetic-Fluid Solvers for Heterogeneous Computing Architectures

Adaptive Line Tracking with Multiple Hypotheses for Augmented Reality

Adaptive load balancing for raycasting of non-uniformly bricked volumes
Adaptive Mesh Fluid Simulations on GPU

Adaptive Multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Adaptive OpenCL (ACL) Execution in GPU Architectures

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

Adaptive Optimization Techniques for High-Performance Computing

Adaptive parallelism mapping in dynamic environments using machine learning

Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels

Adaptive proxy geometry for direct volume manipulation

Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU

Adaptive sampling in three dimensions for volume rendering on GPUs

Adaptive sampling of intersectable models exploiting image and object-space coherence

Adaptive Sequential Posterior Simulators for Massively Parallel Computing Environments

Titles: 100
open PDFs: 89
packages: 13
