Papers on hgpu.org (.txt-file)
Architecture-Aware Optimization Targeting Multithreaded Stream Computing
Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems
Are Very Deep Neural Networks Feasible on Mobile Devices?
Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space
Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory
ARK: GPU-driven Code Execution for Distributed Deep Learning
ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere
Array Languages Make Neural Networks Fast
Array Program Transformation with Loo.py by Example: High-Order Finite Elements
Array-Oriented Languages and Polyhedral Compilation
ART vs. NDK vs. GPU acceleration: A study of performance of image processing algorithms on Android
Articulated object tracking by rendering consistent appearance parts
Artifact-Free Decompression and Zooming of JPEG Compressed Images with Total Generalized Variation
Artifact-Free JPEG Decompression with Total Generalized Variation
Artificial Intelligence in Electric Machine Drives: Advances and Trends
Artificial neural network computation on graphic process unit
Artificial Neural Network Simulation on CUDA
ARVO-CL: The OpenCL version of the ARVO package – An efficient tool for computing the accessible surface area and the excluded volume of proteins via analytical equations
ASAMgpu V1.0-a moist fully compressible atmospheric model using graphics processing units (GPUs)
Aspect-Driven Mixed-Precision Tuning Targeting GPUs
Aspects of GPU for general purpose high performance computing
Assembling large mosaics of electron microscope images using GPU
Assembly of finite element methods on graphics processors
Assembly-Free Large-Scale Modal Analysis on the GPU
Assembly-Free Structural Dynamics On CPU and GPU
Assessing Accelerator-Based HPC Reverse Time Migration
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems
Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment
Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems
Assessing the feasibility of OpenCL CPU implementations for agent-based simulations
Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs
Assessing the Impact of Compiler Optimizations on GPUs Reliability
Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing
Assessment of GPU computational enhancement to a 2D flood model
Assessment of various GPU acceleration strategies in text categorization processing flow
Astronomical Photometric Data Reduction Using GPGPU
Astrophysical data mining with GPU. A case study: genetic classification of globular clusters
Astrophysical Particle Simulations on Heterogeneous CPU-GPU Systems
Astrophysical Particle Simulations with Custom GPU Clusters
Astrophysical particle simulations with large custom GPU clusters on three continents
Astrophysical particle simulations with large custom GPU clusters on three continents
Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
Astrophysical-oriented Computational multi-Architectural Framework
ASW: Accelerating Smith-Waterman Algorithm on Coupled CPU-GPU Architecture
AsymML: An Asymmetric Decomposition Framework for Privacy-Preserving DNN Training and Inference
Asymptotic Peak Utilisation in Heterogeneous Parallel CPU/GPU Pipelines: A Decentralised Queue Monitoring Strategy
Asynchronous Communication for Finite-Difference Simulations on GPU Clusters using CUDA and MPI
Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with CUDA
Asynchronous Parallel Computing Model of Global Motion Estimation with CUDA
Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures
ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs
Atmospheric turbulence removal using convolutional neural network
Atomic-free Irregular Computations on GPUs
Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations
Attack Signature Matching using Graphics Processors in High-Performance Intrusion Detection Systems
Attention-based NMT Models as Feature Functions in Phrase-based SMT
ATTILA: a cycle-level execution-driven simulator for modern GPU architectures
Audiovisual Voice Activity Detection and Localization of Simultaneous Speech Sources
Augmented reality live-action compositing
Augmented reality usage for prototyping speed up
Augmenting Operating Systems With the GPU
Augur: a Modeling Language for Data-Parallel Probabilistic Inference
Aurally and visually enhanced audio search with soundtorch
AUTO-GC: Automatic translation of data mining applications to GPU clusters
Auto-Generation and Auto-Tuning of 3D Stencil Codes on GPU Clusters
Auto-Generation and Auto-Tuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters
Auto-Generation of Parallel Finite-Differencing Code for MPI, TBB and CUDA
Auto-optimization of a Feature Selection Algorithm
Auto-SpMV: Automated Optimizing SpMV Kernels on GPU
Auto-tunable GPU BLAS (thesis)
Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems
Auto-tuning 3-D FFT library for CUDA GPUs
Auto-tuning a High-Level Language Targeted to GPU Codes
Auto-tuning a LOFAR radio astronomy pipeline in JavaCL
Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs
Auto-Tuning Dedispersion for Many-Core Accelerators
Auto-tuning Dense Matrix Multiplication for GPGPU with Cache
Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs
Auto-tuning Hybrid CPU-GPU Execution of Algorithmic Skeletons in SkePU
Auto-tuning interactive ray tracing using an analytical GPU architecture model
Auto-tuning of fast fourier transform on graphics processors
Auto-Tuning of Level 1 and Level 2 BLAS for GPUs
Auto-tuning on the macro scale: high level algorithmic auto-tuning for scientific applications
Auto-tuning Shallow water simulations on GPUs
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems
Auto-tuning Streamed Applications on Intel Xeon Phi
Auto-Tunning of Data Communication on Heterogeneous Systems
Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application
AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication
AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning
AutoMat – Automatic Differentiation for Generalized Standard Materials on GPUs
Automated and interactive approaches for optimal surface finding based segmentation of medical image data
Automated and parallel code generation for finite-differencing stencils with arbitrary data types
Titles: 100
Doubles=1
open PDFs: 92
packages: 16