Papers on hgpu.org (.txt-file)
Architecting graphics processors for non-graphics compute acceleration
Architecting SOT-RAM Based GPU Register File
Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking
Architectural Comparisons for a Quantum Monte Carlo Application
Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels
Architectural Exploration and Scheduling Methods for Coarse Grained Reconfigurable Arrays
Architectural explorations for streaming accelerators with customized memory layouts
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters
Architectural Support for the Stream Execution Model on General-Purpose Processors
Architectural Support for Virtual Memory in GPUs
Architecture Comparisons between Nvidia and ATI GPUs: Computation Parallelism and Data Communications
Architecture-Adaptive Code Variant Tuning
Architecture-and Workload-Aware Heterogeneous Algorithms for Sparse Matrix Vector Multiplication
Architecture-Aware Algorithms and Software for Peta and Exascale Computing
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems
Architecture-Aware Optimization on a 1600-core Graphics Processor
Architecture-Aware Optimization Targeting Multithreaded Stream Computing
Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems
Are Very Deep Neural Networks Feasible on Mobile Devices?
Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space
Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory
ARK: GPU-driven Code Execution for Distributed Deep Learning
ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere
Array Languages Make Neural Networks Fast
Array Program Transformation with Loo.py by Example: High-Order Finite Elements
Array-Oriented Languages and Polyhedral Compilation
ART vs. NDK vs. GPU acceleration: A study of performance of image processing algorithms on Android
Articulated object tracking by rendering consistent appearance parts
Artifact-Free Decompression and Zooming of JPEG Compressed Images with Total Generalized Variation
Artifact-Free JPEG Decompression with Total Generalized Variation
Artificial Intelligence in Electric Machine Drives: Advances and Trends
Artificial neural network computation on graphic process unit
Artificial Neural Network Simulation on CUDA
ARVO-CL: The OpenCL version of the ARVO package – An efficient tool for computing the accessible surface area and the excluded volume of proteins via analytical equations
ASAMgpu V1.0-a moist fully compressible atmospheric model using graphics processing units (GPUs)
Aspect-Driven Mixed-Precision Tuning Targeting GPUs
Aspects of GPU for general purpose high performance computing
Assembling large mosaics of electron microscope images using GPU
Assembly of finite element methods on graphics processors
Assembly-Free Large-Scale Modal Analysis on the GPU
Assembly-Free Structural Dynamics On CPU and GPU
Assessing Accelerator-Based HPC Reverse Time Migration
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems
Assessing Intel OneAPI capabilities and cloud-performance for heterogeneous computing
Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment
Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems
Assessing the feasibility of OpenCL CPU implementations for agent-based simulations
Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs
Assessing the Impact of Compiler Optimizations on GPUs Reliability
Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing
Assessment of GPU computational enhancement to a 2D flood model
Assessment of various GPU acceleration strategies in text categorization processing flow
Astronomical Photometric Data Reduction Using GPGPU
Astrophysical data mining with GPU. A case study: genetic classification of globular clusters
Astrophysical Particle Simulations on Heterogeneous CPU-GPU Systems
Astrophysical Particle Simulations with Custom GPU Clusters
Astrophysical particle simulations with large custom GPU clusters on three continents
Astrophysical particle simulations with large custom GPU clusters on three continents
Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters
Astrophysical-oriented Computational multi-Architectural Framework
ASW: Accelerating Smith-Waterman Algorithm on Coupled CPU-GPU Architecture
AsymML: An Asymmetric Decomposition Framework for Privacy-Preserving DNN Training and Inference
Asymptotic Peak Utilisation in Heterogeneous Parallel CPU/GPU Pipelines: A Decentralised Queue Monitoring Strategy
Asynchronous Communication for Finite-Difference Simulations on GPU Clusters using CUDA and MPI
Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with CUDA
Asynchronous Parallel Computing Model of Global Motion Estimation with CUDA
Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures
ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs
Atmospheric turbulence removal using convolutional neural network
Atomic-free Irregular Computations on GPUs
Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations
Attack Signature Matching using Graphics Processors in High-Performance Intrusion Detection Systems
Attention-based NMT Models as Feature Functions in Phrase-based SMT
ATTILA: a cycle-level execution-driven simulator for modern GPU architectures
Audiovisual Voice Activity Detection and Localization of Simultaneous Speech Sources
Augmented reality live-action compositing
Augmented reality usage for prototyping speed up
Augmenting Operating Systems With the GPU
Augur: a Modeling Language for Data-Parallel Probabilistic Inference
Aurally and visually enhanced audio search with soundtorch
AUTO-GC: Automatic translation of data mining applications to GPU clusters
Auto-Generation and Auto-Tuning of 3D Stencil Codes on GPU Clusters
Auto-Generation and Auto-Tuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters
Auto-Generation of Parallel Finite-Differencing Code for MPI, TBB and CUDA
Auto-optimization of a Feature Selection Algorithm
Auto-SpMV: Automated Optimizing SpMV Kernels on GPU
Auto-tunable GPU BLAS (thesis)
Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems
Auto-tuning 3-D FFT library for CUDA GPUs
Auto-tuning a High-Level Language Targeted to GPU Codes
Titles: 100
Doubles=1
open PDFs: 92
packages: 13