Papers on hgpu.org (.txt-file)
Applying software-managed caching and CPU/GPU task scheduling for accelerating dynamic workloads

Applying Source Level Auto-Vectorization to Aparapi Java

Applying the “Simple Accelerator Modelling in MATLAB” (SAMM) Code to High Luminosity LHC Upgrade

Applying the Midas Touch of Reproducibility to High-Performance Computing

Applying the Parallel GPU Model to Radiation Therapy Treatment

Approaches for parallelizing reductions on modern GPUs
Approaches for the Parallelization of Software Implementation of Integer Multiplication

Approximate Belief Propagation by Hierarchical Averaging of Outgoing Messages

Approximate Dynamic Programming and Neural Networks on Game Hardware

Approximate dynamic programming with post-decision states as a solution method for dynamic economic models

Approximate Principal Direction Trees

Approximate Similarity Search for Online Multimedia Services on Distributed CPU-GPU Platforms

Approximate Subdivision Surface Evaluation in the Language of Linear Algebra

Approximation of BEM matrices using GPGPUs

Approximation of Loop Subdivision Surfaces for Fast Rendering
Approximative inference for multivariate functional data on massively parallel processors

APPy: Annotated Parallelism for Python on GPUs

APTCC: Auto Parallelizing Translator From C To CUDA

APUNet: Revitalizing GPU as Packet Processing Accelerator

AQsort: Scalable Multi-Array In-Place Sorting with OpenMP

AQUAgpusph, a free 3D SPH solver accelerated with OpenCL

Aquila 2.0: Software Architecture for Cognitive Robotics

Aquila: An Open-Source GPU-Accelerated Toolkit for Cognitive Robotics Research

Arax: a runtime framework for decoupling applications from heterogeneous accelerators

Arbitrarily large iterative tomographic reconstruction on multiple GPUs using the TIGRE toolbox

Arbitrary dimension Reed-Solomon coding and decoding for extended RAID on GPUs

Arbitrary-Precision Arithmetics on the GPU

ArborX: A Performance Portable Search Library

ARC: Adaptive Ray-tracing with CUDA, a New Ray Tracing Code for Parallel GPUs

ArchesWeather: An efficient AI weather forecasting model at 1.5° resolution

Architecting an LTE Base Station with Graphics Processing Units

Architecting graphics processors for non-graphics compute acceleration

Architecting SOT-RAM Based GPU Register File

Architecting Tensor Core-Based Reductions for Irregular Molecular Docking Kernels

Architectural Analysis and Performance Characterization of NVIDIA GPUs using Microbenchmarking

Architectural Comparisons for a Quantum Monte Carlo Application

Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels

Architectural Exploration and Scheduling Methods for Coarse Grained Reconfigurable Arrays

Architectural explorations for streaming accelerators with customized memory layouts

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

Architectural Support for the Stream Execution Model on General-Purpose Processors

Architectural Support for Virtual Memory in GPUs

Architecture Comparisons between Nvidia and ATI GPUs: Computation Parallelism and Data Communications

Architecture-Adaptive Code Variant Tuning

Architecture-and Workload-Aware Heterogeneous Algorithms for Sparse Matrix Vector Multiplication

Architecture-Aware Algorithms and Software for Peta and Exascale Computing

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems

Architecture-Aware Optimization on a 1600-core Graphics Processor

Architecture-Aware Optimization Targeting Multithreaded Stream Computing

Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems

Are Very Deep Neural Networks Feasible on Mobile Devices?

Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space

Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory

ARK: GPU-driven Code Execution for Distributed Deep Learning

ARKCoS: Artifact-Suppressed Accelerated Radial Kernel Convolution on the Sphere

Array Languages Make Neural Networks Fast

Array Program Transformation with Loo.py by Example: High-Order Finite Elements

Array-Oriented Languages and Polyhedral Compilation

ART vs. NDK vs. GPU acceleration: A study of performance of image processing algorithms on Android

Articulated object tracking by rendering consistent appearance parts

Artifact-Free Decompression and Zooming of JPEG Compressed Images with Total Generalized Variation

Artifact-Free JPEG Decompression with Total Generalized Variation

Artificial Intelligence in Electric Machine Drives: Advances and Trends

Artificial neural network computation on graphic process unit

Artificial Neural Network Simulation on CUDA

ARVO-CL: The OpenCL version of the ARVO package – An efficient tool for computing the accessible surface area and the excluded volume of proteins via analytical equations

ASAMgpu V1.0-a moist fully compressible atmospheric model using graphics processing units (GPUs)

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Aspects of GPU for general purpose high performance computing

Assembling large mosaics of electron microscope images using GPU

Assembly of finite element methods on graphics processors

Assembly-Free Large-Scale Modal Analysis on the GPU

Assembly-Free Structural Dynamics On CPU and GPU

Assessing Accelerator-Based HPC Reverse Time Migration
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Assessing Intel OneAPI capabilities and cloud-performance for heterogeneous computing

Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment

Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems

Assessing the feasibility of OpenCL CPU implementations for agent-based simulations

Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs

Assessing the Impact of Compiler Optimizations on GPUs Reliability

Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing

Assessment of GPU computational enhancement to a 2D flood model
Assessment of various GPU acceleration strategies in text categorization processing flow

Astronomical Photometric Data Reduction Using GPGPU

Astrophysical data mining with GPU. A case study: genetic classification of globular clusters

Astrophysical Particle Simulations on Heterogeneous CPU-GPU Systems

Astrophysical Particle Simulations with Custom GPU Clusters

Astrophysical particle simulations with large custom GPU clusters on three continents

Astrophysical particle simulations with large custom GPU clusters on three continents

Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

Astrophysical-oriented Computational multi-Architectural Framework

ASW: Accelerating Smith-Waterman Algorithm on Coupled CPU-GPU Architecture

AsymML: An Asymmetric Decomposition Framework for Privacy-Preserving DNN Training and Inference

Asymptotic Peak Utilisation in Heterogeneous Parallel CPU/GPU Pipelines: A Decentralised Queue Monitoring Strategy

Asynchronous Communication for Finite-Difference Simulations on GPU Clusters using CUDA and MPI

Titles: 100
Doubles=1
open PDFs: 94
packages: 18
