Papers on hgpu.org (.txt-file)
GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability

GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

GROPHECY: GPU performance projection from CPU code skeletons

Group Marching Tree: Sampling-Based Approximately Optimal Motion Planning on GPUs

Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels

GRS – GPU radix sort for multifield records

gScan: Accelerating Graham Scan on the GPU

gSLIC: a real-time implementation of SLIC superpixel segmentation

gSLICr: SLIC superpixels at over 250Hz

gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration

GSParLib: A multi-level programming interface unifying OpenCL and CUDA for expressing stream and data parallelism

GStream: A General-Purpose Data Streaming Framework on GPU Clusters

gSuite: A Flexible and Framework Independent Benchmark Suite for Graph Neural Network Inference on GPUs

GT4Py: High Performance Stencils for Weather and Climate Applications using Python

Guardian: Safe GPU Sharing in Multi-Tenant Environments

GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm

Guided Profiling for Auto-Tuning Array Layouts on GPUs

Gunrock: A High-Performance Graph Processing Library on the GPU

Gvim: Gpu-accelerated virtual machines

Gyrofluid Modeling of Turbulent, Kinetic Physics

Gyrokinetic Particle-in-Cell Optimization on Emerging Multi- and Manycore Platforms
Gyrokinetic Toroidal Simulations on Leading Multi-and Manycore HPC Systems

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

H- and C-level WFST-based large vocabulary continuous speech recognition on Graphics Processing Units

H-LU Factorization on Many-Core Systems

H. 264 Parallel Optimization on Graphics Processors

H.264/AVC motion estimation implementation on Compute Unified Device Architecture (CUDA)

HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures

HAccRG: Hardware-Accelerated Data Race Detection in GPUs

Hacking Neural Networks: A Short Introduction

Hadoop Mapreduce OpenCL Plugin

Hadoop+Aparapi: Making heterogenous MapReduce programming easier

HadoopCL: MapReduce on Distributed Heterogeneous Platforms Through Seamless Integration of Hadoop and OpenCL

Hadoopcl2: Motivating the design of a distributed, heterogeneous programming system with machine-learning applications

HALF: Holistic Auto Machine Learning for FPGAs

HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

Halo Gathering Scalability for Large Scale Multi-dimensional Sznajd Opinion Models Using Data Parallelism with GPUs

HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi

Hand Tracking based on Hierarchical Clustering of Range Data

Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

Haptic and graphic rendering of deformable objects based on GPUs

Haptic feedback for the GPU-based surgical simulator
Haptic guided 3-D deformable image registration

Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU

Hard-Sphere Collision Simulations with Multiple GPUs, PCIe Extension Buses and GPU-GPU Communications

Hardware Accelerated Molecular Docking: A Survey

Hardware accelerated multi-resolution geometry synthesis

Hardware Accelerated Skin Deformation for Animated Crowds

Hardware accelerated symmetric condensed node TLM procedure for NVIDIA graphics processing units

Hardware Acceleration for Neural Networks: A Comprehensive Survey

Hardware Acceleration for Unstructured Big Data and Natural Language Processing

Hardware Acceleration of EDA Algorithms: Custom ICs, FPGAs and GPUs

Hardware Acceleration of EDA Algorithms: GPU Architecture and the CUDA Programming Model

Hardware Acceleration of HPC Computational Flow Dynamics using HBM-enabled FPGAs

Hardware Acceleration Technologies in Computer Algebra: Challenges and Impact

Hardware acceleration vs. algorithmic acceleration: can GPU-based processing beat complexity optimization for CT?

Hardware Accelerators for Artificial Intelligence

Hardware accelerators for biocomputing: A survey

Hardware Accelerators for Cartesian Genetic Programming

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Hardware Checkpointing and Productive Debugging Flows for FPGAs

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Hardware Implementation and Quantization of Tiny-Yolo-v2 using OpenCL

Hardware thread reordering to boost OpenCL throughput on FPGAs

Hardware Transactional Memory for GPU Architectures

Hardware-accelerated 3D visualization of mass spectrometry data
Hardware-Accelerated Adaptive EWA Volume Splatting

Hardware-accelerated parallel non-photorealistic volume rendering

Hardware-Accelerated Raycasting: Towards an Effective Brain MRI Visualization

Hardware-Accelerated Volume Rendering for Real-Time Medical Data Visualization
Hardware-assisted feature analysis and visualization of procedurally encoded multifield volumetric data

Hardware-Assisted High-Efficiency Ray Casting of Unstructured Time-Varying Flows Using Temporal Coherence

Hardware-Assisted Projected Tetrahedra

Hardware-assisted Rendering of CSG Models

Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing

Hardware-assisted visibility sorting for unstructured volume rendering

Hardware-based nonlinear filtering and segmentation using high-level shading languages

Hardware-based simulation and collision detection for large particle systems

Hardware-Efficient Belief Propagation

Hardware-Oblivious Parallelism for In-Memory Column-Stores

Hardware-Oriented Multigrid Finite Element Solvers on GPU-Accelerated Clusters

Hardware/Software Co-Design for Data-Intensive Genomics Workloads

Hardware/Software Co-design for Energy-Efficient Seismic Modeling

Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures

Harmonic CUDA: Asynchronous Programming on GPUs

Harnessing Aspect Oriented Programming on GPU: Application to Warp-Level Parallelism (WLP)

Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems

Harnessing GPU Computing in System-Level Software

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

Harnessing the GPU for Real-Time Haptic Tissue Simulation

Harnessing the Power of GPUs without Losing Abstractions in SaC and ArrayOL: A Comparative Study

Harnessing the power of idle GPUs for acceleration of biological sequence alignment

Harvesting graphics power for MD simulations

Titles: 100
open PDFs: 95
packages: 19
