Papers on hgpu.org (.txt-file)
Fully Parallel Particle Learning for GPGPUs and Other Parallel Devices

Fully-3D GPU PET reconstruction
Fully-Automated Code Generation for Efficient Computation of Sparse Matrix Permanents on GPUs

Function Call Re-Vectorization

Functional and dynamic programming in the design of parallel prefix networks

Functional High Performance Financial IT

Functional Programming for High-Performance Computing on Heterogeneous Architectures

Functional Signal Processing with Pure and Faust Using the LLVM Toolkit

Fusion of Morphological Images for Airborne Target Detection

FusionAccel: A General Re-configurable Deep Learning Inference Accelerator on FPGA for Convolutional Neural Networks

FusionSim: Characterizing the Performance Benefits of Fused CPU/GPU Systems

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads

FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs

Future of GPGPU Micro-Architectural Parameters

FUX-Sim: Implementation of a fast universal simulation/reconstruction framework for X-ray systems

Fuzz4cuda: Fuzzing Your Nvidia Gpu Libraries Through Debug Interface

Fuzzing Loop Optimizations in Compilers for C++ and Data-Parallel Languages

Fuzzy ART Neural Network Parallel Computing on the GPU

FuzzyGPU: a fuzzy arithmetic library for GPU

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

G-CP: Providing Fault Tolerance on the GPU through Software Checkpointing

G-Heart: A GPU-based System for Electrophysiological Simulation and Multi-modality Cardiac Visualization

G-NET: Effective GPU Sharing in NFV Systems

G-NetMon: A GPU-accelerated Network Performance Monitoring System

G-NetMon: A GPU-accelerated Network Performance Monitoring System for Large Scale Scientific Collaborations

G-SNPM – A GPU-based SNP mapping tool

GA3C: GPU-based A3C for Deep Reinforcement Learning

GACO: A GPU-based High Performance Parallel Multi-ant Colony Optimization Algorithm

GaDei: On Scale-up Training As A Service For Deep Learning

GAIN: GPU-based Constraint Checking for Context Consistency

Gaining Cross-Platform Parallelism for HAL’s Molecular Dynamics Package using SYCL

Gaiwan: a Size-Polymorphic Typesystem for GPU Programs

GALAMOST: GPU-accelerated large-scale molecular simulation toolkit

GALARIO: a GPU Accelerated Library for Analysing Radio Interferometer Observations

Galerkin-based multi-scale time integration for nonlinear structural dynamics

Gallatin: A General-Purpose GPU Memory Manager

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

GamePipe: A Virtualized Cloud Platform Design and Performance Evaluation

GAMER with out-of-core computation

GAMER-2: a GPU-accelerated adaptive mesh refinement code — accuracy, performance, and scalability

GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda

GARDENIA: A Domain-specific Benchmark Suite for Next-generation Accelerators

GAROP: Genetic Algorithm framework for Running On Parallel environments

GASPP: A GPU-Accelerated Stateful Packet Processing Framework

Gate-Level Simulation with GPU Computing

Gauge Field Generation on Large-Scale GPU-Enabled Systems

Gauge Fixing in Lattice QCD on GPUs

Gauge fixing in lattice QCD with multi-GPUs

Gauge fixing using overrelaxation and simulated annealing on GPUs

Gaussian Mixture Model Based Volume Visualization

Gaussian Process Models with Parallelization and GPU acceleration

Gaussian split Ewald: A fast Ewald mesh method for molecular simulation

GBOOST : A GPU-based tool for detecting gene-gene interactions in genome-wide case control studies

GBOTuner: Autotuning of OpenMP Parallel Codes with Bayesian Optimization and Code Representation Transfer Learning

GC3: An Optimizing Compiler for GPU Collective Communication

GCN Inference Acceleration using High-Level Synthesis

GCS: High-Performance Gate-Level Simulation with GP-GPUs

GCStack+GCScaler: Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval Analysis

Gdev: First-Class GPU Resource Management in the Operating System

GDlog: A GPU-Accelerated Deductive Engine

GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

GeantV: from CPU to accelerators

GEARS: A General and Efficient Algorithm for Rendering Shadows

gearshifft – The FFT Benchmark Suite for Heterogeneous Platforms

GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing

GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server

gem5-gpu: A Heterogeneous CPU-GPU Simulator

gEMfitter: A Highly Parallel FFT-Based 3D Density Fitting Tool With GPU Texture Memory Acceleration

Gemma in April: A matrix-like parallel programming architecture on OpenCL

GEMMbench: a framework for reproducible and collaborative benchmarking of matrix multiplication

gEMpicker: A Highly Parallel GPU-Accelerated Particle Picking Tool for Cryo-Electron Microscopy

GEMTC: GPU Enabled Many-Task Computing

GenBase: A Complex Analytics Genomics Benchmark

General Purpose Computation on Graphics Processing Units Using OpenCL

General purpose computing on graphics processing units using OpenCL

General Purpose Computing on Low-Power Embedded GPUs: Has It Come of Age?

General purpose lattice QCD code set Bridge++ 2.0 for high performance computing

General purpose molecular dynamics simulations fully implemented on graphics processing units

General purpose Molecular Dynamics Simulations on GPUs: Issues of Pair Forces and Scaling to large Clusters

General Transformations for GPU Execution of Tree Traversals

General-Purpose Computing on Tensor Processors

General-purpose GPU computing: practice and experience
General-purpose molecular dynamics simulations on GPU-based clusters

Generalisation in genetic programming

Generalized Resource Allocation for the Cloud

Generalized Voronoi Diagram Computation on GPU

Generalizing Execution of Vectorizable Computations by Generating Vector Oriented Byte Code

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Generating 3D Topologies with Multiple Constraints on the GPU

Generating and Rendering Procedural Clouds in Real Time on Programmable 3D Graphics Hardware

Generating Binary Optimal Codes Using Heterogeneous Parallel Computing

Generating Custom Code for Efficient Query Execution on Heterogeneous Processors

Generating Device-specific GPU code for Local Operators in Medical Imaging

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Titles: 100
open PDFs: 97
packages: 30
