Papers on hgpu.org (.txt-file)
Rubus: A compiler for seamless and extensible parallelism

RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles

Run-time Image and Video Resizing Using CUDA-enabled GPUs

Run-time Reconfigurable Multiprocessors

Run-time support for multi-level disjoint memory address spaces

Run, Stencil, Run! – A Comparison of Modern Parallel Programming Paradigms

Running Financial Risk Management Applications on FPGA in the Amazon Cloud

Running the NIM Next-Generation Weather Model on GPUs
Running unstructured grid-based CFD solvers on modern graphics hardware

Running unstructured grid-based CFD solvers on modern graphics hardware

Runtime Code Generation and Data Management for Heterogeneous Computing in Java

Runtime Comparison of CPU and GPU Using Portable Programming Models

Runtime Compilation of Array-Oriented Python Programs

Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off

Runtime Performances Benchmark for Knowledge Graph Embedding Methods

Runtime Specialization for Heterogeneous CPU-GPU Platforms

Runtime Support for Adaptive Power Capping on Heterogeneous SoCs

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems

Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures

Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment

S-buffer: Sparsity-aware Multi-fragment Rendering

SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Saddle Vertex Graph (SVG): A Novel Solution to the Discrete Geodesic Problem

Safe and Practical GPU Acceleration in TrustZone

Safe Asynchronous Multicore Memory Operations

Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc

SafeGPU: Contract- and Library-Based GPGPU for Object-Oriented Languages

SAGA: SystemC Acceleration on GPU Architectures

SAGE: Self-Tuning Approximation for Graphics Engines

SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems

Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method

SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications

Sample distribution shadow maps

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Sapporo2: A versatile direct N-body library

SAR focusing of P-band ice sounding data using back-projection

SAR raw signal simulation based on GPU parallel computation
Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB10

SBArt4 – Breeding abstract animations in realtime

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Scalability Analysis of Parallel Algorithms on GPU Clusters

Scalability Analysis of Synchronous Data-Parallel Artificial Neural Network (ANN) Learners

Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)

Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA

Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads

Scalable and deterministic timing-driven parallel placement for FPGAs

Scalable and High Performance Betweenness Centrality on the GPU

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA
Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

Scalable approximate k-NN in multidimensional big data

Scalable Breadth-First Search on a GPU Cluster

Scalable Clustering for Vision using GPUs

Scalable Clustering Using Graphics Processors

Scalable communication for high-order stencil computations using CUDA-aware MPI

Scalable Data Clustering using GPU Clusters

Scalable Dense Linear Algebra on Heterogeneous Hardware

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

Scalable Distributed Fast Multipole Methods

Scalable Engine and the Performance of Different LLM Models in a SLURM based HPC architecture

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Scalable Fast Multipole Methods on Heterogeneous Architecture

Scalable framework for mapping streaming applications onto multi-GPU systems

Scalable GPU Acceleration of B-Spline Signal Processing Operations

Scalable GPU rendering of CSG models

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

Scalable heterogeneous parallelism for atmospheric modeling and simulation

Scalable instruction set simulator for thousand-core architectures running on GPGPUs

Scalable Kernel Fusion for Memory-Bound GPU Applications

Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

Scalable learning for object detection with GPU hardware

Scalable Metropolis Monte Carlo for simulation of hard shapes

Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors

Scalable Multi Agent Simulation on the GPU

Scalable Multi-Cache Simulation Using GPUs

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Scalable multi-GPU implementation of the MAGFLOW simulator

Scalable Multi-GPU Simulation of Long-Range Molecular Dynamics

Scalable packet classification via GPU metaprogramming
Scalable Parallel Minimum Spanning Forest Computation

Scalable parallel programming with CUDA

Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures

Scalable Programming Models for Massively Multicore Processors
Scalable Query Evaluation in Relational Databases

Scalable Simulation of 3D Wave Propagation in Semi-Infinite Domains Using the Finite Difference Method on a GPU Based Cluster

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

Scalable SMT-based verification of GPU kernel functions

Scalable Software Defined FM-radio receiver running on desktop computers
Scalable Solution of Radiative Heat Transfer Problems by the Photon Monte Carlo Algorithm on Hybrid Computing Architectures

Titles: 100
Doubles=1
open PDFs: 90
packages: 17
