Papers on hgpu.org (.txt-file)
SBArt4 – Breeding abstract animations in realtime

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Scalability Analysis of Parallel Algorithms on GPU Clusters

Scalability Analysis of Synchronous Data-Parallel Artificial Neural Network (ANN) Learners

Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)

Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA

Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads

Scalable and deterministic timing-driven parallel placement for FPGAs

Scalable and High Performance Betweenness Centrality on the GPU

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA
Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

Scalable approximate k-NN in multidimensional big data

Scalable Breadth-First Search on a GPU Cluster

Scalable Clustering for Vision using GPUs

Scalable Clustering Using Graphics Processors

Scalable communication for high-order stencil computations using CUDA-aware MPI

Scalable Data Clustering using GPU Clusters

Scalable Dense Linear Algebra on Heterogeneous Hardware

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

Scalable Distributed Fast Multipole Methods

Scalable Engine and the Performance of Different LLM Models in a SLURM based HPC architecture

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Scalable Fast Multipole Methods on Heterogeneous Architecture

Scalable framework for mapping streaming applications onto multi-GPU systems

Scalable GPU Acceleration of B-Spline Signal Processing Operations

Scalable GPU rendering of CSG models

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

Scalable heterogeneous parallelism for atmospheric modeling and simulation

Scalable instruction set simulator for thousand-core architectures running on GPGPUs

Scalable Kernel Fusion for Memory-Bound GPU Applications

Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

Scalable learning for object detection with GPU hardware

Scalable Metropolis Monte Carlo for simulation of hard shapes

Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors

Scalable Multi Agent Simulation on the GPU

Scalable Multi-Cache Simulation Using GPUs

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Scalable multi-GPU implementation of the MAGFLOW simulator

Scalable Multi-GPU Simulation of Long-Range Molecular Dynamics

Scalable packet classification via GPU metaprogramming
Scalable Parallel Minimum Spanning Forest Computation

Scalable parallel programming with CUDA

Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures

Scalable Programming Models for Massively Multicore Processors
Scalable Query Evaluation in Relational Databases

Scalable Simulation of 3D Wave Propagation in Semi-Infinite Domains Using the Finite Difference Method on a GPU Based Cluster

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

Scalable SMT-based verification of GPU kernel functions

Scalable Software Defined FM-radio receiver running on desktop computers
Scalable Solution of Radiative Heat Transfer Problems by the Photon Monte Carlo Algorithm on Hybrid Computing Architectures

Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass

Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth

Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay

Scalable Verification Techniques for Data-Parallel Programs

Scalable, High Performance Fourier Domain Optical Coherence Tomography: Why FPGAs and Not GPGPUs

Scalar collapse in AdS with an OpenCL open source code

SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs

Scale-dependent and example-based grayscale stippling
Scale-space ridge detection with GPU acceleration
Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
ScaleHLS: Scalable High-Level Synthesis through MLIR

Scaling behavior of topologically constrained polymer rings in a melt

Scaling Coupled Climate Models to Exascale: OpenACC-enabled ECEarth3 Earth System Model

Scaling CUDA for Distributed Heterogeneous Processors

Scaling Deep Learning on GPU and Knights Landing clusters

Scaling Deep Learning on Multiple In-Memory Processors

Scaling Fast Multipole Methods up to 4000 GPUs

Scaling GPU-Accelerated Databases beyond GPU Memory Size

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

Scaling Hierarchical N-body Simulations on GPU Clusters

Scaling High Performance Domain-Specific Language Implementation with Delite

Scaling IDS construction based on Non-negative Matrix factorization using GPU computing

Scaling LAPACK panel operations using parallel cache assignment

Scaling Lattice QCD beyond 100 GPUs

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Scaling Multifluid Compressible Fluid Dynamics to 700,000 cores, 1.5 Pflop/s, and a Trillion Grid Cells

Scaling On-Device GPU Inference for Large Generative Models

Scaling Performance of FFT Computation on an Industrial Integrated GPU Co-processor: Experiments with Algorithm Adaptation

Scaling Radio Astronomy Signal Correlation on Heterogeneous Supercomputers Using Various Data Distribution Methodologies

Scaling Recurrent Neural Network Language Models

Scaling Results for a Discontinuous Galerkin Finite-Element Wave Solver on Multi-GPU Systems

Scaling Soft Matter Physics to Thousands of GPUs in Parallel

Scaling SU(2) to 1000 GPUs using HiRep

Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures

Scaling-up spatially-explicit ecological models using graphics processors

SCALSALE: Scalable SALE Benchmark Framework for Supercomputers

Scan primitives for GPU computing

Scan Test Power Simulation on GPGPUs

Scandalously Parallelizable Mesh Generation

ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU

Titles: 100
open PDFs: 89
packages: 19
