Papers on hgpu.org (.txt-file)
Scalable Dense Linear Algebra on Heterogeneous Hardware

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

Scalable Distributed Fast Multipole Methods

Scalable Engine and the Performance of Different LLM Models in a SLURM based HPC architecture

Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures

Scalable Fast Multipole Methods on Heterogeneous Architecture

Scalable framework for mapping streaming applications onto multi-GPU systems

Scalable GPU Acceleration of B-Spline Signal Processing Operations

Scalable GPU rendering of CSG models

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

Scalable heterogeneous parallelism for atmospheric modeling and simulation

Scalable instruction set simulator for thousand-core architectures running on GPGPUs

Scalable Kernel Fusion for Memory-Bound GPU Applications

Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

Scalable learning for object detection with GPU hardware

Scalable Metropolis Monte Carlo for simulation of hard shapes

Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors

Scalable Multi Agent Simulation on the GPU

Scalable Multi-Cache Simulation Using GPUs

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

Scalable multi-GPU implementation of the MAGFLOW simulator

Scalable Multi-GPU Simulation of Long-Range Molecular Dynamics

Scalable packet classification via GPU metaprogramming
Scalable Parallel Minimum Spanning Forest Computation

Scalable parallel programming with CUDA

Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures

Scalable Programming Models for Massively Multicore Processors
Scalable Query Evaluation in Relational Databases

Scalable Simulation of 3D Wave Propagation in Semi-Infinite Domains Using the Finite Difference Method on a GPU Based Cluster

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

Scalable SMT-based verification of GPU kernel functions

Scalable Software Defined FM-radio receiver running on desktop computers
Scalable Solution of Radiative Heat Transfer Problems by the Photon Monte Carlo Algorithm on Hybrid Computing Architectures

Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass

Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth

Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay

Scalable Verification Techniques for Data-Parallel Programs

Scalable, High Performance Fourier Domain Optical Coherence Tomography: Why FPGAs and Not GPGPUs

Scalar collapse in AdS with an OpenCL open source code

SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs

Scale-dependent and example-based grayscale stippling
Scale-space ridge detection with GPU acceleration
Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
ScaleHLS: Scalable High-Level Synthesis through MLIR

Scaling behavior of topologically constrained polymer rings in a melt

Scaling Coupled Climate Models to Exascale: OpenACC-enabled ECEarth3 Earth System Model

Scaling CUDA for Distributed Heterogeneous Processors

Scaling Deep Learning on GPU and Knights Landing clusters

Scaling Deep Learning on Multiple In-Memory Processors

Scaling Fast Multipole Methods up to 4000 GPUs

Scaling GPU-Accelerated Databases beyond GPU Memory Size

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

Scaling Hierarchical N-body Simulations on GPU Clusters

Scaling High Performance Domain-Specific Language Implementation with Delite

Scaling IDS construction based on Non-negative Matrix factorization using GPU computing

Scaling LAPACK panel operations using parallel cache assignment

Scaling Lattice QCD beyond 100 GPUs

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Scaling Multifluid Compressible Fluid Dynamics to 700,000 cores, 1.5 Pflop/s, and a Trillion Grid Cells

Scaling On-Device GPU Inference for Large Generative Models

Scaling Performance of FFT Computation on an Industrial Integrated GPU Co-processor: Experiments with Algorithm Adaptation

Scaling Radio Astronomy Signal Correlation on Heterogeneous Supercomputers Using Various Data Distribution Methodologies

Scaling Recurrent Neural Network Language Models

Scaling Results for a Discontinuous Galerkin Finite-Element Wave Solver on Multi-GPU Systems

Scaling Soft Matter Physics to Thousands of GPUs in Parallel

Scaling SU(2) to 1000 GPUs using HiRep

Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures

Scaling-up spatially-explicit ecological models using graphics processors

SCALSALE: Scalable SALE Benchmark Framework for Supercomputers

Scan primitives for GPU computing

Scan Test Power Simulation on GPGPUs

Scandalously Parallelizable Mesh Generation

ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU

Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo

Scattering Points in Parallel Coordinates

Scene Boundary Detection Technique Based on Bottom-Up Attention System and OpenCL Parallel Implementation

Scene image classfying via the Partially Connected Neural Network
Scene independent real-time indirect illumination

Scene Recognition Acceleration Using CUDA and OpenMP
SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems

SCGPSim: A fast SystemC simulator on GPUs

Scheduling (ir)regular applications on heterogeneous platforms

Scheduling a Parallel Sparse Direct Solver to Multiple GPUs

Scheduling by Work-Stealing in Hybrid Parallel Architectures

Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures

Scheduling Dataflow Execution Across Multiple Accelerators

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Scheduling for new computing platforms with GPUs

Scheduling Languages: A Past, Present, and Future Taxonomy

Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources

Scheduling on Manycore and Heterogeneous Graphics Processors

Scheduling Parallel Tasks under Multiple Resources: List Scheduling vs. Pack Scheduling

Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System’s Perspective

SciAI4Industry – Solving PDEs for industry-scale problems with deep learning

Scientific and Engineering Computing Using ATI Stream Technology

Titles: 100
open PDFs: 90
packages: 16
