Papers on hgpu.org (.txt-file)
Scalable packet classification via GPU metaprogramming
Scalable Parallel Minimum Spanning Forest Computation

Scalable parallel programming with CUDA

Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures

Scalable Programming Models for Massively Multicore Processors
Scalable Query Evaluation in Relational Databases

Scalable Simulation of 3D Wave Propagation in Semi-Infinite Domains Using the Finite Difference Method on a GPU Based Cluster

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

Scalable SMT-based verification of GPU kernel functions

Scalable Software Defined FM-radio receiver running on desktop computers
Scalable Solution of Radiative Heat Transfer Problems by the Photon Monte Carlo Algorithm on Hybrid Computing Architectures

Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass

Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth

Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay

Scalable Verification Techniques for Data-Parallel Programs

Scalable, High Performance Fourier Domain Optical Coherence Tomography: Why FPGAs and Not GPGPUs

Scalar collapse in AdS with an OpenCL open source code

SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs

Scale-dependent and example-based grayscale stippling
Scale-space ridge detection with GPU acceleration
Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
ScaleHLS: Scalable High-Level Synthesis through MLIR

Scaling behavior of topologically constrained polymer rings in a melt

Scaling Coupled Climate Models to Exascale: OpenACC-enabled ECEarth3 Earth System Model

Scaling CUDA for Distributed Heterogeneous Processors

Scaling Deep Learning on GPU and Knights Landing clusters

Scaling Deep Learning on Multiple In-Memory Processors

Scaling Fast Multipole Methods up to 4000 GPUs

Scaling GPU-Accelerated Databases beyond GPU Memory Size

Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

Scaling Hierarchical N-body Simulations on GPU Clusters

Scaling High Performance Domain-Specific Language Implementation with Delite

Scaling IDS construction based on Non-negative Matrix factorization using GPU computing

Scaling LAPACK panel operations using parallel cache assignment

Scaling Lattice QCD beyond 100 GPUs

Scaling Monte Carlo Tree Search on Intel Xeon Phi

Scaling Multifluid Compressible Fluid Dynamics to 700,000 cores, 1.5 Pflop/s, and a Trillion Grid Cells

Scaling On-Device GPU Inference for Large Generative Models

Scaling Performance of FFT Computation on an Industrial Integrated GPU Co-processor: Experiments with Algorithm Adaptation

Scaling Radio Astronomy Signal Correlation on Heterogeneous Supercomputers Using Various Data Distribution Methodologies

Scaling Recurrent Neural Network Language Models

Scaling Results for a Discontinuous Galerkin Finite-Element Wave Solver on Multi-GPU Systems

Scaling Soft Matter Physics to Thousands of GPUs in Parallel

Scaling SU(2) to 1000 GPUs using HiRep

Scaling up scientific computations by using map-reduce-like control flow on NUMA architectures

Scaling-up spatially-explicit ecological models using graphics processors

SCALSALE: Scalable SALE Benchmark Framework for Supercomputers

Scan primitives for GPU computing

Scan Test Power Simulation on GPGPUs

Scandalously Parallelizable Mesh Generation

ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU

Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo

Scattering Points in Parallel Coordinates

Scene Boundary Detection Technique Based on Bottom-Up Attention System and OpenCL Parallel Implementation

Scene image classfying via the Partially Connected Neural Network
Scene independent real-time indirect illumination

Scene Recognition Acceleration Using CUDA and OpenMP
SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems

SCGPSim: A fast SystemC simulator on GPUs

Scheduling (ir)regular applications on heterogeneous platforms

Scheduling a Parallel Sparse Direct Solver to Multiple GPUs

Scheduling by Work-Stealing in Hybrid Parallel Architectures

Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures

Scheduling Dataflow Execution Across Multiple Accelerators

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Scheduling for new computing platforms with GPUs

Scheduling Languages: A Past, Present, and Future Taxonomy

Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources

Scheduling on Manycore and Heterogeneous Graphics Processors

Scheduling Parallel Tasks under Multiple Resources: List Scheduling vs. Pack Scheduling

Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System’s Perspective

SciAI4Industry – Solving PDEs for industry-scale problems with deep learning

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

Scientific and Engineering Computing Using ATI Stream Technology

Scientific computation for simulations on programmable graphics hardware

Scientific Computation on Graphics Processing Unit using CUDA

Scientific Computation Through a GPU
Scientific Computing on Heterogeneous Architectures

Scientific Computing on Hybrid Architectures

Scientific Computing Using Consumer Video-Gaming Hardware Devices

Scientific Computing with Python on GPUs

Scientific GPU Programming with Data-Flow Languages

Scientific Programming for Heterogeneous Systems – Bridging the Gap between Algorithms and Applications

Scientific Visualization in Astronomy: Towards the Petascale Astronomy Era

Scope for performance enhancement of CMU Sphinx by parallelising with OpenCL

Scope is all you need: Transforming LLMs for HPC Code

Scout: a data-parallel programming language for graphics processors
Seamless acceleration of Fortran intrinsics via AMD AI engines

Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio

Seamless GPU acceleration for C++ based physics with the Metal Shading Language on Apple’s M series unified chips

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

Searching for a counterexample of Kurepa’s Conjecture

Searching for Concurrent Design Patterns in Video Games

Searching for sinks of Henon map using a multiple-precision GPU arithmetic library

Titles: 100
open PDFs: 88
packages: 21
