Papers on hgpu.org (.txt-file)
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
Saddle Vertex Graph (SVG): A Novel Solution to the Discrete Geodesic Problem
Safe and Practical GPU Acceleration in TrustZone
Safe Asynchronous Multicore Memory Operations
Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc
SafeGPU: Contract- and Library-Based GPGPU for Object-Oriented Languages
SAGA: SystemC Acceleration on GPU Architectures
SAGE: Self-Tuning Approximation for Graphics Engines
SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems
Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method
SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
Sample distribution shadow maps
SAPPORO: A way to turn your graphics cards into a GRAPE-6
Sapporo2: A versatile direct N-body library
SAR focusing of P-band ice sounding data using back-projection
SAR raw signal simulation based on GPU parallel computation
SBArt4 – Breeding abstract animations in realtime
SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
Scalability Analysis of Parallel Algorithms on GPU Clusters
Scalability Analysis of Synchronous Data-Parallel Artificial Neural Network (ANN) Learners
Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)
Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs
Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism
Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA
Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures
Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads
Scalable and deterministic timing-driven parallel placement for FPGAs
Scalable and High Performance Betweenness Centrality on the GPU
Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA
Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms
Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework
Scalable approximate k-NN in multidimensional big data
Scalable Breadth-First Search on a GPU Cluster
Scalable Clustering for Vision using GPUs
Scalable Clustering Using Graphics Processors
Scalable communication for high-order stencil computations using CUDA-aware MPI
Scalable Data Clustering using GPU Clusters
Scalable Dense Linear Algebra on Heterogeneous Hardware
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
Scalable Distributed Fast Multipole Methods
Scalable Engine and the Performance of Different LLM Models in a SLURM based HPC architecture
Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures
Scalable Fast Multipole Methods on Heterogeneous Architecture
Scalable framework for mapping streaming applications onto multi-GPU systems
Scalable GPU Acceleration of B-Spline Signal Processing Operations
Scalable GPU rendering of CSG models
Scalable heterogeneous parallelism for atmospheric modeling and simulation
Scalable instruction set simulator for thousand-core architectures running on GPGPUs
Scalable Kernel Fusion for Memory-Bound GPU Applications
Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters
Scalable learning for object detection with GPU hardware
Scalable Metropolis Monte Carlo for simulation of hard shapes
Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors
Scalable Multi Agent Simulation on the GPU
Scalable Multi-Cache Simulation Using GPUs
Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
Scalable multi-GPU implementation of the MAGFLOW simulator
Scalable Multi-GPU Simulation of Long-Range Molecular Dynamics
Scalable packet classification via GPU metaprogramming
Scalable Parallel Minimum Spanning Forest Computation
Scalable parallel programming with CUDA
Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures
Scalable Programming Models for Massively Multicore Processors
Scalable Query Evaluation in Relational Databases
Scalable Simulation of 3D Wave Propagation in Semi-Infinite Domains Using the Finite Difference Method on a GPU Based Cluster
Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters
Scalable SMT-based verification of GPU kernel functions
Scalable Software Defined FM-radio receiver running on desktop computers
Scalable Solution of Radiative Heat Transfer Problems by the Photon Monte Carlo Algorithm on Hybrid Computing Architectures
Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass
Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth
Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms
Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay
Scalable Verification Techniques for Data-Parallel Programs
Scalable, High Performance Fourier Domain Optical Coherence Tomography: Why FPGAs and Not GPGPUs
Scalar collapse in AdS with an OpenCL open source code
SCALE-Ahead-Of-Time Compilation of CUDA for AMD GPUs
Scale-dependent and example-based grayscale stippling
Scale-space ridge detection with GPU acceleration
Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
ScaleHLS: Scalable High-Level Synthesis through MLIR
Scaling behavior of topologically constrained polymer rings in a melt
Scaling Coupled Climate Models to Exascale: OpenACC-enabled ECEarth3 Earth System Model
Scaling CUDA for Distributed Heterogeneous Processors
Scaling Deep Learning on GPU and Knights Landing clusters
Scaling Deep Learning on Multiple In-Memory Processors
Scaling Fast Multipole Methods up to 4000 GPUs
Scaling GPU-Accelerated Databases beyond GPU Memory Size
Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer
Scaling Hierarchical N-body Simulations on GPU Clusters
Scaling High Performance Domain-Specific Language Implementation with Delite
Scaling IDS construction based on Non-negative Matrix factorization using GPU computing
Scaling LAPACK panel operations using parallel cache assignment
Scaling Lattice QCD beyond 100 GPUs
Titles: 100
open PDFs: 88
packages: 20