Papers on hgpu.org (.txt-file)
Strategy Preserving Compilation for Parallel Functional Code

Stream computing on graphics hardware

Stream Join Processing on Heterogeneous Processors

Stream processing for fast and efficient rotated Haar-like features using rotated integral images

Stream Processing of Integral Images for Real-Time Object Detection

Stream processing of moment invariants for real-time classifiers
Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs
StreamBlocks: A compiler for heterogeneous dataflow computing

StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

Streamed Watershed Transform on GPU for Processing of Large Volume Data

Streaming Algorithms for Biological Sequence Alignment on GPUs
Streaming Applications on Heterogeneous Platforms

Streaming architectures and technology trends
Streaming Data from HDD to GPUs for Sustained Peak Performance

Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow

Streaming GPU Singular Value and Dynamic Mode Decompositions

Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks

Streaming-Oriented Parallelization of Domain-Independent Irregular Kernels

STREAMIT: Dynamic visualization and interactive exploration of text streams

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

StreamMR: An Optimized MapReduce Framework for AMD GPUs

StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing

Strega: An HTTP Server for FPGAs

Stress Tensor Field Visualization for Implant Planning in Orthopedics

Stressing the BER simulation of LDPC codes in the error floor region using GPU clusters

String Matching on a Multicore GPU Using CUDA
Striped Smith-Waterman speeds database searches six times over other SIMD implementations

Strong scaling of general-purpose molecular dynamics simulations on GPUs

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU Accelerators

STT-RAM for Shared Memory in GPUs

Studies Concerning the ATLAS IBL Calibration Architecture

Studies of quantum dots: Ab initio coupled-cluster analysis using OpenCL and GPU programming

Studies on CUDA Offloading for Real-Time Simulation and Visualization

Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures

Study and evaluation of improved automatic GPU offloading method

Study for measurement method for coal volume on base of GPU
Study of Bandwidth Partitioning for Co-executing GPU Kernels

Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

Study of Convolution Algorithms using CPU and Graphics Hardware

Study of low density nuclear matter with quantum molecular dynamics: the role of the symmetry energy

Study of OpenCL Processing Models for FPGA Devices

Study of Sparse-Matrix Vector Multiplication (SpMV) on Different Architectures and Libraries

Study on acceleration technique for calculating near field of horn antenna based on GPU
Study on acceleration technique for two-dimensional FDTD algorithm based on GPU
Study on GPU-accelerated extraction of interconnects parasitic using CUDA and MPI
Study on semi-global matching algorithm extended for multi baseline matching and parallel processing method based on GPU

Study on volume rendering of CT slices based on ray casting
Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes

Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

Studying Thermal Management for Graphics-Processor Architectures

STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

SU(2) Lattice Gauge Theory Simulations on Fermi GPUs

SU(2) Lattice QCD Simulations on Fermi GPUs

Sub-seasonal forecasting with a large ensemble of deep-learning weather prediction models

Subdivision Surface Evaluation as Sparse Matrix-Vector Multiplication

Subpixel reconstruction antialiasing for deferred shading

Suitability of NVIDIA GPUs for SKA1-Low

Super Earths and Dynamical Stability of Planetary Systems: First Parallel GPU Simulations Using GENGA

Supercharging Federated Learning with Flower and NVIDIA FLARE

Supercomputing and stellar dynamics

Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem

Superconducting proximity effect in graphene under inhomogeneous strain

SUPERGLUE: A Shared Memory Framework Using Data Versioning for Dependency-Aware Task-Based Parallelization

SUperman: Efficient Permanent Computation on GPUs

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

Supervised Hashing with Deep Neural Networks

Support for Parallel Scan in OpenMP

Support Operator Rupture Dynamics on GPU

Support Vector Machines on GPU with Sparse Matrix Format
Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms

Supporting CUDA for an extended RISC-V GPU architecture

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Supporting Heterogenous Computing Environments in SaC

Supporting input dependent access pattern algorithms on GPUs using GPUfs

Supporting Iteration in a Heterogeneous Data Flow Engine

Supporting mixed-datatype matrix multiplication within the BLIS framework

Supporting Preemptive Task Executions and Memory Copies in GPGPUs

Supporting x86-64 Address Translation for 100s of GPU Lanes

Surface Compression Using Dynamic Color Palettes

Surface Normal Integration for Convex Space-time Multi-view Reconstruction

Surface quality assessment of subdivision surfaces on programmable graphics hardware

Surface Reconstruction from Scattered Point via RBF Interpolation on GPU

Survey and Benchmarking of Machine Learning Accelerators

Survey of Domain-Specific Languages for FPGA Computing

Survey of GPU water simulation in game engine
Survey of HPC in US Research Institutions

Survey on Benchmarks for a GPU Based Multi Camera Stereo Matching Algorithm

Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures

Survey On The Off-Chip Scheduling of Memory Accesses in the Memory Interface Of GPUs

Survey paper on Deep Learning on GPUs

Sustainable GPU Computing at Scale

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale

SW# – GPU enabled exact alignments on genome scale

Titles: 100
open PDFs: 86
packages: 18
