Papers on hgpu.org (.txt-file)
Starchart: Hardware and Software Optimization Using Recursive Partitioning Regression Trees

Stargazer: Automated Regression-Based GPU Design Space Exploration

STARK: Strategic Team of Agents for Refining Kernels

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

State Lattice-based Motion Planning for Autonomous On-Road Driving

State of The Art Report on GPU

State of the Art Report on Real-time Rendering with Hardware Tessellation

State-Based Gauss-Seidel Framework for Real-time 2D Ultrasound Image Sequence Denoising on GPUs

State-of-the-art in heterogeneous computing

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs

Static Analysis and Dynamic Adaptation of Parallelism

Static and Dynamic Analyses for Efficient GPU Execution

Static Compilation Analysis for Host-Accelerator Communication Optimization

Static GPU threads and an improved scan algorithm

Static Memory Access Pattern Analysis on a Massively Parallel GPU

Statistical Computing With Graphics Processing Units

Statistical constraints on binary black hole inspiral dynamics

Statistical Power Consumption Analysis and Modeling for GPU-based Computing

Statistical power modeling of GPU kernels using performance counters

Statistical testing of random number sequences using CUDA
stdgpu: Efficient STL-like Data Structures on the GPU

Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

Stellar-mass black holes in star clusters: implications for gravitational wave radiation

Stencil and Lattice Structures for Field Equation Model Simulations on GPUs

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Stencil shadow volumes for complex and deformable objects

Stencil-Aware GPU Optimization of Iterative Solvers

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

StePS: A Multi-GPU Cosmological N-body Code for Compactified Simulations

Stereo depth with a Unified Architecture GPU

Stereo Matching Algorithm Using Population-Based Incremental Learning on GPU
Stereo Matching using Multi-Resolution Images on CUDA

Stereoscopic Ray Tracing on Graphics Processors

Stereoscopic Scene Flow Computation for 3D Motion Understanding
Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit

Stochastic Differential Equations simulation using GPU

Stochastic DT-MRI Connectivity Mapping on the GPU

Stochastic Gradient Descent on GPUs

Stochastic Progressive Photon Mapping for Dynamic Scenes

STOCHSIMGPU: Parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB

Stock trading strategy creation using GP on GPU

StoreGPU: exploiting graphics processing units to accelerate distributed storage systems

Strain Visualization of Ultra Sound Signals Processed by General Purpose Graphic Process Unit

Strassen’s Matrix Multiplication on GPUs

Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

Strategies for Optimization of Parallel Programs

Strategies for preparing computer science students for the multicore world

Strategies for Protecting Intellectual Property when Using CUDA Applications on Graphics Processing Units

Strategies for the Heterogeneous Execution of Large-Scale Simulations on Hybrid Supercomputers

Strategies to minimise the total run time of cyclic graph based genetic programming with GPUs

Strategy Preserving Compilation for Parallel Functional Code

Stream computing on graphics hardware

Stream Join Processing on Heterogeneous Processors

Stream processing for fast and efficient rotated Haar-like features using rotated integral images

Stream Processing of Integral Images for Real-Time Object Detection

Stream processing of moment invariants for real-time classifiers
Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs
StreamBlocks: A compiler for heterogeneous dataflow computing

StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

Streamed Watershed Transform on GPU for Processing of Large Volume Data

Streaming Algorithms for Biological Sequence Alignment on GPUs
Streaming Applications on Heterogeneous Platforms

Streaming architectures and technology trends
Streaming Data from HDD to GPUs for Sustained Peak Performance

Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow

Streaming GPU Singular Value and Dynamic Mode Decompositions

Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks

Streaming-Oriented Parallelization of Domain-Independent Irregular Kernels

STREAMIT: Dynamic visualization and interactive exploration of text streams

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

StreamMR: An Optimized MapReduce Framework for AMD GPUs

StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing

Strega: An HTTP Server for FPGAs

Stress Tensor Field Visualization for Implant Planning in Orthopedics

Stressing the BER simulation of LDPC codes in the error floor region using GPU clusters

String Matching on a Multicore GPU Using CUDA
Striped Smith-Waterman speeds database searches six times over other SIMD implementations

Strong scaling of general-purpose molecular dynamics simulations on GPUs

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU Accelerators

STT-RAM for Shared Memory in GPUs

Studies Concerning the ATLAS IBL Calibration Architecture

Studies of quantum dots: Ab initio coupled-cluster analysis using OpenCL and GPU programming

Studies on CUDA Offloading for Real-Time Simulation and Visualization

Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures

Study and evaluation of improved automatic GPU offloading method

Study for measurement method for coal volume on base of GPU
Study of Bandwidth Partitioning for Co-executing GPU Kernels

Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

Study of Convolution Algorithms using CPU and Graphics Hardware

Study of low density nuclear matter with quantum molecular dynamics: the role of the symmetry energy

Study of OpenCL Processing Models for FPGA Devices

Titles: 100
open PDFs: 90
packages: 17
