Papers on hgpu.org (.txt-file)
Sorting and Permuting without Bank Conflicts on GPUs

Sorting On A Graphics Processing Unit (GPU)

Sorting on FPGAs using Merge Trees

Sorting on GPUs for large scale datasets: A thorough comparison

Sound and Partially-Complete Static Analysis of Data-Races in GPU Programs

Sound Speed Optimization Using Image Texture on CUDA

Sound Synthesis Using Physical Modeling on Heterogeneous Computing Platforms

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling

Source-to-source transformations for irregular and multithreaded code optimization

Space and the Synchronic A-Ram

Space Charge Dominated Envelope Dynamics Using GPUs

Space-Time Finite Element Analysis on Graphics Processing Unit Computing Platform

Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

Spark: modular, composable shaders for graphics hardware

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

SparkJNI: A Reference Design for a Heterogeneous Apache Spark Framework

Sparse Approximate Inverse Preconditioners for Iterative Solvers on GPUs

Sparse array representations and some selected array operations on GPUs

Sparse Convex Optimization on GPUs

Sparse direct solvers with accelerators over DAG runtimes

Sparse GPU Kernels for Deep Learning

Sparse LU Factorization for Parallel Circuit Simulation on GPU

Sparse Matrix Algorithms Using GPGPU

Sparse matrix computations on manycore GPU’s
Sparse Matrix Formats Evaluation and Optimization on a GPU
Sparse Matrix Matrix Multiplication on Hybrid CPU+GPU Platforms

Sparse Matrix Multiplication using CUDA and Mex Interface

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

Sparse matrix solvers on the GPU: conjugate gradients and multigrid

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse Matrix-Vector Multiplication on GPGPUs

Sparse Matrix-Vector Multiplication on GPU

Sparse Matrix-Vector Multiplication on NVIDIA GPU

Sparse Recovery on GPUs: Accelerating the Iterative Soft-Thresholding Algorithm

Sparse regularization in MRI iterative reconstruction using GPUs

Sparse systems solving on GPUs with GMRES
Sparse Winograd Convolutional neural networks on small-scale systolic arrays

Sparse-Matrix support for the SkePU library for portable CPU/GPU programming

Sparse-Matrix-CG-Solver in CUDA

Sparselet Models for Efficient Multiclass Object Detection

Sparser, Better, Faster GPU Parsing

Spatial Data Structures, Sorting and GPU Parallelism for Situated-agent Simulation and Visualisation

Spatial Indexing of Large-Scale Geo-Referenced Point Data on GPGPUs Using Parallel Primitives

Spatial interpolation in massively parallel computing environments

Spatial interpolation of scattered geoscientific data

Spatial Join with R-Tree on Graphics Processing Units

Spatial Sorting Algorithms for Parallel Computing in Networks

Spatial splits in bounding volume hierarchies

Spatial: A Language and Compiler for Application Accelerators

Spatio-temporal upsampling on the GPU

Spatter: A Benchmark Suite for Evaluating Sparse Access Patterns

Special Relativistic Visualization by Local Ray Tracing

Specification and verification of GPGPU programs

Specification and Verification of GPGPU Programs using Permission-Based Separation Logic

Speckle Reduction with Trained Nonlinear Diffusion Filtering

Spectral classification using convolutional neural networks

Spectral Ewald Acceleration of Stokesian Dynamics for polydisperse suspensions

Spectral Method Characterization on FPGA and GPU Accelerators

Spectral volume rendering using GPU-based raycasting

Specular Effects on the GPU: State of the Art

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

Speculative Execution on GPU: An Exploratory Study
Speculative Execution on Multi-GPU Systems

Speculative Parallel Evaluation Of Classification Trees On GPGPU Compute Engines

Speculative Parallelization on GPGPUs

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Speech Recognition on Modern Graphic Processing Units

Speech Recognition on Multi-Core Processors and GPUs

Speed and Portability issues for Random Number Generation on Graphical Processing Units with CUDA and other Processing Accelerators

Speed sign detection and recognition by convolutional neural networks

Speed up Large Integer Multiplication Using Fourier Transforms and CUDA Technology

Speed-Up Improvement Using Parallel Approach in Image Steganography

Speed, power and cost implications for GPU acceleration of Computational Fluid Dynamics on HPC systems

Speeding up a few orders of magnitude the Jacobi method: high order Chebyshev-Jacobi over GPUs

Speeding up a Video Summarization Approach Using GPUs and Multicore CPUs

Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

Speeding Up Computer Vision Applications on Mobile Computing Platforms

Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units
Speeding Up Geospatial Polygon Rasterization on GPGPUs

Speeding Up Homomorpic Hashing Using GPUs
Speeding up K-Means Algorithm by GPUs
Speeding up Large-Scale Point-in-Polygon Test Based Spatial Join on GPUs

Speeding up lattice sieve with Xeon Phi coprocessor

Speeding up LIP-Canny with CUDA programming

Speeding Up Model Building for ECGA on CUDA Platform

Speeding up Mutual Information Computation Using NVIDIA CUDA Hardware

Speeding Up Object Detection: Fast Resizing in the Integral Image Domain

Speeding Up Particle Trajectory Simulations under Moving Force Fields using GPUs

Speeding Up Reinforcement Learning with Graphics Processing Units

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

Speeding up subset seed algorithm for intensive protein sequence comparison

Speeding up the evaluation of evolutionary learning systems using GPGPUs

Speeding up the evaluation phase of GP classification algorithms on GPUs

Speeding up the MATLAB complex networks package using graphic processors

Titles: 100
open PDFs: 93
packages: 14
