Papers on hgpu.org (.txt-file)
Surface Compression Using Dynamic Color Palettes
Surface Normal Integration for Convex Space-time Multi-view Reconstruction
Surface quality assessment of subdivision surfaces on programmable graphics hardware
Surface Reconstruction from Scattered Point via RBF Interpolation on GPU
Survey and Benchmarking of Machine Learning Accelerators
Survey of Domain-Specific Languages for FPGA Computing
Survey of GPU water simulation in game engine
Survey on Benchmarks for a GPU Based Multi Camera Stereo Matching Algorithm
Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures
Survey On The Off-Chip Scheduling of Memory Accesses in the Memory Interface Of GPUs
Survey paper on Deep Learning on GPUs
Sustainable GPU Computing at Scale
Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
SW# – GPU enabled exact alignments on genome scale
SW#db: GPU-accelerated exact sequence similarity database search
Swan: A tool for porting CUDA programs to OpenCL
SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors
Swarm-NG: a CUDA Library for Parallel n-body Integrations with focus on Simulations of Planetary Systems
Swarm’s flight: Accelerating the particles using C-CUDA
swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
swCUDA: Auto parallel code translation framework from CUDA to ATHREAD for new generation sunway supercomputer
Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Model on Xeon Phi and GPU
Swept Volume approximation of polygon soups
SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences
Switching to High Gear: Opportunities for Grand-Scale Real-Time Parallel Simulations
Swizzle Inventor: Data Movement Synthesis for GPU Kernels
SWM: Simplified Wu-Manber for GPU-based Deep Packet Inspection
SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2
SYCL Code Generation for Multigrid Methods
SYCL compute kernels for ExaHyPE
SYCL in the edge: performance and energy evaluation for heterogeneous acceleration
SYCL in the Edge: Performance Evaluation for Heterogeneous Acceleration
SYCL-Bench 2020: Benchmarking SYCL 2020 on AMD, Intel, and NVIDIA GPUs
SYCL-Bench: A Versatile Cross-Platform Benchmark Suite for Heterogeneous Computing
SYCL-Bench: A Versatile Single-Source Benchmark Suite for Heterogeneous Computing
SYCLops: A SYCL Specific LLVM to MLIR Converter
Sylkan: Towards a Vulkan Compute Target Platform for SYCL
Symbolic Crosschecking of Data-Parallel Floating Point Code
Symbolic crosschecking of floating-point and SIMD code
Symbolic Differentiation in GPU Shaders
Symbolic Testing of OpenCL Code
Symphony: A Scheduler for Client-Server Applications on Coprocessor-based Heterogeneous Clusters
Synchronization and Coordination in Heterogeneous Processors
Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming
Synergia CUDA: GPU-accelerated accelerator modeling package
Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra
Synergistic execution of stream programs on multicores with accelerators
SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism
Synthesis and rendering of bidirectional texture functions on arbitrary surfaces
Synthesis of Custom Networks of Heterogeneous Processing Elements for Complex Physical System Emulation
Synthesis of Embedded Software using Dataflow Schedule Graphs
Synthesis of GPU Programs from High-Level Models
Synthesis of Platform Architectures from OpenCL Programs
Synthesizing Benchmarks for Predictive Modeling
Synthesizing Software from a ForSyDe Model Targeting GPGPUs
Synthesizing Structured Traversals from Attribute Grammars
Synthesizing Subdivision Meshes Using Real Time Tessellation
Synthetic Aperture Beamformation using the GPU
Synthetic Aperture Radar imaging on a CUDA-enabled mobile platform
Synthetic Aperture Radar Processing with GPGPU
Syntix: A Profiling Based Resource Estimator for CUDA Kernels
System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems
System integration of FastSPECT III, a dedicated SPECT rodent-brain imager based on BazookaSPECT detector technology
System-Level Optimization and Code Generation for Graphics Processors using a Domain-Specific Language
Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU
Systematic construction, verification and implementation methodology for LDPC codes
Systematic Performance Optimization of Cone-Beam Back-Projection on the Kepler Architecture
Systematic Physics Constrained Parameter Estimation of Stochastic Differential Equations
SystemC simulation on GP-GPUs: CUDA vs. OpenCL
Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing
SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets
TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning
Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
Tactics to Directly Map CNN graphs on Embedded FPGAs
Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures
Takagi Factorization on GPU using CUDA
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
Taking the graphics processor beyond graphics
Taming irregular EDA applications on GPUs
Taming the complexities of the C11 and OpenCL memory models
Tamp: A Library for Compact Deep Neural Networks with Structured Matrices
Tangible video teleconference system using real-time image-based relighting
Tango: A Deep Neural Network Benchmark Suite for Various Accelerators
Tangram: a High-level Language for Performance Portable Code Synthesis
TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture
Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors
Tapping the supercomputer under your desk: solving dynamic equilibrium models with graphics processors?
Target Marker: A Visual Marker for Long Distances and Detection in Realtime on Mobile Devices
targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance
Targeting GPUs with OpenMP Directives on Summit: A Simple and Effective Fortran Experience
Targeting heterogeneous architectures via macro data flow
Task and Data Distribution in Hybrid Parallel Systems
Task management for irregular-parallel workloads on the GPU
Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages
Task Parallelism and Synchronization: An Overview of Explicit Parallel Programming Languages
Titles: 100
open PDFs: 96
packages: 33