Papers on hgpu.org (.txt-file)
Robust Low Complexity Feature Tracking using CUDA
Robust mesh reconstruction from unoriented noisy points
Robust modified L2 local optical flow estimation and feature tracking
Robust non-local denoising of colored depth data
Robust real time face recognition and tracking on gpu using fusion of rgb and depth image
Robust Real-Time Multiprocessor Interrupt Handling Motivated by GPUs
Rodinia: A benchmark suite for heterogeneous computing
Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs
Room acoustics modelling using GPU-accelerated finite difference and finite volume methods on a face-centered cubic grid
Rootbeer: Seamlessly using GPUs from Java
Rotationally invariant sparse patch matching on GPU and FPGA
Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born
RSVDPACK: Subroutines for computing partial singular value decompositions via randomized sampling on single core, multi core, and GPU architectures
RTCUDB: Building Databases with RT Processors
RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing
RTSL: a Ray Tracing Shading Language
RTX Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location
RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices
Rubus: A compiler for seamless and extensible parallelism
RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles
Run-time Image and Video Resizing Using CUDA-enabled GPUs
Run-time Reconfigurable Multiprocessors
Run-time support for multi-level disjoint memory address spaces
Run, Stencil, Run! – A Comparison of Modern Parallel Programming Paradigms
Running Financial Risk Management Applications on FPGA in the Amazon Cloud
Running the NIM Next-Generation Weather Model on GPUs
Running unstructured grid-based CFD solvers on modern graphics hardware
Running unstructured grid-based CFD solvers on modern graphics hardware
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Comparison of CPU and GPU Using Portable Programming Models
Runtime Compilation of Array-Oriented Python Programs
Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off
Runtime Performances Benchmark for Knowledge Graph Embedding Methods
Runtime Specialization for Heterogeneous CPU-GPU Platforms
Runtime Support for Adaptive Power Capping on Heterogeneous SoCs
Runtime Support for Performance Portability on Heterogeneous Distributed Platforms
Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems
Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures
Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment
S-buffer: Sparsity-aware Multi-fragment Rendering
SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
Saddle Vertex Graph (SVG): A Novel Solution to the Discrete Geodesic Problem
Safe and Practical GPU Acceleration in TrustZone
Safe Asynchronous Multicore Memory Operations
Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc
SafeGPU: Contract- and Library-Based GPGPU for Object-Oriented Languages
SAGA: SystemC Acceleration on GPU Architectures
SAGE: Self-Tuning Approximation for Graphics Engines
SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems
Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method
SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
Sample distribution shadow maps
SAPPORO: A way to turn your graphics cards into a GRAPE-6
Sapporo2: A versatile direct N-body library
SAR focusing of P-band ice sounding data using back-projection
SAR raw signal simulation based on GPU parallel computation
SBArt4 – Breeding abstract animations in realtime
SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing
Scalability Analysis of Parallel Algorithms on GPU Clusters
Scalability Analysis of Synchronous Data-Parallel Artificial Neural Network (ANN) Learners
Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)
Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs
Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism
Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA
Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures
Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads
Scalable and deterministic timing-driven parallel placement for FPGAs
Scalable and High Performance Betweenness Centrality on the GPU
Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA
Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms
Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework
Scalable approximate k-NN in multidimensional big data
Scalable Breadth-First Search on a GPU Cluster
Scalable Clustering for Vision using GPUs
Scalable Clustering Using Graphics Processors
Scalable communication for high-order stencil computations using CUDA-aware MPI
Scalable Data Clustering using GPU Clusters
Scalable Dense Linear Algebra on Heterogeneous Hardware
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
Scalable Distributed Fast Multipole Methods
Scalable Fast Multipole Methods on Distributed Heterogeneous Architectures
Scalable Fast Multipole Methods on Heterogeneous Architecture
Scalable framework for mapping streaming applications onto multi-GPU systems
Scalable GPU Acceleration of B-Spline Signal Processing Operations
Scalable GPU rendering of CSG models
Scalable heterogeneous parallelism for atmospheric modeling and simulation
Scalable instruction set simulator for thousand-core architectures running on GPGPUs
Scalable Kernel Fusion for Memory-Bound GPU Applications
Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters
Scalable learning for object detection with GPU hardware
Scalable Metropolis Monte Carlo for simulation of hard shapes
Scalable Molecular Dynamics Simulation Using FPGAs and Multicore Processors
Scalable Multi Agent Simulation on the GPU
Scalable Multi-Cache Simulation Using GPUs
Titles: 100
Doubles=1
open PDFs: 92
packages: 20