Papers on hgpu.org (.txt-file)
Rinnegan: Efficient Resource Use in Heterogeneous Architectures

Ripple: Simplified Large-Scale Computation on Heterogeneous Architectures with Polymorphic Data Layout

Rise of the Graphics Processor

Risk Estimation Without Using Stein’s Lemma — Application to Image Denoising

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

RNA secondary structure prediction using dynamic programming algorithm – A review and proposed work
RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures

RoadRunner: a fast and flexible exoplanet transit model

Roberts edge detection algorithm based on GPU

Robotic approach to multi-beam optical tweezers with Computer Generated Hologram

Robust Adaptive 3-D Segmentation of Vessel Laminae From Fluorescence Confocal Microscope Images and Parallel GPU Implementation

Robust Computational Tools for Multiple Testing With Genetic Association Studies

Robust Edge Detection and GPU-Based Smoothing for Extracting Surface Primitives from Range Images

Robust foreground segmentation for GPU architecture in an immersive 3D videoconferencing system

Robust GPGPU plugin development for RapidMiner

Robust GPU-assisted camera tracking using free-form surface models

Robust LLM Training Infrastructure at ByteDance

Robust Low Complexity Feature Tracking using CUDA
Robust mesh reconstruction from unoriented noisy points

Robust modified L2 local optical flow estimation and feature tracking
Robust non-local denoising of colored depth data

Robust real time face recognition and tracking on gpu using fusion of rgb and depth image

Robust Real-Time Multiprocessor Interrupt Handling Motivated by GPUs

Rodinia: A benchmark suite for heterogeneous computing

Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs

Room acoustics modelling using GPU-accelerated finite difference and finite volume methods on a face-centered cubic grid

Rootbeer: Seamlessly using GPUs from Java

Rotationally invariant sparse patch matching on GPU and FPGA

Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born

RSVDPACK: Subroutines for computing partial singular value decompositions via randomized sampling on single core, multi core, and GPU architectures

RTCUDB: Building Databases with RT Processors

RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing

RTSL: a Ray Tracing Shading Language

RTX Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location

RubiCL, a Library Providing Automatic Parallelisation on CPU and GPU devices

Rubus: A compiler for seamless and extensible parallelism

RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles

Run-time Image and Video Resizing Using CUDA-enabled GPUs

Run-time Reconfigurable Multiprocessors

Run-time support for multi-level disjoint memory address spaces

Run, Stencil, Run! – A Comparison of Modern Parallel Programming Paradigms

Running Financial Risk Management Applications on FPGA in the Amazon Cloud

Running the NIM Next-Generation Weather Model on GPUs
Running unstructured grid-based CFD solvers on modern graphics hardware

Running unstructured grid-based CFD solvers on modern graphics hardware

Runtime Code Generation and Data Management for Heterogeneous Computing in Java

Runtime Comparison of CPU and GPU Using Portable Programming Models

Runtime Compilation of Array-Oriented Python Programs

Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off

Runtime Performances Benchmark for Knowledge Graph Embedding Methods

Runtime Specialization for Heterogeneous CPU-GPU Platforms

Runtime Support for Adaptive Power Capping on Heterogeneous SoCs

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems

Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures

Runtime Visualization of Application Progress and Monitoring of a GPU-enabled Parallel Environment

S-buffer: Sparsity-aware Multi-fragment Rendering

SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Saddle Vertex Graph (SVG): A Novel Solution to the Discrete Geodesic Problem

Safe and Practical GPU Acceleration in TrustZone

Safe Asynchronous Multicore Memory Operations

Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc

SafeGPU: Contract- and Library-Based GPGPU for Object-Oriented Languages

SAGA: SystemC Acceleration on GPU Architectures

SAGE: Self-Tuning Approximation for Graphics Engines

SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems

Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method

SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications

Sample distribution shadow maps

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Sapporo2: A versatile direct N-body library

SAR focusing of P-band ice sounding data using back-projection

SAR raw signal simulation based on GPU parallel computation
SBArt4 – Breeding abstract animations in realtime

SBLOCK: A Framework for Efficient Stencil-Based PDE Solvers on Multi-core Platforms
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

Scalability Analysis of Parallel Algorithms on GPU Clusters

Scalability Analysis of Synchronous Data-Parallel Artificial Neural Network (ANN) Learners

Scalability and Optimization Strategies for GPU Enhanced Neural Networks (GeNN)

Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA

Scalability Study of Deep Learning Algorithms in High Performance Computer Infrastructures

Scalable Access-Pattern Aware I/O Acceleration and Multi-Tiered Data Management for HPC and AI Workloads

Scalable and deterministic timing-driven parallel placement for FPGAs

Scalable and High Performance Betweenness Centrality on the GPU

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA
Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

Scalable approximate k-NN in multidimensional big data

Scalable Breadth-First Search on a GPU Cluster

Scalable Clustering for Vision using GPUs

Scalable Clustering Using Graphics Processors

Scalable communication for high-order stencil computations using CUDA-aware MPI

Scalable Data Clustering using GPU Clusters

Titles: 100
Doubles=1
open PDFs: 91
packages: 24
