high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Real-Time Soft-Finger Grasping of Physically Based Quasi-rigid Objects

Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid

Real-Time Spherical Panorama Image Stitching Using OpenCL

Real-Time Stereo Matching using Adaptive Window based Disparity Refinement

Real-time stereo matching using orthogonal reliability-based dynamic programming

Real-time stereo matching: A cross-based local approach

Real-Time Stereo on GPGPU using Progressive Multi-Resolution Adaptive Windows

Real-time Stereo Vision: Optimizing Semi-Global Matching

Real-time stereographic rendering and display of medical images with programmable GPUs

Real-Time Stochastic Kinodynamic Motion Planning via Multiobjective Search on GPUs

Real-time Stochastic Optimization of Complex Energy Systems on High Performance Computers

Real-time Stochastic Rasterization on Conventional GPU Architectures

Real-time Subsurface Scattering for Particle-based Fluids using Finite Volume Method

Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Real-Time Systems with Radiation-Hardened Processors: A GPU-based Framework to Explore Tradeoffs

Real-time task reconfiguration support applied to an UAV-based surveillance system

Real-time Terrain Modeling using CPU-GPU Coupled Computation

Real-Time Tone Mapping for High-Resolution HDR Images

Real-Time Tracking of Visually Attended Objects in Virtual Environments and Its Application to LOD

Real-Time Tracking with Non-Rigid Geometric Templates Using the GPU

Real-time Traffic Sign Recognition with Map Fusion on Multicore/Many-core Architectures

Real-Time Translucent Rendering Using GPU-based Texture Space Importance Sampling

Real-Time Ultrasound Biomicroscopy with Optoacoustic Arrays

Real-Time Use of GPUs in NA62 Experiment

Real-time video breakup detection for multiple HD video streams on a single GPU

Real-time video denoising for 2D ultrasound streaming video on GPUs

Real-time video watermarking on programmable graphics hardware

Real-time view synthesis system with multi-texture structure of GPU

Real-time virtual environment signal extraction and denoising using programmable graphics hardware

Real-Time Virtual Viewpoint Generation on the GPU for Scene Navigation

Real-Time Visibility-Based Fusion of Depth Maps

Real-time Visual Tracker by Stream Processing

Real-time visualization of large volume datasets on standard PC hardware

Real-time Visualization of Streaming Text with Force-Based Dynamic System

Real-time Volumetric Haptic and Visual Burrhole Simulation

Real-time volumetric image reconstruction and 3D tumor localization based on a single x-ray projection image for lung cancer radiotherapy

Real-Time Volumetric Shadows using 1D Min-Max Mipmaps

Real-time voxelization for complex polygonal models

Real-Time Weighted Pose-Space Deformation on the GPU

Real-time, accurate depth of field using anisotropic diffusion and programmable graphics cards

Real-time, fast radio transient searches with GPU de-dispersion

Real-world comparison of CPU and GPU implementations of SNPrank: a network analysis tool for GWAS

Real-World Constraints of GPUs in Real-Time Systems

Realisation of a holographic microlaser scalpel using a digital micromirror device

Realistic Lighting Simulation for Interactive VR Applications

Realistic real-time rendering for large-scale forest scenes

Realistic real-time rendering for ocean waves on GPU

Realistic real-time sound re-synthesis and processing for interactive virtual worlds

Realistic rendering of surface appearance using GPU

Realizing Accelerated Cost-Effective Distributed RAID

Realtime affine-photometric KLT feature tracker on GPU in CUDA framework

Realtime background subtraction from dynamic scenes

Realtime Computation of a VST Audio Effect Plugin on the Graphics Processor

Realtime Deformation of Constrained Meshes Using GPU

RealTime GPU-Based Motion Planning for Task Executions

Realtime Loop Subdivision on the GPU

Realtime phase-based optical flow on the GPU

Realtime Ray Tracing on a Hibrid Parallel Architecture

Realtime Ray Tracing on GPU with BVH-based Packet Traversal

Realtime scheduling using GPUs – proof of feasibility

Realtime Simulation of Burning Solids on GPU with CUDA

Realtime Two-Way Coupling of Meshless Fluids and Nonlinear FEM

Recent Advances on GPU Computing in Operations Research

Recent algorithm and machine developments for lattice QCD

Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

Recent trends in software and hardware for GPGPU computing: A comprehensive survey

Reconfigurable Control Variate Monte-Carlo Designs for Pricing Exotic Options

Reconfigurable real-time MIMO detector on GPU

Reconstructing hash reversal based proof of work schemes

Reconstruction and visualization of planetary nebulae

Record Setting Software Implementation of DES Using CUDA

Recovering Historical Climate Records using Artificial Neural Networks in GPU

Recurrence quantification analysis in images with CUDA

Recurrent Neural Networks for anomaly detection in the Post-Mortem time series of LHC superconducting magnets

Recurrent neural networks for language modeling

Recurrent Neural Networks Hardware Implementation on FPGA

Recursive MIS Computation for Streaming BDPT on the GPU

Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

Redefining the Role of the CPU in the Era of CPU-GPU Integration

Redesigning combustion modeling algorithms for the Graphics Processing Unit (GPU): Chemical kinetic rate evaluation and ordinary differential equation integration

Redução de Complexidade de Tempo em GPUs

Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms

Reduced Vlasov-Maxwell simulations

Reducing Beamforming Calculation Time with GPU Accelerated Algorithms

Reducing branch divergence in GPU programs

Reducing branch divergence to speed up parallel execution of unit testing on GPUs

Reducing data access latency in SDSM systems using runtime optimizations

Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization

Reducing IO bandwidth for GPU based moment invariant classifier systems

Reducing overheads of dynamic scheduling on heterogeneous chips

Reducing shading on GPUs using quad-fragment merging

Reducing Synchronous GPU Memory Transfers: Design and implementation of a Futhark compiler optimisation

Reducing the Code Degree Of Parallelism to Increase GPUs Reliability

Reducing the Cost of Heuristic Generation with Machine Learning

Reducing the Disk IO Bandwidth Bottleneck through Fast Floating Point Compression using Accelerators

Reducing the Size of Nurbs Controls Nets Using Genetic Algorithms and CUDA

Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm

Reducing Thread Divergence in GPU-based B and B Applied to the Flow-shop problem

Reducing Thread Divergence in GPU-based B&B Applied to the Flow-shop problem

Reduction of a Symmetrical Matrix to Tridiagonal Form on GPUs

Brief statistics for this page

Titles: 100

Download open PDFs: 88

Package packages: 9

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)