high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads

Refinements in Syntactic Parsing

Refining HPCToolkit for application performance analysis at exascale

Reflective Shadow Map Clustering for Real-Time Global Illumination

Reflector Antenna Analysis using Physical Optics on Graphics Processing Units

Refresh Rate Modulation for Perceptually Optimized Computer Graphics

ReGen: Optimizing Genetic Selection Algorithms for Heterogeneous Computing

Region Templates: Data Representation and Management for Large-Scale Image Analysis

Regional Heritability Advanced Complex Trait Analysis for GPU and Traditional Parallel Architectures

Regression Modelling of Power Consumption for Heterogeneous Processors

Regular Expression Matching and Operational Semantics

Regular Expression Matching on Graphics Hardware for Intrusion Detection

Regular Lattice and Small-World Spin Model Simulations Using CUDA and GPUs

Regularity versus Load-Balancing on GPU for treefix computations

Regularization and nonlinearities for neural language models: when are they needed?

Reinforcement Learning Strategies for Compiler Optimization in High level Synthesis

Reionization simulations powered by GPUs I: the structure of the Ultraviolet radiation field

Reionization Simulations Powered by Graphics Processing Units. I. On the Structure of the Ultraviolet Radiation Field

Relational Algorithms for Multi-Bulk-Synchronous Processors

Relational joins on graphics processors

Relational query coprocessing on graphics processors

Relativistic Hydrodynamics on Graphic Cards

Relativistic hydrodynamics on graphics processing units

Relax-Miracle: GPU Parallelization of Semi-Analytic Fourier-Domain solvers for Earthquake Modeling

Reliability modeling of MEMS devices on CUDA based HPC setup

Reliable Initialization of GPU-enabled Parallel Stochastic Simulations Using Mersenne Twister for Graphics Processors

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time

Remote GPU-Accelerated Online Pre-processing of Raster Maps for Terrain Rendering

Remote Sensing Processing: From Multicore to GPU

Remotely Keyed Cryptographics Secure Remote Display Access Using (Mostly) Untrusted Hardware

Removing the Barrier for FPGA-Based OpenCL Data Center Servers

RenderAnts: Interactive REYES Rendering on GPUs

Rendering Forest Scenes in Real-Time

Rendering of 3D Dynamic Virtual Environments

Rendering Volumetric Haptic Shapes in Mid-Air using Ultrasound

RenderKernel: High-level programming for real-time rendering systems

REOH: Runtime Energy Optimization for Heterogeneous Systems

Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution

Reordering strategy for blocking optimization in sparse linear solvers

RepoLaunch: Automating Build & Test Pipeline of Code Repositories on ANY Language and ANY Platform

RepoLaunch: Automating Build & Test Pipeline of Code Repositories on ANY Language and ANY Platform

Report on the Feasibility of Implementing PIC Codes on a GPU

Report: Performance comparison between C2075 and P100 GPU cards using cosmological correlation functions

Representing Higher-Order Singularities in Vector Fields on Piecewise Linear Surfaces

Reproducible and Accurate Matrix Multiplication for GPU Accelerators

Reproducible Study and Performance Analysis of GPU Programming Paradigms: OpenACC vs. CUDA in Key Linear Algebra Computations

Reproducible Triangular Solvers for High-Performance Computing

Research and Application of Parallel Computing Technologies based on CUDA and OpenCL

Research and Development of Porting SYCL on QNX Operating System for High Parallelism

Research for Chinese Spam Filtering Based on GPU

Research on a Parallel BD-tree Index Structure

Research on ATI-CAL for accelerating FBP reconstruction

Research on CUDA-based Kriging Interpolation Algorithm

Research on double negative materials by using FDTD method based on GPUs

Research on DSP-GPU Heterogeneous Computing System

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on OpenCL optimization for FPGA deep learning application

Research on Parallel DVH Statistic Based on CUDA

Research on Real-Time LLL Imaging Generation Method Based on GPU

Research on the fast Fourier transform of image based on GPU

Research on the simulation of PF-LBM model based on MPI+CUDA mixed granularity parallel

Research on Three-Dimensional Playing Video Technology in Virtual Education Environment

Reservoir Simulation on NVIDIA Tesla GPUs

Resolution of Linear Algebra for the Discrete Logarithm Problem using GPU and Multi-core Architectures

Resolution of the Vlasov-Maxwell system by PIC Discontinuous Galerkin method on GPU with OpenCL

Resolving the conflict between generality and plausibility in verified computation

Resource Centered Computing delivering high parallel performance

Resource Elastic Virtualization for FPGAs using OpenCL

Resource Sharing in GPU-Accelerated Windowing Systems

Resource-Aware Compiler Prefetching for Fine-Grained Many-Cores

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

ReSYCLator: Transforming CUDA C++ source code into SYCL

Retargeting and Respecializing GPU Workloads for Performance Portability

Rethinking resampling in the particle filter on graphics processing units

Rethinking Runtime Verification on Hundreds of Cores: Challenges and Opportunities

Rethinking the Union of Computed Tomography Reconstruction and GPGPU Computing

Returning control to the programmer: SIMD intrinsics for virtual machines

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

Reusable OpenCL FPGA Infrastructure

Reusable software components for accelerator-based clusters

Reuse and Refactoring of GPU Kernels to Design Complex Applications

Reusing Auto-Schedules for Efficient DNN Compilation

Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

Reverse Computation for Rollback-based Fault Tolerance in Large Parallel Systems: Evaluating the Potential Gains and Systems Effects

Reverse-Mode AD of Reduce-by-Index and Scan in Futhark

Review and Comparative Study of Ray Traversal Algorithms on a Modern GPU Architecture

Review of Memory/Cache Management Technologies used on Heterogeneous Computing Systems

Review: Kd-tree Traversal Algorithms for Ray Tracing

Reviewing GPU architectures to build efficient back projection for parallel geometries

Revision of Relational Joins for Multi-Core and Many-Core Architectures

Revisit Long Short-Term Memory: An Optimization Perspective

Revisiting Actor Programming in C++

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture

Revisiting Edge and Node Parallelism for Dynamic GPU Graph Analytics

Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on High-Performance Accelerators

Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures

Brief statistics for this page

Titles: 100

Doubles=1

Download open PDFs: 95

Package packages: 17

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)