high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Single Chain Slip-Spring Model for Fast Rheology Simulations of Entangled Polymers on GPU

Single molecule detection of tuberculosis nucleic acid using dark field Tethered Particle Motion

Single Scattering of Aspherical Particles in DDA Calculations on GPUs Using OpenCL

Single Server Multi-GPU Training of ConvNets

Single stream parallelization of generalized LSTM-like RNNs on a GPU

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

Single-particle 3D reconstruction from cryo-electron microscopy images on GPU

Single-pass GPU solid voxelization for real-time applications

Single-Pass GPU-Raycasting for Structured Adaptive Mesh Refinement Data

Singular value decomposition for collaborative filtering on a GPU

Singular value decomposition on GPU using CUDA

Sinus Endoscopy – Application of Advanced GPU Volume Rendering for Virtual Endoscopy

Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors

Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance

Size-based Transfer Functions: A New Volume Exploration Technique

Skeletal rigid skinning with blending patches on the GPU

Skeleton and Shape Adjustment and Tracking in Multicamera Environments

Skeleton Programming for Heterogeneous GPU-based Systems

Skeleton-based Automatic Parallelization of Image Processing Algorithms for GPUs

Skeleton-based edge bundling for graph visualization

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

SkePU 2: Language Embedding and Compiler Support for Flexible and Type-Safe Skeleton Programming

SkePU: a multi-backend skeleton programming library for multi-GPU systems

Sketch Based Facial Expression Recognition Using Graphics Hardware

Sketching MLS Image Deformations On the GPU

Skew Handling in Aggregate Streaming Queries on GPUs

Skinning with dual quaternions

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

SkyFlow: Heterogeneous streaming for skyline computation using FlowGraph and SYCL

SLATE port to AMD and Intel platforms

Sliding-Tris: A Sliding Window Level-of-Detail Scheme

Sliding-Windows for Rapid Object Class Localization: A Parallel Technique

SMAA: Enhanced Subpixel Morphological Antialiasing

Small Discrete Fourier Transforms on GPUs

Small-Bench NLP: Benchmark for small single GPU trained models in Natural Language Processing

Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

Smart Multi-Task Scheduling for OpenCL Programs on CPU/GPU Heterogeneous Platforms

SMCGen: Generating Reconfigurable Design for Sequential Monte Carlo Applications

Smith-Waterman Acceleration in Multi-GPUs: A Performance per Watt Analysis

Smooth Mixed-Resolution GPU Volume Rendering

Smoothed Particle Hydrodynamics Simulation for Continuous Casting

Smoothed-Particle Hydrodynamics Models: Implementation Features on GPUs

SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

Snowflake: A Lightweight Portable Stencil DSL

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters

SnuHPL: high performance LINPACK for heterogeneous GPUs

SoaAlloc: Accelerating Single-Method Multiple-Objects Applications on GPUs

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

SoAx: A generic C++ Structure of Arrays for handling Particles in HPC Codes

SOCL: An OpenCL Implementation with Automatic Multi-Device Adaptation Support

SODECL: An Open Source Library for Calculating Multiple Orbits of a System of Stochastic Differential Equations in Parallel

SOFF: An OpenCL High-Level Synthesis Framework for FPGAs

Soft Error Resilient QR Factorization for Hybrid System

Soft Error Resilient QR Factorization for Hybrid System with GPGPU

Soft GPGPUs for Embedded FPGAs: An Architectural Evaluation

Softassign and EM-ICP on GPU

Softshell: Dynamic Scheduling on GPUs

Software architecture and system validation of an open, unified model for accelerated multicore computing

Software Challenges for Extreme Scale Computing: Going From Petascale to Exascale Systems

Software Compilation Techniques for Heterogeneous Embedded Multi-Core Systems

Software Defined Radio over CUDA

Software Development Tools Using GPGPU Potentialities

Software Model Checking for GPGPU Programs, Towards a Verification Tool

Software Optimization and Orchestration for Heterogeneous and Distributed Architectures

Software parallel CAVLC encoder based on stream processing

Software Performance Analysis with Parallel Programming Approaches

Software Pipelined Execution of Stream Programs on GPUs

Software Platform for Hybrid Resource Management of Many-core Accelerators

Software Polarization Spectrometer "PolariS"

Software Prefetching for Indirect Memory Accesses

Software Reliability Enhancements for GPU Applications

Software Testing – Test Suite Compilation and Execution Optimizations

Software-Based Algorithm for Modeling and Correction of Gradient Nonlinearity Distortions in Magnetic Resonance Imaging

Software-based branch predication for AMD GPUs

Software-Based ECC for GPUs

Software-Based Hardening Strategies for Neutron Sensitive FFT Algorithms on GPUs

Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications

SoK: A Systems Perspective on Compound AI Threats and Countermeasures

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

SOL: Effortless Device Support for AI Frameworks without Source Code Changes

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

Solution Level Parallelization of Local Search Metaheuristic Algorithm on GPU

Solutions for Optimizing the Monte Carlo Option Pricing Method’s Implementation Using the Compute Unified Device Architecture

Solutions For Optimizing The Radix Sort Algorithmic Function Using The Compute Unified Device Architecture

Solver for Systems of Linear Equations with Infinite Precision on a GPU Cluster

Solving $k$-Nearest Vector Problem on Multiple Graphics Processors

Solving 2D Nonlinear Unsteady Convection-Diffusion Equations on Heterogenous Platforms with Multiple GPUs

Solving 3D Anisotropic Elastic Wave Equations on Parallel GPU Devices

Solving 3D incompressible Navier-Stokes equations on hybrid CPU/GPU systems

Solving 3D viscous incompressible Navier-Stokes equations using CUDA

Solving a kind of BVP for ODEs on heterogeneous CPU + CUDA-enabled GPU systems

Solving Batched Linear Programs on GPU and Multicore CPU

Solving Bivariate Polynomial Systems on a GPU

Solving convex optimization problems on FPGA using OpenCL

Solving Dense Generalized Eigenproblems on Multi-threaded Architectures

Solving Dense Linear Systems on Graphics Processors

Solving dense linear systems on platforms with multiple hardware accelerators

Solving diffractive optics problems using graphics processing units

Solving Discrete Logarithms in Smooth-Order Groups with CUDA

Solving incompressible Navier-Stokes equations on heterogeneous parallel architectures

Brief statistics for this page

Titles: 100

Download open PDFs: 93

Package packages: 22

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)