high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Fast Radix Sort for Sparse Linear Algebra on GPU

Fast Random Graph Generation

Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing

Fast RCS prediction using multiresolution shooting and bouncing ray method on the GPU

Fast reconstruction of 3D volumes from 2D CT projection data with GPUs

Fast recursive filters for simulating nonlinear dynamic systems

Fast reduction of undersampling artifacts in radial MR angiography with 3D total variation on graphics hardware

Fast Regularization of Matrix-Valued Images

Fast Retinal Vessel Analysis

Fast scale invariant feature detection and matching on programmable graphics hardware

Fast scale invariant textured synthesis with GPU acceleration

Fast scan algorithms on graphics processors

Fast scene voxelization and applications

Fast Schedulability Analysis Using Commodity Graphics Hardware

Fast seismic modeling and Reverse Time Migration on a GPU cluster

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Fast Sequence Alignment Method Using CUDA-enabled GPU

Fast short exact repeats finding on GPU

Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

Fast simulation of nonlinear radio frequency ultrasound images in inhomogeneous nonlinear media: CREANUIS

Fast Simulations of Gravitational Many-body Problem on RV770 GPU

Fast Soft Self-Shadowing on Dynamic Height Fields

Fast Software AES Encryption

Fast Solving of Influence Diagrams for Multiagent Planning on GPU-enabled Architectures

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

Fast Sparse Level Sets on Graphics Hardware

Fast Sparse Matrix Multiplication on GPU

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Fast Speaker Diarization Using a High-Level Scripting Language

Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training

Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical Processing Units

Fast Subgraph Matching on Large Graphs using Graphics Processors

Fast support vector machine training and classification on graphics processors

Fast Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Fast thermal simulation of 2D/3D integrated circuits exploiting neural networks and GPUs

Fast Training of Convolutional Networks through FFTs

Fast tridiagonal solvers on the GPU

Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors

Fast Turnaround HLS Debugging using Dependency Analysis and Debug Overlays

Fast TV-L1 Optical Flow for Interactivity

Fast Two Dimensional Convex Hull on the GPU

Fast Ultrasound Image Simulation Using the Westervelt Equation

Fast Universal Background Model (UBM) Training on GPUs using Compute Unified Device Architecture (CUDA)

Fast Variable Center-Biased Windowing for High-Speed Stereo on Programmable Graphics Hardware

Fast variational static IR-drop analysis on the graphical processing unit

Fast view synthesis using GPU for 3D display

Fast Virus Signature Matching Based on the High Performance Computing of GPU

Fast volumetric deformation on general purpose hardware

Fast-Coding Robust Motion Estimation Model in a GPU

Fast-Fourier-Transform-Based Electrical Noise Measurements

Fast, Accurate and Shift-Varying Line Projections for Iterative Reconstruction Using the GPU

Fast, large volume, GPU enabled simulations for the Ly-alpha forest: power spectrum forecasts for baryon acoustic oscillation experiments

Fast, Memory-Efficient Construction of Voxelized Shadows

Fast, parallel and secure cryptography algorithm using Lorenz’s attractor

Fast, parallel implementation of particle filtering on the GPU architecture

Fast, parallel, GPU-based construction of space filling curves and octrees

Fast, Processor-Cardinality Agnostic PRNG with a Tracking Application

Fast, Realistic Terrain Synthesis

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

FastCollect: Offloading Generational Garbage Collection to Integrated GPUs

Faster across the PCIe bus: A GPU library for lightweight decompression

Faster Algorithms for RNA-folding using the Four-Russians method

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Faster Dark Matter Calculations Using the GPU

Faster File Matching using GPGPUs

Faster GPU Based Genetic Programming Using A Two Dimensional Stack

Faster GPU-based convolutional gridding via thread coarsening

Faster Maliciously Secure Two-Party Computation Using the GPU

Faster matrix-vector multiplication on GeForce 8800GTX

Faster Multipattern Matching System on GPU Based on Aho-Corasick Algorithm

Faster Multiple Pattern Matching System on GPU based on Bit-Parallelism

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster Radix Sort via Virtual Memory and Write-Combining

Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO

Faster Upper Body Pose Estimation and Recognition Using CUDA

Faster Upper Body Pose Estimation Using CUDA

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

fastHOG – a real-time GPU implementation of HOG

FastMag: Fast micromagnetic simulator for complex magnetic structures

Fastplay: A Parallelization Model and Implementation of SMC on CUDA Based GPU Cluster Architecture

Fastrack: Fast IO for Secure ML using GPU TEEs

FastSpMM: An Efficient Library for Sparse Matrix Matrix Product on GPUs

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

FastTree: A Hardware KD-Tree Construction Acceleration Engine for Real-Time Ray Tracing

Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

Fat vs. Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

FATSEA-An Architectural Simulator for General Purpose Computing on GPUs

Fault Injection techniques for GPU Reliability Evaluation

Fault Table Computation on GPUs

Fault table generation using Graphics Processing Units

Fault Tree Analysis Speed-up with GPU Parallel Computing

FBLAS: Streaming Linear Algebra Kernels on FPGA

FBLAS: Streaming Linear Algebra on FPGA

FC_ACCEL: Enabling Efficient, Low-Latency and Flexible Inference in DNN Fully Connected Layers, using Optimized Checkerboard Block matrix decomposition, fast scheduling, and a resource efficient 1D PE array with a custom HBM2 memory subsystem

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs

FDTD calculations using graphical processing units

FDTD on Distributed Heterogeneous Multi-GPU Systems

Brief statistics for this page

Titles: 100

Download open PDFs: 93

Package packages: 17

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)