high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Special Relativistic Visualization by Local Ray Tracing

Specification and verification of GPGPU programs

Specification and Verification of GPGPU Programs using Permission-Based Separation Logic

Speckle Reduction with Trained Nonlinear Diffusion Filtering

Spectral classification using convolutional neural networks

Spectral Ewald Acceleration of Stokesian Dynamics for polydisperse suspensions

Spectral Method Characterization on FPGA and GPU Accelerators

Spectral volume rendering using GPU-based raycasting

Specular Effects on the GPU: State of the Art

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

Speculative Execution on GPU: An Exploratory Study

Speculative Execution on Multi-GPU Systems

Speculative Parallel Evaluation Of Classification Trees On GPGPU Compute Engines

Speculative Parallelization on GPGPUs

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Specx: a C++ task-based runtime system for heterogeneous distributed architectures

Speech Recognition on Modern Graphic Processing Units

Speech Recognition on Multi-Core Processors and GPUs

Speed and Portability issues for Random Number Generation on Graphical Processing Units with CUDA and other Processing Accelerators

Speed Records for NTRU

Speed sign detection and recognition by convolutional neural networks

Speed up Large Integer Multiplication Using Fourier Transforms and CUDA Technology

Speed-Up Improvement Using Parallel Approach in Image Steganography

Speed, power and cost implications for GPU acceleration of Computational Fluid Dynamics on HPC systems

Speeding up a few orders of magnitude the Jacobi method: high order Chebyshev-Jacobi over GPUs

Speeding up a Video Summarization Approach Using GPUs and Multicore CPUs

Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

Speeding Up Computer Vision Applications on Mobile Computing Platforms

Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units

Speeding Up Geospatial Polygon Rasterization on GPGPUs

Speeding Up Homomorpic Hashing Using GPUs

Speeding up K-Means Algorithm by GPUs

Speeding up Large-Scale Point-in-Polygon Test Based Spatial Join on GPUs

Speeding up lattice sieve with Xeon Phi coprocessor

Speeding up LIP-Canny with CUDA programming

Speeding Up Model Building for ECGA on CUDA Platform

Speeding up Mutual Information Computation Using NVIDIA CUDA Hardware

Speeding Up Object Detection: Fast Resizing in the Integral Image Domain

Speeding Up Particle Trajectory Simulations under Moving Force Fields using GPUs

Speeding Up Reinforcement Learning with Graphics Processing Units

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

Speeding up subset seed algorithm for intensive protein sequence comparison

Speeding up the evaluation of evolutionary learning systems using GPGPUs

Speeding up the evaluation phase of GP classification algorithms on GPUs

Speeding up the MATLAB complex networks package using graphic processors

Speeding up the MATLAB Hyperspectral Image Analysis Toolbox using GPUs and the Jacket Toolbox

Speeding up the small progress measures algorithm for parity games using the GPU

Speeding-up Pearson Correlation Coefficient calculation on graphical processing units

Speeding-up the Verification Phase of Set Similarity Joins in the GPGPU paradigm

Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters

Speedup for quantum optimal control from GPU-based automatic differentiation

Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units

Speedup of Micromagnetic Simulations with C++ AMP On Graphics Processing Units

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

Speedups between x70 and x120 for a generic local search (memetic) algorithm on a single GPGPU chip

sPEGG: high throughput eco-evolutionary simulations on commodity graphics processors

SPH Based Fluid Animation Using CUDA Enabled GPU

SPH Fluids for Viscous Jet Buckling

SPH on GPU with CUDA

Spherical harmonic transform on heterogeneous architectures using hybrid programming

Spherical harmonic transform with GPUs

Spiking Neural Networks for Real-Time Infrared Images Processing in Thermo Vision Systems

SPIRE, a Sequential to Parallel Intermediate Representation Extension

Split tiling for GPUs: automatic parallelization using trapezoidal tiles

Splotch: porting and optimizing for the Xeon Phi

SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place

SPOC: GPGPU Programming Through Stream Processing With OCaml

Sponge: portable stream programming on graphics engines

Spotting Radio Transients with the help of GPUs

SPRAT: Runtime processor selection for energy-aware computing

Spring-Bead Animation of Viscoelastic Materials

Springald: GPU-Accelerated Window-Based Aggregates Over Out-of-Order Data Streams

Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks

SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs

SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading

SRP Based Natural Interaction between Real and Virtual Worlds in Augmented Reality

SSE Vectorized and GPU Implementations of Arakawa’s Formula for Numerical Integration of Equations of Fluid Motion

SSLPV: subsurface light propagation volumes

SSLShader: Cheap SSL Acceleration with Commodity Processors

Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU

Stabilized Backward Diffusion for Partial Volume Correction

Stable fluids

Stable large-scale solver for Ginzburg-Landau equations for superconductors

Stack-less SIMT reconvergence at low cost

Stackless KD-Tree Traversal for High Performance GPU Ray Tracing

Stadium Hashing: Scalable and Flexible Hashing on GPUs

Staggered fermions simulations on GPUs

STAR-RT: Visual attention for real-time video game playing

Starchart: Hardware and Software Optimization Using Recursive Partitioning Regression Trees

Stargazer: Automated Regression-Based GPU Design Space Exploration

STARK: Strategic Team of Agents for Refining Kernels

StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators

StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

State Lattice-based Motion Planning for Autonomous On-Road Driving

State of The Art Report on GPU

State of the Art Report on Real-time Rendering with Hardware Tessellation

State-Based Gauss-Seidel Framework for Real-time 2D Ultrasound Image Sequence Denoising on GPUs

State-of-the-art in heterogeneous computing

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs

Brief statistics for this page

Titles: 100

Download open PDFs: 93

Package packages: 19

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)