high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Solving the Quadratic Assignment Problem on heterogeneous environment (CPUs and GPUs) with the application of Level 2 Reformulation and Linearization Technique

Solving the Vlasov equation for one-dimensional models with long range interactions on a GPU

Solving very large instances of the scheduling of independent tasks problem on the GPU

Solving Wave Equations on Unstructured Geometries

Some examples of instant computations of fluid dynamics on GPU

Some Graph Algorithms And Related Primitives For The GPU

Some of the What?, Why?, How?, Who? and Where? of Graphics Processing Unit Computing for Bayesian Analysis

SOMGPU: An unsupervised pattern classifier on Graphical Processing Unit

Somoclu: An Efficient Distributed Library for Self-Organizing Maps

Sop-GPU: Accelerating biomolecular simulations in the centisecond timescale using graphics processors

Soren: Adaptive MapReduce for Programmable GPUs

Sort-First Parallel Volume Rendering

Sorting and Permuting without Bank Conflicts on GPUs

Sorting On A Graphics Processing Unit (GPU)

Sorting on FPGAs using Merge Trees

Sorting on GPUs for large scale datasets: A thorough comparison

Sorting with GPUs: A Survey

Sound and Partially-Complete Static Analysis of Data-Races in GPU Programs

Sound Speed Optimization Using Image Texture on CUDA

Sound Synthesis Using Physical Modeling on Heterogeneous Computing Platforms

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Source-to-Source Optimization of CUDA C for GPU Accelerated Cardiac Cell Modeling

Source-to-Source Transformations for GPU Code Generation

Source-to-source transformations for irregular and multithreaded code optimization

Space and the Synchronic A-Ram

Space Charge Dominated Envelope Dynamics Using GPUs

Space-Time Finite Element Analysis on Graphics Processing Unit Computing Platform

Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

Spark: modular, composable shaders for graphics hardware

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

SparkJNI: A Reference Design for a Heterogeneous Apache Spark Framework

Sparse Approximate Inverse Preconditioners for Iterative Solvers on GPUs

Sparse array representations and some selected array operations on GPUs

Sparse Convex Optimization on GPUs

Sparse direct solvers with accelerators over DAG runtimes

Sparse GPU Kernels for Deep Learning

Sparse LU Factorization for Parallel Circuit Simulation on GPU

Sparse Matrix Algorithms Using GPGPU

Sparse matrix computations on manycore GPU’s

Sparse Matrix Formats Evaluation and Optimization on a GPU

Sparse Matrix Matrix Multiplication on Hybrid CPU+GPU Platforms

Sparse Matrix Multiplication using CUDA and Mex Interface

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

Sparse matrix solvers on the GPU: conjugate gradients and multigrid

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse Matrix-Vector Multiplication on GPGPUs

Sparse Matrix-Vector Multiplication on GPU

Sparse Matrix-Vector Multiplication on NVIDIA GPU

Sparse Recovery on GPUs: Accelerating the Iterative Soft-Thresholding Algorithm

Sparse regularization in MRI iterative reconstruction using GPUs

Sparse systems solving on GPUs with GMRES

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

Sparse-Matrix support for the SkePU library for portable CPU/GPU programming

Sparse-Matrix-CG-Solver in CUDA

Sparselet Models for Efficient Multiclass Object Detection

Sparser, Better, Faster GPU Parsing

Spatial Data Structures, Sorting and GPU Parallelism for Situated-agent Simulation and Visualisation

Spatial Indexing of Large-Scale Geo-Referenced Point Data on GPGPUs Using Parallel Primitives

Spatial interpolation in massively parallel computing environments

Spatial interpolation of scattered geoscientific data

Spatial Join with R-Tree on Graphics Processing Units

Spatial Sorting Algorithms for Parallel Computing in Networks

Spatial splits in bounding volume hierarchies

Spatial: A Language and Compiler for Application Accelerators

Spatio-temporal upsampling on the GPU

Spatter: A Benchmark Suite for Evaluating Sparse Access Patterns

SpecGen: Accelerating Agentic Kernel Optimization with Speculative Generation

Brief statistics for this page

Titles: 100

Download open PDFs: 94

Package packages: 13

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)