high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Static Analysis and Dynamic Adaptation of Parallelism

Static and Dynamic Analyses for Efficient GPU Execution

Static Compilation Analysis for Host-Accelerator Communication Optimization

Static GPU threads and an improved scan algorithm

Static Memory Access Pattern Analysis on a Massively Parallel GPU

Statistical Computing With Graphics Processing Units

Statistical constraints on binary black hole inspiral dynamics

Statistical Power Consumption Analysis and Modeling for GPU-based Computing

Statistical power modeling of GPU kernels using performance counters

Statistical testing of random number sequences using CUDA

stdgpu: Efficient STL-like Data Structures on the GPU

Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities

Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL

Stellar-mass black holes in star clusters: implications for gravitational wave radiation

Stencil and Lattice Structures for Field Equation Model Simulations on GPUs

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Stencil shadow volumes for complex and deformable objects

Stencil-Aware GPU Optimization of Iterative Solvers

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

StePS: A Multi-GPU Cosmological N-body Code for Compactified Simulations

Stereo depth with a Unified Architecture GPU

Stereo Matching Algorithm Using Population-Based Incremental Learning on GPU

Stereo Matching using Multi-Resolution Images on CUDA

Stereoscopic Ray Tracing on Graphics Processors

Stereoscopic Scene Flow Computation for 3D Motion Understanding

Stereovision On GPU

StitchCUDA: An Automated Multi-Agents End-to-End GPU Programing Framework with Rubric-based Agentic Reinforcement Learning

Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit

Stochastic Differential Equations simulation using GPU

Stochastic DT-MRI Connectivity Mapping on the GPU

Stochastic Gradient Descent on GPUs

Stochastic Progressive Photon Mapping for Dynamic Scenes

Stochastic transparency

STOCHSIMGPU: Parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB

Stock trading strategy creation using GP on GPU

StoreGPU: exploiting graphics processing units to accelerate distributed storage systems

Strain Visualization of Ultra Sound Signals Processed by General Purpose Graphic Process Unit

Strassen’s Matrix Multiplication on GPUs

Strategies for Maximizing Utilization in multi-CPU & multi-GPU Heterogeneous Architectures

Strategies for Optimization of Parallel Programs

Strategies for preparing computer science students for the multicore world

Strategies for Protecting Intellectual Property when Using CUDA Applications on Graphics Processing Units

Strategies for the Heterogeneous Execution of Large-Scale Simulations on Hybrid Supercomputers

Strategies to minimise the total run time of cyclic graph based genetic programming with GPUs

Strategy Preserving Compilation for Parallel Functional Code

Stream computing on graphics hardware

Stream Join Processing on Heterogeneous Processors

Stream processing for fast and efficient rotated Haar-like features using rotated integral images

Stream Processing of Integral Images for Real-Time Object Detection

Stream processing of moment invariants for real-time classifiers

Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs

StreamBlocks: A compiler for heterogeneous dataflow computing

StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

Streamed Watershed Transform on GPU for Processing of Large Volume Data

Streaming Algorithms for Biological Sequence Alignment on GPUs

Streaming Applications on Heterogeneous Platforms

Streaming architectures and technology trends

Streaming Data from HDD to GPUs for Sustained Peak Performance

Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow

Streaming GPU Singular Value and Dynamic Mode Decompositions

Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks

Streaming-Oriented Parallelization of Domain-Independent Irregular Kernels

STREAMIT: Dynamic visualization and interactive exploration of text streams

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

StreamMR: An Optimized MapReduce Framework for AMD GPUs

StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing

Strega: An HTTP Server for FPGAs

Stress Tensor Field Visualization for Implant Planning in Orthopedics

Stressing the BER simulation of LDPC codes in the error floor region using GPU clusters

String Algorithm on GPGPU

String Matching on a Multicore GPU Using CUDA

Striped Smith-Waterman speeds database searches six times over other SIMD implementations

Strong scaling of general-purpose molecular dynamics simulations on GPUs

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Structural, dynamic, and electrostatic properties of fully hydrated DMPC bilayers from molecular dynamics simulations accelerated with graphical processing units (GPUs)

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU Accelerators

STT-RAM for Shared Memory in GPUs

Studies Concerning the ATLAS IBL Calibration Architecture

Studies of quantum dots: Ab initio coupled-cluster analysis using OpenCL and GPU programming

Studies on CUDA Offloading for Real-Time Simulation and Visualization

Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures

Study and evaluation of improved automatic GPU offloading method

Study for measurement method for coal volume on base of GPU

Study of Bandwidth Partitioning for Co-executing GPU Kernels

Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

Study of Convolution Algorithms using CPU and Graphics Hardware

Study of low density nuclear matter with quantum molecular dynamics: the role of the symmetry energy

Study of OpenCL Processing Models for FPGA Devices

Study of Sparse-Matrix Vector Multiplication (SpMV) on Different Architectures and Libraries

Study on acceleration technique for calculating near field of horn antenna based on GPU

Study on acceleration technique for two-dimensional FDTD algorithm based on GPU

Study on GPU-accelerated extraction of interconnects parasitic using CUDA and MPI

Study on semi-global matching algorithm extended for multi baseline matching and parallel processing method based on GPU

Study on Transient Temperature Field Parallel Computing in Cooling Control Based on a GPU Fourier Method

Study on volume rendering of CT slices based on ray casting

Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes

Studying the core-cusp problem in cold dark matter halos using N-body simulations on GPU clusters

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

Studying Thermal Management for Graphics-Processor Architectures

Brief statistics for this page

Titles: 100

Download open PDFs: 85

Package packages: 16

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)