high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

A comparison of CPU and GPU performance for Fourier pseudospectral simulations of the Navier-Stokes, Cubic Nonlinear Schrodinger and Sine Gordon Equations

A Comparison of CPU and OpenCL Parallelization Methods for Correlation and Graph Layout Algorithms used in the Network Analysis of High Dimensional Data

A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation

A Comparison of FPGA and GPU for Real-Time Phase-based Optical Flow, Stereo, and Local Image Features

A Comparison of GPU Execution Time Prediction using Machine Learning and Analytical Modeling

A Comparison of Gradient Estimation Methods for Volume Rendering on Unstructured Meshes

A Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map Calculation Example

A comparison of HPC-based quantum computing simulators using Quantum Volume

A Comparison of Many-threaded Differential Evolution and Genetic Algorithms on CUDA

A Comparison of Massively Parallel Programming Models Through Applications in Sound Propagation and Jitter Measurement

A Comparison of Modern GPU and CPU Architectures: And the Common Convergence of Both

A Comparison of OpenCL, CUDA, and HIP as Compilation Targets for a Functional Array Language

A Comparison of Optimal Scanline Voxelization Algorithms

A comparison of period finding algorithms

A Comparison of Potential Interfaces for Batched BLAS Computations

A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities

A Comparison of Serial & Parallel Particle Filters for Time Series Analysis

A Comparison of Statistical Techniques for Detecting Side-Channel Information Leakage in Cryptographic Devices

A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations

A Comparison of the performance of HPC Accelerators

A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming Models

A Comparison of Two Methods for Geometric Milling Simulation Accelerated by GPU

A Comparison of xPU Platforms Exemplified with Ray Tracing Algorithms

A Compile-Time Managed Multi-Level Register File Hierarchy

A Compiler and Runtime for Heterogeneous Computing

A compiler for high performance computing with many-core accelerators

A Compiler for Throughput Optimization of Graph Algorithms on GPUs

A compiler framework for optimization of affine loop nests for gpgpus

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

A Compiler Infrastructure for Accelerator Generators

A Compiler Infrastructure for Embedded Multicore SoCs

A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems

A Complete and Efficient CUDA-Sharing Solution for HPC Clusters

A Complete Descritpion of the UnPython and Jit4GPU Framework

A complete modular resultant algorithm targeted for realization on graphics hardware

A comprehensive analysis and parallelization of an image retrieval algorithm

A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

A Comprehensive Deep Learning Library Benchmark and Optimal Library Selection

A Comprehensive Performance Analysis of HSA and OpenCL 2.0

A Comprehensive Performance Comparison of CUDA and OpenCL

A comprehensive study of Dynamic Memory Management in OpenCL kernels

A Comprehensive Survey on Various Evolutionary Algorithms on GPU

A Computational Comparison of Basis Updating Schemes for the Simplex Algorithm on a CPU-GPU System

A Computational Model of Afterimages

A Computational Realization of a Semi-Lagrangian Method for Solving the Advection Equation

A computationally efficient and scalable approach for privacy preserving kNN classification

A Computationally Efficient Approach for Exemplar-based Color Image Inpainting using GPU

A Computationally Efficient Parallel Kernel Regression for Image Reconstruction

A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI Engines

A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

A Computing Kernel for Network Binarization on PyTorch

A computing origami: Optimized code generation for emerging parallel platforms

A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors

A constant-space belief propagation algorithm for stereo matching

A Consumer Application for GPGPUs: Desktop Search

A Container-Based Workflow for Distributed Training of Deep Learning Algorithms in HPC Clusters

A Contour-Guided Deformable Image Registration Algorithm for Adaptive Radiotherapy

A control-structure splitting optimization for GPGPU

A convex formulation for color image segmentation in the context of passive emitter localization

A Convex Relaxation Approach to Space Time Multi-view 3D Reconstruction

A Convolutional Neural Network Cascade for Face Detection

A CPU and GPU Heterogeneous Processing of Multimedia Data by using OpenCL

A CPU-GPU Hybrid Runtime for the Aeminium Language

A CPU+FPGA OpenCL Heterogeneous Computing Platform for Multi-Kernel Pipeline

A Cross-Input Adaptive Framework for GPU Programs Optimization

A Cross-platform Evaluation of Graphics Shader Compiler Optimization

A CUDA Back-End for the Equelle Compiler

A CUDA Based Implementation of an Image Authentication Algorithm

A CUDA based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization

A CUDA Implementation of Independent Component Analysis in the Time-Frequency Domain

A CUDA implementation of the High Performance Conjugate Gradient benchmark

A CUDA Kernel Scheduler Exploiting Static Data Dependencies

A CUDA Monte Carlo simulator for radiation therapy dosimetry based on Geant4

A CUDA SIMT Interpreter for Genetic Programming

A CUDA SIMT interpreter for genetic programming. Revised

A CUDA-Based Cooperative Evolutionary Multi-Swarm Optimization Applied to Engineering Problems

A CUDA-Based Implementation of Stable Fluids in 3D with Internal and Moving Boundaries

A CUDA-based parallel implementation of K-nearest neighbor algorithm

A CUDA-Based Real Parameter Optimization Benchmark

A CUDA-enabled Parallel Implementation of Collaborative Filtering

A curved-element unstructured discontinuous Galerkin method on GPUs for the Euler equations

A Customized 3D GPU Poisson Solver for Free BCs

A Data Communication Scheduler for Stream Programs on CPU-GPU Platform

A Data Parallel Algorithm for Seismic Raytracing

A data parallel approach to genetic programming using programmable graphics hardware

A data parallel view on polyhedral process networks

A Data-Driven Model for Anisotropic Heterogeneous Subsurface Scattering

A Data-oriented Method for Scheduling Dependent Tasks on High-density Multi-GPU Systems

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

A Data-Parallel Extension to Ruby for GPGPU

A Data-Parallel Graphics Pipeline Implemented in OpenCL

A dataflow-like programming model for future hybrid clusters

Brief statistics for this page

Titles: 100

Download open PDFs: 92

Package packages: 21

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)