high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR: Some Early Results

High Performance GPU Implementation of KNN Algorithm: A Review

High Performance GPU-based Fourier Volume Rendering

High Performance GPU-based Proximity Queries using Distance Fields

High performance high-order numerical methods: applications in ocean modeling

High performance histogramming on massively parallel processors

High Performance Histograms on SIMT and SIMD Architectures

High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA

High performance implementation of hydrodynamic interactions and applications with the sub-cellular element method

High Performance Implementation of Ultrasound Color Doppler Imaging on GPU platform

High performance in silico virtual drug screening on many-core processors

High Performance Iterative Solver for Linear System using Multi GPU

High Performance Lattice Boltzmann Solvers on Massively Parallel Architectures with Applications to Building Aeraulics

High Performance Low Power Embedded Vision Systems

High performance massively parallel direct N-body simulations on large GPU clusters

High Performance Matrix Inversion on a Multi-core Platform with Several GPUs

High Performance Matrix Multiplication

High performance memetic algorithm particle filter for multiple object tracking on modern GPUs

High performance methods for frequent pattern mining

High Performance Monte Carlo and Time-Stepping Dynamics for the Classical Spin Heisenberg Model on GPUs

High Performance Monte Carlo Simulation of Ising Model on TPU Clusters

High performance MRI simulations of motion on multi-GPU systems

High Performance Multi-agent System based Simulations

High Performance Multi-dimensional (2D/3D) FFT-Shift Implementation on Graphics Processing Units (GPUs)

High Performance N-Body Simulation and Visualization through CUDA Architecture

High Performance Non-Blocking Collective Communication for Next Generation Infiniband Clusters

High Performance Parallel Design Based on Session Programming

High Performance Parallel Implementation of Compressive Sensing SAR Imaging

High performance pattern matching and data remanence on graphics processing units

High Performance Poisson Equation Solver for Hybrid CPU/GPU Systems

High Performance Portable Tsunami Simulations on Many-core CPU, GPU, and FPGA

High Performance Power Spectrum Analysis Using a FPGA Based Reconfigurable Computing Platform

High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs

High Performance Privacy Preserving AI

High Performance Processor Development for Consumer Electronics Game Processor Perspective

High Performance Programming for Soft Computing

High Performance Radiation Transport Simulations: Preparing for TITAN

High performance realtime vision for mobile robots on the GPU

High Performance Relevance Vector Machine on GPUs

High Performance Remote Sensing Image Processing Using CUDA

High performance sequence mining using pairwise statistical significance

High Performance Simulation for Scalable Multi-Agent Reinforcement Learning

High Performance Stencil Code Algorithms for GPGPUs

High Performance Stencil Code Generation with Lift

High Performance Stereo Vision Designed for Massively Data Parallel Platforms

High performance stream computing for particle beam transport simulations

High Performance Streaming Smith-Waterman Implementation with Implicit Synchronization on Intel FPGA using OpenCL

High performance system for the Interactive rendering of a 3D Model into MPEG-4

High Performance System in GPU and CUDA Media Processing System

High performance technique for database applications using a hybrid GPU/CPU platform

High performance transcription factor-DNA docking with GPU computing

High performance volume splatting for visualization of neurovascular data

High Precision Integer Multiplication with a GPU Using Strassen’s Algorithm with Multiple FFT Sizes

High precision integer multiplication with a graphics processing unit

High productivity multi-device exploitation with the Heterogeneous Programming Library

High Quality Cone-beam CT Reconstruction on the GPU

High Quality Elliptical Texture Filtering on GPU

High Quality Image Reconstruction of Point Models

High Quality Interactive Rendering of Massive Point Models Using Multi-way kd-Trees

High Rayleigh Number Mantle Convection on GPU

High Resolution Program Flow Visualization of Hardware Accelerated Hybrid Multi-core Applications

High Resolution Sparse Voxel DAGs

High speed 3-D registration using GPU

High Speed Articulated Object Tracking Using GPUs: A Particle Filter Approach

High speed cipher cracking: the case of Keeloq on CUDA

High Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing

High speed view interpolation for tele-teaching and tele-conferencing

High Throughput Low Latency LDPC Decoding on GPU for SDR Systems

High throughput multiple-precision GCD on the CUDA architecture

High Throughput Variable Size Non-square Gabor Engine with Feature Pooling Based on GPU

High-accuracy Optimization by Parallel Iterative Discrete Approximation and GPU Cluster Computing

High-accuracy Optimization by Parallel Iterative Discrete Approximation and Multi-GPU Computing

High-Dimensional Adaptive Particle Swarm Optimization on Heterogeneous Systems

High-dimensional Planning on the GPU

High-dimensional wave atoms and compression of seismic datasets

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

High-Level Design for FPGA-based Multiprocessor Accelerators

High-Level Energy Model of Embedded GPU for Real-Time Graphic Rendering

High-level GPU computing with jacket for MATLAB and C/C++

High-level GPU programming in Julia

High-Level Manipulation of OpenCL-Based Subvectors and Submatrices

High-level Parallel Programming Support for Heterogeneous Systems

High-Level Programming Framework for Executing Streaming Applications on Heterogeneous OpenCL Platforms

High-Level programming of graphics hardware to increase performance of electromagnetics simulation

High-level Programming of Vulkan-based GPUs Through OpenMP

High-Level Support for Pipeline Parallelism on Many-Core Architectures

High-Level Synthesis for FPGAs: From Prototyping to Deployment

High-Order Algorithms for Compressible Reacting Flow with Complex Chemistry

High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

High-Order Error-Optimized FDTD Algorithm With GPU Implementation

High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

High-Order Schemes for the Shallow Water Equations on GPUs

High-order thread-safe lattice Boltzmann model for HPC turbulent flow simulations

High-performance 3D Compressive Sensing MRI reconstruction

High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures

High-performance and Embedded Systems for Cryptography

High-performance and Hardware-aware Computing: Proceedings of the First International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC’08)

High-performance astrophysical visualization using Splotch

High-performance bankruptcy prediction model using Graphics Processing Units

High-performance biocomputing for simulating the spread of contagion over large contact networks

Brief statistics for this page

Titles: 100

Download open PDFs: 88

Package packages: 12

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)