high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Nonlinear Dynamic Analysis Efficiency by Using a GPU Parallelization

Nonlinear dynamic finite element analysis with GPU

Nonlinear optimization framework for image-based modeling on programmable graphics hardware

Nonlinear optimization with a massively parallel Evolution Strategy-Pattern Search algorithm on graphics hardware

Nonmetric Priors for Continuous Multilabel Optimization

Nonnegative Tensor Factorization Accelerated Using GPGPU

Nonperturbative Quantum Field Theory in Astrophysics

Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

NOVA: A Functional Language for Data Parallelism

Novel Architectures: Solving Computational Problems with GPU Computing

Novel Computing Architectures

Novel Data-Partitioning Algorithms for Performance and Energy Optimization of Data-Parallel Applications on Modern Heterogeneous HPC Platforms

Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices

Novel implementations of recursive discrete wavelet transform for real time computation with multicore systems on chip (SOC)

Novel insights on atomic synchronization for sort-based group-by on GPUs

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Novel Multi-Layer Network Decomposition Boosting Acceleration of Multi-core Algorithms

Novel Parallel Approaches to Efficiently Solve Spatial Problems on Heterogeneous CPU-GPU Systems

Novel Parallelization Strategies for High-Performance DNN Training on HPC Systems

NPBench: A Benchmarking Suite for High-Performance NumPy

NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers

NQueens on CUDA: Optimization Issues

Nsight Python: A Python-First Profiling Toolkit for Seamless GPU Kernel Analysis (Tool)

NT-SIM: A Co-Simulator for Networked Signal Processing Applications

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

Nucleation Studies on Graphics Processing Units

Nuclei: GPU-Accelerated Many-Core Network Coding

NUMA Data-Access Bandwidth Characterization and Modeling

NUMA-Aware Image Compositing on Multi-GPU Platform

Numerical Accuracy Analysis Based on the Discrete Stochastic Arithmetic on Multiprocessor Platforms

Numerical Accuracy Differences in CPU and GPGPU Codes

Numerical computations in Java with CUDA

Numerical Computations with GPUs

Numerical cosmology on the GPU with Enzo and Ramses

Numerical integration on GPUs for higher order finite elements

Numerical investigations on nonlinear nonparaxial beam propagation using graphics processing units

Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects

Numerical Model of Shallow Water: the Use of NVIDIA CUDA Graphics Processors

Numerical Modeling of Atmospheric Vortices

Numerical modeling of gravitational wave sources accelerated by OpenCL

Numerical Ocean Modeling and Simulation with CUDA

Numerical Parallel Processing Based on GPU with CUDA Architecture

Numerical Precision and Benchmarking Very-High-Order Integration of Particle Dynamics on GPU Accelerators

Numerical resolution of conservation laws with OpenCL

Numerical Simulation for the MHD System in 2D Using OpenCL

Numerical simulation of 3D particulate flows based on GPU technology

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

Numerical Simulation of the Complex Ginzburg-Landau Equation on GPUs with CUDA

Numerical Simulation of the Frank-Kamenetskii PDE: GPU vs. CPU Computing

Numerical simulations of acoustic waves with the graphic acceleration GAMER code

Numerical solution of PDEs with hybrid and heterogeneous computing models

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Numerical Study of Geometric Multigrid Methods on CPU–GPU Heterogeneous Computers

NUPAR: A Benchmark Suite for Modern GPU Architectures

NVIDIA CUDA software and gpu parallel computing architecture

NVIDIA Nemotron Parse 1.1

NVIDIA SimNet: an AI-accelerated multi-physics simulation framework

NVIDIA Tensor Core Programmability, Performance & Precision

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Object Detection Based Handwriting Localization

Object Oriented Framework for CUDA based Pyramidal Image Blending

Object oriented framework for real-time image processing on GPU

Object Space Based Collision Detection for Cloth Simulation on the GPU

Object support for OpenMP-style programming of GPU clusters in Java

Object-oriented stream programming using aspects

Object-oriented stream programming using Aspects: a high-productivity programming paradigm for hybrid platforms

Objective-Driven Workload Allocation in Heterogeneous Computing Systems

Obsidian: GPU Kernel Programming in Haskell (thesis)

Obsidian: GPU Programming in Haskell

Obtaining a 35x Speedup in 2D Phase Unwrapping Using Commodity Graphics Processors

OCCA: A unified approach to multi-threading languages

Ocean wave simulation in real-time using GPU

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware

OCLoptimizer: An Iterative Optimization Tool for OpenCL

OCT on CUDA: Speeding up the image reconstruction algorithm for an Optical Coherence Tomography system using NVIDIA’s CUDA platform

Oct-tree Method on GPU

Octree Light Propagation Volumes

Octree-based, GPU implementation of a continuous cellular automaton for the simulation of complex, evolving surfaces

Odeint – Solving ordinary differential equations in C++

Odyssey: A Public GPU-Based Code for General-Relativistic Radiative Transfer in Kerr Spacetime

Off-axis quantitative phase imaging processing using CUDA: toward real-time applications

Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads

Offload Compiler Runtime for the Intel Xeon Phi Coprocessor

Offloading Critical Security Operations to the GPU

Offloading IDS Computation to the GPU

Offloading Java to Graphics Processors

Offloading Region Matching of Data Distribution Management with CUDA

Offset, Bisector and Medial Axis Construction on NURBS Surface Based on GPU

OKL: A Unified Language for Parallel Architectures

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

Omniwise: Predicting GPU Kernels Performance with LLMs

OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

OmpSs task offload

On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment

On algorithmic reductions in task-parallel programming models

Brief statistics for this page

Titles: 100

Download open PDFs: 89

Package packages: 15

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Papers on hgpu.org (.txt-file)

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)