Papers on hgpu.org (.txt-file)
Next-generation acceleration and code optimization for light transport in turbid media using GPUs

nGFSIM: A GPU-based fault simulator for 1-to-n detection and its applications

Nikola: embedding compiled GPU functions in Haskell

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

NLSEmagic: Nonlinear Schrodinger Equation Multidimensional Matlab-based GPU-accelerated Integrators using Compact High-order Schemes

NMF-mGPU: non-negative matrix factorization on multi-GPU systems

nmfgpu4R: GPU-Accelerated Computation of the Non-Negative Matrix Factorization (NMF) Using CUDA Capable Hardware

NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics

NNS: The Case For Neural Network-based Sorting

No More Shading Languages: Compiling C++ to Vulkan Shaders

Nodal Discontinuous Galerkin Methods on Graphics Processors

Noise Removal from Remote Sensed Images by NonLocal Means with OpenCL Algorithm

Noise-resistant fitting for spherical harmonics

Non-blocking programming on multi-core graphics processors: (extended asbtract)

Non-Determinism in TensorFlow ResNets

Non-deterministic parallelism considered useful

Non-Hydrostatic Pressure Shallow Flows: GPU Implementation Using Finite-Volume and Finite-Difference Scheme

Non-local means denoising algorithm accelerated by GPU
Non-Local Total Generalized Variation for Optical Flow Estimation

Non-Parametric Adaptive Network Pruning

Non-recursive beam search on GPU for formal concept analysis

Non-rigid multi-modal registration on the GPU
Non-separable 2D, 3D and 4D filtering with CUDA

Non-steady relaxation and critical exponents at the depinning transition

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

Non-Uniform Domain Decomposition for Heterogeneous Accelerated Processing Units

Non-Uniformly Partitioned Block Convolution on Graphics Processing Units

Nonlinear Dynamic Analysis Efficiency by Using a GPU Parallelization

Nonlinear dynamic finite element analysis with GPU

Nonlinear optimization framework for image-based modeling on programmable graphics hardware

Nonmetric Priors for Continuous Multilabel Optimization

Nonnegative Tensor Factorization Accelerated Using GPGPU
Nonperturbative Quantum Field Theory in Astrophysics

Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

NOVA: A Functional Language for Data Parallelism

Novel Architectures: Solving Computational Problems with GPU Computing
Novel Data-Partitioning Algorithms for Performance and Energy Optimization of Data-Parallel Applications on Modern Heterogeneous HPC Platforms

Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices

Novel implementations of recursive discrete wavelet transform for real time computation with multicore systems on chip (SOC)

Novel insights on atomic synchronization for sort-based group-by on GPUs

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Novel Multi-Layer Network Decomposition Boosting Acceleration of Multi-core Algorithms

Novel Parallel Approaches to Efficiently Solve Spatial Problems on Heterogeneous CPU-GPU Systems

Novel Parallelization Strategies for High-Performance DNN Training on HPC Systems

NPBench: A Benchmarking Suite for High-Performance NumPy

NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers

NQueens on CUDA: Optimization Issues
NT-SIM: A Co-Simulator for Networked Signal Processing Applications

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

Nucleation Studies on Graphics Processing Units

Nuclei: GPU-Accelerated Many-Core Network Coding

NUMA Data-Access Bandwidth Characterization and Modeling

NUMA-Aware Image Compositing on Multi-GPU Platform

Numerical Accuracy Analysis Based on the Discrete Stochastic Arithmetic on Multiprocessor Platforms

Numerical Accuracy Differences in CPU and GPGPU Codes

Numerical computations in Java with CUDA

Numerical Computations with GPUs

Numerical cosmology on the GPU with Enzo and Ramses

Numerical integration on GPUs for higher order finite elements

Numerical investigations on nonlinear nonparaxial beam propagation using graphics processing units

Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects

Numerical Model of Shallow Water: the Use of NVIDIA CUDA Graphics Processors

Numerical Modeling of Atmospheric Vortices

Numerical modeling of gravitational wave sources accelerated by OpenCL

Numerical Ocean Modeling and Simulation with CUDA

Numerical Parallel Processing Based on GPU with CUDA Architecture
Numerical Precision and Benchmarking Very-High-Order Integration of Particle Dynamics on GPU Accelerators

Numerical resolution of conservation laws with OpenCL

Numerical Simulation for the MHD System in 2D Using OpenCL

Numerical simulation of 3D particulate flows based on GPU technology

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

Numerical Simulation of the Complex Ginzburg-Landau Equation on GPUs with CUDA

Numerical Simulation of the Frank-Kamenetskii PDE: GPU vs. CPU Computing

Numerical simulations of acoustic waves with the graphic acceleration GAMER code

Numerical solution of PDEs with hybrid and heterogeneous computing models

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Numerical Study of Geometric Multigrid Methods on CPU–GPU Heterogeneous Computers

NUPAR: A Benchmark Suite for Modern GPU Architectures

NVIDIA CUDA software and gpu parallel computing architecture

NVIDIA SimNet: an AI-accelerated multi-physics simulation framework

NVIDIA Tensor Core Programmability, Performance & Precision

NVIDIA Tesla: A Unified Graphics and Computing Architecture
Object Detection Based Handwriting Localization

Object Oriented Framework for CUDA based Pyramidal Image Blending

Object oriented framework for real-time image processing on GPU

Object Space Based Collision Detection for Cloth Simulation on the GPU

Object support for OpenMP-style programming of GPU clusters in Java

Object-oriented stream programming using aspects
Object-oriented stream programming using Aspects: a high-productivity programming paradigm for hybrid platforms

Objective-Driven Workload Allocation in Heterogeneous Computing Systems

Obsidian: GPU Kernel Programming in Haskell (thesis)

Obsidian: GPU Programming in Haskell

Obtaining a 35x Speedup in 2D Phase Unwrapping Using Commodity Graphics Processors

OCCA: A unified approach to multi-threading languages

Titles: 100
open PDFs: 88
packages: 19
