Papers on hgpu.org (.txt-file)
Non-recursive beam search on GPU for formal concept analysis

Non-rigid multi-modal registration on the GPU
Non-separable 2D, 3D and 4D filtering with CUDA

Non-steady relaxation and critical exponents at the depinning transition

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

Non-Uniform Domain Decomposition for Heterogeneous Accelerated Processing Units

Non-Uniformly Partitioned Block Convolution on Graphics Processing Units

Nonlinear Dynamic Analysis Efficiency by Using a GPU Parallelization

Nonlinear dynamic finite element analysis with GPU

Nonlinear optimization framework for image-based modeling on programmable graphics hardware

Nonmetric Priors for Continuous Multilabel Optimization

Nonnegative Tensor Factorization Accelerated Using GPGPU
Nonperturbative Quantum Field Theory in Astrophysics

Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

NOVA: A Functional Language for Data Parallelism

Novel Architectures: Solving Computational Problems with GPU Computing
Novel Data-Partitioning Algorithms for Performance and Energy Optimization of Data-Parallel Applications on Modern Heterogeneous HPC Platforms

Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices

Novel implementations of recursive discrete wavelet transform for real time computation with multicore systems on chip (SOC)

Novel insights on atomic synchronization for sort-based group-by on GPUs

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Novel Multi-Layer Network Decomposition Boosting Acceleration of Multi-core Algorithms

Novel Parallel Approaches to Efficiently Solve Spatial Problems on Heterogeneous CPU-GPU Systems

Novel Parallelization Strategies for High-Performance DNN Training on HPC Systems

NPBench: A Benchmarking Suite for High-Performance NumPy

NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers

NQueens on CUDA: Optimization Issues
NT-SIM: A Co-Simulator for Networked Signal Processing Applications

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

Nucleation Studies on Graphics Processing Units

Nuclei: GPU-Accelerated Many-Core Network Coding

NUMA Data-Access Bandwidth Characterization and Modeling

NUMA-Aware Image Compositing on Multi-GPU Platform

Numerical Accuracy Analysis Based on the Discrete Stochastic Arithmetic on Multiprocessor Platforms

Numerical Accuracy Differences in CPU and GPGPU Codes

Numerical computations in Java with CUDA

Numerical Computations with GPUs

Numerical cosmology on the GPU with Enzo and Ramses

Numerical integration on GPUs for higher order finite elements

Numerical investigations on nonlinear nonparaxial beam propagation using graphics processing units

Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects

Numerical Model of Shallow Water: the Use of NVIDIA CUDA Graphics Processors

Numerical Modeling of Atmospheric Vortices

Numerical modeling of gravitational wave sources accelerated by OpenCL

Numerical Ocean Modeling and Simulation with CUDA

Numerical Parallel Processing Based on GPU with CUDA Architecture
Numerical Precision and Benchmarking Very-High-Order Integration of Particle Dynamics on GPU Accelerators

Numerical resolution of conservation laws with OpenCL

Numerical Simulation for the MHD System in 2D Using OpenCL

Numerical simulation of 3D particulate flows based on GPU technology

Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU

Numerical Simulation of the Complex Ginzburg-Landau Equation on GPUs with CUDA

Numerical Simulation of the Frank-Kamenetskii PDE: GPU vs. CPU Computing

Numerical simulations of acoustic waves with the graphic acceleration GAMER code

Numerical solution of PDEs with hybrid and heterogeneous computing models

Numerical Solutions of Heat and Mass Transfer with the Third Kind Boundary and Initial Conditions in Capillary Porous Media Using Programmable Graphics Hardware

Numerical Study of Geometric Multigrid Methods on CPU–GPU Heterogeneous Computers

NUPAR: A Benchmark Suite for Modern GPU Architectures

NVIDIA CUDA software and gpu parallel computing architecture

NVIDIA SimNet: an AI-accelerated multi-physics simulation framework

NVIDIA Tensor Core Programmability, Performance & Precision

NVIDIA Tesla: A Unified Graphics and Computing Architecture
Object Detection Based Handwriting Localization

Object Oriented Framework for CUDA based Pyramidal Image Blending

Object oriented framework for real-time image processing on GPU

Object Space Based Collision Detection for Cloth Simulation on the GPU

Object support for OpenMP-style programming of GPU clusters in Java

Object-oriented stream programming using aspects
Object-oriented stream programming using Aspects: a high-productivity programming paradigm for hybrid platforms

Objective-Driven Workload Allocation in Heterogeneous Computing Systems

Obsidian: GPU Kernel Programming in Haskell (thesis)

Obsidian: GPU Programming in Haskell

Obtaining a 35x Speedup in 2D Phase Unwrapping Using Commodity Graphics Processors

OCCA: A unified approach to multi-threading languages

Ocean wave simulation in real-time using GPU

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware

OCLoptimizer: An Iterative Optimization Tool for OpenCL

OCT on CUDA: Speeding up the image reconstruction algorithm for an Optical Coherence Tomography system using NVIDIA’s CUDA platform

Octree Light Propagation Volumes

Odeint – Solving ordinary differential equations in C++

Odyssey: A Public GPU-Based Code for General-Relativistic Radiative Transfer in Kerr Spacetime

Off-axis quantitative phase imaging processing using CUDA: toward real-time applications

Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads

Offload Compiler Runtime for the Intel Xeon Phi Coprocessor

Offloading Critical Security Operations to the GPU

Offloading IDS Computation to the GPU

Offloading Java to Graphics Processors

Offloading Region Matching of Data Distribution Management with CUDA

Offset, Bisector and Medial Axis Construction on NURBS Surface Based on GPU
OKL: A Unified Language for Parallel Architectures

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

Titles: 100
open PDFs: 88
packages: 13
