Papers on hgpu.org (.txt-file)
Nested Data-Parallelism on the GPU

Nested Intervals Tree Encoding with System of Residual Classes

Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations

NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems

Network Simulator Tools and GPU Parallel Systems

Network-on-Chip Hardware Accelerators for Biological Sequence Alignment

Neural Architecture Search for Lightweight Non-Local Networks

Neural Architecture Search without Training

Neural Code Comprehension: A Learnable Representation of Code Semantics

Neural Decoding using a Parallel Sequential Monte Carlo method on Point Processes with Ensemble Effect

Neural Multi-scale Image Compression

Neural Network Computing Using On-Chip Accelerators

Neural Network Implementation Using CUDA and OpenMP
Neural Network Inference on Mobile SoCs

Neural Network Libraries: A Deep Learning Framework Designed from Engineers’ Perspectives

Neural network modeling on evolution of hydration reaction for Portland cement
Neural Network Simulation: The recognition application

Neural Networks for Beginners. A fast implementation in Matlab, Torch, TensorFlow

Neural Networks through Shared Maps in Mobile Devices

Neural Query Language: A Knowledge Base Query Language for Tensorflow

Neural scene representation and rendering

Neurokernel: An Open Scalable Software Framework for Emulation and Validation of Drosophila Brain Models on Multiple GPUs

Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain

Neuromorphic models on a GPGPU cluster
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA
New Basic Linear Algebra Methods for Simulation on GPUs

New efficient integral algorithms for quantum chemistry

New Efficient Method To Solve Longest Overlap Region Problem For Noncoding DNA Sequence

New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code

New Row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA

New Sparse Matrix Storage Format to Improve The Performance of Total SPMV Time

New Techniques for Spectral Image Acquisition and Analysis

Next-generation acceleration and code optimization for light transport in turbid media using GPUs

nGFSIM: A GPU-based fault simulator for 1-to-n detection and its applications

Nikola: embedding compiled GPU functions in Haskell

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

NLSEmagic: Nonlinear Schrodinger Equation Multidimensional Matlab-based GPU-accelerated Integrators using Compact High-order Schemes

NMF-mGPU: non-negative matrix factorization on multi-GPU systems

nmfgpu4R: GPU-Accelerated Computation of the Non-Negative Matrix Factorization (NMF) Using CUDA Capable Hardware

NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics

NNS: The Case For Neural Network-based Sorting

No More Shading Languages: Compiling C++ to Vulkan Shaders

Nodal Discontinuous Galerkin Methods on Graphics Processors

Noise Removal from Remote Sensed Images by NonLocal Means with OpenCL Algorithm

Noise-resistant fitting for spherical harmonics

Non-blocking programming on multi-core graphics processors: (extended asbtract)

Non-Determinism in TensorFlow ResNets

Non-deterministic parallelism considered useful

Non-Hydrostatic Pressure Shallow Flows: GPU Implementation Using Finite-Volume and Finite-Difference Scheme

Non-local means denoising algorithm accelerated by GPU
Non-Local Total Generalized Variation for Optical Flow Estimation

Non-Parametric Adaptive Network Pruning

Non-recursive beam search on GPU for formal concept analysis

Non-rigid multi-modal registration on the GPU
Non-separable 2D, 3D and 4D filtering with CUDA

Non-steady relaxation and critical exponents at the depinning transition

Non-symmetric magnetohydrostatic equilibria: a multigrid approach

Non-Uniform Domain Decomposition for Heterogeneous Accelerated Processing Units

Non-Uniformly Partitioned Block Convolution on Graphics Processing Units

Nonlinear Dynamic Analysis Efficiency by Using a GPU Parallelization

Nonlinear dynamic finite element analysis with GPU

Nonlinear optimization framework for image-based modeling on programmable graphics hardware

Nonmetric Priors for Continuous Multilabel Optimization

Nonnegative Tensor Factorization Accelerated Using GPGPU
Nonperturbative Quantum Field Theory in Astrophysics

Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

NOVA: A Functional Language for Data Parallelism

Novel Architectures: Solving Computational Problems with GPU Computing
Novel Data-Partitioning Algorithms for Performance and Energy Optimization of Data-Parallel Applications on Modern Heterogeneous HPC Platforms

Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices

Novel implementations of recursive discrete wavelet transform for real time computation with multicore systems on chip (SOC)

Novel insights on atomic synchronization for sort-based group-by on GPUs

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Novel Multi-Layer Network Decomposition Boosting Acceleration of Multi-core Algorithms

Novel Parallel Approaches to Efficiently Solve Spatial Problems on Heterogeneous CPU-GPU Systems

Novel Parallelization Strategies for High-Performance DNN Training on HPC Systems

NPBench: A Benchmarking Suite for High-Performance NumPy

NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers

NQueens on CUDA: Optimization Issues
Nsight Python: A Python-First Profiling Toolkit for Seamless GPU Kernel Analysis (Tool)

NT-SIM: A Co-Simulator for Networked Signal Processing Applications

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

Nucleation Studies on Graphics Processing Units

Nuclei: GPU-Accelerated Many-Core Network Coding

NUMA Data-Access Bandwidth Characterization and Modeling

NUMA-Aware Image Compositing on Multi-GPU Platform

Numerical Accuracy Analysis Based on the Discrete Stochastic Arithmetic on Multiprocessor Platforms

Numerical Accuracy Differences in CPU and GPGPU Codes

Numerical computations in Java with CUDA

Numerical Computations with GPUs

Numerical cosmology on the GPU with Enzo and Ramses

Numerical integration on GPUs for higher order finite elements

Numerical investigations on nonlinear nonparaxial beam propagation using graphics processing units

Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects

Titles: 100
open PDFs: 87
packages: 27
