Papers on hgpu.org (.txt-file)
Multiple Time Scales Recurrent Neural Network for Complex Action Acquisition
Multiple-GPU Scalability of Phase-Field Simulation for Dendritic Solidification
Multiple-GPUs Algorithm for Lattice Boltzmann Method
Multiple-Tasks on Multiple-Devices (MTMD): Exploiting Concurrency in Heterogeneous Managed Runtimes
Multiprocessing Acceleration of H.264/AVC Motion Estimation Full Search Algorithm under CUDA Architecture
Multireduce and Multiscan on Modern GPUs
Multiresolution Flow Simulations on Multi/many-core Architectures
Multiresolution MIP Rendering of Large Volumetric Data Accelerated on Graphics Hardware
Multiscale Hemodynamics Using GPU Clusters
Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture
Multithreaded Dense Linear Algebra on Asymmetric Multi-core Processors
Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors
Multithreading for Visual Effects
MuMax: a new high-performance micromagnetic simulation tool
MUPPET: Optimizing Performance in OpenMP via Mutation Testing
Muscle pushing based skin deformation on GPU
Mutual information computation and maximization using GPU
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
MyCaffe: A Complete C# Re-Write of Caffe with Reinforcement Learning
MYRIAD: A new N-body code for simulations of Star Clusters
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
Myths and Legends in High-Performance Computing
N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions
N-Body Simulation Using GP-GPU: Evaluating Host/Device Memory Transference Overhead
N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems
NAS Parallel Benchmarks for GPGPUs using a Directive-based Programming Model
Native Offload of Haskell Repa Programs to GPGPU
Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs
NaturalCC: A Toolkit to Naturalize the Source Code Corpus
Navier-Stokes on programmable graphics hardware using SMAC
Navigating An Evolutionary Fast Path to Exascale – Expanded Version
NBODY6++GPU: Ready for the gravitational million-body problem
NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units
NCAM: Near-Data Processing for Nearest Neighbor Search
NCRF++: An Open-source Neural Sequence Labeling Toolkit
ndzip-gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs
Near Memory Similarity Search on Automata Processors
Near real-time Fast Bilateral Stereo on the GPU
Near-LSPA Performance at MSA Complexity
Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
Nemo: A parallelized Lagrangian particle-tracking model
NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs
Neneta: Heterogeneous Computing Complex-Valued Neural Network Framework
Nengo: a Python tool for building large-scale functional brain models
NengoDL: Combining deep learning and neuromorphic modelling methods
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Neon: A Domain-Specific Programming Language for Image Processing
neoSYCL: a SYCL implementation for SX-Aurora TSUBASA
Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures
NEPTUNE: Network- and GPU-aware Management of Serverless Functions at the Edge
Nested Data-Parallelism on the GPU
Nested Intervals Tree Encoding with System of Residual Classes
Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations
NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems
Network Simulator Tools and GPU Parallel Systems
Network-on-Chip Hardware Accelerators for Biological Sequence Alignment
Neural Architecture Search for Lightweight Non-Local Networks
Neural Architecture Search without Training
Neural Code Comprehension: A Learnable Representation of Code Semantics
Neural Decoding using a Parallel Sequential Monte Carlo method on Point Processes with Ensemble Effect
Neural Multi-scale Image Compression
Neural Network Computing Using On-Chip Accelerators
Neural Network Implementation Using CUDA and OpenMP
Neural Network Inference on Mobile SoCs
Neural Network Libraries: A Deep Learning Framework Designed from Engineers’ Perspectives
Neural network modeling on evolution of hydration reaction for Portland cement
Neural Network Simulation: The recognition application
Neural Networks for Beginners. A fast implementation in Matlab, Torch, TensorFlow
Neural Networks through Shared Maps in Mobile Devices
Neural Query Language: A Knowledge Base Query Language for Tensorflow
Neural scene representation and rendering
Neurokernel: An Open Scalable Software Framework for Emulation and Validation of Drosophila Brain Models on Multiple GPUs
Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain
Neuromorphic models on a GPGPU cluster
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA
New Basic Linear Algebra Methods for Simulation on GPUs
New efficient integral algorithms for quantum chemistry
New Efficient Method To Solve Longest Overlap Region Problem For Noncoding DNA Sequence
New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code
New Row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA
New Sparse Matrix Storage Format to Improve The Performance of Total SPMV Time
New Techniques for Spectral Image Acquisition and Analysis
Next-generation acceleration and code optimization for light transport in turbid media using GPUs
nGFSIM: A GPU-based fault simulator for 1-to-n detection and its applications
Nikola: embedding compiled GPU functions in Haskell
NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
NLSEmagic: Nonlinear Schrodinger Equation Multidimensional Matlab-based GPU-accelerated Integrators using Compact High-order Schemes
NMF-mGPU: non-negative matrix factorization on multi-GPU systems
nmfgpu4R: GPU-Accelerated Computation of the Non-Negative Matrix Factorization (NMF) Using CUDA Capable Hardware
NNP/MM: Fast molecular dynamics simulations with machine learning potentials and molecular mechanics
NNS: The Case For Neural Network-based Sorting
Nodal Discontinuous Galerkin Methods on Graphics Processors
Titles: 100
open PDFs: 91
packages: 35