Papers on hgpu.org (.txt-file)
Multidimensional upwind hydrodynamics on unstructured meshes using Graphics Processing Units I. Two-dimensional uniform meshes
Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS
Multifold Acceleration of Neural Network Computations Using GPU
Multifrontal computations on GPUs and their multi-core hosts
Multifrontal Factorization of Sparse SPD Matrices on GPUs
Multifrontal Sparse Matrix Factorization on Graphics Processing Units
MultiGPU computing using MPI or OpenMP
Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms
Multigrid Optimization Methods for High Performance Computing
Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs
Multilayered Abstractions for Partial Differential Equations
Multilevel Granularity Parallelism Synthesis on FPGAs
Multilevel Multidimensional Scaling on the GPU
Multilevel summation of electrostatic potentials using graphics processing units
Multilevel Tile Load Map on Massive Terrain Visualization
Multimodal collaboration and human-computer interaction
Multimodal Image Registration Using GPU Parallel Computing Technology
Multimodality imaging and state-of-art GPU technology in discriminating benign from malignant breast lesions on real time decision support system
Multipattern String Matching On A GPU
Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU
Multiphase Fluid Simulations on a Multiple GPGPU PC Using Unsplit Time Integration VSIAM3
Multiple Bounding Boxes Algorithm in Collision Detection and Its Performances in Sequential vs CUDA Parallel Processing
Multiple String Matching on a GPU using CUDAs
Multiple Time Scales Recurrent Neural Network for Complex Action Acquisition
Multiple-GPU Scalability of Phase-Field Simulation for Dendritic Solidification
Multiple-GPUs Algorithm for Lattice Boltzmann Method
Multiple-Tasks on Multiple-Devices (MTMD): Exploiting Concurrency in Heterogeneous Managed Runtimes
Multiprocessing Acceleration of H.264/AVC Motion Estimation Full Search Algorithm under CUDA Architecture
Multireduce and Multiscan on Modern GPUs
Multiresolution Flow Simulations on Multi/many-core Architectures
Multiresolution MIP Rendering of Large Volumetric Data Accelerated on Graphics Hardware
Multiscale Hemodynamics Using GPU Clusters
Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture
Multithreaded Dense Linear Algebra on Asymmetric Multi-core Processors
Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors
Multithreading for Visual Effects
MuMax: a new high-performance micromagnetic simulation tool
MUPPET: Optimizing Performance in OpenMP via Mutation Testing
Muscle pushing based skin deformation on GPU
Mutual information computation and maximization using GPU
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
MyCaffe: A Complete C# Re-Write of Caffe with Reinforcement Learning
MYRIAD: A new N-body code for simulations of Star Clusters
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
Myths and Legends in High-Performance Computing
N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions
N-Body Simulation Using GP-GPU: Evaluating Host/Device Memory Transference Overhead
N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems
NAS Parallel Benchmarks for GPGPUs using a Directive-based Programming Model
Native Offload of Haskell Repa Programs to GPGPU
Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs
NaturalCC: A Toolkit to Naturalize the Source Code Corpus
Navier-Stokes on programmable graphics hardware using SMAC
Navigating An Evolutionary Fast Path to Exascale – Expanded Version
NBODY6++GPU: Ready for the gravitational million-body problem
NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units
NCAM: Near-Data Processing for Nearest Neighbor Search
NCRF++: An Open-source Neural Sequence Labeling Toolkit
ndzip-gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs
Near Memory Similarity Search on Automata Processors
Near real-time Fast Bilateral Stereo on the GPU
Near-LSPA Performance at MSA Complexity
Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
Nemo: A parallelized Lagrangian particle-tracking model
NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs
Neneta: Heterogeneous Computing Complex-Valued Neural Network Framework
Nengo: a Python tool for building large-scale functional brain models
NengoDL: Combining deep learning and neuromorphic modelling methods
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Neon: A Domain-Specific Programming Language for Image Processing
neoSYCL: a SYCL implementation for SX-Aurora TSUBASA
Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures
NEPTUNE: Network- and GPU-aware Management of Serverless Functions at the Edge
Nested Data-Parallelism on the GPU
Nested Intervals Tree Encoding with System of Residual Classes
Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations
NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems
Network Simulator Tools and GPU Parallel Systems
Network-on-Chip Hardware Accelerators for Biological Sequence Alignment
Neural Architecture Search for Lightweight Non-Local Networks
Neural Architecture Search without Training
Neural Code Comprehension: A Learnable Representation of Code Semantics
Neural Decoding using a Parallel Sequential Monte Carlo method on Point Processes with Ensemble Effect
Neural Multi-scale Image Compression
Neural Network Computing Using On-Chip Accelerators
Neural Network Implementation Using CUDA and OpenMP
Neural Network Inference on Mobile SoCs
Neural Network Libraries: A Deep Learning Framework Designed from Engineers’ Perspectives
Neural network modeling on evolution of hydration reaction for Portland cement
Neural Network Simulation: The recognition application
Neural Networks for Beginners. A fast implementation in Matlab, Torch, TensorFlow
Neural Networks through Shared Maps in Mobile Devices
Neural Query Language: A Knowledge Base Query Language for Tensorflow
Titles: 100
open PDFs: 90
packages: 27