Papers on hgpu.org (.txt-file)
Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture

Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance

Multi-user real-time speech recognition with a GPU

Multi-view Rendering Approach for Cloud-based Gaming Services

Multi-walk Parallel Pattern Search Approach on a GPU Computing Platform
Multi2Sim: a simulation framework for CPU-GPU computing

Multicore and GPU Algorithms for Nussinov RNA Folding

Multicore and GPU Parallelization of Neural Networks for Face Recognition

Multicore and Manycore Algorithms for Octrees

Multicore architecture and cache optimization techniques for solving graph problems

Multicore Computing: Algorithms, Architectures, and Applications

Multicore performance optimization using partner cores

Multicore Processing for Classification and Clustering Algorithms

Multicore Processing for Clustering Algorithms

Multicore Scheduling of Parallel Real-Time Tasks with Multiple Parallelization Options

Multidimensional Costas Arrays and Their Enumeration Using GPUs and FPGAs

Multidimensional Dataflow Graph Modeling and Mapping for Efficient GPU Implementation

Multidimensional Parallelization for Streaming Text Processing Applications Based on Parabix Framework

Multidimensional upwind hydrodynamics on unstructured meshes using Graphics Processing Units I. Two-dimensional uniform meshes

Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS

Multifold Acceleration of Neural Network Computations Using GPU
Multifrontal computations on GPUs and their multi-core hosts

Multifrontal Factorization of Sparse SPD Matrices on GPUs

Multifrontal Sparse Matrix Factorization on Graphics Processing Units

MultiGPU computing using MPI or OpenMP
Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms

Multigrid Optimization Methods for High Performance Computing

Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs

Multilayered Abstractions for Partial Differential Equations

Multilevel Granularity Parallelism Synthesis on FPGAs

Multilevel Multidimensional Scaling on the GPU

Multilevel summation of electrostatic potentials using graphics processing units

Multilevel Tile Load Map on Massive Terrain Visualization

Multimodal collaboration and human-computer interaction
Multimodal Image Registration Using GPU Parallel Computing Technology

Multimodality imaging and state-of-art GPU technology in discriminating benign from malignant breast lesions on real time decision support system

Multipattern String Matching On A GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Fluid Simulations on a Multiple GPGPU PC Using Unsplit Time Integration VSIAM3

Multiple Bounding Boxes Algorithm in Collision Detection and Its Performances in Sequential vs CUDA Parallel Processing

Multiple String Matching on a GPU using CUDAs

Multiple Time Scales Recurrent Neural Network for Complex Action Acquisition

Multiple-GPU Scalability of Phase-Field Simulation for Dendritic Solidification

Multiple-GPUs Algorithm for Lattice Boltzmann Method
Multiple-Tasks on Multiple-Devices (MTMD): Exploiting Concurrency in Heterogeneous Managed Runtimes

Multiprocessing Acceleration of H.264/AVC Motion Estimation Full Search Algorithm under CUDA Architecture

Multireduce and Multiscan on Modern GPUs

Multiresolution Flow Simulations on Multi/many-core Architectures

Multiresolution MIP Rendering of Large Volumetric Data Accelerated on Graphics Hardware

Multiscale Hemodynamics Using GPU Clusters

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture

Multithreaded Dense Linear Algebra on Asymmetric Multi-core Processors

Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors

Multithreading for Visual Effects

MuMax: a new high-performance micromagnetic simulation tool

MUPPET: Optimizing Performance in OpenMP via Mutation Testing

Muscle pushing based skin deformation on GPU
Mutual information computation and maximization using GPU

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

MyCaffe: A Complete C# Re-Write of Caffe with Reinforcement Learning

MYRIAD: A new N-body code for simulations of Star Clusters

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Myths and Legends in High-Performance Computing

N-body Simulation for Astronomical Collisional Systems with a New SIMD Instruction Set Extension to the x86 Architecture, Advanced Vector Extensions

N-Body Simulation Using GP-GPU: Evaluating Host/Device Memory Transference Overhead

N-Cloth: Predicting 3D Cloth Deformation with Mesh-Based Networks

NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features

NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems

NAS Parallel Benchmarks for GPGPUs using a Directive-based Programming Model

Native Offload of Haskell Repa Programs to GPGPU

Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

Navier-Stokes on programmable graphics hardware using SMAC

Navigating An Evolutionary Fast Path to Exascale – Expanded Version

NBODY6++GPU: Ready for the gravitational million-body problem

NBSymple, a double parallel, symplectic N-body code running on Graphic Processing Units

NCAM: Near-Data Processing for Nearest Neighbor Search

NCRF++: An Open-source Neural Sequence Labeling Toolkit

ndzip-gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs

Near Memory Similarity Search on Automata Processors

Near real-time Fast Bilateral Stereo on the GPU

Near-LSPA Performance at MSA Complexity

Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs

Nemo: A parallelized Lagrangian particle-tracking model

NeMo: A Platform for Neural Modelling of Spiking Neurons Using GPUs

Neneta: Heterogeneous Computing Complex-Valued Neural Network Framework

Nengo: a Python tool for building large-scale functional brain models

NengoDL: Combining deep learning and neuromorphic modelling methods

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Neon: A Domain-Specific Programming Language for Image Processing

neoSYCL: a SYCL implementation for SX-Aurora TSUBASA

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

NEPTUNE: Network- and GPU-aware Management of Serverless Functions at the Edge

Titles: 100
open PDFs: 91
packages: 23
