Papers on hgpu.org (.txt-file)
Merge: a programming model for heterogeneous multi-core systems

Mersenne Twister Random Number Generation on FPGA, CPU and GPU

Mesh deformations in X3D via CUDA with freeform deformation lattices

Mesh Independent Loop Fusion for Unstructured Mesh Applications

Mesh mutation in programmable graphics hardware

Meshfree/GFEM in hardware-efficiency prospective

Message passing for GPGPU clusters: CudaMPI

Message Passing Interface support for the runtime adaptive multi-processor system-on-chip RAMPSoC
Message passing on data-parallel architectures

Meta Networks for Neural Style Transfer

Meta-Programming and Auto-Tuning in the Search for High Performance GPU Code

Meta-programming and Multi-stage Programming for GPGPUs

Meta-simulation of large WSN on multi-core computers

MetaBinG: Using GPUs to Accelerate Metagenomic Sequence Classification

MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-core Clusters

Metamorphic Testing for (Graphics) Compilers

Method for simulation of coastal terrain on GPU
Methodology of control and supervision of web connected mobile robots with CUDA technology application

Methods and Metrics for Fair Server Assessment under Real-Time Financial Workloads

Methods for Accelerating Machine Learning in High Performance Computing

Methods for GPU Acceleration of Big Data Applications

Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring

MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization

MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)

Microarchitectural Performance Characterization of Irregular GPU Kernels

Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis

Microbenchmarks for GPU characteristics: the occupancy roofline and the pipeline model

Microbranching in mode-I fracture using large scale simulations of amorphous and perturbed lattice models

Microlensing Observations Rapid Search for Exoplanets: MORSE code for GPUs

Micropolygon ray tracing with defocus and motion blur
MIDeA: a multi-parallel intrusion detection architecture

Migrating CUDA to oneAPI: A Smith-Waterman Case Study

Migrating from OpenGL ES to Vulkan

Migrating real-time depth image-based rendering from traditional to next-gen GPGPU
MILC Code Performance on High End CPU and GPU Supercomputer Clusters

MILC staggered conjugate gradient performance on Intel KNL

MILJS: Brand New JavaScript Libraries for Matrix Calculation and Machine Learning

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

Mimetic Methods for Lagrangian Relaxation of Magnetic Fields

Mìmir: A real-time interactive visualization library for CUDA programs

MIML Learning with CNNs: Yelp Restaurant Photo Classification

Mind the gap!: bridging the dichotomy of design and implementation

Minerals detection for hyperspectral images using adapted linear unmixing: LinMin

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning

MinGPU: a minimum GPU library for computer vision

miniLB: A Performance Portability Study of Lattice-Boltzmann Simulations

Minimal models for finite particles in fluctuating hydrodynamics

minimap2-fpga: Integrating hardware-accelerated chaining for efficient end-to-end long-read sequence mapping

Minimising Testing in Genetic Programming

Mining Rare Features in Fingerprints Using Core Points and Triplet-based Features

Mint: realizing CUDA performance in 3D stencil methods with annotated C

Minuet: Accelerating 3D Sparse Convolutions on GPUs

MIOpen: An Open Source Library For Deep Learning Primitives

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Mirovia: A Benchmarking Suite for Modern Heterogeneous Computing

MITHRA: Multiple data independent tasks on a heterogeneous resource architecture

Mix-and-Match: A Model-driven Runtime Optimisation Strategy for BFS on GPUs

Mixed precision in Graphics Processing Unit

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems

Mixed Precision Solver Scalable to 16000 MPI Processes for Lattice Quantum Chromodynamics Simulations on the Oakforest-PACS System

Mixed-Precision Embedding Using a Cache

Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

Mixed-precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs

Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU

Mixed-Resolution Patch-Matching

Mixed-Tool Performance Analysis on Hybrid Multicore Architectures

Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training

Mixing Multi-Core CPUs and GPUs for Scientific Simulation Software

MKPipe: A Compiler Framework for Optimizing Multi-Kernel Workloads in OpenCL for FPGA

ML Inference Scheduling with Predictable Latency

ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming

MLitB: Machine Learning in the Browser

MLS-based scalar fields over triangle meshes and their application in mesh processing
MNN: A Universal and Efficient Inference Engine

Mobile GPGPU Acceleration of Embodied Robot Simulation

Mobile GPU Computing Based Filter Bank Convolution for Three-dimensional Wavelet Transform

MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU

MobiRT: an implementation of OpenGL ES-based CPU-GPU hybrid ray tracer for mobile devices

Model Coupling between the Weather Research and Forecasting Model and the DPRI Large Eddy Simulator for Urban Flows on GPU-accelerated Multicore Systems

Model-Based 3D Object Tracking Using an Extended-Extended Kalman Filter and Graphics Rendered Measurements

Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing

Model-Based Warp-Level Tiling for Image Processing Programs on GPUs

Model-driven autotuning of sparse matrix-vector multiply on GPUs

Model-driven optimisation of memory hierarchy and multithreading on GPUs

Model-Driven Tile Size Selection for DOACROSS Loops on GPUs

Model-independent partial wave analysis using a massively-parallel fitting framework

Model-T: Rethinking the OS for terabit speeds

Modeling and Evaluation of Synchronous Stochastic Gradient Descent in Distributed Deep Learning on Multiple GPUs

Modeling and generating complex motion blur for real-time tracking

Modeling and Optimization of Parallel Matrix-based Computations on GPU

Titles: 100
open PDFs: 95
packages: 27
