Papers on hgpu.org (.txt-file)
MCS 572: Introduction to Supercomputing

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores

md_poly: A Performance-Portable Polyhedral Compiler Based on Multi-Dimensional Homomorphisms

MDLab: A molecular dynamics simulation prototyping environment

MDR: performance model driven runtime for heterogeneous parallel platforms
Mean Shift Parallel Tracking on GPU
Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

Measurements of performance of hardware and general purpose classical molecular dynamics simulation software

Measuring Bandwidth for Super Computer Workloads

Measuring the evolving Internet ecosystem with exchange points

Measuring the Impact of Configuration Parameters in CUDA Through Benchmarking

Measuring the Performance of Realtime DSP Using Pure Data and GPU

Mechanical Characterization and Performance Optimization for GPU Fan-Sink Cooling Module Assembly
Median Based Parallel Steering Kernel Regression for Image Reconstruction

Medical Image Registration using OpenCL

MEDINA: MECCA Development in Accelerators – KPP Fortran to CUDA source-to-source Preprocessor

Medium-Grained Functions Mapping using Modern GPUs

Medusa: A Parallel Graph Processing System on Graphics Processors

Medusa: Simplified Graph Processing on GPUs

Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Megakernels Considered Harmful: Wavefront Path Tracing on GPUs

Megapixel Topology Optimization on a Graphics Processing Unit
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Melia: A MapReduce Framework on OpenCL-based FPGAs

MELT-a Translated Domain Specific Language Embedded in the GCC Compiler

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning

MemcachedGPU: Scaling-up Scale-out Key-value Stores

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU
Memory Bandwidth and Latency in HPC: System Requirements and Performance Impact

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes

Memory Efficient Mixed-Precision Optimizers

Memory Interference and Performance Prediction in GPU-Accelerated Heterogeneous Systems

Memory layout in GPU implementation of lattice Boltzmann method for sparse 3D geometries

Memory Optimization for Deep Networks

Memory Saving Discrete Fourier Transform on GPUs

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

Memory-efficient Adaptive Subdivision for Software Rendering on the GPU

Memory-Efficient Implementation of DenseNets

Memory-Efficient Object-Oriented Programming on GPUs

Memory-Efficient Single-Pass GPU Rendering of Multi-fragment Effects

Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model

Memory-Scalable GPU Spatial Hierarchy Construction

Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Merge: a programming model for heterogeneous multi-core systems

Mersenne Twister Random Number Generation on FPGA, CPU and GPU

Mesh deformations in X3D via CUDA with freeform deformation lattices

Mesh Independent Loop Fusion for Unstructured Mesh Applications

Mesh mutation in programmable graphics hardware

Meshfree/GFEM in hardware-efficiency prospective

Message passing for GPGPU clusters: CudaMPI

Message Passing Interface support for the runtime adaptive multi-processor system-on-chip RAMPSoC
Message passing on data-parallel architectures

Meta Networks for Neural Style Transfer

Meta-Programming and Auto-Tuning in the Search for High Performance GPU Code

Meta-programming and Multi-stage Programming for GPGPUs

Meta-simulation of large WSN on multi-core computers

MetaBinG: Using GPUs to Accelerate Metagenomic Sequence Classification

MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-core Clusters

Metamorphic Testing for (Graphics) Compilers

Method for simulation of coastal terrain on GPU
Methodology of control and supervision of web connected mobile robots with CUDA technology application

Methods and Metrics for Fair Server Assessment under Real-Time Financial Workloads

Methods for Accelerating Machine Learning in High Performance Computing

Methods for GPU Acceleration of Big Data Applications

Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring

MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization

MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)

Microarchitectural Performance Characterization of Irregular GPU Kernels

Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis

Microbenchmarks for GPU characteristics: the occupancy roofline and the pipeline model

Microbranching in mode-I fracture using large scale simulations of amorphous and perturbed lattice models

Microlensing Observations Rapid Search for Exoplanets: MORSE code for GPUs

Micropolygon ray tracing with defocus and motion blur
MIDeA: a multi-parallel intrusion detection architecture

Migrating CUDA to oneAPI: A Smith-Waterman Case Study

Migrating from OpenGL ES to Vulkan

Migrating real-time depth image-based rendering from traditional to next-gen GPGPU
MILC Code Performance on High End CPU and GPU Supercomputer Clusters

MILC staggered conjugate gradient performance on Intel KNL

MILJS: Brand New JavaScript Libraries for Matrix Calculation and Machine Learning

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

Mimetic Methods for Lagrangian Relaxation of Magnetic Fields

Mìmir: A real-time interactive visualization library for CUDA programs

MIML Learning with CNNs: Yelp Restaurant Photo Classification

Mind the gap!: bridging the dichotomy of design and implementation

Minerals detection for hyperspectral images using adapted linear unmixing: LinMin

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning

Titles: 100
open PDFs: 90
packages: 27
