Papers on hgpu.org (.txt-file)
Massively Parallel Identification of Intersection Points for GPGPU Ray Tracing

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit
Massively Parallel Jacobian Computation

Massively Parallel kNN using CUDA on Spam-Classification

Massively Parallel Localization of Pulsed Signal Transitions Using a GPU

Massively Parallel Logic Simulation with GPUs

Massively Parallel Lossless Compression of Medical Images Using Least-Squares Prediction and Arithmetic Coding

Massively parallel Monte Carlo for many-particle simulations on GPUs

Massively Parallel Network Coding on GPUs

Massively Parallel Neural Encoding and Decoding of Visual Stimuli

Massively Parallel Ray Tracing Algorithm Using GPU

Massively parallel read mapping on GPUs with PEANUT

Massively parallel read mapping on GPUs with the q-group index and PEANUT

Massively Parallel Sequential Monte Carlo for Bayesian Inference

Massively parallel simulations of relativistic fluid dynamics on graphics processing units with CUDA

Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs

Massively parallel two-dimensional TLM algorithm on graphics processing units

Massively parallelizable list-mode reconstruction using a Monte Carlo-based elliptical Gaussian model

Massively Parallelized Monte Carlo Simulation and its Applications in Finance

Massively parallelized replica-exchange simulations of polymers on GPUs

Massively-Parallel Lossless Data Decompression

Mastering Atari with Discrete World Models

Mastering Software Variant Explosion for GPU Accelerators

Matched Filter Computation on FPGA, Cell and GPU
MatConvNet – Convolutional Neural Networks for MATLAB

Material Removal Simulation and Cutting Force Prediction of Multi-Axis Machining Processes on General-Purpose Graphics Processing Units

Mathematical limits of parallel computation for embedded systems

MATLAB and Python for GPU Computing

MATLAB graphical interface for GPU based FDTD method
MATLAB Medical Images Classification on Graphics Processors

MATLAB Parallelization through Scalarization

Matrix Computations and Optimization in Apache Spark

Matrix Convolution using Parallel Programming

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

Matrix inversion speed up with CUDA

Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation

Matrix Multiplication on GPUs with On-Line Fault Tolerance
Matrix Multiplication Using Only Addition

Matrix Multiplication with CUDA – A basic introduction to the CUDA programming model

Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs

Matrix-Matrix Multiplications on GPUs for Accelerating a Parallel Fluid Dynamics Code

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

Maximal Information Coefficient Analysis

Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study

Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution

Maximum likelihood event estimation and list-mode image reconstruction on GPU hardware

Maximum mipmaps for fast, accurate, and scalable dynamic height field rendering

MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence

MC-RANSAC: A Pre-processing Model for RANSAC using Monte Carlo method implemented on a GPU

MCBooster: a library for fast Monte Carlo generation of phase-space decays on massively parallel platforms

MCS 572: Introduction to Supercomputing

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores

md_poly: A Performance-Portable Polyhedral Compiler Based on Multi-Dimensional Homomorphisms

MDLab: A molecular dynamics simulation prototyping environment

MDR: performance model driven runtime for heterogeneous parallel platforms
Mean Shift Parallel Tracking on GPU
Measurement and Analysis of GPU-accelerated Applications with HPCToolkit

Measurements of performance of hardware and general purpose classical molecular dynamics simulation software

Measuring Bandwidth for Super Computer Workloads

Measuring the evolving Internet ecosystem with exchange points

Measuring the Impact of Configuration Parameters in CUDA Through Benchmarking

Measuring the Performance of Realtime DSP Using Pure Data and GPU

Mechanical Characterization and Performance Optimization for GPU Fan-Sink Cooling Module Assembly
Median Based Parallel Steering Kernel Regression for Image Reconstruction

Medical Image Registration using OpenCL

MEDINA: MECCA Development in Accelerators – KPP Fortran to CUDA source-to-source Preprocessor

Medium-Grained Functions Mapping using Modern GPUs

Medusa: A Parallel Graph Processing System on Graphics Processors

Medusa: Simplified Graph Processing on GPUs

Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Megakernels Considered Harmful: Wavefront Path Tracing on GPUs

Megapixel Topology Optimization on a Graphics Processing Unit
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Melia: A MapReduce Framework on OpenCL-based FPGAs

MELT-a Translated Domain Specific Language Embedded in the GCC Compiler

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning

MemcachedGPU: Scaling-up Scale-out Key-value Stores

Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU
Memory Bandwidth and Latency in HPC: System Requirements and Performance Impact

Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes

Memory Efficient Mixed-Precision Optimizers

Memory Interference and Performance Prediction in GPU-Accelerated Heterogeneous Systems

Memory layout in GPU implementation of lattice Boltzmann method for sparse 3D geometries

Memory Optimization for Deep Networks

Memory Saving Discrete Fourier Transform on GPUs

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

Memory-efficient Adaptive Subdivision for Software Rendering on the GPU

Memory-Efficient Implementation of DenseNets

Memory-Efficient Object-Oriented Programming on GPUs

Memory-Efficient Single-Pass GPU Rendering of Multi-fragment Effects

Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model

Memory-Scalable GPU Spatial Hierarchy Construction

Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Titles: 100
open PDFs: 90
packages: 27
