Papers on hgpu.org (.txt-file)
Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees

Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters

Machine Learning from Streaming Data in Heterogeneous Computing Environments

Machine Learning in Compilers: Past, Present and Future

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

MacroSS: macro-SIMDization of streaming applications

Maestro: Data Orchestration and Tuning for OpenCL Devices

MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs

MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing

MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing

Magneto-hydrodynamics simulation in astrophysics

Magnetohydrodynamics on Heterogeneous architectures: a performance comparison

Magnetohydrodynamics simulations on graphics processing units

Maintaining constant frame rates in 3D texture-based volume rendering

Makespan computation for GPU threads running on a single streaming multiprocessor

Making Human Connectome Faster: GPU Acceleration of Brain Network Analysis

Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts

Making the case of GPUs in courses on computational physics

MALBEC: a new CUDA-C ray-tracer in General Relativity

MambaCPU: Enhanced Correlation Mining with State Space Models for CPU Performance Prediction

Managing Extreme Heterogeneity in Next Generation HPC Systems

Managing heterogeneous device memory using C++17 memory resources

Managing Multi Instance GPUs for High Throughput and Energy Savings

Managing the Topology of Heterogeneous Cluster Nodes with Hardware Locality (hwloc)

Managing, Profiling, and Optimizing Heterogeneous GPU Workloads

Manas: Mining Software Repositories to Assist AutoML

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

Many Cores, Many Models: GPU Programming Model vs. Vendor Compatibility Overview

Many-body quantum chemistry on graphics processing units

Many-Core Algorithms for Combinatorial Optimization

Many-core algorithms for statistical phylogenetics

Many-core applications to online track reconstruction in HEP experiments

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

Many-core GPU computing with NVIDIA CUDA
Many-core parallel computing – Can compilers and tools do the heavy lifting?

Many-Core vs. Many-Thread Machines: Stay Away From the Valley

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

Many-threaded Differential Evolution on the GPU

Many-threaded implementation of differential evolution for the CUDA platform

Manycore high-performance computing in bioinformatics

Manycore processing of repeated k-NN queries over massive moving objects observations

Manycore processing of repeated range queries over massive moving objects observations

MAP-based Brain Tissue Segmentation using Manifold Learning and Hierarchical Max-Flow regularization

Map-reduce as a Programming Model for Custom Computing Machines

MapCG: writing parallel program portable between CPU and GPU

MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs

Mapping a Data-Flow Programming Model onto Heterogeneous Platforms

Mapping a Dataflow Programming Model onto Heterogeneous Architectures

Mapping a Guided Image Filter on the HARP Reconfigurable Architecture Using OpenCL

Mapping computational concepts to GPUs

Mapping dynamic programming algorithms on graphics processing units

Mapping High-Fidelity Volume Rendering for Medical Imaging to CPU, GPU and Many-Core Architectures

Mapping Iterative Medical Imaging Algorithm on Cell Accelerator

Mapping of a film grain removal algorithm to a heterogeneous reconfigurable architecture

Mapping parallel programs to heterogeneous multi-core systems

Mapping Streaming Applications to OpenCL

Mapping the Arnold web with a GPU-supercomputer

Mapping the Arnold web with a graphic processing unit

Mapping the SBR and TW-ILDCs to Heterogeneous CPU-GPU Architecture for Fast Computation of Electromagnetic Scattering

MapReduce for Counting Word Frequencies with MPI and GPUs

MapSQ: A MapReduce-based Framework for SPARQL Queries on GPU

MARC: A Many-Core Approach to Reconfigurable Computing
March of the Froblins: simulation and rendering massive crowds of intelligent and detailed creatures on GPU

Marian: Cost-effective High-Quality Neural Machine Translation in C++

Markerless View-Independent Registration of Multiple Distorted Projectors on Extruded Surfaces Using an Uncalibrated Camera

Markov Chain Monte Carlo on the GPU

Mars: a MapReduce framework on graphics processors

Mars: Accelerating MapReduce with Graphics Processors

Mascar: Speeding up GPU Warps by Reducing Memory Pitstops

MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs

Mashing load balancing algorithm to boost hybrid kernels in molecular dynamics simulations

Masivo: Parallel Simulation Model Based on OpenCL for Massive Public Transportation Systems’ Routes

Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth

Mass-spring systems on the GPU
Massive Exploration of Neural Machine Translation Architectures

Massive exploration of perturbed conditions of the blood coagulation cascade through GPU parallelization

Massive Image Editing on the Cloud

Massive Parallel Implementation of ODE Solvers

Massive parallel LDPC decoding on GPU
Massive Parallelism with GPUs for Centrality Ranking in Complex Networks

Massive parallelization of combinatorial statistical genetics analyses porting machine learning methods on general purpose graphics processing units (GPU)

Massive parallelization of serial inference algorithms for a complex generalized linear model

Massively Deep Artificial Neural Networks for Handwritten Digit Recognition

Massively LDPC Decoding on Multicore Architectures
Massively Parallel A* Search on a GPU

Massively Parallel Algorithms for CFD Simulation and Optimization on Heterogeneous Many-Core Architectures

Massively Parallel Analysis of Similarity Matrices on Heterogeneous Hardware

Massively parallel approximate Gaussian process regression

Massively Parallel Computation of Accurate Densities for N-body Dark Matter Simulations using the Phase-Space-Element Method

Massively parallel computation using graphics processors with application to optimal experimentation in dynamic control

Massively Parallel Computing in Economics

Massively Parallel Construction of the Cell Graph

Massively parallel differential evolution-pattern search optimization with graphics hardware acceleration: an investigation on bound constrained optimization problems

Massively Parallel Finite Element Simulator for Full-Chip STI Stress Analysis
Massively Parallel GPU Computing of Continuum Robotic Dynamics

Massively Parallel GPU Memory Compaction

Titles: 100
open PDFs: 94
packages: 25
