Papers on hgpu.org (.txt-file)
maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs
Maximal Information Coefficient Analysis
Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study
Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution
Maximum likelihood event estimation and list-mode image reconstruction on GPU hardware
Maximum mipmaps for fast, accurate, and scalable dynamic height field rendering
MaxSSmap: A GPU program for short read mapping with the maximum scoring subsequence
MC-RANSAC: A Pre-processing Model for RANSAC using Monte Carlo method implemented on a GPU
MCBooster: a library for fast Monte Carlo generation of phase-space decays on massively parallel platforms
MCS 572: Introduction to Supercomputing
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores
md_poly: A Performance-Portable Polyhedral Compiler Based on Multi-Dimensional Homomorphisms
MDLab: A molecular dynamics simulation prototyping environment
MDR: performance model driven runtime for heterogeneous parallel platforms
Mean Shift Parallel Tracking on GPU
Measurement and Analysis of GPU-accelerated Applications with HPCToolkit
Measurements of performance of hardware and general purpose classical molecular dynamics simulation software
Measuring Bandwidth for Super Computer Workloads
Measuring the evolving Internet ecosystem with exchange points
Measuring the Impact of Configuration Parameters in CUDA Through Benchmarking
Measuring the Performance of Realtime DSP Using Pure Data and GPU
Mechanical Characterization and Performance Optimization for GPU Fan-Sink Cooling Module Assembly
Median Based Parallel Steering Kernel Regression for Image Reconstruction
Medical Image Registration using OpenCL
MEDINA: MECCA Development in Accelerators – KPP Fortran to CUDA source-to-source Preprocessor
Medium-Grained Functions Mapping using Modern GPUs
Medusa: A Parallel Graph Processing System on Graphics Processors
Medusa: Simplified Graph Processing on GPUs
Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores
MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs
Megapixel Topology Optimization on a Graphics Processing Unit
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Melia: A MapReduce Framework on OpenCL-based FPGAs
MELT-a Translated Domain Specific Language Embedded in the GCC Compiler
MemcachedGPU: Scaling-up Scale-out Key-value Stores
Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU
Memory Bandwidth and Latency in HPC: System Requirements and Performance Impact
Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes
Memory Efficient Mixed-Precision Optimizers
Memory Interference and Performance Prediction in GPU-Accelerated Heterogeneous Systems
Memory layout in GPU implementation of lattice Boltzmann method for sparse 3D geometries
Memory Optimization for Deep Networks
Memory Saving Discrete Fourier Transform on GPUs
Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs
Memory-efficient Adaptive Subdivision for Software Rendering on the GPU
Memory-Efficient Implementation of DenseNets
Memory-Efficient Object-Oriented Programming on GPUs
Memory-Efficient Single-Pass GPU Rendering of Multi-fragment Effects
Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model
Memory-Scalable GPU Spatial Hierarchy Construction
Merge or Separate? Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms
Merge: a programming model for heterogeneous multi-core systems
Mersenne Twister Random Number Generation on FPGA, CPU and GPU
Mesh deformations in X3D via CUDA with freeform deformation lattices
Mesh Independent Loop Fusion for Unstructured Mesh Applications
Mesh mutation in programmable graphics hardware
Meshfree/GFEM in hardware-efficiency prospective
Message passing for GPGPU clusters: CudaMPI
Message Passing Interface support for the runtime adaptive multi-processor system-on-chip RAMPSoC
Message passing on data-parallel architectures
Meta Networks for Neural Style Transfer
Meta-Programming and Auto-Tuning in the Search for High Performance GPU Code
Meta-programming and Multi-stage Programming for GPGPUs
Meta-simulation of large WSN on multi-core computers
MetaBinG: Using GPUs to Accelerate Metagenomic Sequence Classification
MetaCL – A Model-Based Approach to Programming Heterogeneous Architectures Using OpenCL
MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels
MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-core Clusters
Metamorphic Testing for (Graphics) Compilers
Method for simulation of coastal terrain on GPU
Methodology of control and supervision of web connected mobile robots with CUDA technology application
Methods and Metrics for Fair Server Assessment under Real-Time Financial Workloads
Methods for Accelerating Machine Learning in High Performance Computing
Methods for GPU Acceleration of Big Data Applications
Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures
MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring
MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization
MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures
MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)
Microarchitectural Performance Characterization of Irregular GPU Kernels
Microbenchmarks for GPU characteristics: the occupancy roofline and the pipeline model
Microbranching in mode-I fracture using large scale simulations of amorphous and perturbed lattice models
Microlensing Observations Rapid Search for Exoplanets: MORSE code for GPUs
Micropolygon ray tracing with defocus and motion blur
MIDeA: a multi-parallel intrusion detection architecture
Migrating CUDA to oneAPI: A Smith-Waterman Case Study
Migrating from OpenGL ES to Vulkan
Migrating real-time depth image-based rendering from traditional to next-gen GPGPU
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
MILC staggered conjugate gradient performance on Intel KNL
MILJS: Brand New JavaScript Libraries for Matrix Calculation and Machine Learning
MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster
Titles: 100
open PDFs: 90
packages: 29