high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Machine Learning for CUDA+MPI Design Rules

Machine Learning for CUDA+MPI Design Rules

Carl Pearson, Aurya Javeed, Karen Devine

Sandia National Laboratories, Albuquerque, NM, USA

arXiv:2203.02530 [cs.PF], (4 Mar 2022)

DOI:10.48550/arXiv.2203.02530

@article{pearson2022machine,

title={Machine Learning for CUDA+ MPI Design Rules},

author={Pearson, Carl and Javeed, Aurya and Devine, Karen},

journal={arXiv preprint arXiv:2203.02530},

year={2022}

}

Download (PDF)

View

Source

1279

views

We present a new strategy for automatically exploring the design space of key CUDA+MPI programs and providing design rules that discriminate slow from fast implementations. In such programs, the order of operations (e.g., GPU kernels, MPI communication) and assignment of operations to resources (e.g., GPU streams) makes the space of possible designs enormous. Systems experts have the task of redesigning and reoptimizing these programs to effectively utilize each new platform. This work provides a prototype tool to reduce that burden. In our approach, a directed acyclic graph of CUDA and MPI operations defines the design space for the program. Monte-Carlo tree search discovers regions of the design space that have large impact on the program’s performance. A sequence-to-vector transformation defines features for each explored implementation, and each implementation is assigned a class label according to its relative performance. A decision tree is trained on the features and labels to produce design rules for each class; these rules can be used by systems experts to guide their implementations. We demonstrate our strategy using a key kernel from scientific computing — sparse-matrix vector multiplication — on a platform with multiple MPI ranks and GPU streams.

Tags: Computer science, CUDA, Machine learning, Monte Carlo simulation, MPI, nVidia, nVidia A100

March 20, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Machine Learning for CUDA+MPI Design Rules

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Machine Learning for CUDA+MPI Design Rules

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)