high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem

Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem

William F. Godoy, Tatiana Melnichenko, Pedro Valero-Lara, Wael Elwasif, Philip Fackler, Rafael Ferreira Da Silva, Keita Teranishi, Jeffrey S. Vetter

Oak Ridge National Laboratory, Oak Ridge, TN, USA

arXiv:2509.21039 [cs.DC], (25 Sep 2025)

DOI:10.1145/3731599.3767573

@misc{godoy2025mojo,

title={Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem},

author={William F. Godoy and Tatiana Melnichenko and Pedro Valero-Lara and Wael Elwasif and Philip Fackler and Rafael Ferreira Da Silva and Keita Teranishi and Jeffrey S. Vetter},

year={2025},

eprint={2509.21039},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

Source codes

Package:

Modular: The Modular Platform (includes MAX & Mojo)

1495

views

We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM’s Multi-Level Intermediate Representation (MLIR) compiler infrastructure, Mojo aims to close performance and productivity gaps by combining Python’s interoperability and CUDA-like syntax for compile-time portable GPU programming. We target four scientific workloads: a seven-point stencil (memory-bound), BabelStream (memory-bound), miniBUDE (compute-bound), and Hartree-Fock (compute-bound with atomic operations); and compare their performance against vendor baselines on NVIDIA H100 and AMD MI300A GPUs. We show that Mojo’s performance is competitive with CUDA and HIP for memory-bound kernels, whereas gaps exist on AMD GPUs for atomic operations and for fast-math compute-bound kernels on both AMD and NVIDIA GPUs. Although the learning curve and programming requirements are still fairly low-level, Mojo can close significant gaps in the fragmented Python ecosystem in the convergence of scientific computing and AI.

Tags: AI, AMD Radeon Instinct MI300A, ATI, Compilers, Computer science, CUDA, HIP, HPC, nVidia, nVidia H100, Package, Python, ROCm

September 28, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem

Package:

Your response

Recent source codes

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

Most viewed papers (last 30 days)

Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)