high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Seamless acceleration of Fortran intrinsics via AMD AI engines

Seamless acceleration of Fortran intrinsics via AMD AI engines

Nick Brown, Gabriel Rodríguez Canal

EPCC at the University of Edinburgh, Edinburgh, UK

arXiv:2502.10254 [cs.DC]

DOI:10.48550/arXiv.2502.10254

BibTeX

Download (PDF)

View

Source

Source codes

Package:

An MLIR-based toolchain for AMD AI Engine-enabled devices

1018

views

A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations. Specialised architectures, such as FPGAs and AMD’s AI Engines (AIEs), have been demonstrated to provide significant energy efficiency advantages, however a major challenge is that to most effectively program these architectures requires significant expertise and investment of time which is a major blocker. Fortran in the lingua franca of scientific computing, and in this paper we explore automatically accelerating Fortran intrinsics via the AIEs in AMD’s Ryzen AI CPU. Leveraging the open source Flang compiler and MLIR ecosystem, we describe an approach that lowers the MLIR linear algebra dialect to AMD’s AIE dialects, and demonstrate that for suitable workloads the AIEs can provide significant performance advantages over the CPU without any code modifications required by the programmer.

Tags: AI, AMD, Computer science, Fortran, Linear Algebra, Package, Performance

February 24, 2025 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

Seamless acceleration of Fortran intrinsics via AMD AI engines

Package:

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)

Seamless acceleration of Fortran intrinsics via AMD AI engines

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)