high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud

MBZUAI

arXiv:2505.16968 [cs.AR], (22 May 2025)

DOI:10.48550/arXiv.2505.16968

@misc{heakl2025cassnvidiaamdtranspilation,

title={CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark},

author={Ahmed Heakl and Sarim Hashmi and Gustavo Bertolo Stahl and Seung Hun Eddie Han and Salman Khan and Abdulrahman Mahmoud},

year={2025},

eprint={2505.16968},

archivePrefix={arXiv},

primaryClass={cs.AR},

url={https://arxiv.org/abs/2505.16968}

}

Download (PDF)

View

Source

Source codes

Package:

CASS: Cuda-Amd aSSembly

2026

views

We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA<->HIP) and assembly-level (Nvidia SASS<->AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family of domain-specific language models, achieving 95% source translation accuracy and 37.5% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation. Dataset and benchmark are available.

Tags: AI, AMD Radeon RX 7900 XT, ATI, Computer science, CUDA, HIP, Machine learning, nVidia, nVidia A100, OpenCL, Package, Programming Languages, PTX

May 25, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Package:

Your response

Recent source codes

NVIDIA Nemotron Parse 1.1

ThunderKittens: Tile primitives for speedy kernels

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

LC Framework

pplx-garden: Perplexity open source garden for inference technology

Most viewed papers (last 30 days)

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)