high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

Bin Lei, Caiwen Ding, Le Chen, Pei-Hung Lin, Chunhua Liao

Dept. Computer Science and Engineering, University of Connecticut, Storrs, USA

arXiv:2307.07686 [cs.SE], (18 Sep 2023)

DOI:10.48550/arXiv.2307.07686

@misc{lei2023creating,

title={Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++},

author={Bin Lei and Caiwen Ding and Le Chen and Pei-Hung Lin and Chunhua Liao},

year={2023},

eprint={2307.07686},

archivePrefix={arXiv},

primaryClass={cs.SE}

}

Download (PDF)

View

Source

Source codes

Package:

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

1239

views

In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is created from a range of representative open-source OpenMP benchmarks. It is also refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We showcase how this dataset significantly elevates the translation competencies of large language models (LLMs). Specifically, models without prior coding knowledge experienced a boost of x5.1 in their CodeBLEU scores, while models with some coding familiarity saw an impressive x9.9-fold increase. The best fine-tuned model using our dataset outperforms GPT-4. It is also reaching human-level accuracy. This work underscores the immense potential of our dataset in propelling advancements in the domain of code translation for high-performance computing. The dataset is accessible.

Tags: Benchmarking, Computer science, CUDA, Fortran, Machine learning, nVidia, nVidia RTX A6000, OpenMP, Package

November 19, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)