28464

Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code

Bin Lei, Caiwen Ding, Le Chen, Pei-Hung Lin, Chunhua Liao
Dept. Computer Science and Engineering, University of Connecticut, Storrs, USA
arXiv:2307.07686 [cs.SE], (15 Jul 2023)

@misc{lei2023creating,

   title={Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code},

   author={Bin Lei and Caiwen Ding and Le Chen and Pei-Hung Lin and Chunhua Liao},

   year={2023},

   eprint={2307.07686},

   archivePrefix={arXiv},

   primaryClass={cs.SE}

}

In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how this dataset can significantly improve the translation capabilities of large-scale language models, with improvements of times 5.1 for models with no prior coding knowledge and times 9.9 for models with some coding familiarity. Our work highlights the potential of this dataset to advance the field of code translation for high-performance computing.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: