Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code
Dept. Computer Science and Engineering, University of Connecticut, Storrs, USA
arXiv:2307.07686 [cs.SE], (15 Jul 2023)
@misc{lei2023creating,
title={Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code},
author={Bin Lei and Caiwen Ding and Le Chen and Pei-Hung Lin and Chunhua Liao},
year={2023},
eprint={2307.07686},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how this dataset can significantly improve the translation capabilities of large-scale language models, with improvements of times 5.1 for models with no prior coding knowledge and times 9.9 for models with some coding familiarity. Our work highlights the potential of this dataset to advance the field of code translation for high-performance computing.
July 24, 2023 by hgpu