HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis
The University of Texas at Austin
arXiv:2302.10977 [cs.AR], (17 Feb 2023)
@misc{https://doi.org/10.48550/arxiv.2302.10977,
doi={10.48550/ARXIV.2302.10977},
url={https://arxiv.org/abs/2302.10977},
author={Wei, Zhigang and Arora, Aman and John, Lizy K.},
keywords={Hardware Architecture (cs.AR), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
title={HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis},
publisher={arXiv},
year={2023},
copyright={Creative Commons Attribution Non Commercial No Derivatives 4.0 International}
}
Machine Learning (ML) has been widely adopted in design exploration using high level synthesis (HLS) to give a better and faster performance, and resource and power estimation at very early stages for FPGA-based design. To perform prediction accurately, high-quality and large-volume datasets are required for training ML models.This paper presents a dataset for ML-assisted FPGA design using HLS, called HLSDataset. The dataset is generated from widely used HLS C benchmarks including Polybench, Machsuite, CHStone and Rossetta. The Verilog samples are generated with a variety of directives including loop unroll, loop pipeline and array partition to make sure optimized and realistic designs are covered. The total number of generated Verilog samples is nearly 9,000 per FPGA type. To demonstrate the effectiveness of our dataset, we undertake case studies to perform power estimation and resource usage estimation with ML models trained with our dataset. All the codes and dataset are public at the github repo.We believe that HLSDataset can save valuable time for researchers by avoiding the tedious process of running tools, scripting and parsing files to generate the dataset, and enable them to spend more time where it counts, that is, in training ML models.
February 26, 2023 by hgpu