27620

mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL

Parmida Vahdatniya, Amirali Sharifian, Reza Hojabr, Arrvindh Shriraman
School of Computing Sciences, Simon Fraser University
30th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2022

@article{vahdatniya2022mu,

   title={mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL},

   author={Vahdatniya, Parmida and Sharifian, Amirali and Hojabr, Reza and Shriraman, Arrvindh},

   year={2022}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

541

views

High-level synthesis compilers (HLS) enable the rapid creation of accelerator circuits. Unfortunately, compiler generated RTL (H-RTL) is inconsistent in terms of quality, hard to comprehend, and tends to be brittle [28, 41]. This paper develops a framework to help HLS compiler architects inspect and profile H-RTL. Prior state-of-the-art tools [23, 57] have predominantly focused on tracing. Tracing requires massive amount of on-chip buffering, limits the H-RTL design size, and only support post-mortem analysis at the end of the execution. We propose mu-grind, a dynamic instrumentation framework for H-RTL. The key technique is guards, additional logic that we auto-inject into the output of HLS compilers (H-RTL). Guards perform two tasks: i) they run analysis functions on the values fed from the H-RTL signal, and ii) patch values into the H-RTL during live execution. Guards can either be mapped onto the FPGA or can be co-simulated along with the H-RTL. mu-grind can remove them once the H-RTL is finalized. Leveraging mu-grind, we create a novel tool, H-RTL checker, that precisely identifies the erring signal and cycle without any user involvement. Compared to prior art, mu-grind requires 2—10x less SRAM, supports 5x larger H-RTL circuits (upto 98% of the FPGA) and completes checks in <24 hours (including FPGA synthesis time). We also develop two additional tools: i) H-RTL faulty, which deploys heterogeneous guards to study circuit resilience, and ii) H-RTL profiler, which creates detailed execution histograms. We save between 200-35000X DRAM traffic compared to prior art, by avoiding traces.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: