high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL

mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL

Parmida Vahdatniya, Amirali Sharifian, Reza Hojabr, Arrvindh Shriraman

School of Computing Sciences, Simon Fraser University

30th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2022

BibTeX

Download (PDF)

View

Source

Source codes

Package:

muIR Chisel library

907

views

High-level synthesis compilers (HLS) enable the rapid creation of accelerator circuits. Unfortunately, compiler generated RTL (H-RTL) is inconsistent in terms of quality, hard to comprehend, and tends to be brittle [28, 41]. This paper develops a framework to help HLS compiler architects inspect and profile H-RTL. Prior state-of-the-art tools [23, 57] have predominantly focused on tracing. Tracing requires massive amount of on-chip buffering, limits the H-RTL design size, and only support post-mortem analysis at the end of the execution. We propose mu-grind, a dynamic instrumentation framework for H-RTL. The key technique is guards, additional logic that we auto-inject into the output of HLS compilers (H-RTL). Guards perform two tasks: i) they run analysis functions on the values fed from the H-RTL signal, and ii) patch values into the H-RTL during live execution. Guards can either be mapped onto the FPGA or can be co-simulated along with the H-RTL. mu-grind can remove them once the H-RTL is finalized. Leveraging mu-grind, we create a novel tool, H-RTL checker, that precisely identifies the erring signal and cycle without any user involvement. Compared to prior art, mu-grind requires 2—10x less SRAM, supports 5x larger H-RTL circuits (upto 98% of the FPGA) and completes checks in <24 hours (including FPGA synthesis time). We also develop two additional tools: i) H-RTL faulty, which deploys heterogeneous guards to study circuit resilience, and ii) H-RTL profiler, which creates detailed execution histograms. We save between 200-35000X DRAM traffic compared to prior art, by avoiding traces.

Tags: Computer science, FPGA, Heterogeneous systems, HLS, Package, Scala

December 25, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL

Package:

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)