20483

Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Jannatun Naher, Clay Gloster, Shrikant S. Jadhav, Christopher C. Doss
Electrical and Computer Engineering Department, North Carolina A & T State University, Greensboro, NC, USA
North Carolina A & T State University, 2020

@article{naher2020using,

   title={Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA},

   author={Naher, Jannatun and Gloster, Clay and Jadhav, Shrikant S and Doss, Christopher C},

   year={2020}

}

Download Download (PDF)   View View   Source Source   

504

views

Hardware designers use High-Level Synthesis (HLS) tools in order to reduce the design time and design complexity. OpenCL is a framework that uses HLS tools and permits the programmer to write standardized C-like code for the host as well as for the hardware accelerators. Using OpenCL, a program can be written using different memory access and data partitioning strategies. The programmer needs to try various designs in order to optimize the design. However, each design takes multiple hours to compile. Characteristics of a hardware architecture can be estimated using a machine learning technique without doing actual synthesis. Sparse Matrix-Vector Multiplication (SpMV) is widely used in linear algebra and for many applications. The SpMV kernel can be designed in numerous ways using OpenCL. For SpMV implementation, the storage format is a vital factor. Different memory access patterns, storage requirements, and load balancing impact the hardware architecture. Here, this research is proposing two things. First, it utilizes a hybrid approach to store the sparse matrix to implement the SpMV kernel. Second, it estimates the hardware architecture for any set of design settings using a machine learning technique in OpenCL without doing actual synthesis. From our implementation, compared to ELL storage format, the proposed storage format (a combination of ELL and CSR) takes a less amount of resources for LUTs, DSPs, and RAM blocks while providing higher throughput. The Random forest machine learning algorithm estimates the logic utilization and performance for ELL and the proposed storage format within a very reasonable accuracy range. Using hybrid format (ELL+CSR) for 65 designs, the average error is 11.43%, 19.03%, 9.09%, 5.3% and 9.73% for LUTs, DSPs, memory bits, RAM blocks and throughput (GFLOPs) respectively.
Rating: 2.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2020 hgpu.org

All rights belong to the respective authors

Contact us: