high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Jannatun Naher, Clay Gloster, Shrikant S. Jadhav, Christopher C. Doss

Electrical and Computer Engineering Department, North Carolina A & T State University, Greensboro, NC, USA

North Carolina A & T State University, 2020

BibTeX

Download (PDF)

View

Source

1885

views

Hardware designers use High-Level Synthesis (HLS) tools in order to reduce the design time and design complexity. OpenCL is a framework that uses HLS tools and permits the programmer to write standardized C-like code for the host as well as for the hardware accelerators. Using OpenCL, a program can be written using different memory access and data partitioning strategies. The programmer needs to try various designs in order to optimize the design. However, each design takes multiple hours to compile. Characteristics of a hardware architecture can be estimated using a machine learning technique without doing actual synthesis. Sparse Matrix-Vector Multiplication (SpMV) is widely used in linear algebra and for many applications. The SpMV kernel can be designed in numerous ways using OpenCL. For SpMV implementation, the storage format is a vital factor. Different memory access patterns, storage requirements, and load balancing impact the hardware architecture. Here, this research is proposing two things. First, it utilizes a hybrid approach to store the sparse matrix to implement the SpMV kernel. Second, it estimates the hardware architecture for any set of design settings using a machine learning technique in OpenCL without doing actual synthesis. From our implementation, compared to ELL storage format, the proposed storage format (a combination of ELL and CSR) takes a less amount of resources for LUTs, DSPs, and RAM blocks while providing higher throughput. The Random forest machine learning algorithm estimates the logic utilization and performance for ELL and the proposed storage format within a very reasonable accuracy range. Using hybrid format (ELL+CSR) for 65 designs, the average error is 11.43%, 19.03%, 9.09%, 5.3% and 9.73% for LUTs, DSPs, memory bits, RAM blocks and throughput (GFLOPs) respectively.

Tags: Computer science, Design space exploration, DSP, FPGA, Linear Algebra, Machine learning, OpenCL, Sparse matrix

April 12, 2020 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based SpMV Implementation on an FPGA

Share this:

Recent source codes

Most viewed papers (last 30 days)