high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Efficient Hardware Acceleration on SoC-FPGA with OpenCL

Efficient Hardware Acceleration on SoC-FPGA with OpenCL

Susmitha Gogineni

The University of Texas at Dallas

The University of Texas at Dallas, 2017

@phdthesis{gogineni2017efficient,

title={Efficient Hardware Acceleration on SoC-FPGA using OpenCL},

author={Gogineni, Susmitha and others},

year={2017}

}

Download (PDF)

View

Source

2149

views

Field Programmable Gate Arrays (FPGAs) are taking over the conventional processors in the field of High Performance computing. With the advent of FPGA architectures and High level synthesis tools, FPGAs can now be easily used to accelerate computationally intensive applications like, e.g., AI and Cognitive computing. One of the advantages of raising the level of hardware design abstraction is that multiple configurations with unique properties (i.e. area, performance and power) can be automatically generated without the need to re-write the input description. This is not possible when using traditional low-level hardware description languages like VHDL or Verilog. This thesis deals with this important topic and accelerates multiple computationally intensive applications amiable to hardware acceleration and proposes a fast heuristic Design Space Exploration method to find dominant design solutions quickly. In particular, in this work, we developed different computationally intensive applications in OpenCL and mapped them onto a heterogeneous SoC-FPGA. A Genetic Algorithm (GA) based meta-heuristics that does automatic Design Space Exploration (DSE) on these applications was also developed as GA has shown in the past to lead to good results in multi-objective optimization problems like this one. The developed explorer automatically inserts a set of control knobs into the OpenCL behavioral description, e.g., to control how to synthesize loops (unroll or not), and to replicate Compute Units (CUs). By tuning the these control attributes with possible values, thousands of different micro-architecture configurations can be obtained. Thus, an exhaustive search is not feasible and other heuristics are needed. Each configuration is compiled using Altera OpenCL SDK tool and executed on Terasic DE1-SoC FPGA board platform to record the corresponding performance and logic utilization. In order to measure the quality of the proposed GA-based heuristic, each application is explored exhaustively (taking multiple days to finish for smaller designs) to find the dominant optimal solutions (Pareto Optimal Designs). For complex and larger designs, exploring the entire design space exhaustively is not feasible due to very large design space. The comparison is quantified by using metrics like Dominance, Average Distance from Reference Set (ADRS) and run time speed up, showing that our proposed heuristics lead to very good results at a fraction of the time of the exhaustive search.

Tags: Computer science, Design space exploration, FPGA, Heterogeneous systems, HLS, OpenCL, Thesis

May 12, 2018 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Efficient Hardware Acceleration on SoC-FPGA with OpenCL

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Efficient Hardware Acceleration on SoC-FPGA with OpenCL

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)