high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Solving convex optimization problems on FPGA using OpenCL

Solving convex optimization problems on FPGA using OpenCL

Martijn Berkers

Delft University of Technology

Delft University of Technology, 2020

@article{berkers2020solving,

title={Solving convex optimization problems on FPGA using OpenCL},

author={Berkers, Martijn},

year={2020}

}

Download (PDF)

View

Source

2005

views

The application of accelerators in HPC applications has seen enormous growth in the last decade. In the field of HPC demands on throughput are steadily growing. Not all of the algorithms used have a clear HW architecture which performs the best. Our work explores the performance of different HW architectures in solving a convex optimization problem. These algorithms are a sequence of dependent operations making it an interesting use-case because parallelism is not easily found. Our work focuses on a use-case of an on machine computational model present in ASML, we explore the acceleration of a quadratic programming Active-Set algorithm on dedicated hardware. There are libraries available to do this on both the CPU and GPU, while nothing is available for the FPGA. Our work focuses on filling this gap by implementing the algorithm using a high-level abstraction parallel programming language in order to ease development for FPGA accelerators. We use the Intel FPGA SDK for OpenCL framework to evaluate the performance trade-offs involved with FPGA acceleration and compare the performance to both the CPU and GPU using library functions. To fit FPGA architecture the algorithm is converted to a dataflow algorithm to enable streaming of data between kernels. The implementation leverages the features introduced in the Intel FPGA SDK for OpenCL framework to stream data using on-chip low-latency communication between kernels. We demonstrate that such a complicated algorithm can efficiently be implemented using the OpenCL framework. Our implementation achieves competitive performance compared to optimized library function on both the CPU and GPU. The OpenCL framework allows for easy design space exploration. We have explored different optimization strategies. The execution time of the final FPGA implementation is 3.5x and 1.2x longer than the CPU and GPU respectively in double precision floating-point. If the accuracy of the FPGA implementation is reduced to single precision there is a speedup of 2.2x in execution time compared to the double precision variant. Higher throughput can be achieved by duplicating the implementation. With the current size of the algorithm, two additional copies are possible. A handcrafted implementation could further improve the FPGA performance by manually managing local memory structures and reusing processing elements. However, significantly fewer lines of code are required, and a significant reduction in development time is achieved by using the OpenCL framework compared to traditional hardware description languages.

Tags: Algorithms, Computer science, Design space exploration, FPGA, OpenCL, Thesis

March 8, 2020 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Solving convex optimization problems on FPGA using OpenCL

Your response

Recent source codes

NVIDIA Nemotron Parse 1.1

ThunderKittens: Tile primitives for speedy kernels

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

pplx-garden: Perplexity open source garden for inference technology

LC Framework

Most viewed papers (last 30 days)

Solving convex optimization problems on FPGA using OpenCL

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)