How to distribute most efficiently a computation intensive calculation on an Android device to external compute units with an Android API
University Hasselt
University Hasselt, 2017
@mastersthesis{beckers2017distribute,
title={How to distribute most efficiently a computation intensive calculation on an Android device to external compute units with an Android API},
author={Beckers, Lander and Lakiere, Henning},
year={2017},
school={UHasselt}
}
Is transferring computation intensive calculations to external compute-units the next trend? This master’s thesis researches if it is worth the effort to transfer a matrix multiplication from an Android phone to a System-on-Chip (SoC), using Bluetooth or WebSocket as communication protocols. The SoC solution used in this work is an Intel Altera Cyclone V based board from TerASIC, equipped with a Field Programmable Gate Array (FPGA) including a Dualcore ARM A9 processor. Because the matrix size has a strong correlation to the number of calculations in a matrix multiplication, the calculation time on a CPU and FPGA will differ when the matrices grow in size. Comparing the multiplication times on Android and SoC, matrices with a matrix size above 1660×1660 are calculated faster on the SoC. The matrix multiplication is accelerated using an OpenCL kernel on the FPGA, guided by a host program on the processor programmed in C++. Experiments have shown that Bluetooth has a 500 times lower transfer rate than WebSocket, resulting in choosing only WebSocket for further investigations. Due to the transfer times, the minimum matrix size to win time by extending the multiplication to a SoC is 2338×2338. Although the implemented matrix multiplication does only support square matrices, future research could develop multiple kernels of different algorithms that support a variation in width and height.
October 21, 2017 by hgpu