## Machine Learning from Streaming Data in Heterogeneous Computing Environments

Technische Universitat Berlin

Technische Universitat Berlin, 2016

@article{jung2016modular,

title={A modular GPU raytracer using OpenCL for non-interactive graphics},

author={Jung, Henrique Nunes and Cassol, Vinicius Jurinic},

year={2016}

}

With the advent of many-core general-purpose processors (CPUs), the use of an increased number of cores has provided a certain speedup for algorithms that can be parallized. Nowadays, there are distributed and parallel data processing platforms, such as Apache Flink, which inherently makes use of parallel computing. On the other hand, graphics processors(GPUs) offers high performance solutions for certain problems thanks to their architecture that is suitable for massivelly data parallel computations. In the last decade, GPU computing has became popular also for general purpose applications. Although there are some drawbacks such as memory transfer latency, it has been proven that GPUs provide substantial speedup especially in computationally intensive problems thanks to their massively parallel computation capability. Nowadays, there are also heterogeneous computing platforms such as OpenCL which enables developers to write portable programs that can be executed in parallel in a range of processors such as CPUs and GPUs while providing certain abstractions that simplify parallel programming across different computing devices. Streaming k-means is an unsupervised online learning algorithm which is an adaptation of batch k-means algorithm which is still one of the most commonly used algorithms due to its simplicity, efficiency and empirical success. In this thesis, we initially implement sliding window based streaming k-means algorithm in OpenCL and Apache Flink, and give an overview regarding the impact of the window size, the tuple size, the number of clusters and the window slide size on system throughput in two CPUs and three GPUs. We achieve higher throughput than Flink in our OpenCL application. Besides, we show that GPUs still produce higher throughput than many-core CPUs. However, the difference between the performances of OpenCL applications where the computational intensive step is executed in CPUs and GPU is reduced in modern architectures. Furthermore, the modern many-core CPUs can occasionally show competitive performance with GPUs in particular when our streaming k-means algorithm is used.

April 11, 2017 by hgpu