18779

Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

Matheus Souza, Lucas Maciel, Pedro Henrique Penna, Henrique Freitas
Pontifical Catholic University of Minas Gerais (PUC Minas), Belo Horizonte, Brazil
hal-02048964, (February 26, 2019)

@inproceedings{souza:hal-02048964,

   title={Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package},

   author={Souza, Matheus and Maciel, Lucas and Penna, Pedro Henrique and Freitas, Henrique},

   url={https://hal.archives-ouvertes.fr/hal-02048964},

   booktitle={2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)},

   address={Lyon, France},

   publisher={IEEE},

   pages={372-379},

   year={2018},

   month={Sep},

   keywords={FPGA ; Energy Efficiency ; OpenCL ; K-means},

   pdf={https://hal.archives-ouvertes.fr/hal-02048964/file/hpml18.pdf},

   hal_id={hal-02048964},

   hal_version={v1}

}

Download Download (PDF)   View View   Source Source   

321

views

FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel ® has launched a platform that integrates a multicore and an FPGA in the same package, enabling low latency and coherent fine-grained data offload. In this paper, we present a parallel implementation of the K-Means clustering algorithm, for this novel platform, using OpenCL language, and compared it against other platforms. We found that the CPU+FPGA platform was more energy efficient than the CPU-only approach from 70.71% to 85.92%, with Standard and Tiny input sizes respectively, and up to 68.21% of performance improvement was obtained with Tiny input size. Furthermore, it was up to 7.2x more energy efficient than an Intel Xeon Phi, 21.5x than a cluster of Raspberry Pi boards, and 3.8x than the low-power MPPA-256 architecture, when the Standard input size was used.
Rating: 2.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: