high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package

Matheus Souza, Lucas Maciel, Pedro Henrique Penna, Henrique Freitas

Pontifical Catholic University of Minas Gerais (PUC Minas), Belo Horizonte, Brazil

hal-02048964, (February 26, 2019)

@inproceedings{souza:hal-02048964,

title={Energy Efficient Parallel K-Means Clustering for an Intel Hybrid Multi-Chip Package},

author={Souza, Matheus and Maciel, Lucas and Penna, Pedro Henrique and Freitas, Henrique},

url={https://hal.archives-ouvertes.fr/hal-02048964},

booktitle={2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)},

address={Lyon, France},

publisher={IEEE},

pages={372-379},

year={2018},

month={Sep},

keywords={FPGA ; Energy Efficiency ; OpenCL ; K-means},

pdf={https://hal.archives-ouvertes.fr/hal-02048964/file/hpml18.pdf},

hal_id={hal-02048964},

hal_version={v1}

}

Download (PDF)

View

Source

2186

views

FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel ® has launched a platform that integrates a multicore and an FPGA in the same package, enabling low latency and coherent fine-grained data offload. In this paper, we present a parallel implementation of the K-Means clustering algorithm, for this novel platform, using OpenCL language, and compared it against other platforms. We found that the CPU+FPGA platform was more energy efficient than the CPU-only approach from 70.71% to 85.92%, with Standard and Tiny input sizes respectively, and up to 68.21% of performance improvement was obtained with Tiny input size. Furthermore, it was up to 7.2x more energy efficient than an Intel Xeon Phi, 21.5x than a cluster of Raspberry Pi boards, and 3.8x than the low-power MPPA-256 architecture, when the Standard input size was used.

Tags: Clustering, Computer science, Energy-efficient computing, FPGA, Intel Xeon Phi, Machine learning, OpenCL

March 10, 2019 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...