18872

An Architectural Journey into RISC Architectures for HPC Workloads

Ying Hao Xu Lin
Barcelona Supercomputing Center (BSC)
Barcelona Supercomputing Center, 2019
BibTeX

Download Download (PDF)   View View   Source Source   

1599

views

The race to the Exascale (i.e., 10^18 Floating Point operations per seconds) together with the slow-down of Moore’s law are posing unprecedented challenges to the whole High-Performance Computing (HPC) community. Computer architects, system integrators and software engineers studying programming models for handling parallelism are especially called to the rescue in a moment like the one in which we are living. While studying the current HPC market, a careful observer can notice that i) the dominance of a single x86 is fading; ii) as a consequence of the previous point, new CPU architectures and accelerators are gaining relevance (e.g. RISC CPUs and GP-GPUs); iii) also, new workloads coming from industry 4.0 and automotive (e.g. machine learning) are requiring more and more computational resources. Thus, driving the development of next-generation computational systems. This thesis explores the boundary of these three observations evaluating the current state-of-the-art of emerging RISC architectures in HPC (Arm and RISC-V). It studies the performance, the instantaneous power consumption and total energy spent to reach the solution of a scientific problem in heterogeneous System-on-Chips (SoCs). For the evaluation, four platforms have been tested: two heterogeneous Arm platforms (CPU+GPU and CPU+FPGA), one RISC-V platform and one Open Source RISC-V core running in an FPGA. The added values of the thesis come from the fact that: A. The evaluation of the aforementioned platforms has been performed using a machine learning test-case based on the k-means clustering algorithm related to predictive maintenance and failure detection provided by an industrial partner. While preparing this master thesis, I was in fact involved in the research activities within the collaboration between the Barcelona Supercomputing Center (BSC) and Aingura IIoT. B. The tests of the k-means algorithm on the RISC-V core implied the implementation of a System on Chip allowing the interaction with the RISC-V core. Even if the Ariane core itself is freely available online, the work of having peripherals for minimal I/O operations and performance counters required careful work on FPGA using a hardware description language (SystemVerilog). As expected, the more mature Arm Cortex A57 processor outperformed the rest of the platforms and the best RISC-V platform shown to perform as good as the Arm Cortex A9. For the heterogeneous platforms, the studied CPU+GPU system achieved the best performance but the CPU+FPGA used less energy when considering only the active power of the execution. The document makes special emphasis on the reproducibility of the experiments by explaining step-by-step how to set up an FPGA-based research platform using an Open Source RISC-V core and how to interact with the hardware counters defined in RISC-V in order to measure the performance.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org