## Multithreaded Dense Linear Algebra on Asymmetric Multi-core Processors

Universitat Jaume i de Castello, Escola de Doctorat de la Universitat Jaume I

Universitat Jaume i de Castello, 2017

@phdthesis{catalan2018multithreaded,

title={Multithreaded Dense Linear Algebra on Asymmetric Multi-core Processors},

author={Catal{‘a}n Pallar{‘e}s, Sandra and others},

year={2018},

school={Universitat Jaume I}

}

Nowadays, there exists a large variety of scientific, industry and engineering applications that require high computational power and storage, and their demands continue to grow; in order to obtain more precise solutions in these applications, scientists need to elaborate and work with more sophisticated and complex physical and mathematical models. Consequently, the capacity of new data processing systems and High Performance Computing (HPC) centers is saturated shortly after of their set up [2, 17, 44, 50]. Nonetheless, these resources are still used: scientific computation (or computational sciences, that is, the elaboration of mathematical models and the use of computers to analyze and solve scientific problems) is an effective tool in scientific discovering, complementary to more traditional methods based on theory and experimentation [44, 50]. Large-scale HPC systems are large energy consumers, that employ computing resources and auxiliary systems to operate [44, 49, 51, 74]. This consumption has a direct impact on the operational and maintenance costs of computing centers. However, electricity cost is not the only problem; in general, energy consumption turns into carbon emissions that are dangerous to the environment and public health, and the heat reduces the reliability of the hardware [51]. The situation requires additional measures: studying the Green500 list of June 2017 [1] we can see that, nowadays, the most efficient HPC systems in terms of power consumption attain 14,110 MFLOPS per Watt (MFLOPS/W). A simple calculation reveals that reaching the EXAFLOPS rate with the current technology will require 70.9 MFLOPS/W approximately, with an approximate cost of 70.9 million dollar per year. Although EXAFLOPS challenge will unleash innovative scientific discoveries, it is also true that more efficient hardware and software technologies are required from the energy point of view [6, 13, 18]. The pressure of HPC centers has forced hardware manufacturers to improve their designs to increase energy efficiency: Central Processing Unit (CPU), memory and disks (three of the large energy consumers in computing systems, with the remaining ones being the interconnection network and the power supply) integrate energy saving strategies, based on the system transition to low power states or the dynamic reduction of frequency and voltage (DVFS or Dynamic Voltage Frequency Scaling). On the other hand, software systems, communication libraries and, especially, computational libraries and application codes running in HPC centers have been, in general, oblivious to energy consumption. The Top500 [4] list is a good example. Computers in this list are classified according to the sustained performance (in floating-point arithmetic operations per second (FLOPS)) that the Linpack benchmark attains (basically, the solution of a dense linear system of large dimension). However, the numerical method behind this test, the LU factorization, is far from representative of the real performance attained by most scientific codes [18]. Even though this is a mature topic in other segments, the development of energy-aware solutions for HPC applications, which optimize both the execution time and the energy consumption, is still only in its early stages, despite the huge benefits that it may produce [6, 51]. The HPC community is now aware of the energy costs, as was demonstrated with the creation of the Green500 [1] list. As a response to this situation, the general objective of this thesis is the study, design, development and analysis of experimental solutions that are energy-aware for the execution of scientific and engineering numerical applications on low power architectures, more specifically asymmetric platforms. With the aim of demonstrating the benefits of these contributions, we selected diverse dense linear algebra operations that arise in very different areas, such as image processing, molecular dynamics simulation, and big data analytics, among others.

March 10, 2018 by hgpu