Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques
Norwegian University of Science and Technology, Faculty of Information Technology and Electrical Engineering, Department of Computer Science
Norwegian University of Science and Technology, 2018
@article{al2018energy,
title={Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques},
author={Al Hasib, Abdullah},
year={2018},
publisher={NTNU}
}
Over the past few years, energy consumption has become the main limiting factor for computing in general. This has led CPU vendors to aggressively promote parallel computing using multiple cores without significantly increasing the thermal design power of the processor. However, achieving maximum performance and energy efficiency from the available resources on the multi-core and many-core platforms mandates efficient exploitation of the existing and emerging architectural features at the application level. This thesis presents the study of some of the existing and emerging technologies in order to identify the potential of exploiting these technologies in achieving high performance and energy efficiency for a set of Smart Grid applications on Intel multi-core and many-core platforms. The first part of this thesis explores the energy efficiency impact of different multi-core programming techniques for a selected set of benchmarks and smart grid applications on Intel SandyBridge and Haswell multi-core processors. These techniques include different parallelism techniques such as thread-level parallelism using OpenMP, task-based parallelism using OmpSs, data parallelism using SIMD (Single Instruction Multiple Data) instruction sets, code optimizations and use of different existing optimized math libraries. In our initial case studies, SIMD vectorization is proven very effective in providing both high performance and energy efficiency. Though the SIMD vectorization is proven very effective, it can also exert pressure on the available memory bandwidth for some applications like Powel Time-Series Kernel, causing under-utilization of the computing resources and thus energy inefficient executions. In the second part of this research, we investigate the opportunities of improving the performance of SIMD vectorization for memory-bound applications using SIMD data compression, SIMD software prefetching, SIMD shuffling, code-blocking and other code transformation techniques. The key idea is to reduce the data movement across memory hierarchy by using the idle CPU time. We show that integration of data compression is feasible on the Intel multicore platforms, as long as we can do it in a reasonable time. We present a comprehensive discussion on the SIMD compression techniques and the code transformations required for achieving efficient SIMD computations for memory/cache bound applications using Powel time series kernel as a demonstrator application. Finally, we perform feasibility study of SIMD optimization and compression techniques across other application domains using k-means clustering algorithm and full-search motion estimation algorithm. We also extended our experiments on Intel many-core architecture using Intel Xeon Phi coprocessor.
June 20, 2018 by hgpu