Energy Efficiency Studies of Mont Blanc Applications

hgpu.org » Applications » Computer science » Energy Efficiency Studies of Mont Blanc Applications

Energy Efficiency Studies of Mont Blanc Applications

Mads Holden

Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science, Norwegian University of Science and Technology

Norwegian University of Science and Technology, 2013

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Energy Efficiency Studies of Mont Blanc Applications

2705

views

In this thesis, the performance and energy efficiency of four different implementations of matrix multiplication, written in OmpSs and OpenCL, is tested and evaluated. The benchmarking is done using an Intel Ivy Bridge Core i7 3770K. The results are evaluated and discussed with regards to different optimization configurations, like vectorization and multi-threading. Energy measurements are taken using PAPI, which in turn uses the Running Average Power Limit interface in the Intel processor to take energy readings. Performance is presented using MFLOPS, while energy efficiency is compared using MFLOPS/W, watts used, and the energy delay product and energy delay squared. The OpenCL versions are compared with and without vectorization. One of the applications using OmpSs is also measured with regards to vectorization, and also number of threads. The last OmpSs version uses the BLAS implementation ATLAS, which is already vectorized. Therefore it is only compared using number of threads. SSE and AVX vectorization is shown to significantly improve performance while using little to no extra energy per second for all implementations. Multi-threading also gives higher performance, however this consumes more energy. Running with eight threads was shown to spend more energy while performing worse when using ATLAS. The OmpSs version using ATLAS was both the fastest and most energy efficient, peaking at 125 GFLOPS and 2.7 GLOPS/W while running with four threads and using AVX.

Tags: ARM, Computer science, Energy-efficient computing, Matrix multiplication, OpenCL, Package, Thesis

October 21, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org