D5.5.2 – Architectural Techniques to exploit SLACK & ACCURACY trade-offs
@article{keramidasd5,
title={D5. 5.2–Architectural Techniques to exploit SLACK & ACCURACY trade-offs},
author={Keramidas, Georgios and TUB, Ben Juurlink and Stamoulis, Reviewers Iakovos}
}
In this work we are (a) exploring memory slack for the state-of-the-art many-core CPUs and GPUs, (b) present techniques to eliminate slack, and (c) explore the architectural parameters to improve power eciency. Dynamic Voltage-Frequency Scaling (DVFS) is one of the most benecial techniques for CPU’s to improve power eciency. The end of Dennard scaling however, in which as technology advances the available voltage range shrinks, is threatening the eectiveness of DVFS. This is very common in GPUs today and will become a severe limitation for many-cores in the near future. In this report we are analysing the impact of core DVFS for dierent memory frequencies into state of the art GPUs. Because of the limitations imposed by either the programming models or the hardware itself we could not apply DVFS on embedded low power GPUs. Therefore we swift our attention to general purpose multi-cores and demonstrate signicant energy benets from our proposed execution scheme. For the GPU evaluation part we are using the NVIDIA-CUDA toolkit and some custom micro-benchmarks. Our analysis shows that DVFS can give signicant energy benet at architectures with restricted memory bandwidth, such as embedded or mobile GPUs (although this is restricted to simulated runs only due to limitations ). Finally our work (a) proposes and evaluates a novel execution scheme for general purpose many-cores, and (b) investigates and intriguing future direction and reveal that energy ineciencies of GPUs are not related with memory slack but with the mechanisms used to hide slack which seems to compromise applications locality.
September 6, 2013 by hgpu