A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures

hgpu.org » Applications » Computer science » A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures

A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures

Shuaiwen Song, Chunyi Su, Barry Rountree, Kirk W. Cameron

Virginia Tech, Blacksburg, VA

27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2013

BibTeX

Download (PDF)

View

Source

3150

views

Emergent heterogeneous systems must be optimized for both power and performance at exascale. Massive parallelism combined with complex memory hierarchies form a barrier to efficient application and architecture design. These challenges are exacerbated with GPUs as parallelism increases orders of magnitude and power consumption can easily double. Models have been proposed to isolate power and performance bottlenecks and identify their root causes. However, no current models combine simplicity, accuracy, and support for emergent GPU architectures (e.g. NVIDIA Fermi). We combine hardware performance counter data with machine learning and advanced analytics to model power-performance efficiency for modern GPU-based systems. Our performance counter based approach is simpler than previous approaches and does not require detailed understanding of the underlying architecture. The resulting model is accurate for predicting power (within 2.1%) and performance (within 6.7%) for application kernels on modern GPUs. Our model can identify power-performance bottlenecks and their root causes for various complex computation and memory access patterns (e.g. global, shared, texture). We measure the accuracy of our power and performance models on a NVIDIA Fermi C2075 GPU for more than a dozen CUDA applications. We show our power model is more accurate and robust than the best available GPU power models – multiple linear regression models MLR and MLR+. We demonstrate how to use our models to identify power-performance bottlenecks and suggest optimization strategies for high-performance codes such as GEM, a biomolecular electrostatic analysis application. We verify our power-performance model is accurate on clusters of NVIDIA Fermi M2090s and useful for suggesting optimal runtime configurations on the Keeneland supercomputer at Georgia Tech.

Tags: Biomolecules, Computer science, CUDA, Electrostatics, Energy-efficient computing, GPU cluster, Heterogeneous systems, Machine learning, nVidia, Tesla C2075, Tesla M2090

February 18, 2013 by hgpu

Rating: 2.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org