9038

Convergence and Scalarization for Data-Parallel Architectures

Yunsup Lee, Ronny Krashinsky, Vinod Grover, Stephen W. Keckler, Krste Asanovic
University of California at Berkeley
International Symposium on Code Generation and Optimization (CGO-2013), 2013

@article{lee2013convergence,

   title={Convergence and Scalarization for Data-Parallel Architectures!},

   author={Lee, Yunsup and Krashinsky, Ronny and Grover, Vinod and Keckler, Stephen W and Asanovic, Krste},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

1715

views

Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data parallelism in application kernels expressed as threaded code. One drawback of this approach compared to conventional vector architectures is redundant execution of instructions that are common across multiple threads, resulting in energy inefficiency due to excess instruction dispatch, register file accesses, and memory operations. This paper proposes to alleviate these overheads while retaining the threaded programming model by automatically detecting the scalar operations and factoring them out of the parallel code. We have developed a scalarizing compiler that employs convergence and variance analyses to statically identify values and instructions that are invariant across multiple threads. Our compiler algorithms are effective at identifying convergent execution even in programs with arbitrary control flow, identifying two-thirds of the opportunity captured by a dynamic oracle. The compile-time analysis leads to a reduction in instructions dispatched by 29%, register file reads and writes by 31%, memory address counts by 47%, and data access counts by 38%.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: