2909

Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications

I-Jui Sung, John A. Stratton, Wen-Mei W. Hwu
Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) 2010, Vienna, Austria, September 11-15, 2010

@inproceedings{Sung:2010:DLT:1854273.1854336,

   author={Sung, I-Jui and Stratton, John A. and Hwu, Wen-Mei W.},

   title={Data layout transformation exploiting memory-level parallelism in structured grid many-core applications},

   booktitle={Proceedings of the 19th international conference on Parallel architectures and compilation techniques},

   series={PACT ’10},

   year={2010},

   isbn={978-1-4503-0178-7},

   location={Vienna, Austria},

   pages={513–522},

   numpages={10},

   url={http://doi.acm.org/10.1145/1854273.1854336},

   doi={http://doi.acm.org/10.1145/1854273.1854336},

   acmid={1854336},

   publisher={ACM},

   address={New York, NY, USA},

   keywords={GPU, data layout transformation, parallel programming,}

}

Download Download (PDF)   View View   Source Source   

838

views

We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we have enabled automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 560% performance increases over the languagedefined layout, and a 7% performance gain in the worst case, in which the language-defined layout and access pattern is already well-vectorizable by the underlying hardware.
No votes yet.
Please wait...

* * *

* * *

Featured events

2018
November
27-30
Hida Takayama, Japan

The Third International Workshop on GPU Computing and AI (GCA), 2018

2018
September
19-21
Nagoya University, Japan

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

2018
September
22-24
MediaCityUK, Salford Quays, Greater Manchester, England

The 10th International Conference on Information Management and Engineering (ICIME), 2018

2018
August
21-23
No. 1037, Luoyu Road, Hongshan District, Wuhan, China

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

2018
October
29-31
Nanyang Executive Centre in Nanyang Technological University, Singapore

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: