Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications

I-Jui Sung, John A. Stratton, Wen-Mei W. Hwu
Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) 2010, Vienna, Austria, September 11-15, 2010


   author={Sung, I-Jui and Stratton, John A. and Hwu, Wen-Mei W.},

   title={Data layout transformation exploiting memory-level parallelism in structured grid many-core applications},

   booktitle={Proceedings of the 19th international conference on Parallel architectures and compilation techniques},

   series={PACT ’10},



   location={Vienna, Austria},







   address={New York, NY, USA},

   keywords={GPU, data layout transformation, parallel programming,}


Download Download (PDF)   View View   Source Source   



We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid as the primary data structure. Fluid dynamics and heat distribution, which both solve partial differential equations on a discretized representation of space, are representative of many important structured grid applications. Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we have enabled automatic data layout transformations for structured grid codes with dynamically allocated arrays. We also present how a tool can guide these transformations to statically choose a good layout given a model of the memory system, using a modern GPU as an example. A transformed layout that distributes concurrent memory requests among parallel memory system components provides substantial speedup for structured grid applications by improving their achieved memory-level parallelism. Even with the overhead of more complex address calculations, we observe up to 560% performance increases over the languagedefined layout, and a 7% performance gain in the worst case, in which the language-defined layout and access pattern is already well-vectorizable by the underlying hardware.
No votes yet.
Please wait...

* * *

* * *

Featured events

Hida Takayama, Japan

The Third International Workshop on GPU Computing and AI (GCA), 2018

Nagoya University, Japan

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

MediaCityUK, Salford Quays, Greater Manchester, England

The 10th International Conference on Information Management and Engineering (ICIME), 2018

No. 1037, Luoyu Road, Hongshan District, Wuhan, China

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

Nanyang Executive Centre in Nanyang Technological University, Singapore

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: