Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

hgpu.org » Applications » Computer science » Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Jiayuan Meng, Kevin Skadron

Department of Computer Science, University of Virginia

In ICS ’09: Proceedings of the 23rd international conference on Supercomputing (2009), pp. 256-265.

DOI:10.1145/1542275.1542313

@conference{meng2009performance,

title={Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs},

author={Meng, J. and Skadron, K.},

booktitle={Proceedings of the 23rd international conference on Supercomputing},

pages={256–265},

year={2009},

organization={ACM}

}

Download (PDF)

View

Source

1975

views

Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture, there are usually halo regions that need to be updated and exchanged among different processing elements (PEs). In addition, synchronization is often used to signal the completion of halo exchanges. Both communication and synchronization may incur significant overhead on parallel architectures with shared memory. This is especially true in the case of graphics processors (GPUs), which do not preserve the state of the per-core L1 storage across global synchronizations. To reduce these overheads, ghost zones can be created to replicate stencil operations, reducing communication and synchronization costs at the expense of redundantly computing some values on multiple PEs. However, the selection of the optimal ghost zone size depends on the characteristics of both the architecture and the application, and it has only been studied for message-passing systems in a grid environment. To automate this process on shared memory systems, we establish a performance model using NVIDIA’s Tesla architecture as a case study and propose a framework that uses the performance model to automatically select the ghost zone size that performs best and generate appropriate code. The modeling is validated by four diverse ISL applications, for which the predicted ghost zone configurations are able to achieve a speedup no less than 98% of the optimal speedup.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 280, Programming techniques

November 23, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org