5651

Register packing for cyclic reduction: a case study

Andrew Davidson, John D. Owens
University of California, Davis
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, 2011

@inproceedings{davidson2011register,

   title={Register packing for cyclic reduction: a case study},

   author={Davidson, A. and Owens, J.D.},

   booktitle={Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units},

   pages={4},

   year={2011},

   organization={ACM}

}

Download Download (PDF)   View View   Source Source   

1386

views

We generalize a method for avoiding GPU shared communication when dealing with a downsweep pattern. We apply this generalization to Cyclic Reduction, a tridiagonal solver with this pattern. Previously, Cyclic Reduction suffered poor performance when compared to other tridiagonal solvers on the GPU due to performance issues stemming from shared-memory bandwidth bottlenecks and step-efficiency. We address this problem by applying our down-sweep shared-memory communication-reducing methodology. Our re-mapping also allows Cyclic Reduction to solve larger systems directly in a virtual block. By using our generalized mapping, we improve Cyclic Reduction’s performance on a GPU by a factor of 3-4.5x over the original CR implementation, making it 1.5-3x faster than other GPU tridiagonal solvers.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: