Static GPU threads and an improved scan algorithm
Research Group Programming Languages / Methodologies, University of Kassel
EURO-PAR 2010 Parallel Processing Workshops, Lecture Notes in Computer Science, 2011, Volume 6586/2011, 373-380
@inproceedings{breitbart2011static,
title={Static GPU threads and an improved scan algorithm},
author={Breitbart, J.},
booktitle={Euro-Par 2010 Parallel Processing Workshops},
pages={373–380},
year={2011},
organization={Springer}
}
Current GPU programming systems automatically distribute the work on all GPU processors based on a set of fixed assumptions, e.g. that all tasks are independent from each other. We show that automatic distribution limits algorithmic design, and demonstrate that manual work distribution hardly adds any overhead. Our Scan+algorithm is an improved scan relying on manual work distribution. It uses global barriers and task interleaving to provides almost twice the performance of Apple’s reference implementation [1].
October 6, 2011 by hgpu