Data-driven versus Topology-driven Irregular Computations on GPUs

Rupesh Nasre, Martin Burtscher, Keshav Pingali
The University of Texas, Austin, Texas, USA
27th IEEE International Parallel & Distributed Processing Symposium, 2013


   title={Data-driven versus Topology-driven Irregular Computations on GPUs},

   author={Nasre, Rupesh and Burtscher, Martin and Pingali, Keshav},

   booktitle={27th IEEE International Parallel & Distributed Processing Symposium},



Download Download (PDF)   View View   Source Source   



Irregular algorithms are algorithms with complex main data structures such as directed and undirected graphs, trees, etc. A useful abstraction for many irregular algorithms is its operator formulation in which the algorithm is viewed as the iterated application of an operator to certain nodes, called active nodes, in the graph. Each operator application, called an activity, usually touches only a small part of the overall graph, so nonoverlapping activities can be performed in parallel. In topologydriven implementations, all nodes are assumed to be active so the operator is applied everywhere in the graph even if there is no work to do at some nodes. In contrast, in data-driven implementations the operator is applied only to nodes at which there might be work to do. Multicore implementations of irregular algorithms are usually data-driven because current multicores only support small numbers of threads and work-efficiency is important. Conversely, many irregular GPU implementations use a topology-driven approach because work inefficiency can be counterbalanced by the large number of GPU threads. In this paper, we study data-driven and topology-driven implementations of six important graph algorithms on GPUs. Our goal is to understand the tradeoffs between these implementations and how to optimize them. We find that data-driven versions are generally faster and scale better despite the cost of maintaining a worklist. However, topology-driven versions can be superior when certain algorithmic properties are exploited to optimize the implementation. These results led us to devise hybrid approaches that combine the two techniques and outperform both of them.
Rating: 2.5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: