4496

GPU-Based Heuristic Solver for Linear Sum Assignment Problems Under Real-time Constraints

Roberto Roverso, Amgad Naiem, Mohammed El-Beltagy, Sameh El-Ansary
Peerialism Inc., Sweden
arXiv:1106.5694v1 [math.OC] (28 Jun 2011)

@article{2011arXiv1106.5694R,

   author={Roverso}, R. and {Naiem}, A. and {El-Beltagy}, M. and {El-Ansary}, S.},

   title={"{GPU-Based Heuristic Solver for Linear Sum Assignment Problems Under Real-time Constraints}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1106.5694},

   primaryClass={"math.OC"},

   keywords={Mathematics – Optimization and Control, Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Mathematical Software, Computer Science – Performance, G.1.6},

   year={2011},

   month={jun},

   adsurl={http://adsabs.harvard.edu/abs/2011arXiv1106.5694R},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

1324

views

In this paper we modify a fast heuristic solver for the Linear Sum Assignment Problem (LSAP) for use on Graphical Processing Units (GPUs). The motivating scenario is an industrial application for P2P live streaming that is moderated by a central node which is periodically solving LSAP instances for assigning peers to one another. The central node needs to handle LSAP instances involving thousands of peers in as near to real-time as possible. Our findings are generic enough to be applied in other contexts. Our main result is a parallel version of a heuristic algorithm called Deep Greedy Switching (DGS) on GPUs using the CUDA programming language. DGS sacrifices absolute optimality in favor of low computation time and was designed as an alternative to classical LSAP solvers such as the Hungarian and auctioning methods. The contribution of the paper is threefold: First, we present the process of trial and error we went through, in the hope that our experience will be beneficial to adopters of GPU programming for similar problems. Second, we show the modifications needed to parallelize the DGS algorithm. Third, we show the performance gains of our approach compared to both a sequential CPU-based implementation of DGS and a parallel GPU-based implementation of the auctioning algorithm.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: