GPU-Based Heuristic Solver for Linear Sum Assignment Problems Under Real-time Constraints

hgpu.org » Programming » CUDA » GPU-Based Heuristic Solver for Linear Sum Assignment Problems Under Real-time Constraints

GPU-Based Heuristic Solver for Linear Sum Assignment Problems Under Real-time Constraints

Roberto Roverso, Amgad Naiem, Mohammed El-Beltagy, Sameh El-Ansary

Peerialism Inc., Sweden

arXiv:1106.5694v1 [math.OC] (28 Jun 2011)

@article{2011arXiv1106.5694R,

author={Roverso}, R. and {Naiem}, A. and {El-Beltagy}, M. and {El-Ansary}, S.},

title={"{GPU-Based Heuristic Solver for Linear Sum Assignment Problems Under Real-time Constraints}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1106.5694},

primaryClass={"math.OC"},

keywords={Mathematics – Optimization and Control, Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Mathematical Software, Computer Science – Performance, G.1.6},

year={2011},

month={jun},

adsurl={http://adsabs.harvard.edu/abs/2011arXiv1106.5694R},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

2041

views

In this paper we modify a fast heuristic solver for the Linear Sum Assignment Problem (LSAP) for use on Graphical Processing Units (GPUs). The motivating scenario is an industrial application for P2P live streaming that is moderated by a central node which is periodically solving LSAP instances for assigning peers to one another. The central node needs to handle LSAP instances involving thousands of peers in as near to real-time as possible. Our findings are generic enough to be applied in other contexts. Our main result is a parallel version of a heuristic algorithm called Deep Greedy Switching (DGS) on GPUs using the CUDA programming language. DGS sacrifices absolute optimality in favor of low computation time and was designed as an alternative to classical LSAP solvers such as the Hungarian and auctioning methods. The contribution of the paper is threefold: First, we present the process of trial and error we went through, in the hope that our experience will be beneficial to adopters of GPU programming for similar problems. Second, we show the modifications needed to parallelize the DGS algorithm. Third, we show the performance gains of our approach compared to both a sequential CPU-based implementation of DGS and a parallel GPU-based implementation of the auctioning algorithm.

Tags: CUDA, Mathematical Software, Mathematics, nVidia, nVidia GeForce GTX 295, Optimization, Performance

June 29, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org