A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches
Department of Computer Science, Texas State University-San Marcos, San Marcos, TX 78666, USA
27th IEEE International Parallel & Distributed Processing Symposium, 2013
@article{burtscher2013scalable,
title={A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches},
author={Burtscher, Martin and Rabeti, Hassan},
year={2013}
}
This paper describes and evaluates a highly-scalable framework for running iterative local searches on heterogeneous HPC platforms. The user only needs to provide serial CPU or single-GPU code that implements a simple interface. The framework then executes this code in parallel using MPI between compute nodes and OpenMP and multi-GPU support within nodes. It handles all parallelization aspects, seed distribution and program termination, and it regularly records the currently best solution. We evaluate our framework on three supercomputers using a heuristic iterative hill-climbing TSP solver as well as a search for good finite-state machines. The framework scales to 2048 nodes (32,768 cores) on Ranger with less than a 5% drop in efficiency, searches over 12.2 trillion TSP tours per second on Stampede using 1024 nodes, and evaluates over 21.5 trillion FSM transitions per second using 256 CPUs and 384 GPUs on Keeneland.
March 12, 2013 by hgpu