Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

hgpu.org » Applications » Computer science » Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi

Ali Mirsoleimani, Aske Plaat, Jos Vermaseren, Jaap van den Herik

Leiden Centre of Data Science, Leiden University, The Netherlands

arXiv:1409.4297 [cs.PF], (15 Sep 2014)

@article{2014arXiv1409.4297M,

author={Mirsoleimani}, A. and {Plaat}, A. and {Vermaseren}, J. and {van den Herik}, J.},

title={"{Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1409.4297},

primaryClass={"cs.PF"},

keywords={Computer Science – Performance},

year={2014},

month={sep},

adsurl={http://adsabs.harvard.edu/abs/2014arXiv1409.4297M},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1569

views

In 2013 Intel introduced the Xeon Phi, a new parallel co-processor board. The Xeon Phi is a cache-coherent many-core shared memory architecture claiming CPU-like versatility, programmability, high performance, and power efficiency. The first published micro-benchmark studies indicate that many of Intel’s claims appear to be true. The current paper is the first study on the Phi of a complex artificial intelligence application. It contains an open source MCTS application for playing tournament quality Go (an oriental board game). We report the first speedup figures for up to 240 parallel threads on a real machine, allowing a direct comparison to previous simulation studies. After a substantial amount of work, we observed that performance scales well up to 32 threads, largely confirming previous simulation results of this Go program, although the performance surprisingly deteriorates between 32 and 240 threads. Furthermore, we report (1) unexpected performance anomalies between the Xeon Phi and Xeon CPU for small problem sizes and small numbers of threads, and (2) that performance is sensitive to scheduling choices. Achieving good performance on the Xeon Phi for complex programs is not straightforward; it requires a deep understanding of (1) search patterns, (2) of scheduling, and (3) of the architecture and its many cores and caches. In practice, the Xeon Phi is less straightforward to program for than originally envisioned by Intel.

Tags: Artificial intelligence, Benchmarking, Computer science, Intel Xeon Phi, Performance

September 16, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org