Benchmarking performance of a hybrid Xeon/Xeon Phi system for parallel computation of similarity measures between large vectors

hgpu.org » Applications » Computer science » Benchmarking performance of a hybrid Xeon/Xeon Phi system for parallel computation of similarity measures between large vectors

Benchmarking performance of a hybrid Xeon/Xeon Phi system for parallel computation of similarity measures between large vectors

Pawel Czarnul

Gdansk University of Technology

International Journal of Parallel Programming, 2016

@article{czarnul2017benchmarking,

title={Benchmarking performance of a hybrid Xeon/Xeon Phi system for parallel computation of similarity measures between large vectors},

author={Czarnul, Pawe{l}},

journal={Int. J. Parallel Progr.(2017)}

}

Download (PDF)

View

Source

1880

views

The paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability in a hybrid multicore CPU + Xeon Phi system. Hybrid systems including multicore CPUs and many-core compute devices such as Intel Xeon Phi allow parallelization of such computations using vectorization but require proper load balancing and optimization techniques. The proposed implementation uses C/OpenMP with the offload mode to Xeon Phi cards. Several results are presented: execution times for various partitioning parameters such as batch sizes of vectors being compared, impact of dynamic adjustment of batch size, overlapping computations and communication. Execution times for comparison of all pairs of vectors are presented as well as those for which similarity measures account for a predefined threshold. The latter makes load balancing more difficult and is used as a benchmark for the proposed optimizations. Results are presented for the native mode on an Intel Xeon Phi, CPU only and the CPU + offload mode for a hybrid system with 2 Intel Xeons with 20 physical cores and 40 logical processors and 2 Intel Xeon Phis with a total of 120 physical cores and 480 logical processors.

Tags: Benchmarking, Computer science, Intel Xeon Phi, OpenMP, Performance

November 16, 2016 by hgpu

Rating: 1.7/5. From 3 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org