Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures

hgpu.org » Programming » Algorithms » Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures

Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures

Varun Nagpal

Universita della Svizzera Italiana

arXiv:1603.02655 [cs.DC], (8 Mar 2016)

@article{nagpal2016study,

title={Study and evaluation of an Irregular Graph Algorithm on Multicore and GPU Processor Architectures},

author={Nagpal, Varun},

year={2016},

month={mar},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

2446

views

One area of Computing applications which poses significant challenge of performance scalability on Chip Multiprocessors(CMP’s) are Irregular applications. Such applications have very little computation and unpredictable memory access patterns making them memory-bound in contrast to compute-bound applications. Since the gap between processor and memory performance continues to exist, difficulty to hide and decrease this gap is one of the important factors which results in poor performance of these applications on CMP’s. The goal of this thesis is to overcome many such challenges posed during performance acceleration of an irregular graph algorithm called Triad Census. We accelerated the Triad Census algorithm on two significantly different Chip Multiprocessors: Dual-socket Intel Xeon Multicore (8 hardware threads per socket) and 240-processor core NVIDIA Tesla C1060 GPGPU(128 hardware threads per core). The experimental results obtained on Intel Multicore Xeon system shows performance speedups (w.r.t baseline sequential) of maximum 56x , average 33x and minimum 8.3x for real world graph data sets. On NVIDIA Tesla C1060 GPGPU, we were able to match almost equally the Multicore results – 58.4x maximum, 32.8x average and 4.2x minimum speedups w.r.t baseline sequential. In terms of raw performance, for the graph data set called Patents network, our results on Intel Xeon Multicore(16 hw threads) were 1.27x times faster than previous results on Cray XMT(16 hw threads) while results achieved on GPGPU were comparatively slower(0.72x). To the best of our knowledge, this algorithm has only been accelerated on supercomputer class computer named Cray XMT and no work exists that demonstrates performance evaluation and comparison of this algorithm on relatively lower-cost Multicore and GPGPU based platforms.

Tags: Algorithms, Computer science, CUDA, Graph theory, nVidia, Performance, Tesla C1060, Thesis

March 10, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org