Partitioning streaming parallelism for multi-cores: a machine learning based approach

hgpu.org » Applications » Computer science » Partitioning streaming parallelism for multi-cores: a machine learning based approach

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Zheng Wang, Michael F. P. O’Boyle

University of Edinburgh, Edinburgh, United Kingdom

In Proceedings of the 19th international conference on Parallel architectures and compilation techniques (2010), PACT ’10, pp. 307-318

DOI:10.1145/1854273.1854313

@conference{wang2010partitioning,

title={Partitioning streaming parallelism for multi-cores: a machine learning based approach},

author={Wang, Z. and O’Boyle, M.F.P.},

booktitle={Proceedings of the 19th international conference on Parallel architectures and compilation techniques},

pages={307–318},

year={2010},

organization={ACM}

}

Source

1829

views

Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned off-line. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90x speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77x performance improvement. By porting our approach to a 8-core platform, we are able to obtain 1.8x improvement over the StreamIt default scheme, demonstrating the portability of our approach.

Tags: Computer science, Optimization, Programming techniques

February 8, 2011 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org