Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study
Parallel and Distributed Systems Group, Delft University of Technology, Delft, the Netherlands
ICT.Open, 2011
@inproceedings{fang2012maximize,
author={Jianbin Fang and Ana Lucia Varbanescu and Henk Sips},
title={Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study},
booktitle={Proceedings of ICT.Open 2011},
year={2011},
month={November},
note={(an extension to the FGC’11 paper)},
location={Veldhoven, the Netherlands},
url={http://www.pds.ewi.tudelft.nl/fileadmin/pds/homepages/fang/papers/asci2k11_fang.pdf},
topic={Parallel Programming},
group={PDS}
}
In this paper, we analyze the trade-offs encountered when minimizing the total execution time using the rake-based applications on GPUs. We use clustering data streams as a case study, and present a rake-based implementation for it, making it more efficient in terms of memory usage. In order to maximize performance for different problem sizes and architectures, we propose a model-based auto-tuning solution. Experimental results show that our fully optimized implementation can perform 2.1x and 1.4x faster than the native OpenCL implementation on NVIDIA GTX480 and AMD HD5870, respectively; it can also achieve 1.4x to 3.3x speedup relative to the original CUDA implementation solution on GTX480.
April 18, 2012 by hgpu