Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study

Jianbin Fang, Ana Lucia Varbanescu, Henk Sips
Parallel and Distributed Systems Group, Delft University of Technology, Delft, the Netherlands
ICT.Open, 2011


   author={Jianbin Fang and Ana Lucia Varbanescu and Henk Sips},

   title={Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study},

   booktitle={Proceedings of ICT.Open 2011},



   note={(an extension to the FGC’11 paper)},

   location={Veldhoven, the Netherlands},


   topic={Parallel Programming},



Download Download (PDF)   View View   Source Source   



In this paper, we analyze the trade-offs encountered when minimizing the total execution time using the rake-based applications on GPUs. We use clustering data streams as a case study, and present a rake-based implementation for it, making it more efficient in terms of memory usage. In order to maximize performance for different problem sizes and architectures, we propose a model-based auto-tuning solution. Experimental results show that our fully optimized implementation can perform 2.1x and 1.4x faster than the native OpenCL implementation on NVIDIA GTX480 and AMD HD5870, respectively; it can also achieve 1.4x to 3.3x speedup relative to the original CUDA implementation solution on GTX480.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: