Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units
Academy of Economic Studies, Bucharest, Romania
Informatica Economica Vol. 16 No. 3, 2012
@article{lungu2012optimization,
title={Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units},
author={LUNGU, I. and PETROSANU, D.M. and PIRJAN, A.},
year={2012}
}
In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs) that implement the Compute Unified Device Architecture (CUDA), a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions’ efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104) and a central processing unit; the data type influence; the binary operator’s influence.
October 14, 2012 by hgpu