High Performance Data Mining Using R on Heterogeneous Platforms
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011
@inproceedings{kumar2011high,
title={High Performance Data Mining Using R on Heterogeneous Platforms},
author={Kumar, P. and Ozisikyilmaz, B. and Liao, W.K. and Memik, G. and Choudhary, A.},
booktitle={Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on},
pages={1720–1729},
year={2011},
organization={IEEE}
}
The exponential increase in the generation and collection of data has led us in a new era of data analysis and information extraction. Conventional systems based on general-purpose processors are unable to keep pace with the heavy computational requirements of data mining techniques. High performance co-processors like GPUs and FPGAs have the potential to handle large computational workloads. In this paper, we present a scalable framework aimed at providing a platform for developing and using high performance data mining applications on heterogeneous platforms. The framework incorporates a software infrastructure and a library of high performance kernels. Furthermore, it includes a variety of optimizations which increase the throughput of applications. The framework spans multiple technologies including R, GPUs, multi-core CPUs, MPI, and parallelnet CDF harnessing their capabilities for high-performance computations. This paper also introduces the concept of interleaving GPU kernels from multiple applications providing significant performance gain. Thus, in comparison to other tools available for data mining, our framework provides an easy-to-use and scalable environment both for application development and execution. The framework is available as a software package which can be easily integrated in the R programming environment.
November 15, 2011 by hgpu