Empirical analysis of a parallel data mining algorithm on a graphic processor
Turku University of Applied Sciences
Turku University of Applied Sciences, 2012
@article{berdin2012empirical,
title={Empirical analysis of a parallel data mining algorithm on a graphic processor},
author={Berdin, D. and others},
year={2012},
publisher={Turun ammattikorkeakoulu}
}
In this thesis, we analyze in an empirical way a different approach of the algorithm SPAM (Sequential PAttern Mining using A Bitmap Representation) made by J. Ayres, J. Gehrke, T. Yiu and J. Flannick from Cornell University, exploiting GPUs. SPAM is a novel approach for FSM (Frequent Sequence Mining) where the algorithm is not looking for a single item during the mining process but for a set of items that customers have taken in their transactions. Basically we can think of Spam as a supermarket where the customers have the "Fidelity card" and that the supermarket can track the transactions. What Spam does is to calculate which are the most bought items per customer. But the question is: Why graphic cards? The answer is pretty easy: graphic cards can benefit of a rich amount of data parallelism allowing many arithmetic operations to be safely performed on program data structures in a simultaneous manner. In our tests we noticed that performing a massive amount of multiplications (or any other kind of operation) with the CPU took at least the 60% more rather than the GPU. Just with this simple case, we could understand the potential of GPUs.
October 20, 2012 by hgpu