5140

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

Ping Li, Anshumali Shrivastava, Christian Konig
Dept. of Statistical Science, Cornell University, Ithaca, NY 14853
arXiv:1108.3072v1 [cs.LG] (15 Aug 2011)

@article{2011arXiv1108.3072L,

   author={Li}, P. and {Shrivastava}, A. and {Konig}, C.},

   title={"{Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1108.3072},

   primaryClass={"cs.LG"},

   keywords={Computer Science – Learning, Statistics – Methodology, Statistics – Machine Learning},

   year={2011},

   month={aug},

   adsurl={http://adsabs.harvard.edu/abs/2011arXiv1108.3072L},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

931

views

We generated a dataset of 200 GB with 10^9 features, to test our recent b-bit minwise hashing algorithms for training very large-scale logistic regression and SVM. The results confirm our prior work that, compared with the VW hashing algorithm (which has the same variance as random projections), b-bit minwise hashing is substantially more accurate at the same storage. For example, with merely 30 hashed values per data point, b-bit minwise hashing can achieve similar accuracies as VW with 2^14 hashed values per data point.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: