hgpu.org » Programming » Algorithms » Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)
Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)
Ping Li, Anshumali Shrivastava, Christian Konig
Dept. of Statistical Science, Cornell University, Ithaca, NY 14853
We generated a dataset of 200 GB with 10^9 features, to test our recent b-bit minwise hashing algorithms for training very large-scale logistic regression and SVM. The results confirm our prior work that, compared with the VW hashing algorithm (which has the same variance as random projections), b-bit minwise hashing is substantially more accurate at the same storage. For example, with merely 30 hashed values per data point, b-bit minwise hashing can achieve similar accuracies as VW with 2^14 hashed values per data point.