The Anatomy of High-Performance 2D Similarity Calculations
Department of Computer Science, Stanford University, Stanford, California 94305, United States
Journal of Chemical Information and Modeling, 2011, 51 (9), pp 2345-2351
@article{haque2011anatomy,
title={The Anatomy of High-Performance 2D Similarity Calculations},
author={Haque, I.S. and Pande, V.S. and Walters, W.P.},
journal={Journal of Chemical Information and Modeling},
year={2011},
publisher={ACS Publications}
}
Similarity measures based on the comparison of dense bit vectors of two-dimensional chemical features are a dominant method in chemical informatics. For large-scale problems, including compound selection and machine learning, computing the intersection between two dense bit vectors is the overwhelming bottleneck. We describe efficient implementations of this primitive as well as example applications using features of modern CPUs that allow 20-40x performance increases relative to typical code. Specifically, we describe fast methods for population count on modern x86 processors and cache-efficient matrix traversal and leader clustering algorithms that alleviate memory bandwidth bottlenecks in similarity matrix construction and clustering. The speed of our 2D comparison primitives is within a small factor of that obtained on GPUs and does not require specialized hardware.
October 16, 2011 by hgpu