Massive parallelization of serial inference algorithms for a complex generalized linear model
Department of Biomathematics, University of California, Los Angeles, CA, USA
arXiv:1208.0945v1 [stat.CO] (4 Aug 2012)
@article{2012arXiv1208.0945S,
author={Suchard}, M.~A. and {Simpson}, S.~E. and {Zorych}, I. and {Ryan}, P. and {Madigan}, D.},
title={"{Massive parallelization of serial inference algorithms for a complex generalized linear model}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1208.0945},
primaryClass={"stat.CO"},
keywords={Statistics – Computation, Mathematics – Optimization and Control},
year={2012},
month={aug},
adsurl={http://adsabs.harvard.edu/abs/2012arXiv1208.0945S},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety.
August 9, 2012 by hgpu