Data Mining and Machine Learning in Astronomy
Herzberg Institute of Astrophysics, National Research Council, 5017 West Saanich Road, Victoria, BC V9E 2E7, Canada
International Journal of Modern Physics D, Volume 19, Issue 07, pp. 1049-1106 (2010)
@article{2010IJMPD..19.1049B,
author={Ball}, N.~M. and {Brunner}, R.~J.},
title={“{Data Mining and Machine Learning in Astronomy}”},
journal={International Journal of Modern Physics D},
archivePrefix={“arXiv”},
eprint={0906.2173},
primaryClass={“astro-ph.IM”},
keywords={Data mining, machine learning, knowledge discovery in databases, astroinformatics, astrostatistics, Virtual Observatory},
year={2010},
volume={19},
pages={1049-1106},
doi={10.1142/S0218271810017160},
adsurl={http://adsabs.harvard.edu/abs/2010IJMPD..19.1049B},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
We review the current state of data mining and machine learning in astronomy. ‘Data Mining’ can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
February 27, 2011 by hgpu