Efficient and Scalable Parallel Zonal Statistics on Large-Scale Species Occurrence Data on GPUs
Department of Computer Science, City College of New York, New York, NY, 10031
City College of New York, Technical Report, 2014
@article{zhang2014efficient,
title={Efficient and Scalable Parallel Zonal Statistics on Large-Scale Species Occurrence Data on GPUs},
author={Zhang, Jianting and You, Simin},
year={2014}
}
Analyzing how species are distributed on the Earth has been one of the fundamental questions in the intersections of environmental sciences, geosciences and biological sciences. With world-wide data contributions, more than 375 million species occurrence records for nearly 1.5 million species have been deposited to the Global Biodiversity Information Facility (GBIF) data portal. The sheer amounts of point and polygon data and the computation-intensive point-in-polygon tests for zonal statistics for biodiversity studies have imposed significant technical challenges. In this study, we have significantly extended our previous work on parallel primitives based spatial joins on commodity Graphics Processing Units (GPUs) and have developed new efficient and scalable techniques to enable parallel zonal statistics on the GBIF data completely on GPUs with limited memory capacity. Experiment results have shown that an impressive end-to-end response time under 100 seconds can be achieved for zonal statistics on the 375+ million species records over 15+ thousand global eco-regions with 4+ million vertices on a single Nvidia Quadro 6000 GPU device. The achieved high performance, which is several orders of magnitude faster than reference serial implementations using traditional open source geospatial techniques, not only demonstrates the potential of GPU computing for large scale geospatial processing, but also makes interactive query driven visual exploration of global biodiversity data possible.
March 6, 2014 by hgpu