Parallel Zonal Summations of Large-Scale Species Occurrence Data on Hybrid CPU-GPU Systems
Department of Computer Science, The City College of New York, 138 ST at Convent Avenue, New York, NY, 10031
City College of New York, Technical report, 2013
@article{zhang2013parallel,
title={Parallel Zonal Summations of Large-Scale Species Occurrence Data on Hybrid CPU-GPU Systems},
author={Zhang, Jianting and You, Simin},
year={2013}
}
Analyzing how species are distributed on the Earth has been one of the fundamental questions in biogeography and ecology for a long time. With world-wide data contributions, more than 375 million species occurrence records for nearly 1.5 million species have been deposited to the Global Biodiversity Information Facility (GBIF) data portal. The sheer amounts of point and polygon data and the computation-intensive point-in-polygon tests for zonal summations for biodiversity studies have imposed significant technical challenges. In this study, we have developed a set of data parallel designs of point-in-polygon test based spatial joins for zonal summations by identifying the inherent fine-grained data parallelism in point and polygon indexing, spatial filtering and spatial refinement. We have implemented our designs on both multi-core CPUs and many-core Graphics Processing Units (GPUs). The fine-grained data parallel designs also allow interoperations between the two commodity parallel hardware architectures to achieve desired high efficiency while overcoming their respective limitations. Experiment results have shown that an impressive end-to-end response time under 100 seconds can be achieved for zonal summations on the 375+ million species records over 15+ thousand global eco-regions with 4+ million vertices in a personal computing environment. Our experiments also show that near real time response times can be achieved for subsets of species when species occurrence data can fit in GPU memory which opens the possibilities of query-driven visual explorations of global biodiversity patterns that require zonal summations or similar geospatial operations.
May 19, 2013 by hgpu