Query Optimization in Heterogeneous CPU/GPU Environment for Time Series Databases

Piotr Przymus
University of Warsaw, Faculty of Mathematics, Informatics, and Mechanics
University of Warsaw, 2014



In recent years, processing and exploration of time series has experienced a noticeable interest. Growing volumes of data and needs of efficient processing pushed the research in new directions, including hardware based solutions. Graphics Processing Units (GPU) have significantly more applications than just rendering images. They are also used in general purpose computing to solve problems that can benefit from massive parallel processing. There are numerous reports confirming the effectiveness of GPU in science and industrial applications. However, there are several issues related with GPU usage as a databases coprocessor that must be considered. First, all computations on the GPU are preceded by time consuming memory transfers. In this thesis we present a study on lossless lightweight compression algorithms in the context of GPU computations and time series database systems. We discuss the algorithms, their application and implementation details on GPU. We analyse their influence on the data processing efficiency, taking into account both the data transfer time and decompression time. Moreover, we propose a data adaptive compression planner based on those algorithms, which uses hierarchy of multiple compression algorithms in order to further reduce the data size. Secondly, there are tasks that either hardly suit GPU or fit GPU only partially. This may be related to the size or type of the task. We elaborate on heterogeneous CPU/GPU computation environment and optimization method that seeks equilibrium between these two computation platforms. This method is based on heuristic search for bi-objective optimal execution plans. The underlying model mimics the commodity market, where devices are producers and queries are consumers. The value of resources of computing devices is controlled by supply-and-demand laws. Our model of the optimization criteria allows finding solutions for heterogeneous query processing problems where existing methods have been ineffective. Furthermore, it also offers lower time complexity and higher accuracy than other methods. The dissertation also discusses an exemplary application of time series databases: the analysis of zebra mussel (Dreissena polymorpha) behaviour based on observations of the change of the gap between the valves, collected as a time series. We propose a new algorithm based on wavelets and kernel methods that detects relevant events in the collected data. This algorithm allows us to extract elementary behaviour events from the observations. Moreover, we propose an efficient framework for automatic classification to separate the control and stressful conditions. Since zebra mussels are well-known bioindicators this is an important step towards the creation of an advanced environmental biomonitoring system.
