GPU-based JSON data processing using structural indexes
Department of Mathematics and Computer Science, Database Group, Eindhoven University of Technology
Eindhoven University of Technology, 2021
@article{vlaswinkel2021gpu,
title={GPU-based JSON data processing using structural indexes},
author={Vlaswinkel, KR Koen},
year={2021}
}
In recent years, large amounts of data are being increasingly generated and stored every day. Big data is often processed by different software systems, which require a common data interchange format. JavaScript Object Notation, or JSON, is one of the most popular data exchange formats and is widely used in web and data-intensive applications. Unfortunately, parsing and processing JSON data is often a bottleneck in data processing pipelines due to the lacking performance of JSON. To improve JSON data processing performance, recent work has proposed the usage of auxiliary data structures such as structural indexes and speculative data access. All such existing techniques use CPU-based sequential processing, although some recent systems have proposed using Single Instruction Multiple Data (SIMD) instructions to accelerate and partly parallelize JSON data access. While SIMD-based parallelization could bring significant speedups over sequential approaches, modern hardware architectures often provide even more parallel data processing opportunities thanks to massively parallel Graphical Processing Units (GPUs). In recent years, GPUs have been employed in multiple data processing domains and have demonstrated that they can speed up various data processing tasks. However, none have explored using GPUs to massively parallelize JSON data processing. This thesis explores the usage of GPUs for parsing and querying large JSON documents from high-level dynamic languages. To our knowledge, we present the first GPU-based JSON query evaluation engine, which is able to speed up query execution compared to state-of-the-art JSON parsers and query engines. We show how existing sequential techniques can be adapted to run on the GPU and how novel techniques can be used to evaluate queries in parallel on a GPU. Our implementation is compared to existing state-of-the-art JSON parsers and query engines. We also compare our implementation to JSON query engines available on Node.js since our implementation is usable from high-level scripting languages due to the use of GraalVM for exposing the GPU to languages such as JavaScript and Python.
May 15, 2022 by hgpu