high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Data-driven Performance Optimization for Data-intensive Applications

Data-driven Performance Optimization for Data-intensive Applications

Jie Liu

University of California, Merced

University of California, 2024

@phdthesis{liu2024data,

title={Data-driven Performance Optimization for Data-intensive Applications},

author={Liu, Jie},

year={2024},

school={UC Merced}

}

Download (PDF)

View

Source

612

views

Data-intensive applications have attracted considerable attention from researchersin information sciences and enterprises, as these applications have made evolutionary breakthroughs in scientific fields and are extremely valuable to produce productivity in businesses. Recently, as the high speed growth of the new generated data, researchers have begun to leverage the useful knowledge hidden in such huge volume of data to optimize the performance of the data-intensive applications. However, optimize the performance of the data-intensive applications based on the data-driven approaches are still need to be explored. In this thesis, we focus on data-driven performance optimization for dataintensive applications. We first study an application, auto-labeling data on mobile devices. How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data for data-intensive applications on mobile devices is a challenging task, because data is incrementally generated and there is a possibility of having unknown labels among new coming data. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing the autolabeling workload. We introduce Flame, an auto-labeling system that can label dynamically generated data with unknown labels. Flame includes an execution engine that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with six datasets on two mobile devices, we demonstrate that the labeling accuracy of Flame is 11.8%, 16.1%, 18.5%, and 25.2% higher than a state-of-the-art labeling method, transfer learning, semi-supervised learning, and boosting methods respectively. Flame is also energy efficient, it consumes only 328.65mJ and 414.84mJ when labeling 500 data instances on Samsung S9 and Google Pixel2 respectively. Furthermore, running Flame on mobile devices only brings about 0.75 ms additional frame latency which is imperceivable by the users. Second, we explore another data-intensive application, the cardinality estimation in database systems. Cardinality estimation is a fundamental and critical problem in databases. Recently, many estimators based on deep learning have been proposed to solve this problem and they have achieved promising results. However, these estimators struggle to provide accurate results for complex queries, due to not capturing real inter-column and inter-table correlations. Furthermore, none of these estimators contain the uncertainty information about their estimations. In this paper, we present a join cardinality estimator called . learns the correlations across all columns and all tables in the database. It also contains the uncertainty information of each estimation. Among all studied learned estimators, our results are promising: (1) has the smallest model size; (2) It has the fastest inference speed; (3) Compared with the state of the art estimator, has 10× faster inference speed, and provides 1.3× ∼ 6.7× smaller estimation errors for complex queries; (4) To the best of our knowledge, is the first estimator that incorporates uncertainty information for cardinality estimation into a deep learning model. Furthermore, we also study the data loading problem for large-scale distributed training. The resource-hungry and time-consuming process of training Deep Neural Networks (DNNs) can be accelerated by optimizing and/or scaling computations on accelerators such as GPUs. However, the loading and pre-processing of training samples then often emerges as a new bottleneck. This data loading process engages a complex pipeline that extends from the sampling of training data on external storage to delivery of those data to GPUs, and that comprises not only expensive I/O operations but also decoding, shuffling, batching, augmentation, and other operations. We propose in this paper a new holistic approach to data loading that addresses three challenges not sufficiently addressed by other methods: I/O load imbalances among the GPUs on a node; rigid resource allocations to data loading and data preprocessing steps, which lead to idle resources and bottlenecks; and limited efficiency of caching strategies based on pre-fetching due to eviction of training samples needed soon at the expense of those needed later. We first present a study of key bottlenecks observed as training samples flow through the data loading and preprocessing pipeline. Then, we describe Lobster, a data loading runtime that uses performance modeling and advanced heuristics to combine flexible thread management with optimized eviction for distributed caching in order to mitigate I/O overheads and load imbalances. Experiments with a range of models and datasets show that the Lobster approach reduces both I/O overheads and end-to-end training times by up to 1.5× compared with state-of-the-art approaches. Finally, we study the cardinality estimation for string predicates in database systems. Cardinality estimation for string predicates is a notoriously challenging problem in database systems. This paper presents ArbiLIKE, an advanced deep learning-based cardinality estimator for arbitrary LIKE predicates. ArbiLIKE utilizes a cardinality-aware embedding technique to encode LIKE predicates into feature vectors. It further incorporates an innovative sequence model to capture the semantic information of different substrings, enhancing the estimation accuracy. ArbiLIKE is also capable of handling LIKE predicates with any combination of wildcards (“%”, “ ”). Empirical evaluations showcase ArbiLIKE’s promising accuracy, achieving estimation errors that are up to 165.1× smaller than those of eight baselines, including state-of-the-art methods. As a generic estimator, ArbiLIKE realizes error reductions ranging from 1.4 to 93.1× for LIKE predicates with multiple wildcards in comparison to the existing techniques. To the best of our knowledge, ArbiLIKE is the first deep learning-based estimator capable of handling arbitrary LIKE predicates.

Tags: Computer science, Databases, Deep learning, Neural networks, nVidia, nVidia V100, TensorFlow, Thesis

July 28, 2024 by hgpu

No votes yet.

Please wait...

GPU-Smash: Compression abstraction library

A Parallel Compression Pipeline for Improving GPU Virtualization Data Transfers

Owl: Differential-based Side-Channel Leakage Detection for CUDA Applications

pSTL-Bench: Micro-benchmark suite designed to evaluate the scalability and efficiency of parallel C++ STL implementations

Exploring Scalability in C++ Parallel STL Implementations

* * *

high performance computing on graphics processing units: hgpu.org

Data-driven Performance Optimization for Data-intensive Applications

Recent source codes

GPU-Smash: Compression abstraction library

Owl: Differential-based Side-Channel Leakage Detection for CUDA Applications

pSTL-Bench: Micro-benchmark suite designed to evaluate the scalability and efficiency of parallel C++ STL implementations

Noarr Structures

3D-fractal-generators

SyncPerformance

VerCors: a verification toolset for verifying parallel and concurrent software

HiCCL: A hierarchical collective communications library with portable optimizations

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch

In-browser Inference Measurement

Most viewed papers (last 30 days)

Data-driven Performance Optimization for Data-intensive Applications

Share this:

Recent source codes

Most viewed papers (last 30 days)