high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Advanced Concurrency Control Algorithm Design and GPU System Support for High Performance In-Memory Data Management

Advanced Concurrency Control Algorithm Design and GPU System Support for High Performance In-Memory Data Management

Yuan Yuan

The Ohio State University

The Ohio State University, 2016

@phdthesis{yuan2016advanced,

title={Advanced Concurrency Control Algorithm Design and GPU System Support for High Performance In-Memory Data Management},

author={Yuan, Yuan},

year={2016},

school={The Ohio State University}

}

Download (PDF)

View

Source

1604

views

The design and implementation of data management systems have been significantly affected by application demands and hardware advancements. On one hand, with the emerging of various new applications, the traditional one-size-fits-all data management system has evolved into domain specific systems optimized for each application (e.g., OLTP, OLAP, streaming, etc.). On the other hand, with increasing memory capacity, and advancements of multi-core CPUs and massive parallel co-processors (e.g., GPUs), the performance bottleneck of data management systems have shifted from I/O to memory accesses, which has led a constructive re-design of data management systems for memory resident data. Although many in-memory systems have been developed to deliver much better performance than that of disk-based systems, they all face the challenge of how to maximize the system’s performance by massive parallelism. In this Ph.D. dissertation, we explore how to design high performance in-memory data management systems for massive parallel processors. We have identified three critical issues of in-memory data processing. First, Optimistic Concurrency Control (OCC) method has been commonly used for in-memory databases to ensure transaction serializability. Although OCC can achieve high performance at low contention, it causes large number of unnecessary transaction aborts at high contention, which wastes system resources and significantly degrades database throughput. To solve the problem, we propose a new concurrency control method named Balanced Concurrency Control (BCC) that can more accurately abort transactions while maintaining OCC’s merits at low contention. Second, we study how to use the massive parallel co-processor GPUs to improve the performance of in-memory analytical systems. Existing works have demonstrated GPU’s performance advantage over CPU on simple analytical operations (e.g., join), but it is unclear how to optimize complex queries with various optimizations. To address the issue, we comprehensively examine analytical query behaviors on GPUs and design a new GPU in-memory analytical system to efficiently execute complex analytical workloads. Third, we investigate how to use GPUs to accelerate the performance of various analytical applications on production-level distributed in-memory data processing systems. Most of existing GPU works adopt a GPU-centric design, which completely redesigns a system for GPUs without considering the performance of CPU operations. It is unclear how much a CPU-optimized, distributed in-memory data processing system can benefit from GPUs. To answer the question, we use Apache Spark as a platform and design Spark-GPU that has addressed a set of real-world challenges incurred by the mismatches between Spark and GPU. Our research includes both algorithm design and system design and implementation in the form of open source software.

Tags: Algorithms, ATI, ATI Radeon HD 7970, Computer science, CUDA, Databases, nVidia, nVidia GeForce GTX 480, nVidia GeForce GTX 580, nVidia GeForce GTX 680, OpenCL, Thesis

February 7, 2017 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Advanced Concurrency Control Algorithm Design and GPU System Support for High Performance In-Memory Data Management

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Advanced Concurrency Control Algorithm Design and GPU System Support for High Performance In-Memory Data Management

Share this:

Recent source codes

Most viewed papers (last 30 days)