Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors

hgpu.org » Applications » Computer science » Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors

Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors

Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, Pradeep Dubey

Intel Corporation

ACM Transactions on Database Systems, Volume 36, Issue 4, 2011

@article{kim2011designing,

title={Designing Fast Architecture Sensitive Tree Search on Modern Multi-Core/Many-Core Processors},

author={Kim, Changkyu and Chhugani, Jatin and Satish, Nadathur and Sedlar, Eric and Nguyen, Anthony D. and Kaldewey, Tim and Lee, Victor W. and Brandt, Scott A. and Dubey, Pradeep},

year={2011}

}

Download (PDF)

View

Source

1875

views

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal. In this paper, we present FAST, an extremely fast architecture sensitive layout of the index tree. FAST is a binary tree logically organized to optimize for architecture features like page size, cache line size, and Single Instruction Multiple Data (SIMD) width of the underlying hardware. FAST eliminates the impact of memory latency, and exploits thread-level and data-level parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second for large trees of 64M elements, with even better results on smaller trees. These are 5X (CPU) and 1.7X (GPU) faster than the best previously reported performance on the same architectures. We also evaluated FAST on the Intel R Many Integrated Core architecture (Intel(R) MIC), howing a speedup of 2.4X – 3X over CPU and 1.8X – 4.4X over GPU. FAST supports efficient bulk updates by rebuilding index trees in less than 0.1 seconds for datasets as large as 64M keys and naturally integrates compression techniques, overcoming the memory bandwidth bottleneck and achieving a 6X performance improvement over uncompressed index search for large keys on CPUs.

Tags: Compression, Computer science, CUDA, Databases, nVidia, nVidia GeForce GTX 280, Search, Tesla C2050

December 25, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org