9859

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture

Jiong He, Mian Lu, Bingsheng He
Nanyang Technological University
arXiv:1307.1955 [cs.DC], (8 Jul 2013)
@article{2013arXiv1307.1955H,

   author={He}, J. and {Lu}, M. and {He}, B.},

   title={"{Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1307.1955},

   primaryClass={"cs.DC"},

   keywords={Computer Science – Distributed, Parallel, and Cluster Computing},

   year={2013},

   month={jul},

   adsurl={http://adsabs.harvard.edu/abs/2013arXiv1307.1955H},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

418

views

Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip. That opens up new opportunities for optimizing query co-processing. In this paper, we experimentally revisit hash joins, one of the most important join algorithms for main memory databases, on a coupled CPU-GPU architecture. Particularly, we study the fine-grained co-processing mechanisms on hash joins with and without partitioning. The co-processing outlines an interesting design space. We extend existing cost models to automatically guide decisions on the design space. Our experimental results on a recent AMD APU show that (1) the coupled architecture enables fine-grained co-processing and cache reuses, which are inefficient on discrete CPU-GPU architectures; (2) the cost model can automatically guide the design and tuning knobs in the design space; (3) fine-grained co-processing achieves up to 53%, 35% and 28% performance improvement over CPU-only, GPU-only and conventional CPU-GPU co-processing, respectively. We believe that the insights and implications from this study are initial yet important for further research on query co-processing on coupled CPU-GPU architectures.
VN:F [1.9.22_1171]
Rating: 3.0/5 (2 votes cast)
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture, 3.0 out of 5 based on 2 ratings

* * *

* * *

Like us on Facebook

HGPU group

149 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1236 peoples are following HGPU @twitter

Featured events

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: