high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » MemcachedGPU: Scaling-up Scale-out Key-value Stores

MemcachedGPU: Scaling-up Scale-out Key-value Stores

Tayler H. Hetherington, Mike O’Connor, Tor M. Aamodt

The University of British Columbia

ACM Symposium on Cloud Computing (SoCC 2015), 2015

BibTeX

Download (PDF)

View

Source

Source codes

Package:

MemcachedGPU: Scaling-up Scale-out Key-value Stores

2019

views

This paper tackles the challenges of obtaining more efficient data center computing while maintaining low latency, low cost, programmability, and the potential for workload consolidation. We introduce GNoM, a software framework enabling energy-efficient, latency bandwidth optimized UDP network and application processing on GPUs. GNoM handles the data movement and task management to facilitate the development of high-throughput UDP network services on GPUs. We use GNoM to develop MemcachedGPU, an accelerated key-value store, and evaluate the full system on contemporary hardware. MemcachedGPU achieves ~10 GbE line-rate processing of ~13 million requests per second (MRPS) while delivering an efficiency of 62 thousand RPS per Watt (KRPS/W) on a high-performance GPU and 84.8 KRPS/W on a lowpower GPU. This closely matches the throughput of an optimized FPGA implementation while providing up to 79% of the energy-efficiency on the low-power GPU. Additionally, the low-power GPU can potentially improve cost-efficiency (KRPS/$) up to 17% over a state-of-the-art CPU implementation. At 8 MRPS, MemcachedGPU achieves a 95-percentile RTT latency under 300µs on both GPUs. An offline limit study on the low-power GPU suggests that MemcachedGPU may continue scaling throughput and energyefficiency up to 28.5 MRPS and 127 KRPS/W respectively.

Tags: Computer science, CUDA, Databases, Energy-efficient computing, nVidia, nVidia GeForce GTX 750 Ti, Package, Tesla K20, Tesla K40

August 27, 2015 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

high performance computing on graphics processing units: hgpu.org

MemcachedGPU: Scaling-up Scale-out Key-value Stores

Package:

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

MemcachedGPU: Scaling-up Scale-out Key-value Stores

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)