high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Security » GPU-based Private Information Retrieval for On-Device Machine Learning Inference

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, Edward Suh

Meta AI

arXiv:2301.10904 [cs.CR], (26 Jan 2023)

DOI:10.48550/arXiv.2301.10904

@misc{https://doi.org/10.48550/arxiv.2301.10904,

doi={10.48550/ARXIV.2301.10904},

url={https://arxiv.org/abs/2301.10904},

author={Lam, Maximilian and Johnson, Jeff and Xiong, Wenjie and Maeng, Kiwan and Gupta, Udit and Rhu, Minsoo and Lee, Hsien-Hsin S. and Reddi, Vijay Janapa and Wei, Gu-Yeon and Brooks, David and Suh, Edward},

keywords={Cryptography and Security (cs.CR), Distributed, Parallel, and Cluster Computing (cs.DC), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},

title={GPU-based Private Information Retrieval for On-Device Machine Learning Inference},

publisher={arXiv},

year={2023},

}

Download (PDF)

View

Source

1086

views

On-device machine learning (ML) inference can enable the use of private user data on user devices without remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information during on-device ML inference. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) develop a novel algorithm for accelerating PIR on GPUs, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20x over an optimized CPU PIR implementation, and our co-design techniques obtain over 5x additional throughput improvement at fixed model quality. Together, on various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000 queries per second — a >100x throughput improvement over a naively implemented system — while maintaining model accuracy, and limiting inference communication and response latency to within 300KB and <100ms respectively.

Tags: Computer science, Information Retrieval, Machine learning, nVidia, nVidia V100, Security

January 29, 2023 by hgpu

No votes yet.

Please wait...