Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement

Alex Shamis, Peter Pietzuch, Antoine Delignat-Lavaud, Andrew Paverd, Manuel Costa
Microsoft Research
arXiv:2205.15757 [cs.DC], (31 May 2022)




   author={Shamis, Alex and Pietzuch, Peter and Delignat-Lavaud, Antoine and Paverd, Andrew and Costa, Manuel},

   keywords={Distributed, Parallel, and Cluster Computing (cs.DC), Cryptography and Security (cs.CR), FOS: Computer and information sciences, FOS: Computer and information sciences},

   title={Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement},



   copyright={arXiv.org perpetual, non-exclusive license}


Download Download (PDF)   View View   Source Source   



Marketplaces for machine learning (ML) models are emerging as a way for organizations to monetize models. They allow model owners to retain control over hosted models by using cloud resources to execute ML inference requests for a fee, preserving model confidentiality. Clients that rely on hosted models require trustworthy inference results, even when models are managed by third parties. While the resilience and robustness of inference results can be improved by combining multiple independent models, such support is unavailable in today’s marketplaces. We describe Dropbear, the first ML model marketplace that provides clients with strong integrity guarantees by combining results from multiple models in a trustworthy fashion. Dropbear replicates inference computation across a model group, which consists of multiple cloud-based GPU nodes belonging to different model owners. Clients receive inference certificates that prove agreement using a Byzantine consensus protocol, even under model heterogeneity and concurrent model updates. To improve performance, Dropbear batches inference and consensus operations separately: it first performs the inference computation across a model group, before ordering requests and model updates. Despite its strong integrity guarantees, Dropbear’s performance matches that of state-of-the-art ML inference systems: deployed across 3 cloud sites, it handles 800 requests/s with ImageNet models.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: