cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs

Tao Lu, Chengkun Wei, Ruijing Yu, Yi Chen, Li Wang, Chaochao Chen, Zeke Wang, and Wenzhi Chen
Zhejiang University
Cryptology ePrint Archive 2022/1321, (5 Oct 2022)


   title={cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs},

   author={Lu, Tao and Wei, Chengkun and Yu, Ruijing and Chen, Yi and Wang, Li and Chen, Chaochao and Wang, Zeke and Chen, Wenzhi},

   journal={Cryptology ePrint Archive},



Download Download (PDF)   View View   Source Source   



Zero-knowledge proof (ZKP) is a critical cryptographic protocol, and it has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, ZKP has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM) on elliptic curves. Currently, several GPU-accelerated implementations of ZKP have been developed to improve its performance. However, these existing GPU designs do not fully unleash the potential of GPUs. Therefore, this paper presents cuZK, an efficient GPU implementation of ZKP with the following three optimizations to achieve higher performance. First, we propose a new parallel MSM algorithm and deploy it in cuZK. This MSM algorithm is well adapted to the high parallelism provided by GPUs, and it achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation, which is lightly disregarded by other existing GPU designs. Indeed, along with our self-designed MSM parallel scheme and well-studied NTT parallel scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer (DT) by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation with the multi-streaming technique. We design a series of evaluation schemes for cuZK. The evaluation results show that our MSM module provides over 2.08x (up to 2.63x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.47x) speedup on standard benchmarks and 2.18x speedup on a GPU-accelerated cryptocurrency application, Filecoin.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: