Solving Discrete Logarithms in Smooth-Order Groups with CUDA
Cheriton School of Computer Science, University of Waterloo, Waterloo ON Canada N2L 3G1
Cheriton School of Computer Science, University of Waterloo, Technical report CACR 2012-02, 2012
@article{henry2012solving,
title={Solving Discrete Logarithms in Smooth-Order Groups with CUDA},
author={Henry, R. and Goldberg, I.},
year={2012}
}
This paper chronicles our experiences using CUDA to implement a parallelized variant of Pollard’s rho algorithm to solve discrete logarithms in groups with cryptographically large moduli but smooth order using commodity GPUs. We first discuss some key design constraints imposed by modern GPU architectures and the CUDA framework, and then explain how we were able to implement efficient arbitrary-precision modular multiplication within these constraints. Our implementation can execute roughly 50.6 million 768-bit modular multiplications per second – or a whopping 840 million 192-bit modular multiplications per second – on a single Nvidia Tesla M2050 GPU card, which is a notable improvement over all previous results on comparable hardware. We leverage this fast modular multiplication in our implementation of the parallel rho algorithm, which can solve discrete logarithms modulo a 1536-bit RSA number with a 2^55-smooth totient in less than two minutes. We conclude the paper by discussing implications to discrete logarithm-based cryptosystems, and by pointing out how efficient implementations of parallel rho (or related algorithms) lead to trapdoor discrete logarithm groups; we also point out two potential cryptographic applications for the latter. Our code is written in C for CUDA and PTX; it is open source and freely available for download online.
January 29, 2012 by hgpu