Efficient GPU Implementation of Multi-Precision Integer Division
University of Copenhagen, Faculty of Science
University of Copenhagen, 2025
Efficient arithmetic on multi-precision integers is a cornerstone of many scientific and cryptographic applications that require computations on integers that exceed the native sizes supported by modern processors. While GPU-efficient addition and multiplication has been well explored, division has been subject to less attention due to its greater algorithmic complexity. This thesis attempts to bridge this gap by implementing a GPU-efficient division, that works on integers up to 250.000 bits in size which fit in a single cuda block, exploiting the temporal data reuse of fast scratchpad memory. The algorithm is based on the Newton-inspired method for computing the reciprocal of the divisor presented by Watt in [33], which performs exact division entirely within the integer domain. Our main product is an efficient implementation in CUDA, although not outperforming the popular CGBN library, it demonstrates promising scalability results. Moreover, to our knowledge, we are the first to implement a parallel division capable of operating on inputs larger than 215 bits. Finally, we implement a Futhark version to explore the practical aspects of using a high-level functional language, and conclude that current compiler
July 6, 2025 by hgpu
Your response
You must be logged in to post a comment.