high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Efficient GPU Implementation of Multi-Precision Integer Division

Efficient GPU Implementation of Multi-Precision Integer Division

Aske N. Raahauge, Martin B. Marchioro, Marc I. Løvenskjold

University of Copenhagen, Faculty of Science

University of Copenhagen, 2025

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Efficient GPU Implementation of Multi-Precision Integer Division

796

views

Efficient arithmetic on multi-precision integers is a cornerstone of many scientific and cryptographic applications that require computations on integers that exceed the native sizes supported by modern processors. While GPU-efficient addition and multiplication has been well explored, division has been subject to less attention due to its greater algorithmic complexity. This thesis attempts to bridge this gap by implementing a GPU-efficient division, that works on integers up to 250.000 bits in size which fit in a single cuda block, exploiting the temporal data reuse of fast scratchpad memory. The algorithm is based on the Newton-inspired method for computing the reciprocal of the divisor presented by Watt in [33], which performs exact division entirely within the integer domain. Our main product is an efficient implementation in CUDA, although not outperforming the popular CGBN library, it demonstrates promising scalability results. Moreover, to our knowledge, we are the first to implement a parallel division capable of operating on inputs larger than 215 bits. Finally, we implement a Futhark version to explore the practical aspects of using a high-level functional language, and conclude that current compiler

Tags: Computer science, CUDA, Extended precision, Futhark, nVidia, nVidia A100, Package, Thesis

July 6, 2025 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient GPU Implementation of Multi-Precision Integer Division

Package:

Your response

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)

Efficient GPU Implementation of Multi-Precision Integer Division

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)