high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs

Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs

Fabio Jose Goncalves Correia

Universidade do Minho

Universidade do Minho, 2014

@article{correia2014assessing,

title={Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs},

author={Correia, F{‘a}bio Jos{‘e} Gon{c{c}}alves},

year={2014}

}

Download (PDF)

View

Source

2413

views

Lattice-based cryptography has been a hot topic in the past decade, since it is believed that lattice-based cryptosystems are immune against attacks operated by quantum computers. The security of this type of cryptography is based on the hardness of algorithms that solve lattice-based problems, namely the Shortest Vector Problem (SVP). Therefore, it is important to assess the performance of such algorithms on High Performance Computing (HPC) systems. This dissertation compares a wide range of algorithms that solve the SVP, the SVP-solvers, namely, the Voronoi cell-based algorithm and two enumeration-based solvers, SE++ and ENUM. We show that different techniques and optimizations used to significantly improve the performance of ENUM can also be applied to other enumeration algorithms, namely the extreme pruning technique and the optimization that avoids symmetric branches of the enumeration tree. We present the first practical results of the Voronoi cell-based algorithm and compare its performance to the mentioned enumeration algorithms. The Voronoi cell-based algorithm performed considerably worse, although it displays potential for parallelization. The optimization that avoids the computation of symmetric branches improved the performance of SE++ by almost 50%, thus outperforming ENUM by a factor of 3%, on the average. Parallel versions of the enumeration algorithms were implemented on a shared memory system based on a dual (8+8)-core device, which, in some instances, scale super-linearly for up to 8 threads and linearly for 16 threads. The parallel versions of the enumeration with extreme pruning algorithms were parallelized with MPI and achieved speedups of up to 12.96x with 16 processes with ENUM on a lattice in dimension 74. We show that an efficient parallel implementation of ENUM can be integrated into BKZ, a lattice basis reduction algorithm, to parallelize it efficiently, since for high block-sizes almost all of the execution time of BKZ is spent on ENUM. This implementation of BKZ achieves speedups of up to 13.72x for a lattice in dimension 60, reduced with block-size 50. We also compared the quality of the output bases to other BKZ implementations, namely AC_BKZ, an implementation developed in colaboration with Thomas Arnreich, and G_BKZ_FP, an implementation publicly available in the NTL library. Our implementation showed to compute the bases with better quality, in the general case. Finally, a novel parallel approach for the enumeration with extreme pruning is proposed. This approach promises to significantly improve the performance of the enumeration with extreme pruning, since much higher number of cores can be used to achieve higher speedups.

Tags: Algorithms, Computer science, CUDA, MPI, nVidia, Quantum computing, Security, Thesis, Voronoi diagram

September 8, 2015 by hgpu

Rating: 1.8/5. From 3 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Assessing the hardness of SVP algorithms in the presence of CPUs and GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)