high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Security » Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

Hojin Choi, SeongJun Choi, SeogChung Seo

Department of Financial Information Security, Kookmin University, Seoul 02707, Republic of Korea

Electronics, 13(5), 896, 2024

DOI:10.3390/electronics13050896

BibTeX

Download (PDF)

View

Source

Source codes

Package:

KISA version of Lightweight Secure Hash

932

views

Currently, cryptographic hash functions are widely used in various applications, including message authentication codes, cryptographic random generators, digital signatures, key derivation functions, and post-quantum algorithms. Notably, they play a vital role in establishing secure communication between servers and clients. Specifically, servers often need to compute a large number of hash functions simultaneously to provide smooth services to connected clients. In this paper, we present highly optimized parallel implementations of Lightweight Secure Hash (LSH), a hash algorithm developed in Korea, on server sides. To optimize LSH performance, we leverage two parallel architectures: AVX-512 on high-end CPUs and NVIDIA GPUs. In essence, we introduce a word-level parallel processing design suitable for AVX-512 instruction sets and a data parallel processing design appropriate for the NVIDIA CUDA platform. In the former approach, we parallelize the core functions of LSH using AVX-512 registers and instructions. As a result, our first implementation achieves a performance improvement of up to 50.37% compared to the latest LSH AVX-2 implementation. In the latter approach, we optimize the core operation of LSH with CUDA PTX assembly and apply a coalesced memory access pattern. Furthermore, we determine the optimal number of blocks/threads configuration and CUDA streams for RTX 2080Ti and RTX 3090. Consequently, in the RTX 3090 architecture, our optimized CUDA implementation achieves about a 180.62% performance improvement compared with the initially ported LSH implementation to the CUDA platform. As far as we know, this is the first work on optimizing LSH with AVX-512 and NVIDIA GPU. The proposed implementation methodologies can be used alone or together in a server environment to achieve the maximum throughput of LSH computation.

Tags: Computer science, CUDA, Hashing, nVidia, nVidia GeForce RTX 2080 Ti, nVidia GeForce RTX 3090, Package, PTX, Security

March 10, 2024 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)