Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs
Department of Electronic and Information Engineering, The Hong Kong Polytechnic University
arXiv:1204.0334v1 [cs.IT] (2 Apr 2012)
@article{2012arXiv1204.0334Z,
author={Zhao}, Y. and {Lau}, F.~C.~M.},
title={"{Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1204.0334},
primaryClass={"cs.IT"},
keywords={Computer Science – Information Theory, Computer Science – Distributed, Parallel, and Cluster Computing},
year={2012},
month={apr},
adsurl={http://adsabs.harvard.edu/abs/2012arXiv1204.0334Z},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
With the use of belief propagation (BP) decoding algorithm, low-density parity-check (LDPC) codes can achieve near-Shannon limit performance. LDPC codes can accomplish bit error rates (BERs) as low as $10^{-15}$ even at a small bit-energy-to-noise-power-spectral-density ratio ($E_{b}/N_{0}$). In order to evaluate the error performance of LDPC codes, simulators running on central processing units (CPUs) are commonly used. However, the time taken to evaluate LDPC codes with very good error performance is excessive. For example, assuming 30 iterations are used in the decoder, our simulation results have shown that it takes a modern CPU more than 7 days to arrive at a BER of 10^{-6} for a code with length 18360. In this paper, efficient LDPC block-code decoders/simulators which run on graphics processing units (GPUs) are proposed. Both standard BP decoding algorithm and layered decoding algorithm are used. We also implement the decoder for the LDPC convolutional codes (LDPCCC). The LDPCCC is derived from a pre-designed quasi-cyclic LDPC block code with good error performance. Compared to the decoder based on the randomly constructed LDPCCC code, the complexity of the proposed LDPCCC decoder is reduced due to the periodicity of the derived LDPCCC and the properties of the quasi-cyclic structure. By optimizing the data structures of the messages used in the decoding process, both the read and write processes can be performed in a highly parallel manner by the GPUs. In addition, a thread hierarchy avoiding the divergence of the threads is deployed, and it can maximize the efficiency of the parallel execution. With the use of a large number of cores in the GPU to perform the simple computations simultaneously, our GPU-based LDPC decoder can obtain hundreds of times speed-up compared with a CPU-based simulator.
April 4, 2012 by hgpu