Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

hgpu.org » Applications » Computer science » Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

Hyunwoo Ji, Junho Cho, Wonyong Sung

Sch. of Electr. Eng., Seoul Nat. Univ., Seoul, South Korea

IEEE Workshop on Signal Processing Systems, 2009. SiPS 2009

DOI:10.1109/SIPS.2009.5336268

BibTeX

Source

1645

views

Simulation of low-density parity-check (LDPC) codes frequently takes several days, thus the use of general purpose graphics processing units (GPGPUs) is very promising. However, GPGPUs are designed for compute-intensive applications, and they are not optimized for data caching or control management. In LDPC decoding, the parity check matrix H needs to be accessed at every node updating process, and the size of H matrix is often larger than that of GPU on-chip memory especially when the code-length is long or the weight is high. In this work, the parity check matrix of cyclic or quasi-cyclic LDPC codes is greatly compressed by exploiting the periodic property of the matrix. In our experiments, the Compute Unified Device Architecture (CUDA) of Nvidia is used. With the (1057, 813) and (4161, 3431) projective geometry (PG)-LDPC codes, the execution speed of the proposed method is more than twice of the reference implementations that do not exploit the cyclic property of the parity check matrices.

Tags: Computer science, CUDA, Error recovery, nVidia

April 14, 2011 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

Share this:

Recent source codes

Most viewed papers (last 30 days)