In this paper, we present a high throughput and low latency LDPC (low-density parity-check) decoder implementation on GPUs (graphics processing units). The existing GPU-based LDPC decoder implementations suffer from low throughput and long latency, which prevent them from being used in practical SDR (software-defined radio) systems. To overcome this problem, we present optimization techniques for a parallel LDPC decoder including algorithm optimization, fully coalesced memory access, asynchronous data transfer and multi-stream concurrent kernel execution for modern GPU architectures. Experimental results demonstrate that the proposed LDPC decoder achieves 316Mbps (at 10 iterations) peak throughput on a single GPU. The decoding latency, which is much lower than that of the state of the art, varies from 0.207ms to 1.266ms for different throughput requirements from 62.5Mbps to 304.16Mbps. When using four GPUs concurrently, we achieve an aggregate peak throughput of 1.25Gbps (at 10 iterations).
(Guohui Wang, Michael Wu, Bei Yin, and Joseph R. Cavallaro: “High Throughput Low Latency LDPC Decoding on GPU for SDR Systems”, 1st IEEE Global Conference on Signal and Information Processing (GlobalSIP), Dec. 2013. [PDF])