ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | FAIL | download_stamp > now() - 6 MONTH | 7.5 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://medium.com/@prasannasethuraman/some-bits-on-error-correcting-codes-6b699cab854d |
| Last Crawled | 2025-08-24 10:15:01 (7 months ago) |
| First Indexed | not set |
| HTTP Status Code | 200 |
| Meta Title | Some Bits on Error Correcting Codes | by Prasanna Sethuraman | Medium |
| Meta Description | Some observations on error correcting codes with performance comparison of serially concatenated codes to the widely used LDPC and Polar codes |
| Meta Canonical | null |
| Boilerpipe Text | Pardon the pun in the title, dear reader. In this time of virus and war, a man must have his fun when he can. And where does a communication systems engineer get his fun? When he has a “wow” moment when working on the building blocks making digital communication possible. That someone who has spent two decades practicing design and development of wireless communication systems can still be surprised by new insights does say quite a bit about the depth of communications theory fundamentals and the human spirit for continuous learning. What I am about to narrate is some observations on error correcting codes that form the backbone for reliable communication. We add more bits to the random data in such a way that the extra bits introduce a structure that can be exploited by a decoder to recover any information bits in error. Ever since the first error correction code invented by Hamming in the 1940s, a lot of brilliant people have looked into various ways of adding structures that makes efficient decoding possible. Linear block codes with ‘K’ information bits and ‘N-K’ parity bits result in ‘N’ bit codewords, and can be written as multiplying the K×1 message vector with a N×K generator matrix. Each column of this generator matrix is then a codeword (since message vector can be one hot), but so are linear combinations of these columns (since the message can be any random K×1 vector). There are 2ᴷ possible messages, and to have unique mapping, there needs to be 2ᴷ codewords. For this to happen, all K columns of the generator matrix have to linearly independent. Given such a matrix, we can perform Gaussian elimination to get a generator matrix of the form [I, P]ᵀ where I is a K×K identity matrix and P is a (N-K)×K parity check matrix and this form can be used for systematic encoding. For maximum likelihood decoding, we want these codewords to be as far apart as possible in the discrete N-dimensional vector space they live in. The probability of mapping the received bits to a wrong codeword then depends on the distance between the correct codeword and the wrong codeword. We can then upper bound this probability of error by looking at the minimum distance, dmin , between all pairs of codewords (which, for linear block codes , is equivalent to finding the codeword with minimum weight, since that is distance from the all-zero codeword). What we also want to do is pick the smallest possible N for a given K and still have dmin distance between codewords. This means picking codewords in discrete N dimensional space each having a discrete sphere of radius dmin within which no other codewords exist. If we can find how many such discrete spheres can be packed in the volume of an discrete N-dimensional space, we know what is the largest K for a given N. Sphere packing is not an easy problem, but we can compute upper bounds (Singleton, Hamming, Plotkin, Elias), and lower bounds (Gilbert-Varshamov). While these bounds say what K and N are possible for good codes, they say nothing about how to find these codewords! In his seminal work, “A Mathematical Theory of Communication”, Shannon showed that every communication channel has a capacity and this capacity can always be achieved by long random error correction codes. This means we can pick long random codes and expect most of them to be “good”, meaning large enough dmin for a given N and K. How do we decode such a random code though! To have a practical decoder, codewords need to have a structure. For example, we can have cyclic codes where every codeword is a cyclic shift of another codeword. If we represent the codeword as a polynomial (for example, a codeword that is 1101 is 1 + x + x³), then all cyclic shifts are equivalent to multiplying this codeword by powers of x and taking modulo over xᴺ+1. The encoding can then be written as m(x)g(x) where m(x) is a polynomial of degree at most K and g(x) is a polynomial of degree N-K. When there are polynomials, the first question to ask about them is, what can we say about the roots of this polynomial. A polynomial p(x) with coefficients from a field (a set that is closed under addition, subtraction, multiplication and division) might or might not have one of the field elements as a root. If there are no field elements that satisfy p(x) = 0, then p(x) is irreducible over that field. For example, x²+1=0 has no root in real number field, but has its root in the field of complex numbers. If a polynomial has no roots in the base field, where can we find its roots? All polynomials have roots in the complex number field, which is algebraically closed, but we are interested in fields that have finite number of elements since we want to encode a finite block of bits. The answer came from the genius of Galois. As an example, see that 1 + x + x³ does not have a root in the field {0,1} with modulo 2 addition (neither 0 nor 1 solves 1 + x + x³=0). But its root, α (which means 1+α+α³ = 0, and therefore α³=α+1), generates an extension field that consists of 2³ elements {0,α⁰,α¹,α²,α³,α⁴,α⁵,α⁶}. What about α⁷? We have α⁷= α³×α³×α = (α+1)(α+1)α = (α²+1)α = α³+α = α+1+α = 1 (remember 1+1=0 in modulo 2 addition). We can do similar exercise and see that all other powers of α from 1 to 6 are distinct, and can be uniquely mapped to corresponding 3 element vectors: (0🠆[0,0,0], 1🠆[1,0,0], α🠆[0,1,0], α²🠆[0,0,1], α³🠆[1,1,0], α⁴🠆[0,1,1],α⁵🠆[1,1,1], α⁶🠆[1,0,1]) and the mapping is decided by the polynomial 1 + x + x³ we picked. This is great, but what is α, you ask? It is an abstract quantity that generates an extension field given a polynomial that is irreducible in the base field. Therein lies the genius of Galois, who invented this whole new branch of mathematics at just 19 years of age. In his honor, the field with {0,1} modulo 2 is written as GF(2) and the extension field as GF(2ⁿ). Going back to our cyclic codes, we have c(x) = m(x)g(x) where g(x) is a polynomial of degree N-K. If we pick g(x) such that d of its roots, α[m] (subscripts are a pain to do in Medium, hence the square brackets), are in GF(2ᴺ), then those roots are also roots of c(x), which means c(α[m])=0. If α[m] are chosen to be contiguous powers of α, that is α[m]=αᵐ, then we can show that c(x) must be have at least d non-zero coefficients (see https://en.wikipedia.org/wiki/BCH_code#Properties ). This makes the minimum distance of the code to be at least d . Decoding this code also relies on the fact that c(α[m])=0 and therefore if the received codeword r(x) = c(x) + e(x), then we get the d syndromes as Sₘ = e(α[m]). Let us denote the locations of non-zero coefficients of e(x) by J: e(x) = e[J0]xᴶ⁰+e[J1]xᴶ¹+e[J2]xᴶ² if we limit ourselves to 3 errors, and we can say e[Ji]=1 since we are still looking at binary codes. For each α[m], this gives us: (α[m])ᴶ⁰+(α[m])ᴶ¹+(α[m])ᴶ², but just in the previous paragraph, we decided to use contiguous powers of α, that is α[m]=αᵐ, and therefore we have power sums of the form (αᵐ)ᴶ⁰+(αᵐ)ᴶ¹+(αᵐ)ᴶ²=(αᴶ⁰)ᵐ+(αᴶ¹)ᵐ+(αᴶ²)ᵐ for m=0,1,…, d-1 . The task is then to find e(x) from its evaluations S[m]. The trick is to find a polynomial that has 1/αᴶ as its roots, where, as mentioned above, {J} indicates the non-zero coefficient locations in e(x). We call this the error locator polynomial, since its roots contain information about the error locations. The error locator polynomial is of the form Λ(x)=Π(αᴶx+1), but since we don’t know the error locations {J} yet, let us replace {αᴶ} with another unknown variable {X}, and if we expand, we get: Λ(x)=Π(αᴶx+1)=Π(X[i]x+1) = (xX[0]+1)(xX[1]+1)(xX[2]+1) =(x²X[0]X[1]+x(X[0]+X[1])+1)(xX[2]+1) =1+(X[0]+X[1]+X[2])x +(X[0]X[1]+X[0]X[2]+X[1]X[2])x² + (X[0]X[1]X[2])x³ =Λ[0]+Λ[1]x+Λ[2]x²+Λ[3]x³ We have limited ourselves to 3 errors in this case. We then see: Λ[0]=1, Λ[1]=(X[0]+X[1]+X[2]). Λ[2]=(X[0]X[1] + X[0]X[2] + X[1]X[2]) and Λ[3]=(X[0]X[1]X[2]). These are called elementary symmetric polynomials. The syndromes S[m] =Σᵢ(X[i])ᵐ are called the power sums. Let us take Λ[2] now, it has all the cross terms of degree 2. But we see : Λ[1]S[1] =(X[0]+X[1]+X[2])(X[0]+X[1]+X[2]) =(X[0]²+X[0]X[1]+X[0]X[2] + X[0]X[1]+X[1]²+X[1]X[2] + X[0]X[2]+X[1]X[2]+X[2]²) = 2(X[0]X[1]+X[0]X[2]+X[1]X[2])+ (X[0]²+X[1]²+X[2]²) =2Λ[2]+S[2] Rearranging, 2Λ[2]=Λ[1]S[1]+S[2] (remember that our computations are in GF(2), so there is no difference between a+b and a-b). For Λ[3], we can do similar computations to find out that 3Λ[3] = Λ[2]S[1]+Λ[1]S[2]+S[3] (anyone notice the similarity to convolution here?). The computation itself is too cumbersome to do, but Newton already did this for us 400 years ago and generalized this into his Newton’s identities . We now have a system of equations that relate the unknown quantities Λ[m] with the syndromes S[m] that we can compute. We can collect and arrange all the syndromes into a matrix S , and find the solution to this matrix vector equation S Λ=0. Solving for Λ gets us to Λ(x) and we can find the roots of Λ(x), by brute force search if necessary, to find αᴶ, which gives us our error locations {J}. Remember that this required us to use contiguous powers of α as roots of our generator polynomial, but once we did that, we have an elegant way to find the error locations now. This, dear reader, is the famous Berlekamp-Massey algorithm for decoding BCH and Reed-Solomon codes. BCH and Reed-Solomon codes are both very similar with one difference. The generator matrix of Reed-Solomon codes has coefficients from GF(2ᴺ) while the generator matrix of BCH codes can be restricted to coefficients from GF(2) due to the construction based on minimal polynomials. Let us not go much further into the rabbit hole, there are excellent books on this topic. And at some point, we have to actually switch from using English to using Math to communicate ideas — Medium (or an internet article meant for casual reading) is definitely not the platform/place for that. As the reader can see already from this article, coding theory draws heavily from several fields of mathematics, and is therefore a delight to anyone who is mathematically inclined. But there are also other classes of error correction codes that draw from computer science concepts to build tree/trellis/graph structures to enable efficient decoding. Convolutional codes allow for a trellis representation of the coded bits, and LPDC codes relies on the bipartite graph structure for its message passing decoder. In addition, these codes allow for use of soft information, without having to force the received symbols to be elements of a finite field. When we quantize receive signal to bits, we lose the reliability information and it is now well known that using this reliability information significantly improves the decoding performance. Turbo codes were the first to employ two decoders to iteratively improve the reliability information, or the log likelihood ratio — LLR (intrinsic and extrinsic), but the idea of using two decoders is not new. Forney introduced the idea of serial concatenation of two codes, and the resulting concatenated code has a minimum distance that is the product of the minimum distance of constituent codes. The Voyager mission (not Star Trek’s Voyager!) used a Reed-Solomon outer code and a convolutional inner code to achieve significant coding gains, best among codes known at that time. In 2009, Polar codes were invented by Erkan and shown to be capacity achieving in discrete memoryless channels. If we transmit N bits and the channel does not introduce a dependency between one bit and the next, we can consider this as a transmitting over N parallel channels. We can then combine bits in such a way that we redistribute the probability of error across the N channels, but reducing it for some and increasing it for others. We are effectively polarizing the channels into “good” and “bad”, hence the name Polar codes! This construction has a simple encoder and a low complexity successive cancellation decoder and showed better performance than other codes widely in use for shorter code lengths. Polar codes have now been included in the 3GPP 5G New Radio standard as the error correction scheme for control channels. With such richness and diversity of coding techniques available to us, we need to decide which one to use. Is there a one ring (not F[x]), to rule them all here? One coding scheme that outperforms every other in all scenario? Shannon’s work on capacity allows us to set a limit on the energy per bit over noise that is required for reliable transmission. In additive white Gaussian noise, capacity C = B×log₂(1+SNR) bits/second and this equation allows us to get an estimate of the signal to noise ratio, SNR, required to achieve as specific bit rate R for the given bandwidth B. The signal power is the energy per bit Eb integrated over number of bits sent over one second R, and the noise power is the noise power spectral density/Hz integrated over the used bandwidth B. the The spectral efficiency η=R/B then becomes η=log₂(1+(Eb×R)/(No×B))=log₂(1+(Eb/No)×η). As η tends to 0, Eb/No becomes -1.59 dB at the limit. When evaluating error correction codes, we can ignore all the other parts of the system and just look at the Eb/No required for very low probability of error for different codes with same code rate. For a given coding scheme, we can compare the (coded) Eb/No needed to achieve 1% block error rate to the (uncoded) Eb/No that achieves 1% block error rate without any coding. The difference gives us the coding gain — we can see which coding schemes have the best coding gain for a given code rate. And I happened to do just that recently. I simulated the block error rate for the Reed-Solomon (N=63,K=55) and Convolution Encoder (R=1/2,K=7) serially concatenated code. For comparison, I also simulated the Quasi Cyclic Rate 1/2 LDPC (N=648, K=324) code used in WLAN 802.11n/ac/ax physical layer, and the Polar Code (N=1024, Rate 1/2) used in 5G NR. With the incredible engineers at Mathworks building the required encoders and decoders into the latest versions of Matlab, it is all of half a day’s work to do this comparison. Since the block sizes are different, I used different information bits for each coding scheme. For example, LDPC needs 324 bits to encode one block, while the RS-CC concatenated code requires 330 bits (55 symbols from GF(2⁶) and 6 bits per symbol), and Polar code requires 512 bits. From the result below, we see that the 7 dB coding gain we get from RS-CC is outdone by the 9.5 dB coding gain from LDPC and almost 11 dB coding gain from Polar codes! BLER comparison for serially concatenated RS+CC, LDPC and Polar codes — unpunctured But this isn’t a fair comparison however. We need to use the same number of information bits and if we pick a number lower than what the code is designed for, we need to puncture the parity bits to maintain the code rate. Let us use a 20 byte (160 bits) information length for our comparison next. For Reed-Solomon codes, there is no easy way to puncture, so we settle for the lower code rate. Here, 160 bits is 27 symbols, and RS encoder adds 8 parity symbols, so the code rate becomes 27/35=0.77, which is then encoded by the rate 1/2 CC, and the effective rate is therefore 0.39 instead of the required rate 1/2. For LDPC codes, to keep rate 1/2, we need 320 bits, so we need to puncture 324–160=164 parity bits. This means we lost more than half the parity bits the code is designed to add. For Polar codes, the “rate-matching” is also part of the Polar encoder specified by 5G NR. The performance of punctured codes are shown in the simulation result below. We see that Polar code loses about 1 dB coding gain, while the LDPC loses close to 2 dB coding gain! We see better performance of RS+CC because the effective code rate is now lower. The performance gap between RS+CC and the LPDC is now just below 1 dB! Performance of punctured Polar, LDPC with RS+CC concatenated code For the RS+CC simulation so far, I used a Viterbi decoder that uses hard decision output from demodulator, but the LDPC and Polar codes use the demodulator’s LLR output. Which again makes the comparison unfair. Following simulation result shows the performance of RS+CC code when soft-input Viterbi decoder is used, and the concatenated code is much better than punctured LDPC and almost matches the performance of the rate-matched Polar code! And that, my dear reader, is the “wow” moment. Performance comparison of RS+CC with soft-input Viterbi decoder with punctured LDPC and Polar codes |
| Markdown | [Sitemap](https://medium.com/sitemap/sitemap.xml)
[Open in app](https://rsci.app.link/?%24canonical_url=https%3A%2F%2Fmedium.com%2Fp%2F6b699cab854d&~feature=LoOpenInAppButton&~channel=ShowPostUnderUser&~stage=mobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40prasannasethuraman%2Fsome-bits-on-error-correcting-codes-6b699cab854d&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40prasannasethuraman%2Fsome-bits-on-error-correcting-codes-6b699cab854d&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

# Some Bits on Error Correcting Codes
[](https://medium.com/@prasannasethuraman?source=post_page---byline--6b699cab854d---------------------------------------)
[Prasanna Sethuraman](https://medium.com/@prasannasethuraman?source=post_page---byline--6b699cab854d---------------------------------------)
12 min read
·
Mar 27, 2022
\--
1
Listen
Share
Pardon the pun in the title, dear reader. In this time of virus and war, a man must have his fun when he can. And where does a communication systems engineer get his fun? When he has a “wow” moment when working on the building blocks making digital communication possible. That someone who has spent two decades practicing design and development of wireless communication systems can still be surprised by new insights does say quite a bit about the depth of communications theory fundamentals and the human spirit for continuous learning.
What I am about to narrate is some observations on error correcting codes that form the backbone for reliable communication. We add more bits to the random data in such a way that the extra bits introduce a structure that can be exploited by a decoder to recover any information bits in error. Ever since the first error correction code invented by Hamming in the 1940s, a lot of brilliant people have looked into various ways of adding structures that makes efficient decoding possible.
# Linear Block Codes and Minimum Distance
Linear block codes with ‘K’ information bits and ‘N-K’ parity bits result in ‘N’ bit codewords, and can be written as multiplying the K×1 message vector with a N×K generator matrix. Each column of this generator matrix is then a codeword (since message vector can be one hot), but so are linear combinations of these columns (since the message can be any random K×1 vector). There are 2ᴷ possible messages, and to have unique mapping, there needs to be 2ᴷ codewords. For this to happen, all K columns of the generator matrix have to linearly independent. Given such a matrix, we can perform Gaussian elimination to get a generator matrix of the form \[I, P\]ᵀ where I is a K×K identity matrix and P is a (N-K)×K parity check matrix and this form can be used for systematic encoding.
For maximum likelihood decoding, we want these codewords to be as far apart as possible in the discrete N-dimensional vector space they live in. The probability of mapping the received bits to a wrong codeword then depends on the distance between the correct codeword and the wrong codeword. We can then upper bound this probability of error by looking at the minimum distance, *dmin*, between all pairs of codewords (which, for [linear block codes](https://en.wikipedia.org/wiki/Block_code#The_distance_d), is equivalent to finding the codeword with minimum weight, since that is distance from the all-zero codeword). What we also want to do is pick the smallest possible N for a given K and still have *dmin* distance between codewords. This means picking codewords in discrete N dimensional space each having a discrete sphere of radius *dmin* within which no other codewords exist. If we can find how many such discrete spheres can be packed in the volume of an discrete N-dimensional space, we know what is the largest K for a given N. Sphere packing is not an easy problem, but we can compute upper bounds (Singleton, Hamming, Plotkin, Elias), and lower bounds (Gilbert-Varshamov). While these bounds say what K and N are possible for good codes, they say nothing about how to find these codewords\!
In his seminal work, “A Mathematical Theory of Communication”, Shannon showed that every communication channel has a capacity and this capacity can always be achieved by long random error correction codes. This means we can pick long random codes and expect most of them to be “good”, meaning large enough *dmin* for a given N and K. How do we decode such a random code though\!
# Cyclic Codes and Galois Fields
To have a practical decoder, codewords need to have a structure. For example, we can have cyclic codes where every codeword is a cyclic shift of another codeword. If we represent the codeword as a polynomial (for example, a codeword that is 1101 is 1 + x + x³), then all cyclic shifts are equivalent to multiplying this codeword by powers of x and taking modulo over xᴺ+1. The encoding can then be written as m(x)g(x) where m(x) is a polynomial of degree at most K and g(x) is a polynomial of degree N-K.
When there are polynomials, the first question to ask about them is, what can we say about the roots of this polynomial. A polynomial p(x) with coefficients from a field (a set that is closed under addition, subtraction, multiplication and division) might or might not have one of the field elements as a root. If there are no field elements that satisfy p(x) = 0, then p(x) is irreducible over that field. For example, x²+1=0 has no root in real number field, but has its root in the field of complex numbers. If a polynomial has no roots in the base field, where can we find its roots?
All polynomials have roots in the complex number field, which is algebraically closed, but we are interested in fields that have finite number of elements since we want to encode a finite block of bits. The answer came from the genius of Galois. As an example, see that 1 + x + x³ does not have a root in the field {0,1} with modulo 2 addition (neither 0 nor 1 solves 1 + x + x³=0). But its root, α (which means 1+α+α³ = 0, and therefore α³=α+1), generates an extension field that consists of 2³ elements {0,α⁰,α¹,α²,α³,α⁴,α⁵,α⁶}. What about α⁷? We have α⁷= α³×α³×α = (α+1)(α+1)α = (α²+1)α = α³+α = α+1+α = 1 (remember 1+1=0 in modulo 2 addition). We can do similar exercise and see that all other powers of α from 1 to 6 are distinct, and can be uniquely mapped to corresponding 3 element vectors: (0🠆\[0,0,0\], 1🠆\[1,0,0\], α🠆\[0,1,0\], α²🠆\[0,0,1\], α³🠆\[1,1,0\], α⁴🠆\[0,1,1\],α⁵🠆\[1,1,1\], α⁶🠆\[1,0,1\]) and the mapping is decided by the polynomial 1 + x + x³ we picked. This is great, but what is α, you ask? It is an abstract quantity that generates an extension field given a polynomial that is irreducible in the base field. Therein lies the genius of Galois, who invented this whole new branch of mathematics at just 19 years of age. In his honor, the field with {0,1} modulo 2 is written as GF(2) and the extension field as GF(2ⁿ).
# Berlekamp-Massey Decoding
Going back to our cyclic codes, we have c(x) = m(x)g(x) where g(x) is a polynomial of degree N-K. If we pick g(x) such that *d* of its roots, α\[m\] (subscripts are a pain to do in Medium, hence the square brackets), are in GF(2ᴺ), then those roots are also roots of c(x), which means c(α\[m\])=0. If α\[m\] are chosen to be contiguous powers of α, that is α\[m\]=αᵐ, then we can show that c(x) must be have at least *d* non-zero coefficients (see <https://en.wikipedia.org/wiki/BCH_code#Properties>). This makes the minimum distance of the code to be at least *d*.
Decoding this code also relies on the fact that c(α\[m\])=0 and therefore if the received codeword r(x) = c(x) + e(x), then we get the *d* syndromes as Sₘ = e(α\[m\]). Let us denote the locations of non-zero coefficients of e(x) by J: e(x) = e\[J0\]xᴶ⁰+e\[J1\]xᴶ¹+e\[J2\]xᴶ² if we limit ourselves to 3 errors, and we can say e\[Ji\]=1 since we are still looking at binary codes. For each α\[m\], this gives us: (α\[m\])ᴶ⁰+(α\[m\])ᴶ¹+(α\[m\])ᴶ², but just in the previous paragraph, we decided to use contiguous powers of α, that is α\[m\]=αᵐ, and therefore we have power sums of the form (αᵐ)ᴶ⁰+(αᵐ)ᴶ¹+(αᵐ)ᴶ²=(αᴶ⁰)ᵐ+(αᴶ¹)ᵐ+(αᴶ²)ᵐ for m=0,1,…,*d-1*.
The task is then to find e(x) from its evaluations S\[m\]. The trick is to find a polynomial that has 1/αᴶ as its roots, where, as mentioned above, {J} indicates the non-zero coefficient locations in e(x). We call this the error locator polynomial, since its roots contain information about the error locations. The error locator polynomial is of the form Λ(x)=Π(αᴶx+1), but since we don’t know the error locations {J} yet, let us replace {αᴶ} with another unknown variable {X}, and if we expand, we get:
> Λ(x)=Π(αᴶx+1)=Π(X\[i\]x+1)
> \= (xX\[0\]+1)(xX\[1\]+1)(xX\[2\]+1)
> \=(x²X\[0\]X\[1\]+x(X\[0\]+X\[1\])+1)(xX\[2\]+1)
> \=1+(X\[0\]+X\[1\]+X\[2\])x +(X\[0\]X\[1\]+X\[0\]X\[2\]+X\[1\]X\[2\])x² + (X\[0\]X\[1\]X\[2\])x³
> \=Λ\[0\]+Λ\[1\]x+Λ\[2\]x²+Λ\[3\]x³
We have limited ourselves to 3 errors in this case. We then see: Λ\[0\]=1, Λ\[1\]=(X\[0\]+X\[1\]+X\[2\]). Λ\[2\]=(X\[0\]X\[1\] + X\[0\]X\[2\] + X\[1\]X\[2\]) and Λ\[3\]=(X\[0\]X\[1\]X\[2\]). These are called elementary symmetric polynomials. The syndromes S\[m\] =Σᵢ(X\[i\])ᵐ are called the power sums.
Let us take Λ\[2\] now, it has all the cross terms of degree 2. But we see :
> Λ\[1\]S\[1\]
> \=(X\[0\]+X\[1\]+X\[2\])(X\[0\]+X\[1\]+X\[2\])
> \=(X\[0\]²+X\[0\]X\[1\]+X\[0\]X\[2\] + X\[0\]X\[1\]+X\[1\]²+X\[1\]X\[2\] + X\[0\]X\[2\]+X\[1\]X\[2\]+X\[2\]²)
> \= 2(X\[0\]X\[1\]+X\[0\]X\[2\]+X\[1\]X\[2\])+ (X\[0\]²+X\[1\]²+X\[2\]²)
> \=2Λ\[2\]+S\[2\]
Rearranging, 2Λ\[2\]=Λ\[1\]S\[1\]+S\[2\] (remember that our computations are in GF(2), so there is no difference between a+b and a-b).
For Λ\[3\], we can do similar computations to find out that 3Λ\[3\] = Λ\[2\]S\[1\]+Λ\[1\]S\[2\]+S\[3\] (anyone notice the similarity to convolution here?). The computation itself is too cumbersome to do, but Newton already did this for us 400 years ago and generalized this into his [Newton’s identities](https://en.wikipedia.org/wiki/Newton%27s_identities). We now have a system of equations that relate the unknown quantities Λ\[m\] with the syndromes S\[m\] that we can compute. We can collect and arrange all the syndromes into a matrix **S**, and find the solution to this matrix vector equation **S**Λ=0. Solving for Λ gets us to Λ(x) and we can find the roots of Λ(x), by brute force search if necessary, to find αᴶ, which gives us our error locations {J}. Remember that this required us to use contiguous powers of α as roots of our generator polynomial, but once we did that, we have an elegant way to find the error locations now. This, dear reader, is the famous Berlekamp-Massey algorithm for decoding BCH and Reed-Solomon codes.
BCH and Reed-Solomon codes are both very similar with one difference. The generator matrix of Reed-Solomon codes has coefficients from GF(2ᴺ) while the generator matrix of BCH codes can be restricted to coefficients from GF(2) due to the construction based on minimal polynomials. Let us not go much further into the rabbit hole, there are excellent books on this topic. And at some point, we have to actually switch from using English to using Math to communicate ideas — Medium (or an internet article meant for casual reading) is definitely not the platform/place for that.
# Modern Codes: Turbo, LDPC and Polar
As the reader can see already from this article, coding theory draws heavily from several fields of mathematics, and is therefore a delight to anyone who is mathematically inclined. But there are also other classes of error correction codes that draw from computer science concepts to build tree/trellis/graph structures to enable efficient decoding. Convolutional codes allow for a trellis representation of the coded bits, and LPDC codes relies on the bipartite graph structure for its message passing decoder. In addition, these codes allow for use of soft information, without having to force the received symbols to be elements of a finite field. When we quantize receive signal to bits, we lose the reliability information and it is now well known that using this reliability information significantly improves the decoding performance.
Turbo codes were the first to employ two decoders to iteratively improve the reliability information, or the log likelihood ratio — LLR (intrinsic and extrinsic), but the idea of using two decoders is not new. Forney introduced the idea of serial concatenation of two codes, and the resulting concatenated code has a minimum distance that is the product of the minimum distance of constituent codes. The Voyager mission (not Star Trek’s Voyager!) used a Reed-Solomon outer code and a convolutional inner code to achieve significant coding gains, best among codes known at that time.
In 2009, Polar codes were invented by Erkan and shown to be capacity achieving in discrete memoryless channels. If we transmit N bits and the channel does not introduce a dependency between one bit and the next, we can consider this as a transmitting over N parallel channels. We can then combine bits in such a way that we redistribute the probability of error across the N channels, but reducing it for some and increasing it for others. We are effectively polarizing the channels into “good” and “bad”, hence the name Polar codes! This construction has a simple encoder and a low complexity successive cancellation decoder and showed better performance than other codes widely in use for shorter code lengths. Polar codes have now been included in the 3GPP 5G New Radio standard as the error correction scheme for control channels.
With such richness and diversity of coding techniques available to us, we need to decide which one to use. Is there a one ring (not F\[x\]), to rule them all here? One coding scheme that outperforms every other in all scenario?
# Comparison of Error Correcting Codes
Shannon’s work on capacity allows us to set a limit on the energy per bit over noise that is required for reliable transmission. In additive white Gaussian noise, capacity C = B×log₂(1+SNR) bits/second and this equation allows us to get an estimate of the signal to noise ratio, SNR, required to achieve as specific bit rate R for the given bandwidth B. The signal power is the energy per bit Eb integrated over number of bits sent over one second R, and the noise power is the noise power spectral density/Hz integrated over the used bandwidth B. the The spectral efficiency η=R/B then becomes η=log₂(1+(Eb×R)/(No×B))=log₂(1+(Eb/No)×η). As η tends to 0, Eb/No becomes -1.59 dB at the limit.
When evaluating error correction codes, we can ignore all the other parts of the system and just look at the Eb/No required for very low probability of error for different codes with same code rate. For a given coding scheme, we can compare the (coded) Eb/No needed to achieve 1% block error rate to the (uncoded) Eb/No that achieves 1% block error rate without any coding. The difference gives us the coding gain — we can see which coding schemes have the best coding gain for a given code rate.
And I happened to do just that recently. I simulated the block error rate for the Reed-Solomon (N=63,K=55) and Convolution Encoder (R=1/2,K=7) serially concatenated code. For comparison, I also simulated the Quasi Cyclic Rate 1/2 LDPC (N=648, K=324) code used in WLAN 802.11n/ac/ax physical layer, and the Polar Code (N=1024, Rate 1/2) used in 5G NR. With the incredible engineers at Mathworks building the required encoders and decoders into the latest versions of Matlab, it is all of half a day’s work to do this comparison.
Since the block sizes are different, I used different information bits for each coding scheme. For example, LDPC needs 324 bits to encode one block, while the RS-CC concatenated code requires 330 bits (55 symbols from GF(2⁶) and 6 bits per symbol), and Polar code requires 512 bits.
From the result below, we see that the 7 dB coding gain we get from RS-CC is outdone by the 9.5 dB coding gain from LDPC and almost 11 dB coding gain from Polar codes\!
Press enter or click to view image in full size
![]()
BLER comparison for serially concatenated RS+CC, LDPC and Polar codes — unpunctured
But this isn’t a fair comparison however. We need to use the same number of information bits and if we pick a number lower than what the code is designed for, we need to puncture the parity bits to maintain the code rate. Let us use a 20 byte (160 bits) information length for our comparison next.
For Reed-Solomon codes, there is no easy way to puncture, so we settle for the lower code rate. Here, 160 bits is 27 symbols, and RS encoder adds 8 parity symbols, so the code rate becomes 27/35=0.77, which is then encoded by the rate 1/2 CC, and the effective rate is therefore 0.39 instead of the required rate 1/2. For LDPC codes, to keep rate 1/2, we need 320 bits, so we need to puncture 324–160=164 parity bits. This means we lost more than half the parity bits the code is designed to add. For Polar codes, the “rate-matching” is also part of the Polar encoder specified by 5G NR.
The performance of punctured codes are shown in the simulation result below. We see that Polar code loses about 1 dB coding gain, while the LDPC loses close to 2 dB coding gain! We see better performance of RS+CC because the effective code rate is now lower. The performance gap between RS+CC and the LPDC is now just below 1 dB\!
Press enter or click to view image in full size
![]()
Performance of punctured Polar, LDPC with RS+CC concatenated code
For the RS+CC simulation so far, I used a Viterbi decoder that uses hard decision output from demodulator, but the LDPC and Polar codes use the demodulator’s LLR output. Which again makes the comparison unfair. Following simulation result shows the performance of RS+CC code when soft-input Viterbi decoder is used, and the concatenated code is much better than punctured LDPC and almost matches the performance of the rate-matched Polar code! And that, my dear reader, is the “wow” moment.
Press enter or click to view image in full size
![]()
Performance comparison of RS+CC with soft-input Viterbi decoder with punctured LDPC and Polar codes
\--
\--
1
[](https://medium.com/@prasannasethuraman?source=post_page---post_author_info--6b699cab854d---------------------------------------)
[](https://medium.com/@prasannasethuraman?source=post_page---post_author_info--6b699cab854d---------------------------------------)
[Written by Prasanna Sethuraman](https://medium.com/@prasannasethuraman?source=post_page---post_author_info--6b699cab854d---------------------------------------)
[439 followers](https://medium.com/@prasannasethuraman/followers?source=post_page---post_author_info--6b699cab854d---------------------------------------)
·[3 following](https://medium.com/@prasannasethuraman/following?source=post_page---post_author_info--6b699cab854d---------------------------------------)
It has been two decades of building Wireless Systems with Signal Processing, but the learning never stops\!
## Responses (1)
See all responses
[Help](https://help.medium.com/hc/en-us?source=post_page-----6b699cab854d---------------------------------------)
[Status](https://medium.statuspage.io/?source=post_page-----6b699cab854d---------------------------------------)
[About](https://medium.com/about?autoplay=1&source=post_page-----6b699cab854d---------------------------------------)
[Careers](https://medium.com/jobs-at-medium/work-at-medium-959d1a85284e?source=post_page-----6b699cab854d---------------------------------------)
[Press](mailto:pressinquiries@medium.com)
[Blog](https://blog.medium.com/?source=post_page-----6b699cab854d---------------------------------------)
[Privacy](https://policy.medium.com/medium-privacy-policy-f03bf92035c9?source=post_page-----6b699cab854d---------------------------------------)
[Rules](https://policy.medium.com/medium-rules-30e5502c4eb4?source=post_page-----6b699cab854d---------------------------------------)
[Terms](https://policy.medium.com/medium-terms-of-service-9db0094a1e0f?source=post_page-----6b699cab854d---------------------------------------)
[Text to speech](https://speechify.com/medium?source=post_page-----6b699cab854d---------------------------------------) |
| Readable Markdown | null |
| Shard | 77 (laksa) |
| Root Hash | 13179037029838926277 |
| Unparsed URL | com,medium!/@prasannasethuraman/some-bits-on-error-correcting-codes-6b699cab854d s443 |