âšď¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.3 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://www.jameswhanlon.com/error-correcting-codes.html |
| Last Crawled | 2026-04-07 18:40:17 (7 days ago) |
| First Indexed | 2020-05-11 00:51:04 (5 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Error-correcting codes |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Error correcting codes (ECCs) are used in computer and communication systems to
improve resiliency to bit flips caused by permanent hardware faults or
transient conditions, such as neutron particles from cosmic rays, known
generally as
soft errors
. This note
describes the principles of Hamming codes that underpin
ECC
schemes,
ECC
codes
are constructed, focusing on single-error correction and double error
detection, and how they are implemented.
ECCs work by adding additional redundant bits to be stored or transported with
data. The bits are encoded as a function of the data in such a way that it is
possible to detect erroneous bit flips and to correct them. The ratio of the
number of data bits to the total number of bits encoded is called the
code
rate
, with a rate of 1 being a an impossible encoding with no overhead.
Simple ECCs
Parity coding
adds a single bit that indicates whether the number of set
bits in the data is odd or even. When the data and parity bit is accessed or
received, the parity can be recomputed and compared. This is sufficient to
detect any odd number of bit flips but not to correct them. For applications
where the error rate is low, so that only single bit flips are likely and
double bit flips are rare enough to be ignored, parity error detection is
sufficient and desirable due to itâs low overhead (just a single bit) and
simple implementation.
Repetition coding
simply repeats each data bit a fixed number of times. When
the encoded data is received, if each of the repeated bits are non identical,
an error has occurred. With a repetition of two, single-bit errors can be
detected but not corrected. With a repetition of three, single bit flips can be
corrected by determining each data bit as the majority value in each triple,
but double bit flips are undetectable and will cause an erroneous correction.
Repetition codes are simple to implement but have a high overhead.
Hamming codes
Hamming codes are an efficient family of codes using additional redundant bits to
detect up to two-bit errors and correct single-bit errors (technically, they are
linear error-correcting codes
).
In them,
check bits
are added to data bits to form a
codeword
, and the
codeword is
valid
only when the check bits have been generated from the data
bits, according to the Hamming code. The check bits are chosen so that there is
a fixed
Hamming distance
between any two valid codewords (the number of
positions in which bits differ).
When valid codewords have a Hamming distance of two, any single bit flip will
invalidate the word and allow the error to be detected. For example, the valid codewords
00
and
11
are separated for single bit flips by the invalid codewords
01
and
10
. If either of the invalid words is obtained an error
has occurred, but neither can be associated with a valid codeword. Two bit
flips are undetectable since they always map to a valid codeword. Note that
parity encoding is an example of a distance-two Hamming code.
00 < Valid codeword
|
10 < Invalid codeword (obtained by exactly 1 bit flip)
|
11 < Valid codeword
With Hamming distance three, any single bit flip in a valid codeword makes an
invalid one, and the invalid codeword is Hamming distance one from exactly one
valid codeword. Using this, the valid codeword can be restored, enabling single
error correction. Any two bit flips map to an invalid codeword, which would
cause correction to the wrong valid codeword.
000 < Valid codeword
|
001
|
011
|
111 < Valid codeword
With Hamming distance four, two bit flips moves any valid codeword Hamming
distance two from exactly two valid codewords, allowing detection of two flips
but not correction. Single bit flips can be corrected as they were for distance
three. Distance-four codes are widely used in computing, where is it often the
case where single errors are frequent, double errors are rare and triple errors
occur so rarely they can be ignored. These codes are referred to as â
SECDED
ECC
â (single error correction, double error detection).
0000 < Valid codeword
|
0001
|
0011 < Two bit flips from either codeword.
|
0111
|
1111 < Valid codeword
Double errors can be corrected with a distance-five code, as well as enabling
the detection of triple errors. In general, if a Hamming code can detect $d$
errors, it must have a minimum distance of $d+1$ so there is no way $d$ errors
can change one valid codeword into another one. If a code can correct $d$
errors, it must have a minimum distance of $2d+1$ so that the originating code
is always the closest one. The following table summarises Hamming codes.
Distance
Max bits corrected
Max bits detected
2
0
1
Single error detection (eg parity code)
3
1
1
Single error correction (eg triple repetition code)
4
1
2
Single error correction, double error detection (a â
SECDED
â code)
5
2
2
Double error correction
6
2
3
Double error correction, triple error detection
Creating a Hamming code
A codeword includes the data bits and checkbits. Each check bit corresponds to
a subset of the data bits and it is set when the parity of those data bits is
odd. To obtain a code with a particular Hamming distance, the number of check
bits and their mapping to data bits must be chosen carefully.
To build a single-error correcting (
SEC
) code that requires Hamming distance
three between valid codewords, it is necessary for:
The mapping of each data bit to check bits is unique.
Each data bit to map to at least two check bits.
To see why this works, consider two distinct codewords that necessarily
must have different data bits. If the data bits differ by:
1 bit
, at least two check bits are flipped, giving a total of three
different bits.
2 bits
, these will cause at least one flip in the check bits since any two
data bits cannot share the same check-bit mapping (ie by taking the
XOR
of
the two check bit patterns). This also gives a total of three different bits as required.
3 bits
, this is already sufficient to give a Hamming distance of three.
To build a
SECDED
code that requires Hamming distance of four between valid
codewords, it is necessary for:
The mapping of each data bit to check bits is unique.
Each data bit to map to at least three check bits.
Each check bit pattern to have an odd number of bits set.
Following a similar argument, consider two distinct codewords, data differing by:
1 bit
flips three check bits, giving a total of four different bits.
2 bits
flip check bits in two patterns, and since any two odd-length patterns
must have at least two non-overlapping bits, the results is at least two
flipped bits, giving a total of four different bits. For example:
Check bits: 0 1 2 3
data[a] x x x
data[b] x x x
----------- --------
Flips x x
Check bits: 0 1 2 3 4
data[a] x x x
data[b] x x x x x
---------- ---------
Flips x x
3 bits
flip check bits in three patterns, and this time it is possible to
overlap odd-length patterns in such a way that a minimum of 1 bit is flipped.
For example:
Check bits: 0 1 2 3 4
data[a] x x x
data[b] x x x
data[c] x x x
----------- ---------
Flips x
Check bits: 0 1 2 3 4
data[a] x x x x x
data[b] x x x
data[c] x x x
----------- ---------
Flips x
4 bits
is already sufficient to provide a Hamming distance of four.
An example
SEC
code for eight data bits with four parity bits:
Check bits: 0 1 2 3
data[0] x x x
data[1] x x x
data[2] x x x
data[3] x x x
data[4] x x
data[5] x x
data[6] x x
data[7] x x
An example
SECDED
code for eight data bits with five parity bits:
Check bits: 0 1 2 3 4
data[0] x x x
data[1] x x x
data[2] x x x
data[3] x x x
data[4] x x x
data[5] x x x
data[6] x x x
data[7] x x x
Note that mappings of data bits to check bits can be chosen flexibly, providing
they maintain the rules that set the Hamming distance. This flexibility is
useful when implementing
ECC
to reduce the cost of calculating the check bits.
In contrast, many descriptions of
ECC
that I have found in text books and on
Wikipedia
describe a specific
encoding that does not acknowledge this freedom. The encoding they describe
allows the syndrome to be interpreted as the bit index of the single bit error,
by the check bit in position $i$ covering data bits in position $i$.
Additionally, they specify that parity bits are positioned in the codeword at
power-of-two positions, for no apparent benefit.
Implementing
ECC
Given data bits and check bits, and mapping of data bits to check bits,
ECC
encoding works by calculating the check bits from the data bits, then combining
data bits and check bits to form the codeword. Decoding works by taking the
data bits from a codeword, recalculating the check bits, then calculating the
bitwise
XOR
between the original check bits and the recalculated ones. This
value is called the
syndrome
. By inspecting the number of bits set in the
syndrome, it is possible to determine whether there has been an error,
whether it is correctable, and how to correct it.
Using the
SEC
check-bit encoding above, creating a codeword from
data[7:0]
,
the check bits are calculated as follows (using Verilog syntax):
assign
check_word
[
0
]
=
data
[
0
]
^
data
[
2
]
^
data
[
3
]
^
data
[
4
]
^
data
[
7
];
assign
check_word
[
1
]
=
data
[
0
]
^
data
[
1
]
^
data
[
3
]
^
data
[
4
]
^
data
[
5
];
assign
check_word
[
2
]
=
data
[
0
]
^
data
[
1
]
^
data
[
2
]
^
data
[
5
]
^
data
[
6
];
assign
check_word
[
3
]
=
data
[
1
]
^
data
[
2
]
^
data
[
3
]
^
data
[
6
]
^
data
[
7
];
And the codeword formed by concatenating the check bits and data:
assign
codeword
=
{
check
[
3
:
0
],
data
[
7
:
0
]};
Decoding of a codeword, splits it into the checkword and data bits, recomputes
the check bits and calculates the syndrome:
assign
{
old_check_word
,
old_data
}
=
codeword
;
assign
new_check_word
[
0
]
=
...;
assign
new_check_word
[
1
]
=
...;
assign
new_check_word
[
2
]
=
...;
assign
new_check_word
[
3
]
=
...;
assign
syndrome
=
new_check_word
^
old_check_word
;
When single bit errors occur, the syndrome will have the bit pattern
corresponding to a particular data bit, so a correction can be applied by
creating a mask to flip the bit in that position:
unique
case
(
syndrome
)
4
'b1110
:
correction
=
1
<<
0
;
4
'b0111
:
correction
=
1
<<
1
;
4
'b1011
:
correction
=
1
<<
2
;
4
'b1101
:
correction
=
1
<<
3
;
4
'b1100
:
correction
=
1
<<
4
;
4
'b0110
:
correction
=
1
<<
5
;
4
'b0011
:
correction
=
1
<<
6
;
4
'b1001
:
correction
=
1
<<
7
;
default
:
correction
=
0
;
endcase
And using it to generate the corrected data:
assign
corrected_data
=
data
^
correction
;
The value of the syndrome can be further inspected to signal what action has
been taken. If the syndrome is:
Equal to zero, no error occurred.
Has one bit set, then this is a flip of a check bit and can be ignored.
Has a value matching a pattern (three bits set or two bits in the adjacent positions), a correctable error occurred.
Has a value not matching a pattern (two bits set in the other non-adjacent positions:
4'b1010
,
4'b0101
), or four bits set, a multi-bit uncorrectable error occurred.
The above
SECDED
check-bit encoding can be implemented in a similar way, but
since it uses only three-bit patterns, mapping syndromes to correction masks
can be done with three-input
AND
 gates:
unique
case
(
syndrome
)
syndrome
[
0
]
&&
syndrome
[
1
]
&&
syndrome
[
2
]
:
correction
=
1
<<
0
;
syndrome
[
0
]
&&
syndrome
[
1
]
&&
syndrome
[
3
]
:
correction
=
1
<<
1
;
syndrome
[
0
]
&&
syndrome
[
2
]
&&
syndrome
[
3
]
:
correction
=
1
<<
2
;
syndrome
[
1
]
&&
syndrome
[
2
]
&&
syndrome
[
3
]
:
correction
=
1
<<
3
;
syndrome
[
0
]
&&
syndrome
[
1
]
&&
syndrome
[
4
]
:
correction
=
1
<<
4
;
syndrome
[
0
]
&&
syndrome
[
2
]
&&
syndrome
[
4
]
:
correction
=
1
<<
5
;
syndrome
[
1
]
&&
syndrome
[
2
]
&&
syndrome
[
4
]
:
correction
=
1
<<
6
;
syndrome
[
0
]
&&
syndrome
[
3
]
&&
syndrome
[
4
]
:
correction
=
1
<<
7
;
default
:
correction
=
0
;
endcase
And any syndromes with one or two bits set are correctable, and otherwise uncorrectable.
References / further reading
Error correction code
, Wikipedia.
Hamming code
, Wikipedia.
ECC
memory
, Wikipedia.
Error detecting and error correcting codes (
PDF
)
,
R. W.
Hamming, in The Bell System Technical Journal, vol. 29, no. 2, pp. 147-160, April 1950.
Constructing an Error Correcting Code (
PDF
)
,
Andrew E. Phelps, University of Wisconsin, Madison, November 2006.
Please get in touch (mail @ this domain) with any
comments, corrections or suggestions. |
| Markdown | â°
- [about](https://www.jameswhanlon.com/index.html)
- [notes](https://www.jameswhanlon.com/notes.html)
- [code](https://www.jameswhanlon.com/code.html)
- [links](https://www.jameswhanlon.com/links.html)
- [archive](https://www.jameswhanlon.com/archive.html)
[James W. Hanlon](https://www.jameswhanlon.com/)
# Error-correcting codes
02 May 2020
[computing](https://www.jameswhanlon.com/tag/computing.html)
Error correcting codes (ECCs) are used in computer and communication systems to improve resiliency to bit flips caused by permanent hardware faults or transient conditions, such as neutron particles from cosmic rays, known generally as [soft errors](https://en.wikipedia.org/wiki/Soft_error). This note describes the principles of Hamming codes that underpin ECC schemes, ECC codes are constructed, focusing on single-error correction and double error detection, and how they are implemented.
ECCs work by adding additional redundant bits to be stored or transported with data. The bits are encoded as a function of the data in such a way that it is possible to detect erroneous bit flips and to correct them. The ratio of the number of data bits to the total number of bits encoded is called the *code rate*, with a rate of 1 being a an impossible encoding with no overhead.
## Simple ECCs
**Parity coding** adds a single bit that indicates whether the number of set bits in the data is odd or even. When the data and parity bit is accessed or received, the parity can be recomputed and compared. This is sufficient to detect any odd number of bit flips but not to correct them. For applications where the error rate is low, so that only single bit flips are likely and double bit flips are rare enough to be ignored, parity error detection is sufficient and desirable due to itâs low overhead (just a single bit) and simple implementation.
**Repetition coding** simply repeats each data bit a fixed number of times. When the encoded data is received, if each of the repeated bits are non identical, an error has occurred. With a repetition of two, single-bit errors can be detected but not corrected. With a repetition of three, single bit flips can be corrected by determining each data bit as the majority value in each triple, but double bit flips are undetectable and will cause an erroneous correction. Repetition codes are simple to implement but have a high overhead.
## Hamming codes
Hamming codes are an efficient family of codes using additional redundant bits to detect up to two-bit errors and correct single-bit errors (technically, they are *[linear error-correcting codes](https://en.wikipedia.org/wiki/Linear_code)*). In them, *check bits* are added to data bits to form a *codeword*, and the codeword is *valid* only when the check bits have been generated from the data bits, according to the Hamming code. The check bits are chosen so that there is a fixed *Hamming distance* between any two valid codewords (the number of positions in which bits differ).
When valid codewords have a Hamming distance of two, any single bit flip will invalidate the word and allow the error to be detected. For example, the valid codewords `00` and `11` are separated for single bit flips by the invalid codewords `01` and `10`. If either of the invalid words is obtained an error has occurred, but neither can be associated with a valid codeword. Two bit flips are undetectable since they always map to a valid codeword. Note that parity encoding is an example of a distance-two Hamming code.
```
```
With Hamming distance three, any single bit flip in a valid codeword makes an invalid one, and the invalid codeword is Hamming distance one from exactly one valid codeword. Using this, the valid codeword can be restored, enabling single error correction. Any two bit flips map to an invalid codeword, which would cause correction to the wrong valid codeword.
```
```
With Hamming distance four, two bit flips moves any valid codeword Hamming distance two from exactly two valid codewords, allowing detection of two flips but not correction. Single bit flips can be corrected as they were for distance three. Distance-four codes are widely used in computing, where is it often the case where single errors are frequent, double errors are rare and triple errors occur so rarely they can be ignored. These codes are referred to as âSECDED ECCâ (single error correction, double error detection).
```
```
Double errors can be corrected with a distance-five code, as well as enabling the detection of triple errors. In general, if a Hamming code can detect \$d\$ errors, it must have a minimum distance of \$d+1\$ so there is no way \$d\$ errors can change one valid codeword into another one. If a code can correct \$d\$ errors, it must have a minimum distance of \$2d+1\$ so that the originating code is always the closest one. The following table summarises Hamming codes.
| Distance | Max bits corrected | Max bits detected | |
|---|---|---|---|
| 2 | 0 | 1 | Single error detection (eg parity code) |
| 3 | 1 | 1 | Single error correction (eg triple repetition code) |
| 4 | 1 | 2 | Single error correction, double error detection (a âSECDEDâ code) |
| 5 | 2 | 2 | Double error correction |
| 6 | 2 | 3 | Double error correction, triple error detection |
## Creating a Hamming code
A codeword includes the data bits and checkbits. Each check bit corresponds to a subset of the data bits and it is set when the parity of those data bits is odd. To obtain a code with a particular Hamming distance, the number of check bits and their mapping to data bits must be chosen carefully.
To build a single-error correcting (SEC) code that requires Hamming distance three between valid codewords, it is necessary for:
- The mapping of each data bit to check bits is unique.
- Each data bit to map to at least two check bits.
To see why this works, consider two distinct codewords that necessarily must have different data bits. If the data bits differ by:
- **1 bit**, at least two check bits are flipped, giving a total of three different bits.
- **2 bits**, these will cause at least one flip in the check bits since any two data bits cannot share the same check-bit mapping (ie by taking the XOR of the two check bit patterns). This also gives a total of three different bits as required.
- **3 bits**, this is already sufficient to give a Hamming distance of three.
To build a SECDED code that requires Hamming distance of four between valid codewords, it is necessary for:
- The mapping of each data bit to check bits is unique.
- Each data bit to map to at least three check bits.
- Each check bit pattern to have an odd number of bits set.
Following a similar argument, consider two distinct codewords, data differing by:
- **1 bit** flips three check bits, giving a total of four different bits.
- **2 bits** flip check bits in two patterns, and since any two odd-length patterns must have at least two non-overlapping bits, the results is at least two flipped bits, giving a total of four different bits. For example:
```
```
- **3 bits** flip check bits in three patterns, and this time it is possible to overlap odd-length patterns in such a way that a minimum of 1 bit is flipped. For example:
```
```
- **4 bits** is already sufficient to provide a Hamming distance of four.
An example SEC code for eight data bits with four parity bits:
```
```
An example SECDED code for eight data bits with five parity bits:
```
```
Note that mappings of data bits to check bits can be chosen flexibly, providing they maintain the rules that set the Hamming distance. This flexibility is useful when implementing ECC to reduce the cost of calculating the check bits. In contrast, many descriptions of ECC that I have found in text books and on [Wikipedia](https://en.wikipedia.org/wiki/Hamming_code) describe a specific encoding that does not acknowledge this freedom. The encoding they describe allows the syndrome to be interpreted as the bit index of the single bit error, by the check bit in position \$i\$ covering data bits in position \$i\$. Additionally, they specify that parity bits are positioned in the codeword at power-of-two positions, for no apparent benefit.
## Implementing ECC
Given data bits and check bits, and mapping of data bits to check bits, ECC encoding works by calculating the check bits from the data bits, then combining data bits and check bits to form the codeword. Decoding works by taking the data bits from a codeword, recalculating the check bits, then calculating the bitwise XOR between the original check bits and the recalculated ones. This value is called the *syndrome*. By inspecting the number of bits set in the syndrome, it is possible to determine whether there has been an error, whether it is correctable, and how to correct it.
Using the SEC check-bit encoding above, creating a codeword from `data[7:0]`, the check bits are calculated as follows (using Verilog syntax):
```
```
And the codeword formed by concatenating the check bits and data:
```
assign codeword = {check[3:0], data[7:0]};
```
Decoding of a codeword, splits it into the checkword and data bits, recomputes the check bits and calculates the syndrome:
```
```
When single bit errors occur, the syndrome will have the bit pattern corresponding to a particular data bit, so a correction can be applied by creating a mask to flip the bit in that position:
```
```
And using it to generate the corrected data:
```
assign corrected_data = data ^ correction;
```
The value of the syndrome can be further inspected to signal what action has been taken. If the syndrome is:
- Equal to zero, no error occurred.
- Has one bit set, then this is a flip of a check bit and can be ignored.
- Has a value matching a pattern (three bits set or two bits in the adjacent positions), a correctable error occurred.
- Has a value not matching a pattern (two bits set in the other non-adjacent positions: `4'b1010`, `4'b0101`), or four bits set, a multi-bit uncorrectable error occurred.
The above SECDED check-bit encoding can be implemented in a similar way, but since it uses only three-bit patterns, mapping syndromes to correction masks can be done with three-input AND gates:
```
```
And any syndromes with one or two bits set are correctable, and otherwise uncorrectable.
## References / further reading
- [Error correction code](https://en.wikipedia.org/wiki/Error_correction_code), Wikipedia.
- [Hamming code](https://en.wikipedia.org/wiki/Hamming_code), Wikipedia.
- [ECC memory](https://en.wikipedia.org/wiki/ECC_memory), Wikipedia.
- [Error detecting and error correcting codes (PDF)](https://signallake.com/innovation/hamming.pdf), R. W. Hamming, in The Bell System Technical Journal, vol. 29, no. 2, pp. 147-160, April 1950.
- [Constructing an Error Correcting Code (PDF)](http://pages.cs.wisc.edu/~david/courses/cs552/S12/handouts/ecc-phelps.pdf), Andrew E. Phelps, University of Wisconsin, Madison, November 2006.
Please get in touch (mail @ this domain) with any comments, corrections or suggestions.
***
[](http://creativecommons.org/licenses/by/4.0/)
Unless otherwise noted, all content is freely available under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
The views expressed on this website are the author's personal views and should not be attributed to any other person, including that of their employer.
Subscribe: [Atom](https://jameswhanlon.com/reeds/atom.xml) / [RSS](https://jameswhanlon.com/reeds/rss.xml) |
| Readable Markdown | Error correcting codes (ECCs) are used in computer and communication systems to improve resiliency to bit flips caused by permanent hardware faults or transient conditions, such as neutron particles from cosmic rays, known generally as [soft errors](https://en.wikipedia.org/wiki/Soft_error). This note describes the principles of Hamming codes that underpin ECC schemes, ECC codes are constructed, focusing on single-error correction and double error detection, and how they are implemented.
ECCs work by adding additional redundant bits to be stored or transported with data. The bits are encoded as a function of the data in such a way that it is possible to detect erroneous bit flips and to correct them. The ratio of the number of data bits to the total number of bits encoded is called the *code rate*, with a rate of 1 being a an impossible encoding with no overhead.
## Simple ECCs
**Parity coding** adds a single bit that indicates whether the number of set bits in the data is odd or even. When the data and parity bit is accessed or received, the parity can be recomputed and compared. This is sufficient to detect any odd number of bit flips but not to correct them. For applications where the error rate is low, so that only single bit flips are likely and double bit flips are rare enough to be ignored, parity error detection is sufficient and desirable due to itâs low overhead (just a single bit) and simple implementation.
**Repetition coding** simply repeats each data bit a fixed number of times. When the encoded data is received, if each of the repeated bits are non identical, an error has occurred. With a repetition of two, single-bit errors can be detected but not corrected. With a repetition of three, single bit flips can be corrected by determining each data bit as the majority value in each triple, but double bit flips are undetectable and will cause an erroneous correction. Repetition codes are simple to implement but have a high overhead.
## Hamming codes
Hamming codes are an efficient family of codes using additional redundant bits to detect up to two-bit errors and correct single-bit errors (technically, they are *[linear error-correcting codes](https://en.wikipedia.org/wiki/Linear_code)*). In them, *check bits* are added to data bits to form a *codeword*, and the codeword is *valid* only when the check bits have been generated from the data bits, according to the Hamming code. The check bits are chosen so that there is a fixed *Hamming distance* between any two valid codewords (the number of positions in which bits differ).
When valid codewords have a Hamming distance of two, any single bit flip will invalidate the word and allow the error to be detected. For example, the valid codewords `00` and `11` are separated for single bit flips by the invalid codewords `01` and `10`. If either of the invalid words is obtained an error has occurred, but neither can be associated with a valid codeword. Two bit flips are undetectable since they always map to a valid codeword. Note that parity encoding is an example of a distance-two Hamming code.
```
```
With Hamming distance three, any single bit flip in a valid codeword makes an invalid one, and the invalid codeword is Hamming distance one from exactly one valid codeword. Using this, the valid codeword can be restored, enabling single error correction. Any two bit flips map to an invalid codeword, which would cause correction to the wrong valid codeword.
```
```
With Hamming distance four, two bit flips moves any valid codeword Hamming distance two from exactly two valid codewords, allowing detection of two flips but not correction. Single bit flips can be corrected as they were for distance three. Distance-four codes are widely used in computing, where is it often the case where single errors are frequent, double errors are rare and triple errors occur so rarely they can be ignored. These codes are referred to as âSECDED ECCâ (single error correction, double error detection).
```
```
Double errors can be corrected with a distance-five code, as well as enabling the detection of triple errors. In general, if a Hamming code can detect \$d\$ errors, it must have a minimum distance of \$d+1\$ so there is no way \$d\$ errors can change one valid codeword into another one. If a code can correct \$d\$ errors, it must have a minimum distance of \$2d+1\$ so that the originating code is always the closest one. The following table summarises Hamming codes.
| Distance | Max bits corrected | Max bits detected | |
|---|---|---|---|
| 2 | 0 | 1 | Single error detection (eg parity code) |
| 3 | 1 | 1 | Single error correction (eg triple repetition code) |
| 4 | 1 | 2 | Single error correction, double error detection (a âSECDEDâ code) |
| 5 | 2 | 2 | Double error correction |
| 6 | 2 | 3 | Double error correction, triple error detection |
## Creating a Hamming code
A codeword includes the data bits and checkbits. Each check bit corresponds to a subset of the data bits and it is set when the parity of those data bits is odd. To obtain a code with a particular Hamming distance, the number of check bits and their mapping to data bits must be chosen carefully.
To build a single-error correcting (SEC) code that requires Hamming distance three between valid codewords, it is necessary for:
- The mapping of each data bit to check bits is unique.
- Each data bit to map to at least two check bits.
To see why this works, consider two distinct codewords that necessarily must have different data bits. If the data bits differ by:
- **1 bit**, at least two check bits are flipped, giving a total of three different bits.
- **2 bits**, these will cause at least one flip in the check bits since any two data bits cannot share the same check-bit mapping (ie by taking the XOR of the two check bit patterns). This also gives a total of three different bits as required.
- **3 bits**, this is already sufficient to give a Hamming distance of three.
To build a SECDED code that requires Hamming distance of four between valid codewords, it is necessary for:
- The mapping of each data bit to check bits is unique.
- Each data bit to map to at least three check bits.
- Each check bit pattern to have an odd number of bits set.
Following a similar argument, consider two distinct codewords, data differing by:
- **1 bit** flips three check bits, giving a total of four different bits.
- **2 bits** flip check bits in two patterns, and since any two odd-length patterns must have at least two non-overlapping bits, the results is at least two flipped bits, giving a total of four different bits. For example:
```
```
- **3 bits** flip check bits in three patterns, and this time it is possible to overlap odd-length patterns in such a way that a minimum of 1 bit is flipped. For example:
```
```
- **4 bits** is already sufficient to provide a Hamming distance of four.
An example SEC code for eight data bits with four parity bits:
```
```
An example SECDED code for eight data bits with five parity bits:
```
```
Note that mappings of data bits to check bits can be chosen flexibly, providing they maintain the rules that set the Hamming distance. This flexibility is useful when implementing ECC to reduce the cost of calculating the check bits. In contrast, many descriptions of ECC that I have found in text books and on [Wikipedia](https://en.wikipedia.org/wiki/Hamming_code) describe a specific encoding that does not acknowledge this freedom. The encoding they describe allows the syndrome to be interpreted as the bit index of the single bit error, by the check bit in position \$i\$ covering data bits in position \$i\$. Additionally, they specify that parity bits are positioned in the codeword at power-of-two positions, for no apparent benefit.
## Implementing ECC
Given data bits and check bits, and mapping of data bits to check bits, ECC encoding works by calculating the check bits from the data bits, then combining data bits and check bits to form the codeword. Decoding works by taking the data bits from a codeword, recalculating the check bits, then calculating the bitwise XOR between the original check bits and the recalculated ones. This value is called the *syndrome*. By inspecting the number of bits set in the syndrome, it is possible to determine whether there has been an error, whether it is correctable, and how to correct it.
Using the SEC check-bit encoding above, creating a codeword from `data[7:0]`, the check bits are calculated as follows (using Verilog syntax):
```
```
And the codeword formed by concatenating the check bits and data:
```
assign codeword = {check[3:0], data[7:0]};
```
Decoding of a codeword, splits it into the checkword and data bits, recomputes the check bits and calculates the syndrome:
```
```
When single bit errors occur, the syndrome will have the bit pattern corresponding to a particular data bit, so a correction can be applied by creating a mask to flip the bit in that position:
```
```
And using it to generate the corrected data:
```
assign corrected_data = data ^ correction;
```
The value of the syndrome can be further inspected to signal what action has been taken. If the syndrome is:
- Equal to zero, no error occurred.
- Has one bit set, then this is a flip of a check bit and can be ignored.
- Has a value matching a pattern (three bits set or two bits in the adjacent positions), a correctable error occurred.
- Has a value not matching a pattern (two bits set in the other non-adjacent positions: `4'b1010`, `4'b0101`), or four bits set, a multi-bit uncorrectable error occurred.
The above SECDED check-bit encoding can be implemented in a similar way, but since it uses only three-bit patterns, mapping syndromes to correction masks can be done with three-input AND gates:
```
```
And any syndromes with one or two bits set are correctable, and otherwise uncorrectable.
## References / further reading
- [Error correction code](https://en.wikipedia.org/wiki/Error_correction_code), Wikipedia.
- [Hamming code](https://en.wikipedia.org/wiki/Hamming_code), Wikipedia.
- [ECC memory](https://en.wikipedia.org/wiki/ECC_memory), Wikipedia.
- [Error detecting and error correcting codes (PDF)](https://signallake.com/innovation/hamming.pdf), R. W. Hamming, in The Bell System Technical Journal, vol. 29, no. 2, pp. 147-160, April 1950.
- [Constructing an Error Correcting Code (PDF)](http://pages.cs.wisc.edu/~david/courses/cs552/S12/handouts/ecc-phelps.pdf), Andrew E. Phelps, University of Wisconsin, Madison, November 2006.
Please get in touch (mail @ this domain) with any comments, corrections or suggestions. |
| Shard | 171 (laksa) |
| Root Hash | 5485458964568390171 |
| Unparsed URL | com,jameswhanlon!www,/error-correcting-codes.html s443 |