âšī¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | FAIL | download_stamp > now() - 6 MONTH | 8 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | http://www.eecs.umich.edu/courses/eecs373.w05/lecture/errorcode.html |
| Last Crawled | 2025-08-13 10:06:05 (8 months ago) |
| First Indexed | not set |
| HTTP Status Code | 200 |
| Meta Title | Error detecting and correcting codes |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text |
(Notes for EECS 373, Winter 2005)
Data can be corrupted in transmission or storage by a
variety of undesirable phenomenon, such as radio interference,
electrical noise, power surges, bad spots on disks or tapes,
or scratches or dirt on CD or DVD media.
It is useful to have a way to to detect (and sometimes correct)
such data corruption.
Errors come in several forms.
The most common situation is that a bit in a stream of
data gets flipped (a 0 becomes a 1 or a 1 becomes a 0).
It is also possible for a bit to get deleted,
or for an extra bit to be inserted.
In some situations, burst errors occur,
where several successive bits are affected.
Parity bit
We can detect single errors with a parity bit .
The parity bit is computed as the exclusive-OR (even parity) or
exclusive-NOR (odd parity) of all of the other bits in the word.
Thus, the resulting word with a parity bit will always have an even
(for even parity) or odd (for odd parity) number of 1 bits in it.
If a single bit is flipped in transmission or storage,
the received data will have the wrong parity,
so we will know something bad has happened.
Note that we can't tell which bit was corrupted
(or if it was just the parity bit that was corrupted).
Double errors go undetected, triple errors get
detected, quadruple errors don't, etc.
Random garbage has a 50% probability of being accepted as valid.
Overhead is small; if we put a parity bit on each byte,
add 1 bit for each 8, so data transmitted or stored grows
by 12.5%. Larger words reduce the overhead: 16 bit words: 6.25%,
32 bit words: 3.125%, 64 bit words: 1.5625%.
Original data plus correction bits form a codeword .
The codeword, generally larger than the original data,
is used as the representation for that data for transmission
or storage purposes.
An ordered pair notation is often used, (c,d) represents
a codeword of c bits encoding a data word of d bits.
Error correcting codes
What if just detecting errors isn't enough?
What if we want to find and fix the bad data.
Brute force repetition
Can repeat each bit three times:
00011011 becomes 000 000 000 111 111 000 111 111
Any single bit error can be corrected; just take
a majority vote on each group of three.
Double errors within a group will still corrupt the data.
Overhead is large; 8 bits became 24; 200% increase in data size.
Can extend to correct even more errors;
repeat each bit 5 times to correct up to 2 errors per group,
but even more overhead.
More efficient approaches to single error correction
Just repeating the bits is fairly inefficient.
We could do better if we could have a compact
way to figure out which bit got flipped (if any).
As the number of bits in a word gets large,
things are going to get very complicated very fast.
We need some systematic way to handle things.
Hamming distance
A key issue in designing any error correcting code
is making sure that any two valid codewords
are sufficiently dissimilar so that corruption of
a single bit (or possibly a small number of bits)
does not turn one valid code word into another.
To measure the distance between two codewords, we just
count the number of bits that differ between them.
If we are doing this in hardware or software,
we can just XOR the two codewords and count
the number of 1 bits in the result.
This count is called the Hamming distance
(Hamming, 1950).
The key significance of the hamming distance is that if two
codewords have a Hamming distance of d between them,
then it would take d single bit errors to turn one
of them into the other.
For a set of multiple codewords, the Hamming distance of the
set is the minimum distance between any pair of its members.
Minimum Hamming distance for error detection
To design a code that can detect d single bit errors,
the minimum Hamming distance for the set of codewords
must be d + 1 (or more).
That way, no set of d errors in a single bit
could turn one valid codeword into some other valid codeword.
Minimum Hamming distance for error correction
To design a code that can correct d single bit errors,
a minimum distance of 2 d + 1 is required.
That puts the valid codewords so far apart that even after
bit errors in d of the bits, it is still less
than half the distance to another valid codeword,
so the receiver will be able to determine what the
correct starting codeword was.
Pracical Hamming implementation
Here are a few useful references (all PDF format):
|
| Markdown | # Error detecting and correcting codes
*(Notes for EECS 373, Winter 2005)*
Data can be corrupted in transmission or storage by a variety of undesirable phenomenon, such as radio interference, electrical noise, power surges, bad spots on disks or tapes, or scratches or dirt on CD or DVD media. It is useful to have a way to to detect (and sometimes correct) such data corruption.
Errors come in several forms. The most common situation is that a bit in a stream of data gets flipped (a 0 becomes a 1 or a 1 becomes a 0). It is also possible for a bit to get deleted, or for an extra bit to be inserted. In some situations, burst errors occur, where several successive bits are affected.
## Parity bit
We can detect single errors with a **parity bit**. The parity bit is computed as the exclusive-OR (even parity) or exclusive-NOR (odd parity) of all of the other bits in the word. Thus, the resulting word with a parity bit will always have an even (for even parity) or odd (for odd parity) number of 1 bits in it. If a single bit is flipped in transmission or storage, the received data will have the wrong parity, so we will know something bad has happened.
Note that we can't tell which bit was corrupted (or if it was just the parity bit that was corrupted). Double errors go undetected, triple errors get detected, quadruple errors don't, etc. Random garbage has a 50% probability of being accepted as valid.
Overhead is small; if we put a parity bit on each byte, add 1 bit for each 8, so data transmitted or stored grows by 12.5%. Larger words reduce the overhead: 16 bit words: 6.25%, 32 bit words: 3.125%, 64 bit words: 1.5625%.
Original data plus correction bits form a **codeword**. The codeword, generally larger than the original data, is used as the representation for that data for transmission or storage purposes. An ordered pair notation is often used, (c,d) represents a codeword of c bits encoding a data word of d bits.
## Error correcting codes
What if just detecting errors isn't enough? What if we want to find and fix the bad data.
### Brute force repetition
Can repeat each bit three times: 00011011 becomes 000 000 000 111 111 000 111 111 Any single bit error can be corrected; just take a majority vote on each group of three. Double errors within a group will still corrupt the data. Overhead is large; 8 bits became 24; 200% increase in data size.
Can extend to correct even more errors; repeat each bit 5 times to correct up to 2 errors per group, but even more overhead.
### More efficient approaches to single error correction
Just repeating the bits is fairly inefficient. We could do better if we could have a compact way to figure out which bit got flipped (if any). As the number of bits in a word gets large, things are going to get very complicated very fast. We need some systematic way to handle things.
### Hamming distance
A key issue in designing any error correcting code is making sure that any two valid codewords are sufficiently dissimilar so that corruption of a single bit (or possibly a small number of bits) does not turn one valid code word into another. To measure the distance between two codewords, we just count the number of bits that differ between them. If we are doing this in hardware or software, we can just XOR the two codewords and count the number of 1 bits in the result. This count is called the **Hamming distance** (Hamming, 1950).
The key significance of the hamming distance is that if two codewords have a Hamming distance of *d* between them, then it would take *d* single bit errors to turn one of them into the other.
For a set of multiple codewords, the Hamming distance of the set is the minimum distance between any pair of its members.
### Minimum Hamming distance for error detection
To design a code that can detect *d* single bit errors, the minimum Hamming distance for the set of codewords must be *d* + 1 (or more). That way, no set of *d* errors in a single bit could turn one valid codeword into some other valid codeword.
### Minimum Hamming distance for error correction
To design a code that can correct *d* single bit errors, a minimum distance of 2*d* + 1 is required. That puts the valid codewords so far apart that even after bit errors in *d* of the bits, it is still less than half the distance to another valid codeword, so the receiver will be able to determine what the correct starting codeword was.
### Pracical Hamming implementation
Here are a few useful references (all PDF format):
- [Error Correction with Hamming Codes](http://www.eecs.umich.edu/courses/eecs373.w05/lecture/hamming1.pdf)
- [Calculating the Hamming Code](http://www.eecs.umich.edu/courses/eecs373.w05/lecture/hamming2.pdf)
- [Applying Hamming Code to blocks of data](http://www.eecs.umich.edu/courses/eecs373.w05/lecture/hamming3.pdf) |
| Readable Markdown | null |
| Shard | 11 (laksa) |
| Root Hash | 8403072594226036211 |
| Unparsed URL | edu,umich!eecs,www,/courses/eecs373.w05/lecture/errorcode.html h80 |