ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.8 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://www.cs.fsu.edu/~burmeste/slideshow/chapter12/12-3.html |
| Last Crawled | 2026-03-24 05:57:16 (24 days ago) |
| First Indexed | 2019-07-09 23:50:02 (6 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Hash Functions |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Good hash functions should aim at the assumption of
s
imple
u
niform
h
ashing: each key is equally likely to hash into
any of the
m
slots. Draw keys , k, from U with probability P (k), so
SUH:
�
k
:h (k) = j
P (k) = 1/m , j = 0, ..., m-1
thus all hash values are equally likely
This is hard to use as P (k)'s are not usually known (even
approximately.) If the k's are U[0,1) and independent then h (k) =
�
k m
�
(m fixed) satisfies SUH.
Hash functions are usually chosen to be as "independent"
of patterns that may exist in the k's. Note: SUH is powerful, but often hash
functions need to do "better than" just SUH. Sometimes
"mixing" and "separation" are required, so that a close
k
1
k
2
should be far apart in their hash value.
Aside: Why such "separating" hash function? Cryptographic
verification of signatures.
My Will
file
�
hash
��
digest
�
sign
��
certificate
What if my brother could change my will slightly (add the word
"not") and not change the digest? Bad news for
me!
Even
though keys may not be integers, lets consider them to be integers.
Hash Functions
h (k) = k ( mod m )
What are good choices for m?
m
�
2
p
m
�
2
p
�
c (close to a power of 2)
m
�
10
p
(especially with
decimal keys)
m = prime is a good choice.
Note: If your U is well known, you could try to experimentally optimize
m. This is called the division method since:
k ( mod m ) = k - m (
�
k/m
�
)
Multiplication Method
h (k) =
�
m ( k A (mod 1) )
�
A is a real number 0 < A < 1
m is usually an integer ( 2
p
)
k A (mod 1) = k A -
�
k A
�
Example: m = 2
p
,
k x
�
A 2
w
�
= 2
w
r
1
+ r
0
2
p
( 2
w
r
1
+ r
0
) = 2
w+p
r
1
+ r
0
2
p
,
the p m.s.b.s here are h (k)
w
k
x
�
A 2
w
�
r
1
p
r
0
multiplying by 2
p
shifts r
0
by p bits
(creates an integer our of the p m.s.b.s of r
0
) What
choices of A re best? A: A should be irrational. What irrationals are
the most irrational?
A: A = a +
1
b +
1
c +
1
c + ...
with repeating continued fractions (solutions to quadratic
equations).
Universal Hashing:
With a fixed hash function there are keys that will hash poorly. We
previously solved bad worst case behavior by randomizing into average
cases: choose your hash functions randomly! This is called Universal
Hashing, to do this we must construct a family of hash functions to choose
from. H is such a family.
h
�
H h: U
�
{ 0,..., m-1 }
if for each pair x, y
� �
U # h
'
h (x) = h (y) is | H | / m .
This means that h
�
H randomly chosen will
give h (x) = h (y) (collision) with probability 1/m. This means that on
the average (with regard to the functions in H) we get SUH.
Theorem
: Let h
�
H. We hash n keys
into a table of size m, n
�
m. Then the number
of expected collisions for a key x is less than one.
Proof
: c
yz
= { 1 if h (y) = h (z), 0 otherwise }
E [ c
yz
] = 1/m (because
h was chosen randomly).
c
x
= total number of collisions with x
in T of size m with n keys.
E [ ( c
x
) ] =
�
y
�
T
E ( ( c
xy
) ] = ( n-1 ) (
1/m )
assumptions: y
�
x and ( n- 1 ) ( 1/m ) < 1
since n
�
m.
How can we design H a universal class?
.
Example: |T| = m, m prime. x = <x
0
, x
1
, x
2
,
...,x
r
> Bytes, Max value Byte < m
<a
0
, a
1
, a
2
, ...,a
r
>
a
i
randomly chosen from {0, 1, 2, ..., m-1}
h (x) =
r
�
i
= 0
a
i
x
i
( mod m )
H = U
a
{h
a
}, has m
r+1
members.
Theorem:
The class H is a universal class.
Proof:
Consider
x, y, can assume x
0
�
y
0
.
With {a
0
, ...,a
r
} given:
a
0
(x
0
-
y
0
)
�
r
--
�
i
=
1
a
i
(x
i
- y
i
) ( mod m )
Has only one a
0
that solves it. (write down h (x) = h (y) for a
0
).
Since m is prime:
a
0
�
r
-
�
i
= 1
a
i
(x
i -
y
i
)
(x
0 -
y
0
)
-1
(mod m)
This means a
0
can be found to cause a
collision each time. There are thus m
r
different collisions
here, one for each of the m
r
choices of <a
1
, a
2
,
a
3
, ...,a
r
>.
Since there are m
r+1
,
<a
0
, a
1
, a
2
, ...,a
r
>'s,
x and y collide with probability m
r
/m
r+1
= 1/m
��
H is Universal.
An aside on modular inversion: a
-1
( mod m ) is the integer
that solves:
a a
-1
= 1 ( mod m )
For a to have an inverse ( mod m ) it must be that : gcd ( a, m ) = 1,
i.e. a and m have no common factor.
One computes gcd ( a, m ) via the
Euclidean
algorithm
( will analyze this later this term.) A variant called the
Extended
Euclidean Algorithm:
given a, m produces gcd ( a, m ) = xa + ym. If
gcd ( a, m ) = 1 the x = a
-1
!
Method 2:
If m is prime, then a
m-1
��
1( mod m ) for any a. (This fact is the basis for probabilistic
primality testing.) thus a a
m-2
�
1
( mod m ) and a
-1
�
a
m-2
( mod m ). If modular multiplication is
Q
(1) , then what is the cost of modular exponentiation?
a
3
= a
11
=
(a
2
) a ( s - m )
a
4
= a
100
= (a
2
)
2
ss
a
5
= a
101
= (a
2
)
2
a s (
s - m )
a
10110
= a
22
= (((a
2
)
2
a)
2
a)
2
So starting from the next to the m.s. bit and working towards the l.s.
bit of the exponent, when you come to a '0'
��
square, and when you come to a '1'
�
square-multiply.
Why does this work?
Induction:
a
1
= a
a
10
= a
2
a
11
= a
3
Assume a
p
is correct
q = 2 p + 1
�
square-multiply
= 2 p
�
square
Cost:
Q
( lg p ) operations.
Note:
this is the "giant-step" algorithm. |
| Markdown | | | | | |
|---|---|---|---|
| [](https://www.cs.fsu.edu/~burmeste/slideshow/index.html) | [](https://www.cs.fsu.edu/~burmeste/slideshow/chapter12/toc.html) | [](https://www.cs.fsu.edu/~burmeste/slideshow/chapter12/12-2.html) | [](https://www.cs.fsu.edu/~burmeste/slideshow/chapter12/12-4.html) |
| | | | |
|---|---|---|---|
| | | | |
| | | | |
| �*k* :h (k) = j | P (k) = 1/m , j = 0, ..., m-1 | | |
| | | | |
| w | | k | |
| x | | � A 2w � | |
| | | | |
| r1 | p | r0 | |
| | | | |
| E \[ ( cx) \] = | | | |
| | | | |
| �y �T | E ( ( cxy) \] = ( n-1 ) ( 1/m ) | | |
| | | | |
| h (x) = | | | |
| | | | |
| | | | |
| *r* �*i* = 0 | ai xi ( mod m ) | | |
| | | | |
| a0 (x0 - y0 ) � | | | |
| | | | |
| *r* -- �*i* =1 | ai (xi \- yi ) ( mod m ) | | |
| | | | |
| a0 � | | | |
| | | | |
| *r* -�*i* = 1 | ai (xi - yi ) (x0 - y0 )\-1 (mod m) | | |
| | | | |
| [](https://www.cs.fsu.edu/~burmeste/slideshow/chapter12/12-2.html) | [](https://www.cs.fsu.edu/~burmeste/slideshow/chapter12/12-4.html) | [](https://www.cs.fsu.edu/~burmeste/slideshow/chapter12/12-3.html#Top_of_Page) | **Hash Tables** - 4 of 5 | |
| Readable Markdown | null |
| Shard | 19 (laksa) |
| Root Hash | 2399862591257072619 |
| Unparsed URL | edu,fsu!cs,www,/~burmeste/slideshow/chapter12/12-3.html s443 |