âčïž Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0 months ago (distributed domain, exempt) |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://en.wikipedia.org/wiki/Hash_function |
| Last Crawled | 2026-04-22 06:45:35 (2 hours ago) |
| First Indexed | 2013-08-08 16:25:40 (12 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Hash function - Wikipedia |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | "hashlink" redirects here. For the Haxe virtual machine, see
HashLink
.
This article is about a computer programming construct. For other meanings of "hash" and "hashing", see
Hash (disambiguation)
.
A hash function that maps names to integers from 0 to 15. There is a
collision
between keys "John Smith" and "Sandra Dee".
A
hash function
is any
function
that can be used to map
data
of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output.
[
1
]
The values returned by a hash function are called
hash values
,
hash codes
, (
hash/message
)
digests
,
[
2
]
or simply
hashes
. The values are usually used to index a fixed-size table called a
hash table
. Use of a hash function to index a hash table is called
hashing
or
scatter-storage addressing
.
Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a way to access data quickly and efficiently. Unlike lists or trees, it provides near-constant access time. It also uses much less storage than trying to store all possible keys directly, especially when keys are large or variable in length.
Use of hash functions relies on statistical properties of key and function interaction: worst-case behavior is intolerably bad but rare, and average-case behavior can be nearly optimal (minimal
collision
).
[
3
]
:â527â
Hash functions are related to (and often confused with)
checksums
,
check digits
,
fingerprints
,
lossy compression
,
randomization functions
,
error-correcting codes
, and
ciphers
. Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash function differs from these concepts mainly in terms of
data integrity
. Hash tables may use
non-cryptographic hash functions
, while
cryptographic hash functions
are used in cybersecurity to secure sensitive data such as passwords.
In a hash table, a hash function takes a key as an input, which is associated with a datum or record and used to identify it to the data storage and retrieval application. The keys may be fixed-length, like an integer, or variable-length, like a name. In some cases, the key is the datum itself. The output is a hash code used to index a hash table holding the data or records, or pointers to them.
A hash function may be considered to perform three functions:
Convert variable-length keys into fixed-length (usually
machine-word
-length or less) values, by folding them by words or other units using a
parity-preserving operator
like ADD or XOR,
Scramble the bits of the key so that the resulting values are uniformly distributed over the
keyspace
, and
Map the key values into ones less than or equal to the size of the table.
A good hash function satisfies two basic properties: it should be very fast to compute, and it should minimize duplication of output values (
collisions
). Hash functions rely on generating favorable
probability distributions
for their effectiveness, reducing access time to nearly constant. High table loading factors,
pathological
key sets, and poorly designed hash functions can result in access times approaching linear in the number of items in the table. Hash functions can be designed to give the best worst-case performance,
[
Notes 1
]
good performance under high table loading factors, and in special cases, perfect (collisionless) mapping of keys into hash codes. Implementation is based on parity-preserving bit operations (XOR and ADD), multiply, or divide. A necessary adjunct to the hash function is a collision-resolution method that employs an auxiliary data structure like
linked lists
, or systematic probing of the table to find an empty slot.
Hash functions are used in conjunction with
hash tables
to store and retrieve data items or data records. The hash function translates the key associated with each datum or record into a hash code, which is used to index the hash table. When an item is to be added to the table, the hash code may index an empty slot (also called a bucket), in which case the item is added to the table there. If the hash code indexes a full slot, then some kind of collision resolution is required: the new item may be omitted (not added to the table), or replace the old item, or be added to the table in some other location by a specified procedure. That procedure depends on the structure of the hash table. In
chained hashing
, each slot is the head of a linked list or chain, and items that collide at the slot are added to the chain. Chains may be kept in random order and searched linearly, or in serial order, or as a self-ordering list by frequency to speed up access. In
open address hashing
, the table is probed starting from the occupied slot in a specified manner, usually by
linear probing
,
quadratic probing
, or
double hashing
until an open slot is located or the entire table is probed (overflow). Searching for the item follows the same procedure until the item is located, an open slot is found, or the entire table has been searched (item not in table).
Hash functions are also used to build
caches
for large data sets stored in slow media. A cache is generally simpler than a hashed search table, since any collision can be resolved by discarding or writing back the older of the two colliding items.
[
4
]
Hash functions are an essential ingredient of the
Bloom filter
, a space-efficient
probabilistic
data structure
that is used to test whether an
element
is a member of a
set
.
A special case of hashing is known as
geometric hashing
or the
grid method
. In these applications, the set of all inputs is some sort of
metric space
, and the hashing function can be interpreted as a
partition
of that space into a grid of
cells
. The table is often an array with two or more indices (called a
grid file
,
grid index
,
bucket grid
, and similar names), and the hash function returns an index
tuple
. This principle is widely used in
computer graphics
,
computational geometry
, and many other disciplines, to solve many
proximity problems
in the
plane
or in
three-dimensional space
, such as finding
closest pairs
in a set of points, similar shapes in a list of shapes, similar
images
in an
image database
, and so on.
Hash tables are also used to implement
associative arrays
and
dynamic sets
.
[
5
]
A good hash function should map the expected inputs as evenly as possible over its output range. That is, every hash value in the output range should be generated with roughly the same
probability
. The reason for this last requirement is that the cost of hashing-based methods goes up sharply as the number of
collisions
âpairs of inputs that are mapped to the same hash valueâincreases. If some hash values are more likely to occur than others, then a larger fraction of the lookup operations will have to search through a larger set of colliding table entries.
This criterion only requires the value to be
uniformly distributed
, not
random
in any sense. A good randomizing function is (barring computational efficiency concerns) generally a good choice as a hash function, but the converse need not be true.
Hash tables often contain only a small subset of the valid inputs. For instance, a club membership list may contain only a hundred or so member names, out of the very large set of all possible names. In these cases, the uniformity criterion should hold for almost all typical subsets of entries that may be found in the table, not just for the global set of all possible entries.
In other words, if a typical set of
m
records is hashed to
n
table slots, then the probability of a bucket receiving many more than
m
/
n
records should be vanishingly small. In particular, if
m
<
n
, then very few buckets should have more than one or two records. A small number of collisions is virtually inevitable, even if
n
is much larger than
m
âsee the
birthday problem
.
In special cases when the keys are known in advance and the key set is static, a hash function can be found that achieves absolute (or collisionless) uniformity. Such a hash function is said to be
perfect
. There is no algorithmic way of constructing such a functionâsearching for one is a
factorial
function of the number of keys to be mapped versus the number of table slots that they are mapped into. Finding a perfect hash function over more than a very small set of keys is usually computationally infeasible; the resulting function is likely to be more computationally complex than a standard hash function and provides only a marginal advantage over a function with good statistical properties that yields a minimum number of collisions. See
universal hash function
.
Testing and measurement
[
edit
]
When testing a hash function, the uniformity of the distribution of hash values can be evaluated by the
chi-squared test
. This test is a goodness-of-fit measure: it is the actual distribution of items in buckets versus the expected (or uniform) distribution of items. The formula is
[
dubious
â
discuss
]
[
citation needed
]
where
n
is the number of keys,
m
is the number of buckets, and
b
j
is the number of items in bucket
j
.
A ratio within one confidence interval (such as 0.95 to 1.05) is indicative that the hash function evaluated has an expected uniform distribution.
Hash functions can have some technical properties that make it more likely that they will have a uniform distribution when applied. One is the
strict avalanche criterion
: whenever a single input bit is complemented, each of the output bits changes with a 50% probability. The reason for this property is that selected subsets of the keyspace may have low variability. For the output to be uniformly distributed, a low amount of variability, even one bit, should translate into a high amount of variability (i.e. distribution over the tablespace) in the output. Each bit should change with a probability of 50% because, if some bits are reluctant to change, then the keys become clustered around those values. If the bits want to change too readily, then the mapping is approaching a fixed XOR function of a single bit. Standard tests for this property have been described in the literature.
[
6
]
The relevance of the criterion to a multiplicative hash function is assessed here.
[
7
]
In data storage and retrieval applications, the use of a hash function is a trade-off between search time and data storage space. If search time were unbounded, then a very compact unordered linear list would be the best medium; if storage space were unbounded, then a randomly accessible structure indexable by the key-value would be very large and very sparse, but very fast. A hash function takes a finite amount of time to map a potentially large keyspace to a feasible amount of storage space searchable in a bounded amount of time regardless of the number of keys. In most applications, the hash function should be computable with minimum latency and secondarily in a minimum number of instructions.
Computational complexity varies with the number of instructions required and latency of individual instructions, with the simplest being the bitwise methods (folding), followed by the multiplicative methods, and the most complex (slowest) are the division-based methods.
Because collisions should be infrequent, and cause a marginal delay but are otherwise harmless, it is usually preferable to choose a faster hash function over one that needs more computation but saves a few collisions.
Division-based implementations can be of particular concern because a division requires multiple cycles on nearly all processor
microarchitectures
. Division (
modulo
) by a constant can be inverted to become a multiplication by the word-size multiplicative-inverse of that constant. This can be done by the programmer, or by the compiler. Division can also be reduced directly into a series of shift-subtracts and shift-adds, though minimizing the number of such operations required is a daunting problem; the number of machine-language instructions resulting may be more than a dozen and swamp the pipeline. If the microarchitecture has
hardware multiply
functional units
, then the multiply-by-inverse is likely a better approach.
We can allow the table size
n
to not be a power of 2 and still not have to perform any remainder or division operation, as these computations are sometimes costly. For example, let
n
be significantly less than
2
b
. Consider a
pseudorandom number generator
function
P
(key)
that is uniform on the interval
[0, 2
b
 â 1]
. A hash function uniform on the interval
[0,
n
â 1]
is
n
P
(key) / 2
b
. We can replace the division by a (possibly faster) right
bit shift
:
n P
(key) >>
b
.
If keys are being hashed repeatedly, and the hash function is costly, then computing time can be saved by precomputing the hash codes and storing them with the keys. Matching hash codes almost certainly means that the keys are identical. This technique is used for the transposition table in game-playing programs, which stores a 64-bit hashed representation of the board position.
A
universal hashing
scheme is a
randomized algorithm
that selects a hash function
h
among a family of such functions, in such a way that the probability of a collision of any two distinct keys is
1/
m
, where
m
is the number of distinct hash values desiredâindependently of the two keys. Universal hashing ensures (in a probabilistic sense) that the hash
function application
will behave as well as if it were using a random function, for any distribution of the input data. It will, however, have more collisions than perfect hashing and may require more operations than a special-purpose hash function.
A hash function that allows only certain table sizes or strings only up to a certain length, or cannot accept a seed (i.e. allow double hashing) is less useful than one that does.
[
citation needed
]
A hash function is applicable in a variety of situations. Particularly within cryptography, notable applications include:
[
8
]
Integrity checking
: Identical hash values for different files imply equality, providing a reliable means to detect file modifications.
Key derivation
: Minor input changes result in a random-looking output alteration, known as the diffusion property. Thus, hash functions are valuable for key derivation functions.
Message authentication codes
(MACs): Through the integration of a confidential key with the input data, hash functions can generate MACs ensuring the genuineness of the data, such as in
HMACs
.
Password storage: The password's hash value does not expose any password details, emphasizing the importance of securely storing hashed passwords on the server.
Signatures
: Message hashes are signed rather than the whole message.
A hash procedure must be
deterministic
âfor a given input value, it must always generate the same hash value. In other words, it must be a
function
of the data to be hashed, in the mathematical sense of the term. This requirement excludes hash functions that depend on external variable parameters, such as
pseudo-random number generators
or the time of day. It also excludes functions that depend on the
memory address
of the object being hashed, because the address may change during execution (as may happen on systems that use certain methods of
garbage collection
), although sometimes rehashing of the item is possible.
The determinism is in the context of the reuse of the function. For example,
Python
adds the feature that hash functions make use of a randomized seed that is generated once when the Python process starts in addition to the input to be hashed.
[
9
]
The Python hash (
SipHash
) is still a valid hash function when used within a single run, but if the values are persisted (for example, written to disk), they can no longer be treated as valid hash values, since in the next run the random value might differ.
It is often desirable that the output of a hash function have fixed size (but see below). If, for example, the output is constrained to 32-bit integer values, then the hash values can be used to index into an array. Such hashing is commonly used to accelerate data searches.
[
10
]
Producing fixed-length output from variable-length input can be accomplished by breaking the input data into chunks of specific size. Hash functions used for data searches use some arithmetic expression that iteratively processes chunks of the input (such as the characters in a string) to produce the hash value.
[
10
]
In many applications, the range of hash values may be different for each run of the program or may change along the same run (for instance, when a hash table needs to be expanded). In those situations, one needs a hash function which takes two parametersâthe input data
z
, and the number
n
of allowed hash values.
A common solution is to compute a fixed hash function with a very large range (say,
0
to
2
32
 â 1
), divide the result by
n
, and use the division's
remainder
. If
n
is itself a power of
2
, this can be done by
bit masking
and
bit shifting
. When this approach is used, the hash function must be chosen so that the result has fairly uniform distribution between
0
and
n
 â 1
, for any value of
n
that may occur in the application. Depending on the function, the remainder may be uniform only for certain values of
n
, e.g.
odd
or
prime numbers
.
Variable range with minimal movement (dynamic hash function)
[
edit
]
When the hash function is used to store values in a hash table that outlives the run of the program, and the hash table needs to be expanded or shrunk, the hash table is referred to as a dynamic hash table.
A hash function that will relocate the minimum number of records when the table is resized is desirable. What is needed is a hash function
H
(
z
,
n
)
(where
z
is the key being hashed and
n
is the number of allowed hash values) such that
H
(
z
,
n
 + 1) =
H
(
z
,
n
)
with probability close to
n
/(
n
 + 1)
.
Linear hashing
and
spiral hashing
are examples of dynamic hash functions that execute in constant time but relax the property of uniformity to achieve the minimal movement property.
Extendible hashing
uses a dynamic hash function that requires space proportional to
n
to compute the hash function, and it becomes a function of the previous keys that have been inserted. Several algorithms that preserve the uniformity property but require time proportional to
n
to compute the value of
H
(
z
,
n
)
have been invented.
[
clarification needed
]
A hash function with minimal movement is especially useful in
distributed hash tables
.
In some applications, the input data may contain features that are irrelevant for comparison purposes. For example, when looking up a personal name, it may be desirable to ignore the distinction between upper and lower case letters. For such data, one must use a hash function that is compatible with the data
equivalence
criterion being used: that is, any two inputs that are considered equivalent must yield the same hash value. This can be accomplished by normalizing the input before hashing it, as by upper-casing all letters.
Hashing integer data types
[
edit
]
There are several common algorithms for hashing integers. The method giving the best distribution is data-dependent. One of the simplest and most common methods in practice is the modulo division method.
Identity hash function
[
edit
]
If the data to be hashed is small enough, then one can use the data itself (reinterpreted as an integer) as the hashed value. The cost of computing this
identity
hash function is effectively zero. This hash function is
perfect
, as it maps each input to a distinct hash value.
The meaning of "small enough" depends on the size of the type that is used as the hashed value. For example, in
Java
, the hash code is a 32-bit integer. Thus the 32-bit integer
Integer
and 32-bit floating-point
Float
objects can simply use the value directly, whereas the 64-bit integer
Long
and 64-bit floating-point
Double
cannot.
Other types of data can also use this hashing scheme. For example, when mapping
character strings
between
upper and lower case
, one can use the binary encoding of each character, interpreted as an integer, to index a table that gives the alternative form of that character ("A" for "a", "8" for "8", etc.). If each character is stored in 8 bits (as in
extended ASCII
[
Notes 2
]
or
ISO Latin 1
), the table has only 2
8
= 256 entries; in the case of
Unicode
characters, the table would have 17 Ă 2
16
=
1
114
112
entries.
The same technique can be used to map
two-letter country codes
like "us" or "za" to country names (26
2
= 676 table entries), 5-digit
ZIP codes
like 13083 to city names (
100
000
entries), etc. Invalid data values (such as the country code "xx" or the ZIP code 00000) may be left undefined in the table or mapped to some appropriate "null" value.
Trivial hash function
[
edit
]
If the keys are uniformly or sufficiently uniformly distributed over the key space, so that the key values are essentially random, then they may be considered to be already "hashed". In this case, any number of any bits in the key may be extracted and collated as an index into the hash table. For example, a simple hash function might mask off the
m
least significant bits and use the result as an index into a hash table of size
2
m
.
A mid-squares hash code is produced by squaring the input and extracting an appropriate number of middle digits or bits. For example, if the input is
123
456
789
and the hash table size
10
000
, then squaring the key produces
15
241
578
750
190
521
, so the hash code is taken as the middle 4 digits of the 17-digit number (ignoring the high digit) 8750. The mid-squares method produces a reasonable hash code if there is not a lot of leading or trailing zeros in the key. This is a variant of multiplicative hashing, but not as good because an arbitrary key is not a good multiplier.
A standard technique is to use a modulo function on the key, by selecting a divisor
M
which is a prime number close to the table size, so
h
(
K
) âĄ
K
(mod
M
)
. The table size is usually a power of 2. This gives a distribution from
{0,
M
â 1}
. This gives good results over a large number of key sets. A significant drawback of division hashing is that division requires multiple cycles on most modern architectures (including
x86
) and can be 10 times slower than multiplication. A second drawback is that it will not break up clustered keys. For example, the keys 123000, 456000, 789000, etc. modulo 1000 all map to the same address. This technique works well in practice because many key sets are sufficiently random already, and the probability that a key set will be cyclical by a large prime number is small.
Algebraic coding is a variant of the division method of hashing which uses division by a polynomial modulo 2 instead of an integer to map
n
bits to
m
bits.
[
3
]
:â512â513â
In this approach,
M
= 2
m
, and we postulate an
m
th-degree polynomial
Z
(
x
) =
x
m
+ ζ
m
â1
x
m
â1
+ ⯠+ ζ
0
. A key
K
= (
k
n
â1
âŠ
k
1
k
0
)
2
can be regarded as the polynomial
K
(
x
) =
k
n
â1
x
n
â1
+ ⯠+
k
1
x
+
k
0
. The remainder using polynomial arithmetic modulo 2 is
K
(
x
) mod
Z
(
x
) =
h
m
â1
x
m
â1
+ âŻ
h
1
x
+
h
0
. Then
h
(
K
) = (
h
m
â1
âŠ
h
1
h
0
)
2
. If
Z
(
x
)
is constructed to have
t
or fewer non-zero coefficients, then keys which share fewer than
t
bits are guaranteed to not collide.
Z
is a function of
k
,
t
, and
n
(the last of which is a divisor of
2
k
â 1
) and is constructed from the
finite field
GF(2
k
)
.
Knuth
gives an example: taking
(
n
,
m
,
t
) = (15,10,7)
yields
Z
(
x
) =
x
10
+
x
8
+
x
5
+
x
4
+
x
2
+
x
+ 1
. The derivation is as follows:
Let
S
be the smallest set of integers such that
{1,2,âŠ,
t
} â
S
and
(2
j
mod
n
) â
S
â
j
â
S
.
[
Notes 3
]
Define
where
α â
n
GF(2
k
)
and where the coefficients of
P
(
x
)
are computed in this field. Then the degree of
P
(
x
) = |
S
|
. Since
α
2
j
is a root of
P
(
x
)
whenever
α
j
is a root, it follows that the coefficients
p
i
of
P
(
x
)
satisfy
p
2
i
=
p
i
, so they are all 0 or 1. If
R
(
x
) =
r
n
â1
x
n
â1
+ ⯠+
r
1
x
+
r
0
is any nonzero polynomial modulo 2 with at most
t
nonzero coefficients, then
R
(
x
)
is not a multiple of
P
(
x
)
modulo 2.
[
Notes 4
]
If follows that the corresponding hash function will map keys with fewer than
t
bits in common to unique indices.
[
3
]
:â542â543â
The usual outcome is that either
n
will get large, or
t
will get large, or both, for the scheme to be computationally feasible. Therefore, it is more suited to hardware or microcode implementation.
[
3
]
:â542â543â
Unique permutation hashing
[
edit
]
Unique permutation hashing has a guaranteed best worst-case insertion time.
[
11
]
Multiplicative hashing
[
edit
]
Standard multiplicative hashing uses the formula
h
a
(
K
) =
â
(
aK
mod
W
) / (
W
/
M
)
â
, which produces a hash value in
{0, âŠ,
M
â 1}
. The value
a
is an appropriately chosen value that should be
relatively prime
to
W
; it should be large,
[
clarification needed
]
and its binary representation a random mix
[
clarification needed
]
of 1s and 0s. An important practical special case occurs when
W
= 2
w
and
M
= 2
m
are powers of 2 and
w
is the machine
word size
. In this case, this formula becomes
h
a
(
K
) =
â
(
aK
mod 2
w
) / 2
w
â
m
â
. This is special because arithmetic modulo
2
w
is done by default in low-level programming languages and integer division by a power of 2 is simply a right-shift, so, in
C
, for example, this function becomes
unsigned
hash
(
unsigned
K
)
{
return
(
a
*
K
)
>>
(
w
-
m
);
}
and for fixed
m
and
w
this translates into a single integer multiplication and right-shift, making it one of the fastest hash functions to compute.
Multiplicative hashing is susceptible to a "common mistake" that leads to poor diffusionâhigher-value input bits do not affect lower-value output bits.
[
12
]
A transmutation on the input which shifts the span of retained top bits down and XORs or ADDs them to the key before the multiplication step corrects for this. The resulting function looks like:
[
7
]
unsigned
hash
(
unsigned
K
)
{
K
^=
K
>>
(
w
-
m
);
return
(
a
*
K
)
>>
(
w
-
m
);
}
Fibonacci
hashing is a form of multiplicative hashing in which the multiplier is
2
w
/ Ï
, where
w
is the machine word length and
Ï
(phi) is the
golden ratio
(approximately 1.618). A property of this multiplier is that it uniformly distributes over the table space,
blocks
of consecutive keys with respect to any block of bits in the key. Consecutive keys within the high bits or low bits of the key (or some other field) are relatively common. The multipliers for various word lengths are:
16:
a
=
9E37
16
=
40
503
10
32:
a
=
9E37
79B9
16
=
2
654
435
769
10
48:
a
=
9E37
79B9
7F4B
16
=
173
961
102
589
771
10
[
Notes 5
]
64:
a
=
9E37
79B9
7F4A
7C15
16
=
11
400
714
819
323
198
485
10
The multiplier should be odd, so the least significant bit of the output is invertible modulo
2
w
. The last two values given above are rounded (up and down, respectively) by more than 1/2 of a least-significant bit to achieve this.
Zobrist hashing
, named after
Albert Zobrist
, is a form of
tabulation hashing
, which is a method for constructing universal families of hash functions by combining table lookup with XOR operations. This algorithm has proven to be very fast and of high quality for hashing purposes (especially hashing of integer-number keys).
[
13
]
Zobrist hashing was originally introduced as a means of compactly representing chess positions in computer game-playing programs. A unique random number was assigned to represent each type of piece (six each for black and white) on each space of the board. Thus a table of 64Ă12 such numbers is initialized at the start of the program. The random numbers could be any length, but 64 bits was natural due to the 64 squares on the board. A position was transcribed by cycling through the pieces in a position, indexing the corresponding random numbers (vacant spaces were not included in the calculation) and XORing them together (the starting value could be 0 (the identity value for XOR) or a random seed). The resulting value was reduced by modulo, folding, or some other operation to produce a hash table index. The original Zobrist hash was stored in the table as the representation of the position.
Later, the method was extended to hashing integers by representing each byte in each of 4 possible positions in the word by a unique 32-bit random number. Thus, a table of 2
8
Ă4 random numbers is constructed. A 32-bit hashed integer is transcribed by successively indexing the table with the value of each byte of the plain text integer and XORing the loaded values together (again, the starting value can be the identity value or a random seed). The natural extension to 64-bit integers is by use of a table of 2
8
Ă8 64-bit random numbers.
This kind of function has some nice theoretical properties, one of which is called
3-tuple independence
, meaning that every 3-tuple of keys is equally likely to be mapped to any 3-tuple of hash values.
Customized hash function
[
edit
]
A hash function can be designed to exploit existing entropy in the keys. If the keys have leading or trailing zeros, or particular fields that are unused, always zero or some other constant, or generally vary little, then masking out only the volatile bits and hashing on those will provide a better and possibly faster hash function. Selected divisors or multipliers in the division and multiplicative schemes may make more uniform hash functions if the keys are cyclic or have other redundancies.
Hashing variable-length data
[
edit
]
When the data values are long (or variable-length)
character strings
âsuch as personal names,
web page addresses
, or mail messagesâtheir distribution is usually very uneven, with complicated dependencies. For example, text in any
natural language
has highly non-uniform distributions of
characters
, and
character pairs
, characteristic of the language. For such data, it is prudent to use a hash function that depends on all characters of the stringâand depends on each character in a different way.
[
clarification needed
]
Simplistic hash functions may add the first and last
n
characters of a string along with the length, or form a word-size hash from the middle 4 characters of a string. This saves iterating over the (potentially long) string, but hash functions that do not hash on all characters of a string can readily become linear due to redundancies, clustering, or other pathologies in the key set. Such strategies may be effective as a custom hash function if the structure of the keys is such that either the middle, ends, or other fields are zero or some other invariant constant that does not differentiate the keys; then the invariant parts of the keys can be ignored.
The paradigmatic example of folding by characters is to add up the integer values of all the characters in the string. A better idea is to multiply the hash total by a constant, typically a sizable prime number, before adding in the next character, ignoring overflow. Using exclusive-or instead of addition is also a plausible alternative. The final operation would be a modulo, mask, or other function to reduce the word value to an index the size of the table. The weakness of this procedure is that information may cluster in the upper or lower bits of the bytes; this clustering will remain in the hashed result and cause more collisions than a proper randomizing hash. ASCII byte codes, for example, have an upper bit of 0, and printable strings do not use the last byte code or most of the first 32 byte codes, so the information, which uses the remaining byte codes, is clustered in the remaining bits in an unobvious manner.
The classic approach, dubbed the
PJW hash
based on the work of
Peter J. Weinberger
at
Bell Labs
in the 1970s, was originally designed for hashing identifiers into compiler symbol tables as given in the
"Dragon Book"
.
[
14
]
This hash function offsets the bytes 4 bits before adding them together. When the quantity wraps, the high 4 bits are shifted out and if non-zero,
xored
back into the low byte of the cumulative quantity. The result is a word-size hash code to which a modulo or other reducing operation can be applied to produce the final hash index.
Today, especially with the advent of 64-bit word sizes, much more efficient variable-length string hashing by word chunks is available.
Word length folding
[
edit
]
Modern microprocessors will allow for much faster processing if 8-bit character strings are not hashed by processing one character at a time, but by interpreting the string as an array of 32-bit or 64-bit integers and hashing/accumulating these "wide word" integer values by means of arithmetic operations (e.g. multiplication by constant and bit-shifting). The final word, which may have unoccupied byte positions, is filled with zeros or a specified randomizing value before being folded into the hash. The accumulated hash code is reduced by a final modulo or other operation to yield an index into the table.
Radix conversion hashing
[
edit
]
Analogous to the way an ASCII or
EBCDIC
character string representing a decimal number is converted to a numeric quantity for computing, a variable-length string can be converted as
x
k
â1
a
k
â1
+
x
k
â2
a
k
â2
+ ⯠+
x
1
a
+
x
0
. This is simply a polynomial in a
radix
a
 > 1
that takes the components
(
x
0
,
x
1
,...,
x
k
â1
)
as the characters of the input string of length
k
. It can be used directly as the hash code, or a hash function applied to it to map the potentially large value to the hash table size. The value of
a
is usually a prime number large enough to hold the number of different characters in the character set of potential keys. Radix conversion hashing of strings minimizes the number of collisions.
[
15
]
Available data sizes may restrict the maximum length of string that can be hashed with this method. For example, a 128-bit word will hash only a 26-character alphabetic string (ignoring case) with a radix of 29; a printable ASCII string is limited to 9 characters using radix 97 and a 64-bit word. However, alphabetic keys are usually of modest length, because keys must be stored in the hash table. Numeric character strings are usually not a problem; 64 bits can count up to
10
19
, or 19 decimal digits with radix 10.
In some applications, such as
substring search
, one can compute a hash function
h
for every
k
-character
substring
of a given
n
-character string by advancing a window of width
k
characters along the string, where
k
is a fixed integer, and
n
>
k
. The straightforward solution, which is to extract such a substring at every character position in the text and compute
h
separately, requires a number of operations proportional to
k
·
n
. However, with the proper choice of
h
, one can use the technique of rolling hash to compute all those hashes with an effort proportional to
mk
 +Â
n
where
m
is the number of occurrences of the substring.
[
16
]
[
what is the choice of h?
]
The most familiar algorithm of this type is
Rabin-Karp
with best and average case performance
O
(
n
+
mk
)
and worst case
O
(
n
·
k
)
(in all fairness, the worst case here is gravely pathological: both the text string and substring are composed of a repeated single character, such as
t
="AAAAAAAAAAA", and
s
="AAA"). The hash function used for the algorithm is usually the
Rabin fingerprint
, designed to avoid collisions in 8-bit character strings, but other suitable hash functions are also used.
Worst case results for a hash function can be assessed two ways: theoretical and practical. The theoretical worst case is the probability that all keys map to a single slot. The practical worst case is the expected longest probe sequence (hash function + collision resolution method). This analysis considers uniform hashing, that is, any key will map to any particular slot with probability
1/
m
, a characteristic of universal hash functions.
While
Knuth
worries about adversarial attack on real time systems,
[
24
]
Gonnet has shown that the probability of such a case is "ridiculously small". His representation was that the probability of
k
of
n
keys mapping to a single slot is
α
k
/ (
e
α
k
!)
, where
α
is the load factor,
n
/
m
.
[
25
]
The term
hash
offers a natural analogy with its non-technical meaning (to chop up or make a mess out of something), given how hash functions scramble their input data to derive their output.
[
26
]
:â514â
In his research for the precise origin of the term,
Donald Knuth
notes that, while
Hans Peter Luhn
of
IBM
appears to have been the first to use the concept of a hash function in a memo dated January 1953, the term itself did not appear in published literature until the late 1960s, in Herbert Hellerman's
Digital Computer System Principles
, even though it was already widespread jargon by then.
[
26
]
:â547â548â
Look up
hash
in Wiktionary, the free dictionary.
List of hash functions
Nearest neighbor search
Distributed hash table
Identicon
Low-discrepancy sequence
Transposition table
^
This is useful in cases where keys are devised by a malicious agent, for example in pursuit of a DOS attack.
^
Plain
ASCII
is a 7-bit character encoding, although it is often stored in 8-bit bytes with the highest-order bit always clear (zero). Therefore, for plain ASCII, the bytes have only 2
7
= 128 valid values, and the character translation table has only this many entries.
^
For example, for n=15, k=4, t=6,
[Knuth]
^
Knuth conveniently leaves the proof of this to the reader.
^
Unisys large systems.
^
Aggarwal, Kirti; Verma, Harsh K. (March 19, 2015).
Hash_RC6 â Variable length Hash algorithm using RC6
. 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA).
doi
:
10.1109/ICACEA.2015.7164747
.
^
"hash digest"
.
Computer Security Resource Center - Glossary
.
NIST
.
"message digest"
.
Computer Security Resource Center - Glossary
.
NIST
.
^
a
b
c
d
Knuth, Donald E.
(1973).
The Art of Computer Programming, Vol. 3, Sorting and Searching
. Reading, MA., United States:
Addison-Wesley
.
Bibcode
:
1973acp..book.....K
.
ISBN
Â
978-0-201-03803-3
.
^
Stokes, Jon (2002-07-08).
"Understanding CPU caching and performance"
.
Ars Technica
. Retrieved
2022-02-06
.
^
Menezes, Alfred J.; van Oorschot, Paul C.; Vanstone, Scott A (1996).
Handbook of Applied Cryptography
. CRC Press.
ISBN
Â
978-0849385230
.
^
Castro, Julio Cesar Hernandez; et al. (3 February 2005). "The strict avalanche criterion randomness test".
Mathematics and Computers in Simulation
.
68
(1).
Elsevier
:
1â
7.
doi
:
10.1016/j.matcom.2004.09.001
.
S2CID
Â
18086276
.
^
a
b
Sharupke, Malte (16 June 2018).
"Fibonacci Hashing: The Optimization that the World Forgot"
.
Probably Dance
.
^
Wagner, Urs; Lugrin, Thomas (2023), Mulder, Valentin; Mermoud, Alain; Lenders, Vincent; Tellenbach, Bernhard (eds.), "Hash Functions",
Trends in Data Protection and Encryption Technologies
, Cham: Springer Nature Switzerland, pp.Â
21â
24,
doi
:
10.1007/978-3-031-33386-6_5
,
ISBN
Â
978-3-031-33386-6
{{
citation
}}
: CS1 maint: work parameter with ISBN (
link
)
^
"3. Data model â Python 3.6.1 documentation"
.
docs.python.org
. Retrieved
2017-03-24
.
^
a
b
Sedgewick, Robert (2002). "14. Hashing".
Algorithms in Java
(3Â ed.). Addison Wesley.
ISBN
Â
978-0201361209
.
^
Dolev, Shlomi; Lahiani, Limor; Haviv, Yinnon (2013).
"Unique permutation hashing"
.
Theoretical Computer Science
.
475
:
59â
65.
doi
:
10.1016/j.tcs.2012.12.047
.
^
"CS 3110 Lecture 21: Hash functions"
. Section "Multiplicative hashing".
^
Zobrist, Albert L.
(April 1970),
A New Hashing Method with Application for Game Playing
(PDF)
, Tech. Rep. 88, Madison, Wisconsin: Computer Sciences Department, University of Wisconsin
.
^
Aho, A.
;
Sethi, R.
;
Ullman, J. D.
(1986).
Compilers: Principles, Techniques and Tools
. Reading, MA:
Addison-Wesley
. p. 435.
ISBN
Â
0-201-10088-6
.
^
Ramakrishna, M. V.; Zobel, Justin (1997).
"Performance in Practice of String Hashing Functions"
.
Database Systems for Advanced Applications '97
. DASFAA 1997. pp.Â
215â
224.
CiteSeerX
Â
10.1.1.18.7520
.
doi
:
10.1142/9789812819536_0023
.
ISBN
Â
981-02-3107-5
.
S2CID
Â
8250194
. Retrieved
2021-12-06
.
^
Singh, N. B.
A Handbook of Algorithms
. N.B. Singh.
^
Breitinger, Frank (May 2014).
"NIST Special Publication 800-168"
(PDF)
.
NIST Publications
.
doi
:
10.6028/NIST.SP.800-168
. Retrieved
January 11,
2023
.
^
Pagani, Fabio; Dell'Amico, Matteo; Balzarotti, Davide (2018-03-13).
"Beyond Precision and Recall"
(PDF)
.
Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy
. New York, NY, USA: ACM. pp.Â
354â
365.
doi
:
10.1145/3176258.3176306
.
ISBN
Â
9781450356329
. Retrieved
December 12,
2022
.
^
Sarantinos, Nikolaos; BenzaĂŻd, Chafika; Arabiat, Omar (2016).
"Forensic Malware Analysis: The Value of Fuzzy Hashing Algorithms in Identifying Similarities"
.
2016 IEEE Trustcom/BigDataSE/ISPA
(PDF)
. pp.Â
1782â
1787.
doi
:
10.1109/TrustCom.2016.0274
.
ISBN
Â
978-1-5090-3205-1
.
S2CID
Â
32568938
. 10.1109/TrustCom.2016.0274.
^
Kornblum, Jesse (2006).
"Identifying almost identical files using context triggered piecewise hashing"
.
Digital Investigation
. 3, Supplement (September 2006):
91â
97.
doi
:
10.1016/j.diin.2006.06.015
.
^
Oliver, Jonathan; Cheng, Chun; Chen, Yanggui (2013).
"TLSH -- A Locality Sensitive Hash"
(PDF)
.
2013 Fourth Cybercrime and Trustworthy Computing Workshop
. IEEE. pp.Â
7â
13.
doi
:
10.1109/ctc.2013.9
.
ISBN
Â
978-1-4799-3076-0
. Retrieved
December 12,
2022
.
^
Buldas, Ahto; Kroonmaa, Andres; Laanoja, Risto (2013). "Keyless Signatures' Infrastructure: How to Build Global Distributed Hash-Trees". In Riis, Nielson H.; Gollmann, D. (eds.).
Secure IT Systems. NordSec 2013
. Lecture Notes in Computer Science. Vol. 8208. Berlin, Heidelberg: Springer.
doi
:
10.1007/978-3-642-41488-6_21
.
ISBN
Â
978-3-642-41487-9
.
Keyless Signatures Infrastructure (KSI) is a globally distributed system for providing time-stamping and server-supported digital signature services. Global per-second hash trees are created and their root hash values published. We discuss some service quality issues that arise in practical implementation of the service and present solutions for avoiding single points of failure and guaranteeing a service with reasonable and stable delay. Guardtime AS has been operating a KSI Infrastructure for 5 years. We summarize how the KSI Infrastructure is built, and the lessons learned during the operational period of the service.
^
Klinger, Evan; Starkweather, David.
"pHash.org: Home of pHash, the open source perceptual hash library"
.
pHash.org
. Retrieved
2018-07-05
.
pHash is an open source software library released under the GPLv3 license that implements several perceptual hashing algorithms, and provides a C-like API to use those functions in your own programs. pHash itself is written in C++.
^
Knuth, Donald E.
(1975).
The Art of Computer Programming, Vol. 3, Sorting and Searching
. Reading, MA:
Addison-Wesley
. p. 540.
^
Gonnet, G. (1978).
Expected Length of the Longest Probe Sequence in Hash Code Searching
(Technical report). Ontario, Canada:
University of Waterloo
. CS-RR-78-46.
^
a
b
Knuth, Donald E.
(2000).
The Art of Computer Programming, Vol. 3, Sorting and Searching
(2. ed., 6. printing, newly updated and rev. ed.). Boston [u.a.]: Addison-Wesley.
ISBN
Â
978-0-201-89685-5
.
Look up
hash
in Wiktionary, the free dictionary.
The Goulburn Hashing Function
(
PDF
) by Mayur Patel
Hash Function Construction for Textual and Geometrical Data Retrieval
(
PDF
) Latest Trends on Computers, Vol.2, pp. 483â489, CSCC Conference, Corfu, 2010 |
| Markdown | [Jump to content](https://en.wikipedia.org/wiki/Hash_function#bodyContent)
Main menu
Main menu
move to sidebar
hide
Navigation
- [Main page](https://en.wikipedia.org/wiki/Main_Page "Visit the main page [z]")
- [Contents](https://en.wikipedia.org/wiki/Wikipedia:Contents "Guides to browsing Wikipedia")
- [Current events](https://en.wikipedia.org/wiki/Portal:Current_events "Articles related to current events")
- [Random article](https://en.wikipedia.org/wiki/Special:Random "Visit a randomly selected article [x]")
- [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About "Learn about Wikipedia and how it works")
- [Contact us](https://en.wikipedia.org/wiki/Wikipedia:Contact_us "How to contact Wikipedia")
Contribute
- [Help](https://en.wikipedia.org/wiki/Help:Contents "Guidance on how to use and edit Wikipedia")
- [Learn to edit](https://en.wikipedia.org/wiki/Help:Introduction "Learn how to edit Wikipedia")
- [Community portal](https://en.wikipedia.org/wiki/Wikipedia:Community_portal "The hub for editors")
- [Recent changes](https://en.wikipedia.org/wiki/Special:RecentChanges "A list of recent changes to Wikipedia [r]")
- [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_upload_wizard "Add images or other media for use on Wikipedia")
- [Special pages](https://en.wikipedia.org/wiki/Special:SpecialPages "A list of all special pages [q]")
[  ](https://en.wikipedia.org/wiki/Main_Page)
[Search](https://en.wikipedia.org/wiki/Special:Search "Search Wikipedia [f]")
Appearance
- [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)
- [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=Hash+function "You are encouraged to create an account and log in; however, it is not mandatory")
- [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Hash+function "You're encouraged to log in; however, it's not mandatory. [o]")
Personal tools
- [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)
- [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=Hash+function "You are encouraged to create an account and log in; however, it is not mandatory")
- [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Hash+function "You're encouraged to log in; however, it's not mandatory. [o]")
## Contents
move to sidebar
hide
- [(Top)](https://en.wikipedia.org/wiki/Hash_function)
- [1 Overview](https://en.wikipedia.org/wiki/Hash_function#Overview)
- [2 Hash tables](https://en.wikipedia.org/wiki/Hash_function#Hash_tables)
Toggle Hash tables subsection
- [2\.1 Specialized uses](https://en.wikipedia.org/wiki/Hash_function#Specialized_uses)
- [3 Properties](https://en.wikipedia.org/wiki/Hash_function#Properties)
Toggle Properties subsection
- [3\.1 Uniformity](https://en.wikipedia.org/wiki/Hash_function#Uniformity)
- [3\.2 Testing and measurement](https://en.wikipedia.org/wiki/Hash_function#Testing_and_measurement)
- [3\.3 Efficiency](https://en.wikipedia.org/wiki/Hash_function#Efficiency)
- [3\.4 Universality](https://en.wikipedia.org/wiki/Hash_function#Universality)
- [3\.5 Applicability](https://en.wikipedia.org/wiki/Hash_function#Applicability)
- [3\.6 Deterministic](https://en.wikipedia.org/wiki/Hash_function#Deterministic)
- [3\.7 Defined range](https://en.wikipedia.org/wiki/Hash_function#Defined_range)
- [3\.8 Variable range](https://en.wikipedia.org/wiki/Hash_function#Variable_range)
- [3\.9 Variable range with minimal movement (dynamic hash function)](https://en.wikipedia.org/wiki/Hash_function#Variable_range_with_minimal_movement_\(dynamic_hash_function\))
- [3\.10 Data normalization](https://en.wikipedia.org/wiki/Hash_function#Data_normalization)
- [4 Hashing integer data types](https://en.wikipedia.org/wiki/Hash_function#Hashing_integer_data_types)
Toggle Hashing integer data types subsection
- [4\.1 Identity hash function](https://en.wikipedia.org/wiki/Hash_function#Identity_hash_function)
- [4\.2 Trivial hash function](https://en.wikipedia.org/wiki/Hash_function#Trivial_hash_function)
- [4\.3 Mid-squares](https://en.wikipedia.org/wiki/Hash_function#Mid-squares)
- [4\.4 Division hashing](https://en.wikipedia.org/wiki/Hash_function#Division_hashing)
- [4\.5 Algebraic coding](https://en.wikipedia.org/wiki/Hash_function#Algebraic_coding)
- [4\.6 Unique permutation hashing](https://en.wikipedia.org/wiki/Hash_function#Unique_permutation_hashing)
- [4\.7 Multiplicative hashing](https://en.wikipedia.org/wiki/Hash_function#Multiplicative_hashing)
- [4\.8 Fibonacci hashing](https://en.wikipedia.org/wiki/Hash_function#Fibonacci_hashing)
- [4\.9 Zobrist hashing](https://en.wikipedia.org/wiki/Hash_function#Zobrist_hashing)
- [4\.10 Customized hash function](https://en.wikipedia.org/wiki/Hash_function#Customized_hash_function)
- [5 Hashing variable-length data](https://en.wikipedia.org/wiki/Hash_function#Hashing_variable-length_data)
Toggle Hashing variable-length data subsection
- [5\.1 Middle and ends](https://en.wikipedia.org/wiki/Hash_function#Middle_and_ends)
- [5\.2 Character folding](https://en.wikipedia.org/wiki/Hash_function#Character_folding)
- [5\.3 Word length folding](https://en.wikipedia.org/wiki/Hash_function#Word_length_folding)
- [5\.4 Radix conversion hashing](https://en.wikipedia.org/wiki/Hash_function#Radix_conversion_hashing)
- [5\.5 Rolling hash](https://en.wikipedia.org/wiki/Hash_function#Rolling_hash)
- [5\.6 Fuzzy hash](https://en.wikipedia.org/wiki/Hash_function#Fuzzy_hash)
- [5\.7 Perceptual hash](https://en.wikipedia.org/wiki/Hash_function#Perceptual_hash)
- [6 Analysis](https://en.wikipedia.org/wiki/Hash_function#Analysis)
- [7 History](https://en.wikipedia.org/wiki/Hash_function#History)
- [8 See also](https://en.wikipedia.org/wiki/Hash_function#See_also)
- [9 Notes](https://en.wikipedia.org/wiki/Hash_function#Notes)
- [10 References](https://en.wikipedia.org/wiki/Hash_function#References)
- [11 External links](https://en.wikipedia.org/wiki/Hash_function#External_links)
Toggle the table of contents
# Hash function
54 languages
- [ۧÙŰč۱ۚÙŰ©](https://ar.wikipedia.org/wiki/%D8%AF%D8%A7%D9%84%D8%A9_%D8%AA%D9%84%D8%A8%D9%8A%D8%AF "ۯۧÙŰ© ŰȘÙŰšÙŰŻ â Arabic")
- [Asturianu](https://ast.wikipedia.org/wiki/Funci%C3%B3n_hash "FunciĂłn hash â Asturian")
- [AzÉrbaycanca](https://az.wikipedia.org/wiki/He%C5%9F_funksiya "HeĆ funksiya â Azerbaijani")
- [ĐŃлгаŃŃĐșĐž](https://bg.wikipedia.org/wiki/%D0%A5%D0%B5%D1%88_%D1%84%D1%83%D0%BD%D0%BA%D1%86%D0%B8%D1%8F "Đ„Đ”Ń ŃŃĐœĐșŃĐžŃ â Bulgarian")
- [CatalĂ ](https://ca.wikipedia.org/wiki/Funci%C3%B3_hash "FunciĂł hash â Catalan")
- [ÄeĆĄtina](https://cs.wikipedia.org/wiki/Ha%C5%A1ovac%C3%AD_funkce "HaĆĄovacĂ funkce â Czech")
- [Dansk](https://da.wikipedia.org/wiki/Hashfunktion "Hashfunktion â Danish")
- [Deutsch](https://de.wikipedia.org/wiki/Hashfunktion "Hashfunktion â German")
- [ÎλληΜÎčÎșÎŹ](https://el.wikipedia.org/wiki/%CE%A3%CF%85%CE%BD%CE%AC%CF%81%CF%84%CE%B7%CF%83%CE%B7_%CE%BA%CE%B1%CF%84%CE%B1%CE%BA%CE%B5%CF%81%CE%BC%CE%B1%CF%84%CE%B9%CF%83%CE%BC%CE%BF%CF%8D "ÎŁÏ
ÎœÎŹÏÏηÏη ÎșαÏαÎșΔÏΌαÏÎčÏÎŒÎżÏ â Greek")
- [Esperanto](https://eo.wikipedia.org/wiki/Haketfunkcio "Haketfunkcio â Esperanto")
- [Español](https://es.wikipedia.org/wiki/Funci%C3%B3n_hash "FunciĂłn hash â Spanish")
- [Eesti](https://et.wikipedia.org/wiki/R%C3%A4sifunktsioon "RĂ€sifunktsioon â Estonian")
- [Euskara](https://eu.wikipedia.org/wiki/Hashing "Hashing â Basque")
- [Ùۧ۱۳Û](https://fa.wikipedia.org/wiki/%D8%AA%D8%A7%D8%A8%D8%B9_%D9%87%D8%B4 "ŰȘۧۚŰč ÙŰŽ â Persian")
- [Suomi](https://fi.wikipedia.org/wiki/Hajautusalgoritmi "Hajautusalgoritmi â Finnish")
- [Français](https://fr.wikipedia.org/wiki/Fonction_de_hachage "Fonction de hachage â French")
- [ŚąŚŚšŚŚȘ](https://he.wikipedia.org/wiki/%D7%A4%D7%95%D7%A0%D7%A7%D7%A6%D7%99%D7%99%D7%AA_%D7%92%D7%99%D7%91%D7%95%D7%91 "Ś€ŚŚ Ś§ŚŠŚŚŚȘ ŚŚŚŚŚ â Hebrew")
- [Magyar](https://hu.wikipedia.org/wiki/Has%C3%ADt%C3%B3f%C3%BCggv%C3%A9ny "HasĂtĂłfĂŒggvĂ©ny â Hungarian")
- [ŐŐĄŐ”Ő„ÖŐ„Ő¶](https://hy.wikipedia.org/wiki/%D5%80%D5%A5%D5%B7_%D6%86%D5%B8%D6%82%D5%B6%D5%AF%D6%81%D5%AB%D5%A1 "ŐŐ„Ő· ÖŐžÖŐ¶ŐŻÖŐ«ŐĄ â Armenian")
- [Bahasa Indonesia](https://id.wikipedia.org/wiki/Fungsi_pineta "Fungsi pineta â Indonesian")
- [Ido](https://io.wikipedia.org/wiki/Hach-funciono "Hach-funciono â Ido")
- [Ăslenska](https://is.wikipedia.org/wiki/T%C3%A6tifall "TĂŠtifall â Icelandic")
- [Italiano](https://it.wikipedia.org/wiki/Funzione_di_hash "Funzione di hash â Italian")
- [æ„æŹèȘ](https://ja.wikipedia.org/wiki/%E3%83%8F%E3%83%83%E3%82%B7%E3%83%A5%E9%96%A2%E6%95%B0 "ăăă·ă„éąæ° â Japanese")
- [ÒазаÒŃа](https://kk.wikipedia.org/wiki/%D0%90%D1%80%D0%B0%D0%BB%D0%B0%D1%81%D1%82%D1%8B%D1%80%D1%83 "ĐŃалаŃŃŃŃŃ â Kazakh")
- [íê”ìŽ](https://ko.wikipedia.org/wiki/%ED%95%B4%EC%8B%9C_%ED%95%A8%EC%88%98 "íŽì íšì â Korean")
- [Lombard](https://lmo.wikipedia.org/wiki/Fonzion_de_hash "Fonzion de hash â Lombard")
- [LietuviĆł](https://lt.wikipedia.org/wiki/Mai%C5%A1os_funkcija "MaiĆĄos funkcija â Lithuanian")
- [LatvieĆĄu](https://lv.wikipedia.org/wiki/Jauc%C4%93jfunkcija "JaucÄjfunkcija â Latvian")
- [ĐаĐșĐ”ĐŽĐŸĐœŃĐșĐž](https://mk.wikipedia.org/wiki/%D0%A0%D0%B0%D0%B7%D0%B4%D1%80%D0%BE%D0%B1%D1%83%D0%B2%D0%B0%D1%87%D0%BA%D0%B0_%D1%84%D1%83%D0%BD%D0%BA%D1%86%D0%B8%D1%98%D0%B0 "РазЎŃĐŸĐ±ŃĐČаŃĐșа ŃŃĐœĐșŃĐžŃа â Macedonian")
- [ĐĐŸĐœĐłĐŸĐ»](https://mn.wikipedia.org/wiki/%D0%A5%D1%8D%D1%88_%D1%84%D1%83%D0%BD%D0%BA%D1%86 "Đ„ŃŃ ŃŃĐœĐșŃ â Mongolian")
- [Bahasa Melayu](https://ms.wikipedia.org/wiki/Fungsi_cincangan "Fungsi cincangan â Malay")
- [Nederlands](https://nl.wikipedia.org/wiki/Hashfunctie "Hashfunctie â Dutch")
- [Norsk bokmĂ„l](https://no.wikipedia.org/wiki/Hashfunksjon "Hashfunksjon â Norwegian BokmĂ„l")
- [Polski](https://pl.wikipedia.org/wiki/Funkcja_skr%C3%B3tu "Funkcja skrĂłtu â Polish")
- [PortuguĂȘs](https://pt.wikipedia.org/wiki/Fun%C3%A7%C3%A3o_hash "Função hash â Portuguese")
- [RomĂąnÄ](https://ro.wikipedia.org/wiki/Func%C8%9Bie_hash "FuncÈie hash â Romanian")
- [Đ ŃŃŃĐșĐžĐč](https://ru.wikipedia.org/wiki/%D0%A5%D0%B5%D1%88-%D1%84%D1%83%D0%BD%D0%BA%D1%86%D0%B8%D1%8F "ЄДŃ-ŃŃĐœĐșŃĐžŃ â Russian")
- [Srpskohrvatski / ŃŃĐżŃĐșĐŸŃ
ŃĐČаŃŃĐșĐž](https://sh.wikipedia.org/wiki/He%C5%A1_funkcija "HeĆĄ funkcija â Serbo-Croatian")
- [Simple English](https://simple.wikipedia.org/wiki/Hash_function "Hash function â Simple English")
- [SlovenÄina](https://sk.wikipedia.org/wiki/Ha%C5%A1ovacia_funkcia "HaĆĄovacia funkcia â Slovak")
- [SlovenĆĄÄina](https://sl.wikipedia.org/wiki/Zgo%C5%A1%C4%8Devalna_funkcija "ZgoĆĄÄevalna funkcija â Slovenian")
- [Shqip](https://sq.wikipedia.org/wiki/Hash_funksionet "Hash funksionet â Albanian")
- [ĐĄŃĐżŃĐșĐž / srpski](https://sr.wikipedia.org/wiki/He%C5%A1_funkcija "HeĆĄ funkcija â Serbian")
- [Svenska](https://sv.wikipedia.org/wiki/Hashfunktion "Hashfunktion â Swedish")
- [Kiswahili](https://sw.wikipedia.org/wiki/Hash_function "Hash function â Swahili")
- [àčàžàžą](https://th.wikipedia.org/wiki/%E0%B8%9F%E0%B8%B1%E0%B8%87%E0%B8%81%E0%B9%8C%E0%B8%8A%E0%B8%B1%E0%B8%99%E0%B9%81%E0%B8%AE%E0%B8%8A "àžàž±àžàžàčàžàž±àžàčàžźàž â Thai")
- [TĂŒrkçe](https://tr.wikipedia.org/wiki/Karma_i%C5%9Flevi "Karma iĆlevi â Turkish")
- [ĐŁĐșŃаŃĐœŃŃĐșа](https://uk.wikipedia.org/wiki/%D0%A5%D0%B5%D1%88-%D1%84%D1%83%D0%BD%D0%BA%D1%86%D1%96%D1%8F "ЄДŃ-ŃŃĐœĐșŃŃŃ â Ukrainian")
- [Tiáșżng Viá»t](https://vi.wikipedia.org/wiki/H%C3%A0m_b%C4%83m "HĂ m bÄm â Vietnamese")
- [ćŽèŻ](https://wuu.wikipedia.org/wiki/%E6%95%A3%E5%88%97%E5%87%BD%E6%95%B0 "æŁććœæ° â Wu")
- [é©ćèȘ / BĂąn-lĂąm-gĂ](https://zh-min-nan.wikipedia.org/wiki/Hash_koan-s%C3%B2%CD%98 "Hash koan-sĂČÍ â Minnan")
- [çČ”èȘ](https://zh-yue.wikipedia.org/wiki/%E9%9B%9C%E6%B9%8A%E5%87%BD%E6%95%B8 "éæčćœæž â Cantonese")
- [äžæ](https://zh.wikipedia.org/wiki/%E6%95%A3%E5%88%97%E5%87%BD%E6%95%B8 "æŁććœæž â Chinese")
[Edit links](https://www.wikidata.org/wiki/Special:EntityPage/Q183427#sitelinks-wikipedia "Edit interlanguage links")
- [Article](https://en.wikipedia.org/wiki/Hash_function "View the content page [c]")
- [Talk](https://en.wikipedia.org/wiki/Talk:Hash_function "Discuss improvements to the content page [t]")
English
- [Read](https://en.wikipedia.org/wiki/Hash_function)
- [Edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit "Edit this page [e]")
- [View history](https://en.wikipedia.org/w/index.php?title=Hash_function&action=history "Past revisions of this page [h]")
Tools
Tools
move to sidebar
hide
Actions
- [Read](https://en.wikipedia.org/wiki/Hash_function)
- [Edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit "Edit this page [e]")
- [View history](https://en.wikipedia.org/w/index.php?title=Hash_function&action=history)
General
- [What links here](https://en.wikipedia.org/wiki/Special:WhatLinksHere/Hash_function "List of all English Wikipedia pages containing links to this page [j]")
- [Related changes](https://en.wikipedia.org/wiki/Special:RecentChangesLinked/Hash_function "Recent changes in pages linked from this page [k]")
- [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_Upload_Wizard "Upload files [u]")
- [Permanent link](https://en.wikipedia.org/w/index.php?title=Hash_function&oldid=1346252233 "Permanent link to this revision of this page")
- [Page information](https://en.wikipedia.org/w/index.php?title=Hash_function&action=info "More information about this page")
- [Cite this page](https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&page=Hash_function&id=1346252233&wpFormIdentifier=titleform "Information on how to cite this page")
- [Get shortened URL](https://en.wikipedia.org/w/index.php?title=Special:UrlShortener&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FHash_function)
Print/export
- [Download as PDF](https://en.wikipedia.org/w/index.php?title=Special:DownloadAsPdf&page=Hash_function&action=show-download-screen "Download this page as a PDF file")
- [Printable version](https://en.wikipedia.org/w/index.php?title=Hash_function&printable=yes "Printable version of this page [p]")
In other projects
- [Wikimedia Commons](https://commons.wikimedia.org/wiki/Category:Hashing)
- [Wikidata item](https://www.wikidata.org/wiki/Special:EntityPage/Q183427 "Structured data on this page hosted by Wikidata [g]")
Appearance
move to sidebar
hide
From Wikipedia, the free encyclopedia
Mapping arbitrary data to fixed-size values
"hashlink" redirects here. For the Haxe virtual machine, see [HashLink](https://en.wikipedia.org/wiki/HashLink "HashLink").
"Hash code" redirects here. For the programming competition, see [Hash Code (programming competition)](https://en.wikipedia.org/wiki/Hash_Code_\(programming_competition\) "Hash Code (programming competition)").
This article is about a computer programming construct. For other meanings of "hash" and "hashing", see [Hash (disambiguation)](https://en.wikipedia.org/wiki/Hash_\(disambiguation\) "Hash (disambiguation)").
| | |
|---|---|
| [](https://en.wikipedia.org/wiki/File:Question_book-new.svg) | This article **needs additional citations for [verification](https://en.wikipedia.org/wiki/Wikipedia:Verifiability "Wikipedia:Verifiability")**. Please help [improve this article](https://en.wikipedia.org/wiki/Special:EditPage/Hash_function "Special:EditPage/Hash function") by [adding citations to reliable sources](https://en.wikipedia.org/wiki/Help:Referencing_for_beginners "Help:Referencing for beginners"). Unsourced material may be challenged and removed. *Find sources:* ["Hash function"](https://www.google.com/search?as_eq=wikipedia&q=%22Hash+function%22) â [news](https://www.google.com/search?tbm=nws&q=%22Hash+function%22+-wikipedia&tbs=ar:1) **·** [newspapers](https://www.google.com/search?&q=%22Hash+function%22&tbs=bkt:s&tbm=bks) **·** [books](https://www.google.com/search?tbs=bks:1&q=%22Hash+function%22+-wikipedia) **·** [scholar](https://scholar.google.com/scholar?q=%22Hash+function%22) **·** [JSTOR](https://www.jstor.org/action/doBasicSearch?Query=%22Hash+function%22&acc=on&wc=on) *(July 2010)* *([Learn how and when to remove this message](https://en.wikipedia.org/wiki/Help:Maintenance_template_removal "Help:Maintenance template removal"))* |
[](https://en.wikipedia.org/wiki/File:Hash_table_4_1_1_0_0_1_0_LL.svg)
A hash function that maps names to integers from 0 to 15. There is a [collision](https://en.wikipedia.org/wiki/Hash_collision "Hash collision") between keys "John Smith" and "Sandra Dee".
A **hash function** is any [function](https://en.wikipedia.org/wiki/Function_\(mathematics\) "Function (mathematics)") that can be used to map [data](https://en.wikipedia.org/wiki/Data_\(computing\) "Data (computing)") of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output.[\[1\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-1) The values returned by a hash function are called *hash values*, *hash codes*, (*hash/message*) *digests*,[\[2\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-2) or simply *hashes*. The values are usually used to index a fixed-size table called a *[hash table](https://en.wikipedia.org/wiki/Hash_table "Hash table")*. Use of a hash function to index a hash table is called *hashing* or *scatter-storage addressing*.
Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a way to access data quickly and efficiently. Unlike lists or trees, it provides near-constant access time. It also uses much less storage than trying to store all possible keys directly, especially when keys are large or variable in length.
Use of hash functions relies on statistical properties of key and function interaction: worst-case behavior is intolerably bad but rare, and average-case behavior can be nearly optimal (minimal [collision](https://en.wikipedia.org/wiki/Hash_collision "Hash collision")).[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 527
Hash functions are related to (and often confused with) [checksums](https://en.wikipedia.org/wiki/Checksums "Checksums"), [check digits](https://en.wikipedia.org/wiki/Check_digit "Check digit"), [fingerprints](https://en.wikipedia.org/wiki/Fingerprint_\(computing\) "Fingerprint (computing)"), [lossy compression](https://en.wikipedia.org/wiki/Lossy_compression "Lossy compression"), [randomization functions](https://en.wikipedia.org/wiki/Randomization_function "Randomization function"), [error-correcting codes](https://en.wikipedia.org/wiki/Error_correction_code "Error correction code"), and [ciphers](https://en.wikipedia.org/wiki/Cipher "Cipher"). Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash function differs from these concepts mainly in terms of [data integrity](https://en.wikipedia.org/wiki/Data_integrity "Data integrity"). Hash tables may use [non-cryptographic hash functions](https://en.wikipedia.org/wiki/Non-cryptographic_hash_function "Non-cryptographic hash function"), while [cryptographic hash functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function "Cryptographic hash function") are used in cybersecurity to secure sensitive data such as passwords.
## Overview
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=1 "Edit section: Overview")\]
| | |
|---|---|
| [](https://en.wikipedia.org/wiki/File:Question_book-new.svg) | This section **does not [cite](https://en.wikipedia.org/wiki/Wikipedia:Citing_sources "Wikipedia:Citing sources") any [sources](https://en.wikipedia.org/wiki/Wikipedia:Verifiability "Wikipedia:Verifiability")**. Please help [improve this section](https://en.wikipedia.org/wiki/Special:EditPage/Hash_function "Special:EditPage/Hash function") by [adding citations to reliable sources](https://en.wikipedia.org/wiki/Help:Referencing_for_beginners "Help:Referencing for beginners"). Unsourced material may be challenged and [removed](https://en.wikipedia.org/wiki/Wikipedia:Verifiability#Burden_of_evidence "Wikipedia:Verifiability"). *(March 2026)* *([Learn how and when to remove this message](https://en.wikipedia.org/wiki/Help:Maintenance_template_removal "Help:Maintenance template removal"))* |
In a hash table, a hash function takes a key as an input, which is associated with a datum or record and used to identify it to the data storage and retrieval application. The keys may be fixed-length, like an integer, or variable-length, like a name. In some cases, the key is the datum itself. The output is a hash code used to index a hash table holding the data or records, or pointers to them.
A hash function may be considered to perform three functions:
- Convert variable-length keys into fixed-length (usually [machine-word](https://en.wikipedia.org/wiki/Machine_word "Machine word")\-length or less) values, by folding them by words or other units using a [parity-preserving operator](https://en.wikipedia.org/wiki/Parity_function "Parity function") like ADD or XOR,
- Scramble the bits of the key so that the resulting values are uniformly distributed over the [keyspace](https://en.wikipedia.org/wiki/Key_space_\(cryptography\) "Key space (cryptography)"), and
- Map the key values into ones less than or equal to the size of the table.
A good hash function satisfies two basic properties: it should be very fast to compute, and it should minimize duplication of output values ([collisions](https://en.wikipedia.org/wiki/Hash_collision "Hash collision")). Hash functions rely on generating favorable [probability distributions](https://en.wikipedia.org/wiki/Probability_distribution "Probability distribution") for their effectiveness, reducing access time to nearly constant. High table loading factors, [pathological](https://en.wikipedia.org/wiki/Pathological_\(mathematics\) "Pathological (mathematics)") key sets, and poorly designed hash functions can result in access times approaching linear in the number of items in the table. Hash functions can be designed to give the best worst-case performance,[\[Notes 1\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-4) good performance under high table loading factors, and in special cases, perfect (collisionless) mapping of keys into hash codes. Implementation is based on parity-preserving bit operations (XOR and ADD), multiply, or divide. A necessary adjunct to the hash function is a collision-resolution method that employs an auxiliary data structure like [linked lists](https://en.wikipedia.org/wiki/Linked_list "Linked list"), or systematic probing of the table to find an empty slot.
## Hash tables
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=2 "Edit section: Hash tables")\]
Main article: [Hash table](https://en.wikipedia.org/wiki/Hash_table "Hash table")
Hash functions are used in conjunction with [hash tables](https://en.wikipedia.org/wiki/Hash_tables "Hash tables") to store and retrieve data items or data records. The hash function translates the key associated with each datum or record into a hash code, which is used to index the hash table. When an item is to be added to the table, the hash code may index an empty slot (also called a bucket), in which case the item is added to the table there. If the hash code indexes a full slot, then some kind of collision resolution is required: the new item may be omitted (not added to the table), or replace the old item, or be added to the table in some other location by a specified procedure. That procedure depends on the structure of the hash table. In *chained hashing*, each slot is the head of a linked list or chain, and items that collide at the slot are added to the chain. Chains may be kept in random order and searched linearly, or in serial order, or as a self-ordering list by frequency to speed up access. In *open address hashing*, the table is probed starting from the occupied slot in a specified manner, usually by [linear probing](https://en.wikipedia.org/wiki/Linear_probing "Linear probing"), [quadratic probing](https://en.wikipedia.org/wiki/Quadratic_probing "Quadratic probing"), or [double hashing](https://en.wikipedia.org/wiki/Double_hashing "Double hashing") until an open slot is located or the entire table is probed (overflow). Searching for the item follows the same procedure until the item is located, an open slot is found, or the entire table has been searched (item not in table).
### Specialized uses
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=3 "Edit section: Specialized uses")\]
Hash functions are also used to build [caches](https://en.wikipedia.org/wiki/Cache_\(computing\) "Cache (computing)") for large data sets stored in slow media. A cache is generally simpler than a hashed search table, since any collision can be resolved by discarding or writing back the older of the two colliding items.[\[4\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-5)
Hash functions are an essential ingredient of the [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter "Bloom filter"), a space-efficient [probabilistic](https://en.wikipedia.org/wiki/Probability "Probability") [data structure](https://en.wikipedia.org/wiki/Data_structure "Data structure") that is used to test whether an [element](https://en.wikipedia.org/wiki/Element_\(mathematics\) "Element (mathematics)") is a member of a [set](https://en.wikipedia.org/wiki/Set_\(computer_science\) "Set (computer science)").
A special case of hashing is known as [geometric hashing](https://en.wikipedia.org/wiki/Geometric_hashing "Geometric hashing") or the *grid method*. In these applications, the set of all inputs is some sort of [metric space](https://en.wikipedia.org/wiki/Metric_space "Metric space"), and the hashing function can be interpreted as a [partition](https://en.wikipedia.org/wiki/Partition_\(mathematics\) "Partition (mathematics)") of that space into a grid of *cells*. The table is often an array with two or more indices (called a *[grid file](https://en.wikipedia.org/wiki/Grid_file "Grid file")*, *grid index*, *bucket grid*, and similar names), and the hash function returns an index [tuple](https://en.wikipedia.org/wiki/Tuple "Tuple"). This principle is widely used in [computer graphics](https://en.wikipedia.org/wiki/Computer_graphics "Computer graphics"), [computational geometry](https://en.wikipedia.org/wiki/Computational_geometry "Computational geometry"), and many other disciplines, to solve many [proximity problems](https://en.wikipedia.org/wiki/Proximity_problem "Proximity problem") in the [plane](https://en.wikipedia.org/wiki/Plane_\(geometry\) "Plane (geometry)") or in [three-dimensional space](https://en.wikipedia.org/wiki/Three-dimensional_space "Three-dimensional space"), such as finding [closest pairs](https://en.wikipedia.org/wiki/Closest_pair_problem "Closest pair problem") in a set of points, similar shapes in a list of shapes, similar [images](https://en.wikipedia.org/wiki/Image_processing "Image processing") in an [image database](https://en.wikipedia.org/wiki/Image_retrieval "Image retrieval"), and so on.
Hash tables are also used to implement [associative arrays](https://en.wikipedia.org/wiki/Associative_array "Associative array") and [dynamic sets](https://en.wikipedia.org/wiki/Set_\(abstract_data_type\) "Set (abstract data type)").[\[5\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-handbook_of_applied_cryptography-6)
## Properties
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=4 "Edit section: Properties")\]
| | |
|---|---|
| [](https://en.wikipedia.org/wiki/File:Question_book-new.svg) | This section **needs additional citations for [verification](https://en.wikipedia.org/wiki/Wikipedia:Verifiability "Wikipedia:Verifiability")**. Please help [improve this article](https://en.wikipedia.org/wiki/Special:EditPage/Hash_function "Special:EditPage/Hash function") by [adding citations to reliable sources](https://en.wikipedia.org/wiki/Help:Referencing_for_beginners "Help:Referencing for beginners") in this section. Unsourced material may be challenged and removed. *Find sources:* ["Hash function"](https://www.google.com/search?as_eq=wikipedia&q=%22Hash+function%22) â [news](https://www.google.com/search?tbm=nws&q=%22Hash+function%22+-wikipedia&tbs=ar:1) **·** [newspapers](https://www.google.com/search?&q=%22Hash+function%22&tbs=bkt:s&tbm=bks) **·** [books](https://www.google.com/search?tbs=bks:1&q=%22Hash+function%22+-wikipedia) **·** [scholar](https://scholar.google.com/scholar?q=%22Hash+function%22) **·** [JSTOR](https://www.jstor.org/action/doBasicSearch?Query=%22Hash+function%22&acc=on&wc=on) *(October 2017)* *([Learn how and when to remove this message](https://en.wikipedia.org/wiki/Help:Maintenance_template_removal "Help:Maintenance template removal"))* |
### Uniformity
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=5 "Edit section: Uniformity")\]
A good hash function should map the expected inputs as evenly as possible over its output range. That is, every hash value in the output range should be generated with roughly the same [probability](https://en.wikipedia.org/wiki/Probability "Probability"). The reason for this last requirement is that the cost of hashing-based methods goes up sharply as the number of *collisions*âpairs of inputs that are mapped to the same hash valueâincreases. If some hash values are more likely to occur than others, then a larger fraction of the lookup operations will have to search through a larger set of colliding table entries.
This criterion only requires the value to be *uniformly distributed*, not *random* in any sense. A good randomizing function is (barring computational efficiency concerns) generally a good choice as a hash function, but the converse need not be true.
Hash tables often contain only a small subset of the valid inputs. For instance, a club membership list may contain only a hundred or so member names, out of the very large set of all possible names. In these cases, the uniformity criterion should hold for almost all typical subsets of entries that may be found in the table, not just for the global set of all possible entries.
In other words, if a typical set of *m* records is hashed to *n* table slots, then the probability of a bucket receiving many more than *m*/*n* records should be vanishingly small. In particular, if *m* \< *n*, then very few buckets should have more than one or two records. A small number of collisions is virtually inevitable, even if *n* is much larger than *m*âsee the [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem "Birthday problem").
In special cases when the keys are known in advance and the key set is static, a hash function can be found that achieves absolute (or collisionless) uniformity. Such a hash function is said to be *[perfect](https://en.wikipedia.org/wiki/Perfect_hash_function "Perfect hash function")*. There is no algorithmic way of constructing such a functionâsearching for one is a [factorial](https://en.wikipedia.org/wiki/Factorial "Factorial") function of the number of keys to be mapped versus the number of table slots that they are mapped into. Finding a perfect hash function over more than a very small set of keys is usually computationally infeasible; the resulting function is likely to be more computationally complex than a standard hash function and provides only a marginal advantage over a function with good statistical properties that yields a minimum number of collisions. See [universal hash function](https://en.wikipedia.org/wiki/Universal_hashing "Universal hashing").
### Testing and measurement
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=6 "Edit section: Testing and measurement")\]
When testing a hash function, the uniformity of the distribution of hash values can be evaluated by the [chi-squared test](https://en.wikipedia.org/wiki/Chi-squared_test "Chi-squared test"). This test is a goodness-of-fit measure: it is the actual distribution of items in buckets versus the expected (or uniform) distribution of items. The formula is
â j \= 0 m â 1 ( b j ) ( b j \+ 1 ) / 2 ( n / 2 m ) ( n \+ 2 m â 1 ) , {\\displaystyle {\\frac {\\sum \_{j=0}^{m-1}(b\_{j})(b\_{j}+1)/2}{(n/2m)(n+2m-1)}},} \[*[dubious](https://en.wikipedia.org/wiki/Wikipedia:Accuracy_dispute#Disputed_statement "Wikipedia:Accuracy dispute") â [discuss](https://en.wikipedia.org/wiki/Talk:Hash_function#Dubious "Talk:Hash function")*\]\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
where n is the number of keys, m is the number of buckets, and *b**j* is the number of items in bucket j.
A ratio within one confidence interval (such as 0.95 to 1.05) is indicative that the hash function evaluated has an expected uniform distribution.
Hash functions can have some technical properties that make it more likely that they will have a uniform distribution when applied. One is the [strict avalanche criterion](https://en.wikipedia.org/wiki/Strict_avalanche_criterion "Strict avalanche criterion"): whenever a single input bit is complemented, each of the output bits changes with a 50% probability. The reason for this property is that selected subsets of the keyspace may have low variability. For the output to be uniformly distributed, a low amount of variability, even one bit, should translate into a high amount of variability (i.e. distribution over the tablespace) in the output. Each bit should change with a probability of 50% because, if some bits are reluctant to change, then the keys become clustered around those values. If the bits want to change too readily, then the mapping is approaching a fixed XOR function of a single bit. Standard tests for this property have been described in the literature.[\[6\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-7) The relevance of the criterion to a multiplicative hash function is assessed here.[\[7\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-fibonacci-hashing-8)
### Efficiency
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=7 "Edit section: Efficiency")\]
In data storage and retrieval applications, the use of a hash function is a trade-off between search time and data storage space. If search time were unbounded, then a very compact unordered linear list would be the best medium; if storage space were unbounded, then a randomly accessible structure indexable by the key-value would be very large and very sparse, but very fast. A hash function takes a finite amount of time to map a potentially large keyspace to a feasible amount of storage space searchable in a bounded amount of time regardless of the number of keys. In most applications, the hash function should be computable with minimum latency and secondarily in a minimum number of instructions.
Computational complexity varies with the number of instructions required and latency of individual instructions, with the simplest being the bitwise methods (folding), followed by the multiplicative methods, and the most complex (slowest) are the division-based methods.
Because collisions should be infrequent, and cause a marginal delay but are otherwise harmless, it is usually preferable to choose a faster hash function over one that needs more computation but saves a few collisions.
Division-based implementations can be of particular concern because a division requires multiple cycles on nearly all processor [microarchitectures](https://en.wikipedia.org/wiki/Microarchitecture "Microarchitecture"). Division ([modulo](https://en.wikipedia.org/wiki/Modulo_operation "Modulo operation")) by a constant can be inverted to become a multiplication by the word-size multiplicative-inverse of that constant. This can be done by the programmer, or by the compiler. Division can also be reduced directly into a series of shift-subtracts and shift-adds, though minimizing the number of such operations required is a daunting problem; the number of machine-language instructions resulting may be more than a dozen and swamp the pipeline. If the microarchitecture has [hardware multiply](https://en.wikipedia.org/wiki/Hardware_multiply "Hardware multiply") [functional units](https://en.wikipedia.org/wiki/Functional_unit "Functional unit"), then the multiply-by-inverse is likely a better approach.
We can allow the table size *n* to not be a power of 2 and still not have to perform any remainder or division operation, as these computations are sometimes costly. For example, let *n* be significantly less than 2*b*. Consider a [pseudorandom number generator](https://en.wikipedia.org/wiki/Pseudorandom_number_generator "Pseudorandom number generator") function *P*(key) that is uniform on the interval \[0, 2*b* â 1\]. A hash function uniform on the interval \[0, *n* â 1\] is *n* *P*(key) / 2*b*. We can replace the division by a (possibly faster) right [bit shift](https://en.wikipedia.org/wiki/Bit_shifting "Bit shifting"): *n P*(key) \>\> *b*.
If keys are being hashed repeatedly, and the hash function is costly, then computing time can be saved by precomputing the hash codes and storing them with the keys. Matching hash codes almost certainly means that the keys are identical. This technique is used for the transposition table in game-playing programs, which stores a 64-bit hashed representation of the board position.
### Universality
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=8 "Edit section: Universality")\]
Main article: [Universal hashing](https://en.wikipedia.org/wiki/Universal_hashing "Universal hashing")
A *universal hashing* scheme is a [randomized algorithm](https://en.wikipedia.org/wiki/Randomized_algorithm "Randomized algorithm") that selects a hash function *h* among a family of such functions, in such a way that the probability of a collision of any two distinct keys is 1/*m*, where *m* is the number of distinct hash values desiredâindependently of the two keys. Universal hashing ensures (in a probabilistic sense) that the hash [function application](https://en.wikipedia.org/wiki/Function_application "Function application") will behave as well as if it were using a random function, for any distribution of the input data. It will, however, have more collisions than perfect hashing and may require more operations than a special-purpose hash function.
### Applicability
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=9 "Edit section: Applicability")\]
A hash function that allows only certain table sizes or strings only up to a certain length, or cannot accept a seed (i.e. allow double hashing) is less useful than one that does.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
A hash function is applicable in a variety of situations. Particularly within cryptography, notable applications include:[\[8\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-9)
- [Integrity checking](https://en.wikipedia.org/wiki/File_verification "File verification"): Identical hash values for different files imply equality, providing a reliable means to detect file modifications.
- [Key derivation](https://en.wikipedia.org/wiki/Key_derivation_function "Key derivation function"): Minor input changes result in a random-looking output alteration, known as the diffusion property. Thus, hash functions are valuable for key derivation functions.
- [Message authentication codes](https://en.wikipedia.org/wiki/Message_authentication_code "Message authentication code") (MACs): Through the integration of a confidential key with the input data, hash functions can generate MACs ensuring the genuineness of the data, such as in [HMACs](https://en.wikipedia.org/wiki/HMAC "HMAC").
- Password storage: The password's hash value does not expose any password details, emphasizing the importance of securely storing hashed passwords on the server.
- [Signatures](https://en.wikipedia.org/wiki/Digital_signature "Digital signature"): Message hashes are signed rather than the whole message.
### Deterministic
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=10 "Edit section: Deterministic")\]
A hash procedure must be [deterministic](https://en.wikipedia.org/wiki/Deterministic_algorithm "Deterministic algorithm")âfor a given input value, it must always generate the same hash value. In other words, it must be a [function](https://en.wikipedia.org/wiki/Function_\(mathematics\) "Function (mathematics)") of the data to be hashed, in the mathematical sense of the term. This requirement excludes hash functions that depend on external variable parameters, such as [pseudo-random number generators](https://en.wikipedia.org/wiki/Pseudo-random_number_generator "Pseudo-random number generator") or the time of day. It also excludes functions that depend on the [memory address](https://en.wikipedia.org/wiki/Memory_address "Memory address") of the object being hashed, because the address may change during execution (as may happen on systems that use certain methods of [garbage collection](https://en.wikipedia.org/wiki/Garbage_collection_\(computer_science\) "Garbage collection (computer science)")), although sometimes rehashing of the item is possible.
The determinism is in the context of the reuse of the function. For example, [Python](https://en.wikipedia.org/wiki/Python_\(programming_language\) "Python (programming language)") adds the feature that hash functions make use of a randomized seed that is generated once when the Python process starts in addition to the input to be hashed.[\[9\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-10) The Python hash ([SipHash](https://en.wikipedia.org/wiki/SipHash "SipHash")) is still a valid hash function when used within a single run, but if the values are persisted (for example, written to disk), they can no longer be treated as valid hash values, since in the next run the random value might differ.
### Defined range
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=11 "Edit section: Defined range")\]
It is often desirable that the output of a hash function have fixed size (but see below). If, for example, the output is constrained to 32-bit integer values, then the hash values can be used to index into an array. Such hashing is commonly used to accelerate data searches.[\[10\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-algorithms_in_java-11) Producing fixed-length output from variable-length input can be accomplished by breaking the input data into chunks of specific size. Hash functions used for data searches use some arithmetic expression that iteratively processes chunks of the input (such as the characters in a string) to produce the hash value.[\[10\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-algorithms_in_java-11)
### Variable range
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=12 "Edit section: Variable range")\]
In many applications, the range of hash values may be different for each run of the program or may change along the same run (for instance, when a hash table needs to be expanded). In those situations, one needs a hash function which takes two parametersâthe input data *z*, and the number *n* of allowed hash values.
A common solution is to compute a fixed hash function with a very large range (say, 0 to 232 â 1), divide the result by *n*, and use the division's [remainder](https://en.wikipedia.org/wiki/Modulo_operation "Modulo operation"). If *n* is itself a power of 2, this can be done by [bit masking](https://en.wikipedia.org/wiki/Mask_\(computing\) "Mask (computing)") and [bit shifting](https://en.wikipedia.org/wiki/Bit_shifting "Bit shifting"). When this approach is used, the hash function must be chosen so that the result has fairly uniform distribution between 0 and *n* â 1, for any value of *n* that may occur in the application. Depending on the function, the remainder may be uniform only for certain values of *n*, e.g. [odd](https://en.wikipedia.org/wiki/Odd_number "Odd number") or [prime numbers](https://en.wikipedia.org/wiki/Prime_number "Prime number").
### Variable range with minimal movement (dynamic hash function)
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=13 "Edit section: Variable range with minimal movement (dynamic hash function)")\]
When the hash function is used to store values in a hash table that outlives the run of the program, and the hash table needs to be expanded or shrunk, the hash table is referred to as a dynamic hash table.
A hash function that will relocate the minimum number of records when the table is resized is desirable. What is needed is a hash function *H*(*z*,*n*) (where *z* is the key being hashed and *n* is the number of allowed hash values) such that *H*(*z*,*n* + 1) = *H*(*z*,*n*) with probability close to *n*/(*n* + 1).
[Linear hashing](https://en.wikipedia.org/wiki/Linear_hashing "Linear hashing") and [spiral hashing](https://en.wikipedia.org/wiki/Spiral_hashing "Spiral hashing") are examples of dynamic hash functions that execute in constant time but relax the property of uniformity to achieve the minimal movement property. [Extendible hashing](https://en.wikipedia.org/wiki/Extendible_hashing "Extendible hashing") uses a dynamic hash function that requires space proportional to *n* to compute the hash function, and it becomes a function of the previous keys that have been inserted. Several algorithms that preserve the uniformity property but require time proportional to *n* to compute the value of *H*(*z*,*n*) have been invented.\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\]
A hash function with minimal movement is especially useful in [distributed hash tables](https://en.wikipedia.org/wiki/Distributed_hash_table "Distributed hash table").
### Data normalization
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=14 "Edit section: Data normalization")\]
In some applications, the input data may contain features that are irrelevant for comparison purposes. For example, when looking up a personal name, it may be desirable to ignore the distinction between upper and lower case letters. For such data, one must use a hash function that is compatible with the data [equivalence](https://en.wikipedia.org/wiki/Equivalence_relation "Equivalence relation") criterion being used: that is, any two inputs that are considered equivalent must yield the same hash value. This can be accomplished by normalizing the input before hashing it, as by upper-casing all letters.
## Hashing integer data types
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=15 "Edit section: Hashing integer data types")\]
There are several common algorithms for hashing integers. The method giving the best distribution is data-dependent. One of the simplest and most common methods in practice is the modulo division method.
### Identity hash function
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=16 "Edit section: Identity hash function")\]
If the data to be hashed is small enough, then one can use the data itself (reinterpreted as an integer) as the hashed value. The cost of computing this *[identity](https://en.wikipedia.org/wiki/Identity_function "Identity function")* hash function is effectively zero. This hash function is [perfect](https://en.wikipedia.org/wiki/Perfect_hash_function "Perfect hash function"), as it maps each input to a distinct hash value.
The meaning of "small enough" depends on the size of the type that is used as the hashed value. For example, in [Java](https://en.wikipedia.org/wiki/Java_\(programming_language\) "Java (programming language)"), the hash code is a 32-bit integer. Thus the 32-bit integer `Integer` and 32-bit floating-point `Float` objects can simply use the value directly, whereas the 64-bit integer `Long` and 64-bit floating-point `Double` cannot.
Other types of data can also use this hashing scheme. For example, when mapping [character strings](https://en.wikipedia.org/wiki/Character_string "Character string") between [upper and lower case](https://en.wikipedia.org/wiki/Letter_case "Letter case"), one can use the binary encoding of each character, interpreted as an integer, to index a table that gives the alternative form of that character ("A" for "a", "8" for "8", etc.). If each character is stored in 8 bits (as in [extended ASCII](https://en.wikipedia.org/wiki/Extended_ASCII "Extended ASCII")[\[Notes 2\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-12) or [ISO Latin 1](https://en.wikipedia.org/wiki/ISO_Latin_1 "ISO Latin 1")), the table has only 28 = 256 entries; in the case of [Unicode](https://en.wikipedia.org/wiki/Unicode "Unicode") characters, the table would have 17 Ă 216 = 1114112 entries.
The same technique can be used to map [two-letter country codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 "ISO 3166-1 alpha-2") like "us" or "za" to country names (262 = 676 table entries), 5-digit [ZIP codes](https://en.wikipedia.org/wiki/ZIP_Code "ZIP Code") like 13083 to city names (100000 entries), etc. Invalid data values (such as the country code "xx" or the ZIP code 00000) may be left undefined in the table or mapped to some appropriate "null" value.
### Trivial hash function
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=17 "Edit section: Trivial hash function")\]
If the keys are uniformly or sufficiently uniformly distributed over the key space, so that the key values are essentially random, then they may be considered to be already "hashed". In this case, any number of any bits in the key may be extracted and collated as an index into the hash table. For example, a simple hash function might mask off the m least significant bits and use the result as an index into a hash table of size 2*m*.
### Mid-squares
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=18 "Edit section: Mid-squares")\]
A mid-squares hash code is produced by squaring the input and extracting an appropriate number of middle digits or bits. For example, if the input is 123456789 and the hash table size 10000, then squaring the key produces 15241578750190521, so the hash code is taken as the middle 4 digits of the 17-digit number (ignoring the high digit) 8750. The mid-squares method produces a reasonable hash code if there is not a lot of leading or trailing zeros in the key. This is a variant of multiplicative hashing, but not as good because an arbitrary key is not a good multiplier.
### Division hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=19 "Edit section: Division hashing")\]
A standard technique is to use a modulo function on the key, by selecting a divisor M which is a prime number close to the table size, so *h*(*K*) ⥠*K* (mod *M*). The table size is usually a power of 2. This gives a distribution from {0, *M* â 1}. This gives good results over a large number of key sets. A significant drawback of division hashing is that division requires multiple cycles on most modern architectures (including [x86](https://en.wikipedia.org/wiki/X86 "X86")) and can be 10 times slower than multiplication. A second drawback is that it will not break up clustered keys. For example, the keys 123000, 456000, 789000, etc. modulo 1000 all map to the same address. This technique works well in practice because many key sets are sufficiently random already, and the probability that a key set will be cyclical by a large prime number is small.
### Algebraic coding
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=20 "Edit section: Algebraic coding")\]
Algebraic coding is a variant of the division method of hashing which uses division by a polynomial modulo 2 instead of an integer to map n bits to m bits.[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 512â513 In this approach, *M* = 2*m*, and we postulate an mth-degree polynomial *Z*(*x*) = *x**m* + ζ*m*â1*x**m*â1 + ⯠+ ζ0. A key *K* = (*k**n*â1âŠ*k*1*k*0)2 can be regarded as the polynomial *K*(*x*) = *k**n*â1*x**n*â1 + ⯠+ *k*1*x* + *k*0. The remainder using polynomial arithmetic modulo 2 is *K*(*x*) mod *Z*(*x*) = *h**m*â1*x**m*â1 + ⯠*h*1*x* + *h*0. Then *h*(*K*) = (*h**m*â1âŠ*h*1*h*0)2. If *Z*(*x*) is constructed to have t or fewer non-zero coefficients, then keys which share fewer than t bits are guaranteed to not collide.
Z is a function of k, t, and n (the last of which is a divisor of 2*k* â 1) and is constructed from the [finite field](https://en.wikipedia.org/wiki/Finite_field "Finite field") GF(2*k*). [Knuth](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") gives an example: taking (*n*,*m*,*t*) = (15,10,7) yields *Z*(*x*) = *x*10 + *x*8 + *x*5 + *x*4 + *x*2 + *x* + 1. The derivation is as follows:
Let S be the smallest set of integers such that {1,2,âŠ,*t*} â *S* and (2*j* mod *n*) â *S* â*j* â *S*.[\[Notes 3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-13)
Define P ( x ) \= â j â S ( x â α j ) {\\displaystyle P(x)=\\prod \_{j\\in S}(x-\\alpha ^{j})}  where α â*n* GF(2*k*) and where the coefficients of *P*(*x*) are computed in this field. Then the degree of *P*(*x*) = \|*S*\|. Since α2*j* is a root of *P*(*x*) whenever α*j* is a root, it follows that the coefficients *pi* of *P*(*x*) satisfy *p*2
*i* = *p*i, so they are all 0 or 1. If *R*(*x*) = *r**n*â1*x**n*â1 + ⯠+ *r*1*x* + *r*0 is any nonzero polynomial modulo 2 with at most t nonzero coefficients, then *R*(*x*) is not a multiple of *P*(*x*) modulo 2.[\[Notes 4\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-14) If follows that the corresponding hash function will map keys with fewer than t bits in common to unique indices.[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 542â543
The usual outcome is that either n will get large, or t will get large, or both, for the scheme to be computationally feasible. Therefore, it is more suited to hardware or microcode implementation.[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 542â543
### Unique permutation hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=21 "Edit section: Unique permutation hashing")\]
Unique permutation hashing has a guaranteed best worst-case insertion time.[\[11\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-15)
### Multiplicative hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=22 "Edit section: Multiplicative hashing")\]
Standard multiplicative hashing uses the formula *h**a*(*K*) = â(*aK* mod *W*) / (*W*/*M*)â, which produces a hash value in {0, âŠ, *M* â 1}. The value a is an appropriately chosen value that should be [relatively prime](https://en.wikipedia.org/wiki/Coprime_integers "Coprime integers") to W; it should be large,\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\] and its binary representation a random mix\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\] of 1s and 0s. An important practical special case occurs when *W* = 2*w* and *M* = 2*m* are powers of 2 and w is the machine [word size](https://en.wikipedia.org/wiki/Word_size "Word size"). In this case, this formula becomes *h**a*(*K*) = â(*aK* mod 2*w*) / 2*w*â*m*â. This is special because arithmetic modulo 2*w* is done by default in low-level programming languages and integer division by a power of 2 is simply a right-shift, so, in [C](https://en.wikipedia.org/wiki/C_\(programming_language\) "C (programming language)"), for example, this function becomes
```
unsigned hash(unsigned K) {
return (a * K) >> (w - m);
}
```
and for fixed m and w this translates into a single integer multiplication and right-shift, making it one of the fastest hash functions to compute.
Multiplicative hashing is susceptible to a "common mistake" that leads to poor diffusionâhigher-value input bits do not affect lower-value output bits.[\[12\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-16) A transmutation on the input which shifts the span of retained top bits down and XORs or ADDs them to the key before the multiplication step corrects for this. The resulting function looks like:[\[7\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-fibonacci-hashing-8)
```
unsigned hash(unsigned K) {
K ^= K >> (w - m);
return (a * K) >> (w - m);
}
```
### Fibonacci hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=23 "Edit section: Fibonacci hashing")\]
[Fibonacci](https://en.wikipedia.org/wiki/Fibonacci_number "Fibonacci number") hashing is a form of multiplicative hashing in which the multiplier is 2*w* / Ï, where w is the machine word length and Ï (phi) is the [golden ratio](https://en.wikipedia.org/wiki/Golden_ratio "Golden ratio") (approximately 1.618). A property of this multiplier is that it uniformly distributes over the table space, [blocks](https://en.wikipedia.org/wiki/Blockchain "Blockchain") of consecutive keys with respect to any block of bits in the key. Consecutive keys within the high bits or low bits of the key (or some other field) are relatively common. The multipliers for various word lengths are:
- 16: *a* = 9E3716 = 4050310
- 32: *a* = 9E3779B916 = 265443576910
- 48: *a* = 9E3779B97F4B16 = 17396110258977110[\[Notes 5\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-17)
- 64: *a* = 9E3779B97F4A7C1516 = 1140071481932319848510
The multiplier should be odd, so the least significant bit of the output is invertible modulo 2*w*. The last two values given above are rounded (up and down, respectively) by more than 1/2 of a least-significant bit to achieve this.
### Zobrist hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=24 "Edit section: Zobrist hashing")\]
Main articles: [Tabulation hashing](https://en.wikipedia.org/wiki/Tabulation_hashing "Tabulation hashing") and [Zobrist hashing](https://en.wikipedia.org/wiki/Zobrist_hashing "Zobrist hashing")
*Zobrist hashing*, named after [Albert Zobrist](https://en.wikipedia.org/wiki/Albert_Lindsey_Zobrist "Albert Lindsey Zobrist"), is a form of [tabulation hashing](https://en.wikipedia.org/wiki/Tabulation_hashing "Tabulation hashing"), which is a method for constructing universal families of hash functions by combining table lookup with XOR operations. This algorithm has proven to be very fast and of high quality for hashing purposes (especially hashing of integer-number keys).[\[13\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-18)
Zobrist hashing was originally introduced as a means of compactly representing chess positions in computer game-playing programs. A unique random number was assigned to represent each type of piece (six each for black and white) on each space of the board. Thus a table of 64Ă12 such numbers is initialized at the start of the program. The random numbers could be any length, but 64 bits was natural due to the 64 squares on the board. A position was transcribed by cycling through the pieces in a position, indexing the corresponding random numbers (vacant spaces were not included in the calculation) and XORing them together (the starting value could be 0 (the identity value for XOR) or a random seed). The resulting value was reduced by modulo, folding, or some other operation to produce a hash table index. The original Zobrist hash was stored in the table as the representation of the position.
Later, the method was extended to hashing integers by representing each byte in each of 4 possible positions in the word by a unique 32-bit random number. Thus, a table of 28Ă4 random numbers is constructed. A 32-bit hashed integer is transcribed by successively indexing the table with the value of each byte of the plain text integer and XORing the loaded values together (again, the starting value can be the identity value or a random seed). The natural extension to 64-bit integers is by use of a table of 28Ă8 64-bit random numbers.
This kind of function has some nice theoretical properties, one of which is called *3-tuple independence*, meaning that every 3-tuple of keys is equally likely to be mapped to any 3-tuple of hash values.
### Customized hash function
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=25 "Edit section: Customized hash function")\]
A hash function can be designed to exploit existing entropy in the keys. If the keys have leading or trailing zeros, or particular fields that are unused, always zero or some other constant, or generally vary little, then masking out only the volatile bits and hashing on those will provide a better and possibly faster hash function. Selected divisors or multipliers in the division and multiplicative schemes may make more uniform hash functions if the keys are cyclic or have other redundancies.
## Hashing variable-length data
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=26 "Edit section: Hashing variable-length data")\]
When the data values are long (or variable-length) [character strings](https://en.wikipedia.org/wiki/Character_string "Character string")âsuch as personal names, [web page addresses](https://en.wikipedia.org/wiki/URL "URL"), or mail messagesâtheir distribution is usually very uneven, with complicated dependencies. For example, text in any [natural language](https://en.wikipedia.org/wiki/Natural_language "Natural language") has highly non-uniform distributions of [characters](https://en.wikipedia.org/wiki/Character_\(computing\) "Character (computing)"), and [character pairs](https://en.wikipedia.org/wiki/Digraph_\(computing\) "Digraph (computing)"), characteristic of the language. For such data, it is prudent to use a hash function that depends on all characters of the stringâand depends on each character in a different way.\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\]
### Middle and ends
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=27 "Edit section: Middle and ends")\]
Simplistic hash functions may add the first and last *n* characters of a string along with the length, or form a word-size hash from the middle 4 characters of a string. This saves iterating over the (potentially long) string, but hash functions that do not hash on all characters of a string can readily become linear due to redundancies, clustering, or other pathologies in the key set. Such strategies may be effective as a custom hash function if the structure of the keys is such that either the middle, ends, or other fields are zero or some other invariant constant that does not differentiate the keys; then the invariant parts of the keys can be ignored.
### Character folding
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=28 "Edit section: Character folding")\]
The paradigmatic example of folding by characters is to add up the integer values of all the characters in the string. A better idea is to multiply the hash total by a constant, typically a sizable prime number, before adding in the next character, ignoring overflow. Using exclusive-or instead of addition is also a plausible alternative. The final operation would be a modulo, mask, or other function to reduce the word value to an index the size of the table. The weakness of this procedure is that information may cluster in the upper or lower bits of the bytes; this clustering will remain in the hashed result and cause more collisions than a proper randomizing hash. ASCII byte codes, for example, have an upper bit of 0, and printable strings do not use the last byte code or most of the first 32 byte codes, so the information, which uses the remaining byte codes, is clustered in the remaining bits in an unobvious manner.
The classic approach, dubbed the [PJW hash](https://en.wikipedia.org/wiki/PJW_hash_function "PJW hash function") based on the work of [Peter J. Weinberger](https://en.wikipedia.org/wiki/Peter_J._Weinberger "Peter J. Weinberger") at [Bell Labs](https://en.wikipedia.org/wiki/Bell_Labs "Bell Labs") in the 1970s, was originally designed for hashing identifiers into compiler symbol tables as given in the ["Dragon Book"](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools "Compilers: Principles, Techniques, and Tools").[\[14\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-19) This hash function offsets the bytes 4 bits before adding them together. When the quantity wraps, the high 4 bits are shifted out and if non-zero, [xored](https://en.wikipedia.org/wiki/Exclusive_or "Exclusive or") back into the low byte of the cumulative quantity. The result is a word-size hash code to which a modulo or other reducing operation can be applied to produce the final hash index.
Today, especially with the advent of 64-bit word sizes, much more efficient variable-length string hashing by word chunks is available.
### Word length folding
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=29 "Edit section: Word length folding")\]
See also: [Universal hashing § Hashing strings](https://en.wikipedia.org/wiki/Universal_hashing#Hashing_strings "Universal hashing")
Modern microprocessors will allow for much faster processing if 8-bit character strings are not hashed by processing one character at a time, but by interpreting the string as an array of 32-bit or 64-bit integers and hashing/accumulating these "wide word" integer values by means of arithmetic operations (e.g. multiplication by constant and bit-shifting). The final word, which may have unoccupied byte positions, is filled with zeros or a specified randomizing value before being folded into the hash. The accumulated hash code is reduced by a final modulo or other operation to yield an index into the table.
### Radix conversion hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=30 "Edit section: Radix conversion hashing")\]
Analogous to the way an ASCII or [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC "EBCDIC") character string representing a decimal number is converted to a numeric quantity for computing, a variable-length string can be converted as *x**k*â1a*k*â1 + *x**k*â2a*k*â2 + ⯠+ *x*1*a* + *x*0. This is simply a polynomial in a [radix](https://en.wikipedia.org/wiki/Radix "Radix") *a* \> 1 that takes the components (*x*0,*x*1,...,*x**k*â1) as the characters of the input string of length *k*. It can be used directly as the hash code, or a hash function applied to it to map the potentially large value to the hash table size. The value of *a* is usually a prime number large enough to hold the number of different characters in the character set of potential keys. Radix conversion hashing of strings minimizes the number of collisions.[\[15\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-20) Available data sizes may restrict the maximum length of string that can be hashed with this method. For example, a 128-bit word will hash only a 26-character alphabetic string (ignoring case) with a radix of 29; a printable ASCII string is limited to 9 characters using radix 97 and a 64-bit word. However, alphabetic keys are usually of modest length, because keys must be stored in the hash table. Numeric character strings are usually not a problem; 64 bits can count up to 1019, or 19 decimal digits with radix 10.
### Rolling hash
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=31 "Edit section: Rolling hash")\]
Main article: [Rolling hash](https://en.wikipedia.org/wiki/Rolling_hash "Rolling hash")
See also: [Linear congruential generator](https://en.wikipedia.org/wiki/Linear_congruential_generator "Linear congruential generator")
In some applications, such as [substring search](https://en.wikipedia.org/wiki/String_searching_algorithm "String searching algorithm"), one can compute a hash function *h* for every *k*\-character [substring](https://en.wikipedia.org/wiki/Substring "Substring") of a given *n*\-character string by advancing a window of width *k* characters along the string, where *k* is a fixed integer, and *n* \> *k*. The straightforward solution, which is to extract such a substring at every character position in the text and compute *h* separately, requires a number of operations proportional to *k*·*n*. However, with the proper choice of *h*, one can use the technique of rolling hash to compute all those hashes with an effort proportional to *mk* + *n* where *m* is the number of occurrences of the substring.[\[16\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-21)\[*[what is the choice of h?](https://en.wikipedia.org/wiki/Wikipedia:Cleanup "Wikipedia:Cleanup")*\]
The most familiar algorithm of this type is [Rabin-Karp](https://en.wikipedia.org/wiki/Rabin-Karp "Rabin-Karp") with best and average case performance *O*(*n*\+*mk*) and worst case *O*(*n*·*k*) (in all fairness, the worst case here is gravely pathological: both the text string and substring are composed of a repeated single character, such as *t*\="AAAAAAAAAAA", and *s*\="AAA"). The hash function used for the algorithm is usually the [Rabin fingerprint](https://en.wikipedia.org/wiki/Rabin_fingerprint "Rabin fingerprint"), designed to avoid collisions in 8-bit character strings, but other suitable hash functions are also used.
### Fuzzy hash
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=32 "Edit section: Fuzzy hash")\]
This section is an excerpt from [Fuzzy hashing](https://en.wikipedia.org/wiki/Fuzzy_hashing "Fuzzy hashing").\[[edit](https://en.wikipedia.org/w/index.php?title=Fuzzy_hashing&action=edit)\]
[Fuzzy hashing](https://en.wikipedia.org/wiki/Fuzzy_hashing "Fuzzy hashing"), also known as similarity hashing,[\[17\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-Fuzzy_hashing_NIST.SP.800-168-22) is a technique for [detecting data that is similar](https://en.wikipedia.org/wiki/Content_similarity_detection "Content similarity detection"), but not exactly the same, as other data. This is in contrast to [cryptographic hash functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function "Cryptographic hash function"), which are designed to have significantly different hashes for even minor differences. Fuzzy hashing has been used to identify malware[\[18\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-Fuzzy_hashing_Beyond_Precision_and_Recall:_Understanding_Uses_\(and_Misuses\)_of_Similarity_Hashes_in_Binary_Analysis-23)[\[19\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-Fuzzy_hashing_Forensic_Malware_Analysis:_The_Value_of_Fuzzy_Hashing_Algorithms_in_Identifying_Similarities-24) and has potential for other applications, like [data loss prevention](https://en.wikipedia.org/wiki/Data_loss_prevention "Data loss prevention") and detecting multiple versions of code.[\[20\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-Fuzzy_hashing_ssdeep-25)[\[21\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-Fuzzy_hashing_tlsh-26)
### Perceptual hash
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=33 "Edit section: Perceptual hash")\]
This section is an excerpt from [Perceptual hashing](https://en.wikipedia.org/wiki/Perceptual_hashing "Perceptual hashing").\[[edit](https://en.wikipedia.org/w/index.php?title=Perceptual_hashing&action=edit)\]
[Perceptual hashing](https://en.wikipedia.org/wiki/Perceptual_hashing "Perceptual hashing") is the use of a [fingerprinting algorithm](https://en.wikipedia.org/wiki/Fingerprint_\(computing\) "Fingerprint (computing)") that produces a snippet, hash, or [fingerprint](https://en.wikipedia.org/wiki/Fingerprint_\(computing\) "Fingerprint (computing)") of various forms of [multimedia](https://en.wikipedia.org/wiki/Multimedia "Multimedia").[\[22\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-Perceptual_hashing_buldas13-27)[\[23\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-Perceptual_hashing_klinger-28) A perceptual hash is a type of [locality-sensitive hash](https://en.wikipedia.org/wiki/Locality-sensitive_hash "Locality-sensitive hash"), which is analogous if [features](https://en.wikipedia.org/wiki/Feature_vector "Feature vector") of the multimedia are similar. This is in contrast to [cryptographic hashing](https://en.wikipedia.org/wiki/Cryptographic_hash_function "Cryptographic hash function"), which relies on the [avalanche effect](https://en.wikipedia.org/wiki/Avalanche_effect "Avalanche effect") of a small change in input value creating a drastic change in output value. Perceptual hash functions are widely used in finding cases of online [copyright infringement](https://en.wikipedia.org/wiki/Copyright_infringement "Copyright infringement") as well as in [digital forensics](https://en.wikipedia.org/wiki/Digital_Forensics_Framework "Digital Forensics Framework") because of the ability to have a correlation between hashes so similar data can be found (for instance with a differing [watermark](https://en.wikipedia.org/wiki/Digital_watermark "Digital watermark")).
## Analysis
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=34 "Edit section: Analysis")\]
Worst case results for a hash function can be assessed two ways: theoretical and practical. The theoretical worst case is the probability that all keys map to a single slot. The practical worst case is the expected longest probe sequence (hash function + collision resolution method). This analysis considers uniform hashing, that is, any key will map to any particular slot with probability 1/*m*, a characteristic of universal hash functions.
While [Knuth](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") worries about adversarial attack on real time systems,[\[24\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-29) Gonnet has shown that the probability of such a case is "ridiculously small". His representation was that the probability of *k* of *n* keys mapping to a single slot is α*k* / (*e*α *k*!), where *α* is the load factor, *n*/*m*.[\[25\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-30)
## History
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=35 "Edit section: History")\]
The term *hash* offers a natural analogy with its non-technical meaning (to chop up or make a mess out of something), given how hash functions scramble their input data to derive their output.[\[26\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-2000-31): 514 In his research for the precise origin of the term, [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") notes that, while [Hans Peter Luhn](https://en.wikipedia.org/wiki/Hans_Peter_Luhn "Hans Peter Luhn") of [IBM](https://en.wikipedia.org/wiki/IBM "IBM") appears to have been the first to use the concept of a hash function in a memo dated January 1953, the term itself did not appear in published literature until the late 1960s, in Herbert Hellerman's *Digital Computer System Principles*, even though it was already widespread jargon by then.[\[26\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-2000-31): 547â548
## See also
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=36 "Edit section: See also")\]
[](https://en.wikipedia.org/wiki/File:Wiktionary-logo-en-v2.svg)
Look up ***[hash](https://en.wiktionary.org/wiki/hash "wiktionary:hash")*** in Wiktionary, the free dictionary.
- [List of hash functions](https://en.wikipedia.org/wiki/List_of_hash_functions "List of hash functions")
- [Nearest neighbor search](https://en.wikipedia.org/wiki/Nearest_neighbor_search "Nearest neighbor search")
- [Distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table "Distributed hash table")
- [Identicon](https://en.wikipedia.org/wiki/Identicon "Identicon")
- [Low-discrepancy sequence](https://en.wikipedia.org/wiki/Low-discrepancy_sequence "Low-discrepancy sequence")
- [Transposition table](https://en.wikipedia.org/wiki/Transposition_table "Transposition table")
## Notes
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=37 "Edit section: Notes")\]
1. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-4)** This is useful in cases where keys are devised by a malicious agent, for example in pursuit of a DOS attack.
2. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-12)** Plain [ASCII](https://en.wikipedia.org/wiki/ASCII "ASCII") is a 7-bit character encoding, although it is often stored in 8-bit bytes with the highest-order bit always clear (zero). Therefore, for plain ASCII, the bytes have only 27 = 128 valid values, and the character translation table has only this many entries.
3. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-13)**
For example, for n=15, k=4, t=6,
S
\=
{
1
,
2
,
3
,
4
,
5
,
6
,
8
,
10
,
12
,
9
}
{\\displaystyle S=\\{1,2,3,4,5,6,8,10,12,9\\}}

\[Knuth\]
4. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-14)** Knuth conveniently leaves the proof of this to the reader.
5. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-17)** Unisys large systems.
## References
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=38 "Edit section: References")\]
1. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-1)**
Aggarwal, Kirti; Verma, Harsh K. (March 19, 2015). *Hash\_RC6 â Variable length Hash algorithm using RC6*. 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICACEA.2015.7164747](https://doi.org/10.1109%2FICACEA.2015.7164747).
2. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-2)**
- ["hash digest"](https://csrc.nist.gov/glossary/term/hash_digest). *Computer Security Resource Center - Glossary*. [NIST](https://en.wikipedia.org/wiki/NIST "NIST").
- ["message digest"](https://csrc.nist.gov/glossary/term/message_digest). *Computer Security Resource Center - Glossary*. [NIST](https://en.wikipedia.org/wiki/NIST "NIST").
3. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-1) [***c***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-2) [***d***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-3)
[Knuth, Donald E.](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") (1973). *The Art of Computer Programming, Vol. 3, Sorting and Searching*. Reading, MA., United States: [Addison-Wesley](https://en.wikipedia.org/wiki/Addison-Wesley "Addison-Wesley"). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1973acp..book.....K](https://ui.adsabs.harvard.edu/abs/1973acp..book.....K). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0-201-03803-3](https://en.wikipedia.org/wiki/Special:BookSources/978-0-201-03803-3 "Special:BookSources/978-0-201-03803-3")
.
4. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-5)**
Stokes, Jon (2002-07-08). ["Understanding CPU caching and performance"](https://arstechnica.com/gadgets/reviews/2002/07/caching.ars). *Ars Technica*. Retrieved 2022-02-06.
5. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-handbook_of_applied_cryptography_6-0)**
Menezes, Alfred J.; van Oorschot, Paul C.; Vanstone, Scott A (1996). [*Handbook of Applied Cryptography*](https://archive.org/details/handbookofapplie0000mene). CRC Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0849385230](https://en.wikipedia.org/wiki/Special:BookSources/978-0849385230 "Special:BookSources/978-0849385230")
.
6. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-7)**
Castro, Julio Cesar Hernandez; et al. (3 February 2005). "The strict avalanche criterion randomness test". *Mathematics and Computers in Simulation*. **68** (1). [Elsevier](https://en.wikipedia.org/wiki/Elsevier "Elsevier"): 1â7\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.matcom.2004.09.001](https://doi.org/10.1016%2Fj.matcom.2004.09.001). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [18086276](https://api.semanticscholar.org/CorpusID:18086276).
7. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-fibonacci-hashing_8-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-fibonacci-hashing_8-1)
Sharupke, Malte (16 June 2018). ["Fibonacci Hashing: The Optimization that the World Forgot"](https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/). *Probably Dance*.
8. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-9)**
Wagner, Urs; Lugrin, Thomas (2023), Mulder, Valentin; Mermoud, Alain; Lenders, Vincent; Tellenbach, Bernhard (eds.), "Hash Functions", *Trends in Data Protection and Encryption Technologies*, Cham: Springer Nature Switzerland, pp. 21â24, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-031-33386-6\_5](https://doi.org/10.1007%2F978-3-031-33386-6_5), [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-031-33386-6](https://en.wikipedia.org/wiki/Special:BookSources/978-3-031-33386-6 "Special:BookSources/978-3-031-33386-6")
`{{citation}}`: CS1 maint: work parameter with ISBN ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_work_parameter_with_ISBN "Category:CS1 maint: work parameter with ISBN"))
9. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-10)**
["3. Data model â Python 3.6.1 documentation"](https://docs.python.org/3/reference/datamodel.html#object.__hash__). *docs.python.org*. Retrieved 2017-03-24.
10. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-algorithms_in_java_11-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-algorithms_in_java_11-1)
Sedgewick, Robert (2002). "14. Hashing". *Algorithms in Java* (3 ed.). Addison Wesley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0201361209](https://en.wikipedia.org/wiki/Special:BookSources/978-0201361209 "Special:BookSources/978-0201361209")
.
11. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-15)**
Dolev, Shlomi; Lahiani, Limor; Haviv, Yinnon (2013). ["Unique permutation hashing"](https://doi.org/10.1016%2Fj.tcs.2012.12.047). *Theoretical Computer Science*. **475**: 59â65\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.tcs.2012.12.047](https://doi.org/10.1016%2Fj.tcs.2012.12.047).
12. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-16)**
["CS 3110 Lecture 21: Hash functions"](https://www.cs.cornell.edu/courses/cs3110/2008fa/lectures/lec21.html). Section "Multiplicative hashing".
13. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-18)**
[Zobrist, Albert L.](https://en.wikipedia.org/wiki/Albert_Lindsey_Zobrist "Albert Lindsey Zobrist") (April 1970), [*A New Hashing Method with Application for Game Playing*](https://www.cs.wisc.edu/techreports/1970/TR88.pdf) (PDF), Tech. Rep. 88, Madison, Wisconsin: Computer Sciences Department, University of Wisconsin
.
14. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-19)**
[Aho, A.](https://en.wikipedia.org/wiki/Alfred_Aho "Alfred Aho"); [Sethi, R.](https://en.wikipedia.org/wiki/Ravi_Sethi "Ravi Sethi"); [Ullman, J. D.](https://en.wikipedia.org/wiki/Jeffrey_Ullman "Jeffrey Ullman") (1986). *Compilers: Principles, Techniques and Tools*. Reading, MA: [Addison-Wesley](https://en.wikipedia.org/wiki/Addison-Wesley "Addison-Wesley"). p. 435. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[0-201-10088-6](https://en.wikipedia.org/wiki/Special:BookSources/0-201-10088-6 "Special:BookSources/0-201-10088-6")
.
15. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-20)**
Ramakrishna, M. V.; Zobel, Justin (1997). ["Performance in Practice of String Hashing Functions"](https://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.18.7520&rep=rep1&type=pdf). *Database Systems for Advanced Applications '97*. DASFAA 1997. pp. 215â224\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.18.7520](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.7520). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1142/9789812819536\_0023](https://doi.org/10.1142%2F9789812819536_0023). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[981-02-3107-5](https://en.wikipedia.org/wiki/Special:BookSources/981-02-3107-5 "Special:BookSources/981-02-3107-5")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [8250194](https://api.semanticscholar.org/CorpusID:8250194). Retrieved 2021-12-06.
16. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-21)**
Singh, N. B. [*A Handbook of Algorithms*](https://books.google.com/books?id=ALIMEQAAQBAJ&dq=rolling+hash&pg=PT102). N.B. Singh.
17. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_NIST.SP.800-168_22-0)**
Breitinger, Frank (May 2014). ["NIST Special Publication 800-168"](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-168.pdf) (PDF). *NIST Publications*. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.6028/NIST.SP.800-168](https://doi.org/10.6028%2FNIST.SP.800-168). Retrieved January 11, 2023.
18. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_Beyond_Precision_and_Recall:_Understanding_Uses_\(and_Misuses\)_of_Similarity_Hashes_in_Binary_Analysis_23-0)**
Pagani, Fabio; Dell'Amico, Matteo; Balzarotti, Davide (2018-03-13). ["Beyond Precision and Recall"](https://pagabuc.me/docs/codaspy18_pagani.pdf) (PDF). *Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy*. New York, NY, USA: ACM. pp. 354â365\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/3176258.3176306](https://doi.org/10.1145%2F3176258.3176306). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[9781450356329](https://en.wikipedia.org/wiki/Special:BookSources/9781450356329 "Special:BookSources/9781450356329")
. Retrieved December 12, 2022.
19. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_Forensic_Malware_Analysis:_The_Value_of_Fuzzy_Hashing_Algorithms_in_Identifying_Similarities_24-0)**
Sarantinos, Nikolaos; BenzaĂŻd, Chafika; Arabiat, Omar (2016). ["Forensic Malware Analysis: The Value of Fuzzy Hashing Algorithms in Identifying Similarities"](https://ieeexplore.ieee.org/document/7847157). [*2016 IEEE Trustcom/BigDataSE/ISPA*](http://roar.uel.ac.uk/5710/1/Forensic%20Malware%20Analysis.pdf) (PDF). pp. 1782â1787\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TrustCom.2016.0274](https://doi.org/10.1109%2FTrustCom.2016.0274). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-5090-3205-1](https://en.wikipedia.org/wiki/Special:BookSources/978-1-5090-3205-1 "Special:BookSources/978-1-5090-3205-1")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [32568938](https://api.semanticscholar.org/CorpusID:32568938). 10.1109/TrustCom.2016.0274.
20. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_ssdeep_25-0)**
Kornblum, Jesse (2006). ["Identifying almost identical files using context triggered piecewise hashing"](https://doi.org/10.1016%2Fj.diin.2006.06.015). *Digital Investigation*. 3, Supplement (September 2006): 91â97\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.diin.2006.06.015](https://doi.org/10.1016%2Fj.diin.2006.06.015).
21. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_tlsh_26-0)**
Oliver, Jonathan; Cheng, Chun; Chen, Yanggui (2013). ["TLSH -- A Locality Sensitive Hash"](https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf) (PDF). *2013 Fourth Cybercrime and Trustworthy Computing Workshop*. IEEE. pp. 7â13\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ctc.2013.9](https://doi.org/10.1109%2Fctc.2013.9). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4799-3076-0](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-3076-0 "Special:BookSources/978-1-4799-3076-0")
. Retrieved December 12, 2022.
22. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Perceptual_hashing_buldas13_27-0)**
Buldas, Ahto; Kroonmaa, Andres; Laanoja, Risto (2013). "Keyless Signatures' Infrastructure: How to Build Global Distributed Hash-Trees". In Riis, Nielson H.; Gollmann, D. (eds.). *Secure IT Systems. NordSec 2013*. Lecture Notes in Computer Science. Vol. 8208. Berlin, Heidelberg: Springer. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-642-41488-6\_21](https://doi.org/10.1007%2F978-3-642-41488-6_21). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-642-41487-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-642-41487-9 "Special:BookSources/978-3-642-41487-9")
. "Keyless Signatures Infrastructure (KSI) is a globally distributed system for providing time-stamping and server-supported digital signature services. Global per-second hash trees are created and their root hash values published. We discuss some service quality issues that arise in practical implementation of the service and present solutions for avoiding single points of failure and guaranteeing a service with reasonable and stable delay. Guardtime AS has been operating a KSI Infrastructure for 5 years. We summarize how the KSI Infrastructure is built, and the lessons learned during the operational period of the service."
23. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Perceptual_hashing_klinger_28-0)**
Klinger, Evan; Starkweather, David. ["pHash.org: Home of pHash, the open source perceptual hash library"](http://www.phash.org/). *pHash.org*. Retrieved 2018-07-05. "pHash is an open source software library released under the GPLv3 license that implements several perceptual hashing algorithms, and provides a C-like API to use those functions in your own programs. pHash itself is written in C++."
24. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-29)**
[Knuth, Donald E.](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") (1975). *The Art of Computer Programming, Vol. 3, Sorting and Searching*. Reading, MA: [Addison-Wesley](https://en.wikipedia.org/wiki/Addison-Wesley "Addison-Wesley"). p. 540.
25. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-30)**
Gonnet, G. (1978). *Expected Length of the Longest Probe Sequence in Hash Code Searching* (Technical report). Ontario, Canada: [University of Waterloo](https://en.wikipedia.org/wiki/University_of_Waterloo "University of Waterloo"). CS-RR-78-46.
26. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-2000_31-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-2000_31-1)
[Knuth, Donald E.](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") (2000). *The Art of Computer Programming, Vol. 3, Sorting and Searching* (2. ed., 6. printing, newly updated and rev. ed.). Boston \[u.a.\]: Addison-Wesley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0-201-89685-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-201-89685-5 "Special:BookSources/978-0-201-89685-5")
.
## External links
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=39 "Edit section: External links")\]
[](https://en.wikipedia.org/wiki/File:Wiktionary-logo-en-v2.svg)
Look up ***[hash](https://en.wiktionary.org/wiki/hash "wiktionary:hash")*** in Wiktionary, the free dictionary.
- [The Goulburn Hashing Function](http://www.sinfocol.org/archivos/2009/11/Goulburn06.pdf) ([PDF](https://en.wikipedia.org/wiki/Portable_Document_Format "Portable Document Format")) by Mayur Patel
- [Hash Function Construction for Textual and Geometrical Data Retrieval](https://dspace5.zcu.cz/bitstream/11025/11784/1/Skala_2010_Corfu-NAUN-Hash.pdf) ([PDF](https://en.wikipedia.org/wiki/Portable_Document_Format "Portable Document Format")) Latest Trends on Computers, Vol.2, pp. 483â489, CSCC Conference, Corfu, 2010
| [v](https://en.wikipedia.org/wiki/Template:Data_structures_and_algorithms "Template:Data structures and algorithms") [t](https://en.wikipedia.org/wiki/Template_talk:Data_structures_and_algorithms "Template talk:Data structures and algorithms") [e](https://en.wikipedia.org/wiki/Special:EditPage/Template:Data_structures_and_algorithms "Special:EditPage/Template:Data structures and algorithms")[Data structures](https://en.wikipedia.org/wiki/Data_structure "Data structure") and [algorithms](https://en.wikipedia.org/wiki/Algorithm "Algorithm") | |
|---|---|
| Data structures | [Array](https://en.wikipedia.org/wiki/Array_\(data_structure\) "Array (data structure)") [Associative array](https://en.wikipedia.org/wiki/Associative_array "Associative array") [Binary search tree](https://en.wikipedia.org/wiki/Binary_search_tree "Binary search tree") [Fenwick tree](https://en.wikipedia.org/wiki/Fenwick_tree "Fenwick tree") [Graph](https://en.wikipedia.org/wiki/Graph_\(abstract_data_type\) "Graph (abstract data type)") [Hash table](https://en.wikipedia.org/wiki/Hash_table "Hash table") [Heap](https://en.wikipedia.org/wiki/Heap_\(data_structure\) "Heap (data structure)") [Linked list](https://en.wikipedia.org/wiki/Linked_list "Linked list") [Queue](https://en.wikipedia.org/wiki/Queue_\(abstract_data_type\) "Queue (abstract data type)") [Segment tree](https://en.wikipedia.org/wiki/Segment_tree "Segment tree") [Stack](https://en.wikipedia.org/wiki/Stack_\(abstract_data_type\) "Stack (abstract data type)") [String](https://en.wikipedia.org/wiki/String_\(computer_science\) "String (computer science)") [Tree](https://en.wikipedia.org/wiki/Tree_\(abstract_data_type\) "Tree (abstract data type)") [Trie](https://en.wikipedia.org/wiki/Trie "Trie") |
| Algorithms and [algorithmic paradigms](https://en.wikipedia.org/wiki/Algorithmic_paradigm "Algorithmic paradigm") | [Backtracking](https://en.wikipedia.org/wiki/Backtracking "Backtracking") [Binary search](https://en.wikipedia.org/wiki/Binary_search "Binary search") [Breadth-first search](https://en.wikipedia.org/wiki/Breadth-first_search "Breadth-first search") [Brute-force search](https://en.wikipedia.org/wiki/Brute-force_search "Brute-force search") [Depth-first search](https://en.wikipedia.org/wiki/Depth-first_search "Depth-first search") [Divide and conquer](https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm "Divide-and-conquer algorithm") [Dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming "Dynamic programming") [Graph traversal](https://en.wikipedia.org/wiki/Graph_traversal "Graph traversal") [Fold](https://en.wikipedia.org/wiki/Fold_\(higher-order_function\) "Fold (higher-order function)") [Greedy](https://en.wikipedia.org/wiki/Greedy_algorithm "Greedy algorithm") [Hash function]() [Minimax](https://en.wikipedia.org/wiki/Minimax "Minimax") [Online](https://en.wikipedia.org/wiki/Online_algorithm "Online algorithm") [Randomized](https://en.wikipedia.org/wiki/Randomized_algorithm "Randomized algorithm") [Recursion](https://en.wikipedia.org/wiki/Recursion_\(computer_science\) "Recursion (computer science)") [Root-finding](https://en.wikipedia.org/wiki/Root-finding_algorithm "Root-finding algorithm") [Sorting](https://en.wikipedia.org/wiki/Sorting_algorithm "Sorting algorithm") [Streaming](https://en.wikipedia.org/wiki/Streaming_algorithm "Streaming algorithm") [Sweep line](https://en.wikipedia.org/wiki/Sweep_line_algorithm "Sweep line algorithm") [String-searching](https://en.wikipedia.org/wiki/String-searching_algorithm "String-searching algorithm") [Topological sorting](https://en.wikipedia.org/wiki/Topological_sorting "Topological sorting") |
| [List of data structures](https://en.wikipedia.org/wiki/List_of_data_structures "List of data structures") [List of algorithms](https://en.wikipedia.org/wiki/List_of_algorithms "List of algorithms") | |

Retrieved from "<https://en.wikipedia.org/w/index.php?title=Hash_function&oldid=1346252233>"
[Categories](https://en.wikipedia.org/wiki/Help:Category "Help:Category"):
- [Hash functions](https://en.wikipedia.org/wiki/Category:Hash_functions "Category:Hash functions")
- [Search algorithms](https://en.wikipedia.org/wiki/Category:Search_algorithms "Category:Search algorithms")
Hidden categories:
- [CS1 maint: work parameter with ISBN](https://en.wikipedia.org/wiki/Category:CS1_maint:_work_parameter_with_ISBN "Category:CS1 maint: work parameter with ISBN")
- [CS1: long volume value](https://en.wikipedia.org/wiki/Category:CS1:_long_volume_value "Category:CS1: long volume value")
- [Articles with short description](https://en.wikipedia.org/wiki/Category:Articles_with_short_description "Category:Articles with short description")
- [Short description is different from Wikidata](https://en.wikipedia.org/wiki/Category:Short_description_is_different_from_Wikidata "Category:Short description is different from Wikidata")
- [Articles needing additional references from July 2010](https://en.wikipedia.org/wiki/Category:Articles_needing_additional_references_from_July_2010 "Category:Articles needing additional references from July 2010")
- [All articles needing additional references](https://en.wikipedia.org/wiki/Category:All_articles_needing_additional_references "Category:All articles needing additional references")
- [Articles needing additional references from March 2026](https://en.wikipedia.org/wiki/Category:Articles_needing_additional_references_from_March_2026 "Category:Articles needing additional references from March 2026")
- [Articles needing additional references from October 2017](https://en.wikipedia.org/wiki/Category:Articles_needing_additional_references_from_October_2017 "Category:Articles needing additional references from October 2017")
- [All accuracy disputes](https://en.wikipedia.org/wiki/Category:All_accuracy_disputes "Category:All accuracy disputes")
- [Articles with disputed statements from March 2026](https://en.wikipedia.org/wiki/Category:Articles_with_disputed_statements_from_March_2026 "Category:Articles with disputed statements from March 2026")
- [All articles with unsourced statements](https://en.wikipedia.org/wiki/Category:All_articles_with_unsourced_statements "Category:All articles with unsourced statements")
- [Articles with unsourced statements from March 2026](https://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements_from_March_2026 "Category:Articles with unsourced statements from March 2026")
- [Articles with unsourced statements from February 2022](https://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements_from_February_2022 "Category:Articles with unsourced statements from February 2022")
- [Wikipedia articles needing clarification from September 2019](https://en.wikipedia.org/wiki/Category:Wikipedia_articles_needing_clarification_from_September_2019 "Category:Wikipedia articles needing clarification from September 2019")
- [Wikipedia articles needing clarification from January 2021](https://en.wikipedia.org/wiki/Category:Wikipedia_articles_needing_clarification_from_January_2021 "Category:Wikipedia articles needing clarification from January 2021")
- [Articles with excerpts](https://en.wikipedia.org/wiki/Category:Articles_with_excerpts "Category:Articles with excerpts")
- This page was last edited on 30 March 2026, at 20:31 (UTC).
- Text is available under the [Creative Commons Attribution-ShareAlike 4.0 License](https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License "Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License"); additional terms may apply. By using this site, you agree to the [Terms of Use](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Terms_of_Use "foundation:Special:MyLanguage/Policy:Terms of Use") and [Privacy Policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy "foundation:Special:MyLanguage/Policy:Privacy policy"). WikipediaÂź is a registered trademark of the [Wikimedia Foundation, Inc.](https://wikimediafoundation.org/), a non-profit organization.
- [Privacy policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy)
- [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About)
- [Disclaimers](https://en.wikipedia.org/wiki/Wikipedia:General_disclaimer)
- [Contact Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Contact_us)
- [Legal & safety contacts](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Legal:Wikimedia_Foundation_Legal_and_Safety_Contact_Information)
- [Code of Conduct](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Universal_Code_of_Conduct)
- [Developers](https://developer.wikimedia.org/)
- [Statistics](https://stats.wikimedia.org/#/en.wikipedia.org)
- [Cookie statement](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Cookie_statement)
- [Mobile view](https://en.wikipedia.org/w/index.php?title=Hash_function&mobileaction=toggle_view_mobile)
- [](https://www.wikimedia.org/)
- [](https://www.mediawiki.org/)
Search
Toggle the table of contents
Hash function
54 languages
[Add topic](https://en.wikipedia.org/wiki/Hash_function) |
| Readable Markdown | "hashlink" redirects here. For the Haxe virtual machine, see [HashLink](https://en.wikipedia.org/wiki/HashLink "HashLink").
This article is about a computer programming construct. For other meanings of "hash" and "hashing", see [Hash (disambiguation)](https://en.wikipedia.org/wiki/Hash_\(disambiguation\) "Hash (disambiguation)").
[](https://en.wikipedia.org/wiki/File:Hash_table_4_1_1_0_0_1_0_LL.svg)
A hash function that maps names to integers from 0 to 15. There is a [collision](https://en.wikipedia.org/wiki/Hash_collision "Hash collision") between keys "John Smith" and "Sandra Dee".
A **hash function** is any [function](https://en.wikipedia.org/wiki/Function_\(mathematics\) "Function (mathematics)") that can be used to map [data](https://en.wikipedia.org/wiki/Data_\(computing\) "Data (computing)") of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output.[\[1\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-1) The values returned by a hash function are called *hash values*, *hash codes*, (*hash/message*) *digests*,[\[2\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-2) or simply *hashes*. The values are usually used to index a fixed-size table called a *[hash table](https://en.wikipedia.org/wiki/Hash_table "Hash table")*. Use of a hash function to index a hash table is called *hashing* or *scatter-storage addressing*.
Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a way to access data quickly and efficiently. Unlike lists or trees, it provides near-constant access time. It also uses much less storage than trying to store all possible keys directly, especially when keys are large or variable in length.
Use of hash functions relies on statistical properties of key and function interaction: worst-case behavior is intolerably bad but rare, and average-case behavior can be nearly optimal (minimal [collision](https://en.wikipedia.org/wiki/Hash_collision "Hash collision")).[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 527
Hash functions are related to (and often confused with) [checksums](https://en.wikipedia.org/wiki/Checksums "Checksums"), [check digits](https://en.wikipedia.org/wiki/Check_digit "Check digit"), [fingerprints](https://en.wikipedia.org/wiki/Fingerprint_\(computing\) "Fingerprint (computing)"), [lossy compression](https://en.wikipedia.org/wiki/Lossy_compression "Lossy compression"), [randomization functions](https://en.wikipedia.org/wiki/Randomization_function "Randomization function"), [error-correcting codes](https://en.wikipedia.org/wiki/Error_correction_code "Error correction code"), and [ciphers](https://en.wikipedia.org/wiki/Cipher "Cipher"). Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash function differs from these concepts mainly in terms of [data integrity](https://en.wikipedia.org/wiki/Data_integrity "Data integrity"). Hash tables may use [non-cryptographic hash functions](https://en.wikipedia.org/wiki/Non-cryptographic_hash_function "Non-cryptographic hash function"), while [cryptographic hash functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function "Cryptographic hash function") are used in cybersecurity to secure sensitive data such as passwords.
In a hash table, a hash function takes a key as an input, which is associated with a datum or record and used to identify it to the data storage and retrieval application. The keys may be fixed-length, like an integer, or variable-length, like a name. In some cases, the key is the datum itself. The output is a hash code used to index a hash table holding the data or records, or pointers to them.
A hash function may be considered to perform three functions:
- Convert variable-length keys into fixed-length (usually [machine-word](https://en.wikipedia.org/wiki/Machine_word "Machine word")\-length or less) values, by folding them by words or other units using a [parity-preserving operator](https://en.wikipedia.org/wiki/Parity_function "Parity function") like ADD or XOR,
- Scramble the bits of the key so that the resulting values are uniformly distributed over the [keyspace](https://en.wikipedia.org/wiki/Key_space_\(cryptography\) "Key space (cryptography)"), and
- Map the key values into ones less than or equal to the size of the table.
A good hash function satisfies two basic properties: it should be very fast to compute, and it should minimize duplication of output values ([collisions](https://en.wikipedia.org/wiki/Hash_collision "Hash collision")). Hash functions rely on generating favorable [probability distributions](https://en.wikipedia.org/wiki/Probability_distribution "Probability distribution") for their effectiveness, reducing access time to nearly constant. High table loading factors, [pathological](https://en.wikipedia.org/wiki/Pathological_\(mathematics\) "Pathological (mathematics)") key sets, and poorly designed hash functions can result in access times approaching linear in the number of items in the table. Hash functions can be designed to give the best worst-case performance,[\[Notes 1\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-4) good performance under high table loading factors, and in special cases, perfect (collisionless) mapping of keys into hash codes. Implementation is based on parity-preserving bit operations (XOR and ADD), multiply, or divide. A necessary adjunct to the hash function is a collision-resolution method that employs an auxiliary data structure like [linked lists](https://en.wikipedia.org/wiki/Linked_list "Linked list"), or systematic probing of the table to find an empty slot.
Hash functions are used in conjunction with [hash tables](https://en.wikipedia.org/wiki/Hash_tables "Hash tables") to store and retrieve data items or data records. The hash function translates the key associated with each datum or record into a hash code, which is used to index the hash table. When an item is to be added to the table, the hash code may index an empty slot (also called a bucket), in which case the item is added to the table there. If the hash code indexes a full slot, then some kind of collision resolution is required: the new item may be omitted (not added to the table), or replace the old item, or be added to the table in some other location by a specified procedure. That procedure depends on the structure of the hash table. In *chained hashing*, each slot is the head of a linked list or chain, and items that collide at the slot are added to the chain. Chains may be kept in random order and searched linearly, or in serial order, or as a self-ordering list by frequency to speed up access. In *open address hashing*, the table is probed starting from the occupied slot in a specified manner, usually by [linear probing](https://en.wikipedia.org/wiki/Linear_probing "Linear probing"), [quadratic probing](https://en.wikipedia.org/wiki/Quadratic_probing "Quadratic probing"), or [double hashing](https://en.wikipedia.org/wiki/Double_hashing "Double hashing") until an open slot is located or the entire table is probed (overflow). Searching for the item follows the same procedure until the item is located, an open slot is found, or the entire table has been searched (item not in table).
Hash functions are also used to build [caches](https://en.wikipedia.org/wiki/Cache_\(computing\) "Cache (computing)") for large data sets stored in slow media. A cache is generally simpler than a hashed search table, since any collision can be resolved by discarding or writing back the older of the two colliding items.[\[4\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-5)
Hash functions are an essential ingredient of the [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter "Bloom filter"), a space-efficient [probabilistic](https://en.wikipedia.org/wiki/Probability "Probability") [data structure](https://en.wikipedia.org/wiki/Data_structure "Data structure") that is used to test whether an [element](https://en.wikipedia.org/wiki/Element_\(mathematics\) "Element (mathematics)") is a member of a [set](https://en.wikipedia.org/wiki/Set_\(computer_science\) "Set (computer science)").
A special case of hashing is known as [geometric hashing](https://en.wikipedia.org/wiki/Geometric_hashing "Geometric hashing") or the *grid method*. In these applications, the set of all inputs is some sort of [metric space](https://en.wikipedia.org/wiki/Metric_space "Metric space"), and the hashing function can be interpreted as a [partition](https://en.wikipedia.org/wiki/Partition_\(mathematics\) "Partition (mathematics)") of that space into a grid of *cells*. The table is often an array with two or more indices (called a *[grid file](https://en.wikipedia.org/wiki/Grid_file "Grid file")*, *grid index*, *bucket grid*, and similar names), and the hash function returns an index [tuple](https://en.wikipedia.org/wiki/Tuple "Tuple"). This principle is widely used in [computer graphics](https://en.wikipedia.org/wiki/Computer_graphics "Computer graphics"), [computational geometry](https://en.wikipedia.org/wiki/Computational_geometry "Computational geometry"), and many other disciplines, to solve many [proximity problems](https://en.wikipedia.org/wiki/Proximity_problem "Proximity problem") in the [plane](https://en.wikipedia.org/wiki/Plane_\(geometry\) "Plane (geometry)") or in [three-dimensional space](https://en.wikipedia.org/wiki/Three-dimensional_space "Three-dimensional space"), such as finding [closest pairs](https://en.wikipedia.org/wiki/Closest_pair_problem "Closest pair problem") in a set of points, similar shapes in a list of shapes, similar [images](https://en.wikipedia.org/wiki/Image_processing "Image processing") in an [image database](https://en.wikipedia.org/wiki/Image_retrieval "Image retrieval"), and so on.
Hash tables are also used to implement [associative arrays](https://en.wikipedia.org/wiki/Associative_array "Associative array") and [dynamic sets](https://en.wikipedia.org/wiki/Set_\(abstract_data_type\) "Set (abstract data type)").[\[5\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-handbook_of_applied_cryptography-6)
A good hash function should map the expected inputs as evenly as possible over its output range. That is, every hash value in the output range should be generated with roughly the same [probability](https://en.wikipedia.org/wiki/Probability "Probability"). The reason for this last requirement is that the cost of hashing-based methods goes up sharply as the number of *collisions*âpairs of inputs that are mapped to the same hash valueâincreases. If some hash values are more likely to occur than others, then a larger fraction of the lookup operations will have to search through a larger set of colliding table entries.
This criterion only requires the value to be *uniformly distributed*, not *random* in any sense. A good randomizing function is (barring computational efficiency concerns) generally a good choice as a hash function, but the converse need not be true.
Hash tables often contain only a small subset of the valid inputs. For instance, a club membership list may contain only a hundred or so member names, out of the very large set of all possible names. In these cases, the uniformity criterion should hold for almost all typical subsets of entries that may be found in the table, not just for the global set of all possible entries.
In other words, if a typical set of *m* records is hashed to *n* table slots, then the probability of a bucket receiving many more than *m*/*n* records should be vanishingly small. In particular, if *m* \< *n*, then very few buckets should have more than one or two records. A small number of collisions is virtually inevitable, even if *n* is much larger than *m*âsee the [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem "Birthday problem").
In special cases when the keys are known in advance and the key set is static, a hash function can be found that achieves absolute (or collisionless) uniformity. Such a hash function is said to be *[perfect](https://en.wikipedia.org/wiki/Perfect_hash_function "Perfect hash function")*. There is no algorithmic way of constructing such a functionâsearching for one is a [factorial](https://en.wikipedia.org/wiki/Factorial "Factorial") function of the number of keys to be mapped versus the number of table slots that they are mapped into. Finding a perfect hash function over more than a very small set of keys is usually computationally infeasible; the resulting function is likely to be more computationally complex than a standard hash function and provides only a marginal advantage over a function with good statistical properties that yields a minimum number of collisions. See [universal hash function](https://en.wikipedia.org/wiki/Universal_hashing "Universal hashing").
### Testing and measurement
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=6 "Edit section: Testing and measurement")\]
When testing a hash function, the uniformity of the distribution of hash values can be evaluated by the [chi-squared test](https://en.wikipedia.org/wiki/Chi-squared_test "Chi-squared test"). This test is a goodness-of-fit measure: it is the actual distribution of items in buckets versus the expected (or uniform) distribution of items. The formula is
\[*[dubious](https://en.wikipedia.org/wiki/Wikipedia:Accuracy_dispute#Disputed_statement "Wikipedia:Accuracy dispute") â [discuss](https://en.wikipedia.org/wiki/Talk:Hash_function#Dubious "Talk:Hash function")*\]\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
where n is the number of keys, m is the number of buckets, and *b**j* is the number of items in bucket j.
A ratio within one confidence interval (such as 0.95 to 1.05) is indicative that the hash function evaluated has an expected uniform distribution.
Hash functions can have some technical properties that make it more likely that they will have a uniform distribution when applied. One is the [strict avalanche criterion](https://en.wikipedia.org/wiki/Strict_avalanche_criterion "Strict avalanche criterion"): whenever a single input bit is complemented, each of the output bits changes with a 50% probability. The reason for this property is that selected subsets of the keyspace may have low variability. For the output to be uniformly distributed, a low amount of variability, even one bit, should translate into a high amount of variability (i.e. distribution over the tablespace) in the output. Each bit should change with a probability of 50% because, if some bits are reluctant to change, then the keys become clustered around those values. If the bits want to change too readily, then the mapping is approaching a fixed XOR function of a single bit. Standard tests for this property have been described in the literature.[\[6\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-7) The relevance of the criterion to a multiplicative hash function is assessed here.[\[7\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-fibonacci-hashing-8)
In data storage and retrieval applications, the use of a hash function is a trade-off between search time and data storage space. If search time were unbounded, then a very compact unordered linear list would be the best medium; if storage space were unbounded, then a randomly accessible structure indexable by the key-value would be very large and very sparse, but very fast. A hash function takes a finite amount of time to map a potentially large keyspace to a feasible amount of storage space searchable in a bounded amount of time regardless of the number of keys. In most applications, the hash function should be computable with minimum latency and secondarily in a minimum number of instructions.
Computational complexity varies with the number of instructions required and latency of individual instructions, with the simplest being the bitwise methods (folding), followed by the multiplicative methods, and the most complex (slowest) are the division-based methods.
Because collisions should be infrequent, and cause a marginal delay but are otherwise harmless, it is usually preferable to choose a faster hash function over one that needs more computation but saves a few collisions.
Division-based implementations can be of particular concern because a division requires multiple cycles on nearly all processor [microarchitectures](https://en.wikipedia.org/wiki/Microarchitecture "Microarchitecture"). Division ([modulo](https://en.wikipedia.org/wiki/Modulo_operation "Modulo operation")) by a constant can be inverted to become a multiplication by the word-size multiplicative-inverse of that constant. This can be done by the programmer, or by the compiler. Division can also be reduced directly into a series of shift-subtracts and shift-adds, though minimizing the number of such operations required is a daunting problem; the number of machine-language instructions resulting may be more than a dozen and swamp the pipeline. If the microarchitecture has [hardware multiply](https://en.wikipedia.org/wiki/Hardware_multiply "Hardware multiply") [functional units](https://en.wikipedia.org/wiki/Functional_unit "Functional unit"), then the multiply-by-inverse is likely a better approach.
We can allow the table size *n* to not be a power of 2 and still not have to perform any remainder or division operation, as these computations are sometimes costly. For example, let *n* be significantly less than 2*b*. Consider a [pseudorandom number generator](https://en.wikipedia.org/wiki/Pseudorandom_number_generator "Pseudorandom number generator") function *P*(key) that is uniform on the interval \[0, 2*b* â 1\]. A hash function uniform on the interval \[0, *n* â 1\] is *n* *P*(key) / 2*b*. We can replace the division by a (possibly faster) right [bit shift](https://en.wikipedia.org/wiki/Bit_shifting "Bit shifting"): *n P*(key) \>\> *b*.
If keys are being hashed repeatedly, and the hash function is costly, then computing time can be saved by precomputing the hash codes and storing them with the keys. Matching hash codes almost certainly means that the keys are identical. This technique is used for the transposition table in game-playing programs, which stores a 64-bit hashed representation of the board position.
A *universal hashing* scheme is a [randomized algorithm](https://en.wikipedia.org/wiki/Randomized_algorithm "Randomized algorithm") that selects a hash function *h* among a family of such functions, in such a way that the probability of a collision of any two distinct keys is 1/*m*, where *m* is the number of distinct hash values desiredâindependently of the two keys. Universal hashing ensures (in a probabilistic sense) that the hash [function application](https://en.wikipedia.org/wiki/Function_application "Function application") will behave as well as if it were using a random function, for any distribution of the input data. It will, however, have more collisions than perfect hashing and may require more operations than a special-purpose hash function.
A hash function that allows only certain table sizes or strings only up to a certain length, or cannot accept a seed (i.e. allow double hashing) is less useful than one that does.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
A hash function is applicable in a variety of situations. Particularly within cryptography, notable applications include:[\[8\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-9)
- [Integrity checking](https://en.wikipedia.org/wiki/File_verification "File verification"): Identical hash values for different files imply equality, providing a reliable means to detect file modifications.
- [Key derivation](https://en.wikipedia.org/wiki/Key_derivation_function "Key derivation function"): Minor input changes result in a random-looking output alteration, known as the diffusion property. Thus, hash functions are valuable for key derivation functions.
- [Message authentication codes](https://en.wikipedia.org/wiki/Message_authentication_code "Message authentication code") (MACs): Through the integration of a confidential key with the input data, hash functions can generate MACs ensuring the genuineness of the data, such as in [HMACs](https://en.wikipedia.org/wiki/HMAC "HMAC").
- Password storage: The password's hash value does not expose any password details, emphasizing the importance of securely storing hashed passwords on the server.
- [Signatures](https://en.wikipedia.org/wiki/Digital_signature "Digital signature"): Message hashes are signed rather than the whole message.
A hash procedure must be [deterministic](https://en.wikipedia.org/wiki/Deterministic_algorithm "Deterministic algorithm")âfor a given input value, it must always generate the same hash value. In other words, it must be a [function](https://en.wikipedia.org/wiki/Function_\(mathematics\) "Function (mathematics)") of the data to be hashed, in the mathematical sense of the term. This requirement excludes hash functions that depend on external variable parameters, such as [pseudo-random number generators](https://en.wikipedia.org/wiki/Pseudo-random_number_generator "Pseudo-random number generator") or the time of day. It also excludes functions that depend on the [memory address](https://en.wikipedia.org/wiki/Memory_address "Memory address") of the object being hashed, because the address may change during execution (as may happen on systems that use certain methods of [garbage collection](https://en.wikipedia.org/wiki/Garbage_collection_\(computer_science\) "Garbage collection (computer science)")), although sometimes rehashing of the item is possible.
The determinism is in the context of the reuse of the function. For example, [Python](https://en.wikipedia.org/wiki/Python_\(programming_language\) "Python (programming language)") adds the feature that hash functions make use of a randomized seed that is generated once when the Python process starts in addition to the input to be hashed.[\[9\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-10) The Python hash ([SipHash](https://en.wikipedia.org/wiki/SipHash "SipHash")) is still a valid hash function when used within a single run, but if the values are persisted (for example, written to disk), they can no longer be treated as valid hash values, since in the next run the random value might differ.
It is often desirable that the output of a hash function have fixed size (but see below). If, for example, the output is constrained to 32-bit integer values, then the hash values can be used to index into an array. Such hashing is commonly used to accelerate data searches.[\[10\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-algorithms_in_java-11) Producing fixed-length output from variable-length input can be accomplished by breaking the input data into chunks of specific size. Hash functions used for data searches use some arithmetic expression that iteratively processes chunks of the input (such as the characters in a string) to produce the hash value.[\[10\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-algorithms_in_java-11)
In many applications, the range of hash values may be different for each run of the program or may change along the same run (for instance, when a hash table needs to be expanded). In those situations, one needs a hash function which takes two parametersâthe input data *z*, and the number *n* of allowed hash values.
A common solution is to compute a fixed hash function with a very large range (say, 0 to 232 â 1), divide the result by *n*, and use the division's [remainder](https://en.wikipedia.org/wiki/Modulo_operation "Modulo operation"). If *n* is itself a power of 2, this can be done by [bit masking](https://en.wikipedia.org/wiki/Mask_\(computing\) "Mask (computing)") and [bit shifting](https://en.wikipedia.org/wiki/Bit_shifting "Bit shifting"). When this approach is used, the hash function must be chosen so that the result has fairly uniform distribution between 0 and *n* â 1, for any value of *n* that may occur in the application. Depending on the function, the remainder may be uniform only for certain values of *n*, e.g. [odd](https://en.wikipedia.org/wiki/Odd_number "Odd number") or [prime numbers](https://en.wikipedia.org/wiki/Prime_number "Prime number").
### Variable range with minimal movement (dynamic hash function)
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=13 "Edit section: Variable range with minimal movement (dynamic hash function)")\]
When the hash function is used to store values in a hash table that outlives the run of the program, and the hash table needs to be expanded or shrunk, the hash table is referred to as a dynamic hash table.
A hash function that will relocate the minimum number of records when the table is resized is desirable. What is needed is a hash function *H*(*z*,*n*) (where *z* is the key being hashed and *n* is the number of allowed hash values) such that *H*(*z*,*n* + 1) = *H*(*z*,*n*) with probability close to *n*/(*n* + 1).
[Linear hashing](https://en.wikipedia.org/wiki/Linear_hashing "Linear hashing") and [spiral hashing](https://en.wikipedia.org/wiki/Spiral_hashing "Spiral hashing") are examples of dynamic hash functions that execute in constant time but relax the property of uniformity to achieve the minimal movement property. [Extendible hashing](https://en.wikipedia.org/wiki/Extendible_hashing "Extendible hashing") uses a dynamic hash function that requires space proportional to *n* to compute the hash function, and it becomes a function of the previous keys that have been inserted. Several algorithms that preserve the uniformity property but require time proportional to *n* to compute the value of *H*(*z*,*n*) have been invented.\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\]
A hash function with minimal movement is especially useful in [distributed hash tables](https://en.wikipedia.org/wiki/Distributed_hash_table "Distributed hash table").
In some applications, the input data may contain features that are irrelevant for comparison purposes. For example, when looking up a personal name, it may be desirable to ignore the distinction between upper and lower case letters. For such data, one must use a hash function that is compatible with the data [equivalence](https://en.wikipedia.org/wiki/Equivalence_relation "Equivalence relation") criterion being used: that is, any two inputs that are considered equivalent must yield the same hash value. This can be accomplished by normalizing the input before hashing it, as by upper-casing all letters.
## Hashing integer data types
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=15 "Edit section: Hashing integer data types")\]
There are several common algorithms for hashing integers. The method giving the best distribution is data-dependent. One of the simplest and most common methods in practice is the modulo division method.
### Identity hash function
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=16 "Edit section: Identity hash function")\]
If the data to be hashed is small enough, then one can use the data itself (reinterpreted as an integer) as the hashed value. The cost of computing this *[identity](https://en.wikipedia.org/wiki/Identity_function "Identity function")* hash function is effectively zero. This hash function is [perfect](https://en.wikipedia.org/wiki/Perfect_hash_function "Perfect hash function"), as it maps each input to a distinct hash value.
The meaning of "small enough" depends on the size of the type that is used as the hashed value. For example, in [Java](https://en.wikipedia.org/wiki/Java_\(programming_language\) "Java (programming language)"), the hash code is a 32-bit integer. Thus the 32-bit integer `Integer` and 32-bit floating-point `Float` objects can simply use the value directly, whereas the 64-bit integer `Long` and 64-bit floating-point `Double` cannot.
Other types of data can also use this hashing scheme. For example, when mapping [character strings](https://en.wikipedia.org/wiki/Character_string "Character string") between [upper and lower case](https://en.wikipedia.org/wiki/Letter_case "Letter case"), one can use the binary encoding of each character, interpreted as an integer, to index a table that gives the alternative form of that character ("A" for "a", "8" for "8", etc.). If each character is stored in 8 bits (as in [extended ASCII](https://en.wikipedia.org/wiki/Extended_ASCII "Extended ASCII")[\[Notes 2\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-12) or [ISO Latin 1](https://en.wikipedia.org/wiki/ISO_Latin_1 "ISO Latin 1")), the table has only 28 = 256 entries; in the case of [Unicode](https://en.wikipedia.org/wiki/Unicode "Unicode") characters, the table would have 17 Ă 216 = 1114112 entries.
The same technique can be used to map [two-letter country codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 "ISO 3166-1 alpha-2") like "us" or "za" to country names (262 = 676 table entries), 5-digit [ZIP codes](https://en.wikipedia.org/wiki/ZIP_Code "ZIP Code") like 13083 to city names (100000 entries), etc. Invalid data values (such as the country code "xx" or the ZIP code 00000) may be left undefined in the table or mapped to some appropriate "null" value.
### Trivial hash function
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=17 "Edit section: Trivial hash function")\]
If the keys are uniformly or sufficiently uniformly distributed over the key space, so that the key values are essentially random, then they may be considered to be already "hashed". In this case, any number of any bits in the key may be extracted and collated as an index into the hash table. For example, a simple hash function might mask off the m least significant bits and use the result as an index into a hash table of size 2*m*.
A mid-squares hash code is produced by squaring the input and extracting an appropriate number of middle digits or bits. For example, if the input is 123456789 and the hash table size 10000, then squaring the key produces 15241578750190521, so the hash code is taken as the middle 4 digits of the 17-digit number (ignoring the high digit) 8750. The mid-squares method produces a reasonable hash code if there is not a lot of leading or trailing zeros in the key. This is a variant of multiplicative hashing, but not as good because an arbitrary key is not a good multiplier.
A standard technique is to use a modulo function on the key, by selecting a divisor M which is a prime number close to the table size, so *h*(*K*) ⥠*K* (mod *M*). The table size is usually a power of 2. This gives a distribution from {0, *M* â 1}. This gives good results over a large number of key sets. A significant drawback of division hashing is that division requires multiple cycles on most modern architectures (including [x86](https://en.wikipedia.org/wiki/X86 "X86")) and can be 10 times slower than multiplication. A second drawback is that it will not break up clustered keys. For example, the keys 123000, 456000, 789000, etc. modulo 1000 all map to the same address. This technique works well in practice because many key sets are sufficiently random already, and the probability that a key set will be cyclical by a large prime number is small.
Algebraic coding is a variant of the division method of hashing which uses division by a polynomial modulo 2 instead of an integer to map n bits to m bits.[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 512â513 In this approach, *M* = 2*m*, and we postulate an mth-degree polynomial *Z*(*x*) = *x**m* + ζ*m*â1*x**m*â1 + ⯠+ ζ0. A key *K* = (*k**n*â1âŠ*k*1*k*0)2 can be regarded as the polynomial *K*(*x*) = *k**n*â1*x**n*â1 + ⯠+ *k*1*x* + *k*0. The remainder using polynomial arithmetic modulo 2 is *K*(*x*) mod *Z*(*x*) = *h**m*â1*x**m*â1 + ⯠*h*1*x* + *h*0. Then *h*(*K*) = (*h**m*â1âŠ*h*1*h*0)2. If *Z*(*x*) is constructed to have t or fewer non-zero coefficients, then keys which share fewer than t bits are guaranteed to not collide.
Z is a function of k, t, and n (the last of which is a divisor of 2*k* â 1) and is constructed from the [finite field](https://en.wikipedia.org/wiki/Finite_field "Finite field") GF(2*k*). [Knuth](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") gives an example: taking (*n*,*m*,*t*) = (15,10,7) yields *Z*(*x*) = *x*10 + *x*8 + *x*5 + *x*4 + *x*2 + *x* + 1. The derivation is as follows:
Let S be the smallest set of integers such that {1,2,âŠ,*t*} â *S* and (2*j* mod *n*) â *S* â*j* â *S*.[\[Notes 3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-13)
Define  where α â*n* GF(2*k*) and where the coefficients of *P*(*x*) are computed in this field. Then the degree of *P*(*x*) = \|*S*\|. Since α2*j* is a root of *P*(*x*) whenever α*j* is a root, it follows that the coefficients *pi* of *P*(*x*) satisfy *p*2
*i* = *p*i, so they are all 0 or 1. If *R*(*x*) = *r**n*â1*x**n*â1 + ⯠+ *r*1*x* + *r*0 is any nonzero polynomial modulo 2 with at most t nonzero coefficients, then *R*(*x*) is not a multiple of *P*(*x*) modulo 2.[\[Notes 4\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-14) If follows that the corresponding hash function will map keys with fewer than t bits in common to unique indices.[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 542â543
The usual outcome is that either n will get large, or t will get large, or both, for the scheme to be computationally feasible. Therefore, it is more suited to hardware or microcode implementation.[\[3\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-1973-3): 542â543
### Unique permutation hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=21 "Edit section: Unique permutation hashing")\]
Unique permutation hashing has a guaranteed best worst-case insertion time.[\[11\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-15)
### Multiplicative hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=22 "Edit section: Multiplicative hashing")\]
Standard multiplicative hashing uses the formula *h**a*(*K*) = â(*aK* mod *W*) / (*W*/*M*)â, which produces a hash value in {0, âŠ, *M* â 1}. The value a is an appropriately chosen value that should be [relatively prime](https://en.wikipedia.org/wiki/Coprime_integers "Coprime integers") to W; it should be large,\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\] and its binary representation a random mix\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\] of 1s and 0s. An important practical special case occurs when *W* = 2*w* and *M* = 2*m* are powers of 2 and w is the machine [word size](https://en.wikipedia.org/wiki/Word_size "Word size"). In this case, this formula becomes *h**a*(*K*) = â(*aK* mod 2*w*) / 2*w*â*m*â. This is special because arithmetic modulo 2*w* is done by default in low-level programming languages and integer division by a power of 2 is simply a right-shift, so, in [C](https://en.wikipedia.org/wiki/C_\(programming_language\) "C (programming language)"), for example, this function becomes
```
unsigned hash(unsigned K) {
return (a * K) >> (w - m);
}
```
and for fixed m and w this translates into a single integer multiplication and right-shift, making it one of the fastest hash functions to compute.
Multiplicative hashing is susceptible to a "common mistake" that leads to poor diffusionâhigher-value input bits do not affect lower-value output bits.[\[12\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-16) A transmutation on the input which shifts the span of retained top bits down and XORs or ADDs them to the key before the multiplication step corrects for this. The resulting function looks like:[\[7\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-fibonacci-hashing-8)
```
unsigned hash(unsigned K) {
K ^= K >> (w - m);
return (a * K) >> (w - m);
}
```
[Fibonacci](https://en.wikipedia.org/wiki/Fibonacci_number "Fibonacci number") hashing is a form of multiplicative hashing in which the multiplier is 2*w* / Ï, where w is the machine word length and Ï (phi) is the [golden ratio](https://en.wikipedia.org/wiki/Golden_ratio "Golden ratio") (approximately 1.618). A property of this multiplier is that it uniformly distributes over the table space, [blocks](https://en.wikipedia.org/wiki/Blockchain "Blockchain") of consecutive keys with respect to any block of bits in the key. Consecutive keys within the high bits or low bits of the key (or some other field) are relatively common. The multipliers for various word lengths are:
- 16: *a* = 9E3716 = 4050310
- 32: *a* = 9E3779B916 = 265443576910
- 48: *a* = 9E3779B97F4B16 = 17396110258977110[\[Notes 5\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-17)
- 64: *a* = 9E3779B97F4A7C1516 = 1140071481932319848510
The multiplier should be odd, so the least significant bit of the output is invertible modulo 2*w*. The last two values given above are rounded (up and down, respectively) by more than 1/2 of a least-significant bit to achieve this.
*Zobrist hashing*, named after [Albert Zobrist](https://en.wikipedia.org/wiki/Albert_Lindsey_Zobrist "Albert Lindsey Zobrist"), is a form of [tabulation hashing](https://en.wikipedia.org/wiki/Tabulation_hashing "Tabulation hashing"), which is a method for constructing universal families of hash functions by combining table lookup with XOR operations. This algorithm has proven to be very fast and of high quality for hashing purposes (especially hashing of integer-number keys).[\[13\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-18)
Zobrist hashing was originally introduced as a means of compactly representing chess positions in computer game-playing programs. A unique random number was assigned to represent each type of piece (six each for black and white) on each space of the board. Thus a table of 64Ă12 such numbers is initialized at the start of the program. The random numbers could be any length, but 64 bits was natural due to the 64 squares on the board. A position was transcribed by cycling through the pieces in a position, indexing the corresponding random numbers (vacant spaces were not included in the calculation) and XORing them together (the starting value could be 0 (the identity value for XOR) or a random seed). The resulting value was reduced by modulo, folding, or some other operation to produce a hash table index. The original Zobrist hash was stored in the table as the representation of the position.
Later, the method was extended to hashing integers by representing each byte in each of 4 possible positions in the word by a unique 32-bit random number. Thus, a table of 28Ă4 random numbers is constructed. A 32-bit hashed integer is transcribed by successively indexing the table with the value of each byte of the plain text integer and XORing the loaded values together (again, the starting value can be the identity value or a random seed). The natural extension to 64-bit integers is by use of a table of 28Ă8 64-bit random numbers.
This kind of function has some nice theoretical properties, one of which is called *3-tuple independence*, meaning that every 3-tuple of keys is equally likely to be mapped to any 3-tuple of hash values.
### Customized hash function
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=25 "Edit section: Customized hash function")\]
A hash function can be designed to exploit existing entropy in the keys. If the keys have leading or trailing zeros, or particular fields that are unused, always zero or some other constant, or generally vary little, then masking out only the volatile bits and hashing on those will provide a better and possibly faster hash function. Selected divisors or multipliers in the division and multiplicative schemes may make more uniform hash functions if the keys are cyclic or have other redundancies.
## Hashing variable-length data
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=26 "Edit section: Hashing variable-length data")\]
When the data values are long (or variable-length) [character strings](https://en.wikipedia.org/wiki/Character_string "Character string")âsuch as personal names, [web page addresses](https://en.wikipedia.org/wiki/URL "URL"), or mail messagesâtheir distribution is usually very uneven, with complicated dependencies. For example, text in any [natural language](https://en.wikipedia.org/wiki/Natural_language "Natural language") has highly non-uniform distributions of [characters](https://en.wikipedia.org/wiki/Character_\(computing\) "Character (computing)"), and [character pairs](https://en.wikipedia.org/wiki/Digraph_\(computing\) "Digraph (computing)"), characteristic of the language. For such data, it is prudent to use a hash function that depends on all characters of the stringâand depends on each character in a different way.\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\]
Simplistic hash functions may add the first and last *n* characters of a string along with the length, or form a word-size hash from the middle 4 characters of a string. This saves iterating over the (potentially long) string, but hash functions that do not hash on all characters of a string can readily become linear due to redundancies, clustering, or other pathologies in the key set. Such strategies may be effective as a custom hash function if the structure of the keys is such that either the middle, ends, or other fields are zero or some other invariant constant that does not differentiate the keys; then the invariant parts of the keys can be ignored.
The paradigmatic example of folding by characters is to add up the integer values of all the characters in the string. A better idea is to multiply the hash total by a constant, typically a sizable prime number, before adding in the next character, ignoring overflow. Using exclusive-or instead of addition is also a plausible alternative. The final operation would be a modulo, mask, or other function to reduce the word value to an index the size of the table. The weakness of this procedure is that information may cluster in the upper or lower bits of the bytes; this clustering will remain in the hashed result and cause more collisions than a proper randomizing hash. ASCII byte codes, for example, have an upper bit of 0, and printable strings do not use the last byte code or most of the first 32 byte codes, so the information, which uses the remaining byte codes, is clustered in the remaining bits in an unobvious manner.
The classic approach, dubbed the [PJW hash](https://en.wikipedia.org/wiki/PJW_hash_function "PJW hash function") based on the work of [Peter J. Weinberger](https://en.wikipedia.org/wiki/Peter_J._Weinberger "Peter J. Weinberger") at [Bell Labs](https://en.wikipedia.org/wiki/Bell_Labs "Bell Labs") in the 1970s, was originally designed for hashing identifiers into compiler symbol tables as given in the ["Dragon Book"](https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools "Compilers: Principles, Techniques, and Tools").[\[14\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-19) This hash function offsets the bytes 4 bits before adding them together. When the quantity wraps, the high 4 bits are shifted out and if non-zero, [xored](https://en.wikipedia.org/wiki/Exclusive_or "Exclusive or") back into the low byte of the cumulative quantity. The result is a word-size hash code to which a modulo or other reducing operation can be applied to produce the final hash index.
Today, especially with the advent of 64-bit word sizes, much more efficient variable-length string hashing by word chunks is available.
### Word length folding
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=29 "Edit section: Word length folding")\]
Modern microprocessors will allow for much faster processing if 8-bit character strings are not hashed by processing one character at a time, but by interpreting the string as an array of 32-bit or 64-bit integers and hashing/accumulating these "wide word" integer values by means of arithmetic operations (e.g. multiplication by constant and bit-shifting). The final word, which may have unoccupied byte positions, is filled with zeros or a specified randomizing value before being folded into the hash. The accumulated hash code is reduced by a final modulo or other operation to yield an index into the table.
### Radix conversion hashing
\[[edit](https://en.wikipedia.org/w/index.php?title=Hash_function&action=edit§ion=30 "Edit section: Radix conversion hashing")\]
Analogous to the way an ASCII or [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC "EBCDIC") character string representing a decimal number is converted to a numeric quantity for computing, a variable-length string can be converted as *x**k*â1a*k*â1 + *x**k*â2a*k*â2 + ⯠+ *x*1*a* + *x*0. This is simply a polynomial in a [radix](https://en.wikipedia.org/wiki/Radix "Radix") *a* \> 1 that takes the components (*x*0,*x*1,...,*x**k*â1) as the characters of the input string of length *k*. It can be used directly as the hash code, or a hash function applied to it to map the potentially large value to the hash table size. The value of *a* is usually a prime number large enough to hold the number of different characters in the character set of potential keys. Radix conversion hashing of strings minimizes the number of collisions.[\[15\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-20) Available data sizes may restrict the maximum length of string that can be hashed with this method. For example, a 128-bit word will hash only a 26-character alphabetic string (ignoring case) with a radix of 29; a printable ASCII string is limited to 9 characters using radix 97 and a 64-bit word. However, alphabetic keys are usually of modest length, because keys must be stored in the hash table. Numeric character strings are usually not a problem; 64 bits can count up to 1019, or 19 decimal digits with radix 10.
In some applications, such as [substring search](https://en.wikipedia.org/wiki/String_searching_algorithm "String searching algorithm"), one can compute a hash function *h* for every *k*\-character [substring](https://en.wikipedia.org/wiki/Substring "Substring") of a given *n*\-character string by advancing a window of width *k* characters along the string, where *k* is a fixed integer, and *n* \> *k*. The straightforward solution, which is to extract such a substring at every character position in the text and compute *h* separately, requires a number of operations proportional to *k*·*n*. However, with the proper choice of *h*, one can use the technique of rolling hash to compute all those hashes with an effort proportional to *mk* + *n* where *m* is the number of occurrences of the substring.[\[16\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-21)\[*[what is the choice of h?](https://en.wikipedia.org/wiki/Wikipedia:Cleanup "Wikipedia:Cleanup")*\]
The most familiar algorithm of this type is [Rabin-Karp](https://en.wikipedia.org/wiki/Rabin-Karp "Rabin-Karp") with best and average case performance *O*(*n*\+*mk*) and worst case *O*(*n*·*k*) (in all fairness, the worst case here is gravely pathological: both the text string and substring are composed of a repeated single character, such as *t*\="AAAAAAAAAAA", and *s*\="AAA"). The hash function used for the algorithm is usually the [Rabin fingerprint](https://en.wikipedia.org/wiki/Rabin_fingerprint "Rabin fingerprint"), designed to avoid collisions in 8-bit character strings, but other suitable hash functions are also used.
Worst case results for a hash function can be assessed two ways: theoretical and practical. The theoretical worst case is the probability that all keys map to a single slot. The practical worst case is the expected longest probe sequence (hash function + collision resolution method). This analysis considers uniform hashing, that is, any key will map to any particular slot with probability 1/*m*, a characteristic of universal hash functions.
While [Knuth](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") worries about adversarial attack on real time systems,[\[24\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-29) Gonnet has shown that the probability of such a case is "ridiculously small". His representation was that the probability of *k* of *n* keys mapping to a single slot is α*k* / (*e*α *k*!), where *α* is the load factor, *n*/*m*.[\[25\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-30)
The term *hash* offers a natural analogy with its non-technical meaning (to chop up or make a mess out of something), given how hash functions scramble their input data to derive their output.[\[26\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-2000-31): 514 In his research for the precise origin of the term, [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") notes that, while [Hans Peter Luhn](https://en.wikipedia.org/wiki/Hans_Peter_Luhn "Hans Peter Luhn") of [IBM](https://en.wikipedia.org/wiki/IBM "IBM") appears to have been the first to use the concept of a hash function in a memo dated January 1953, the term itself did not appear in published literature until the late 1960s, in Herbert Hellerman's *Digital Computer System Principles*, even though it was already widespread jargon by then.[\[26\]](https://en.wikipedia.org/wiki/Hash_function#cite_note-knuth-2000-31): 547â548
[](https://en.wikipedia.org/wiki/File:Wiktionary-logo-en-v2.svg)
Look up ***[hash](https://en.wiktionary.org/wiki/hash "wiktionary:hash")*** in Wiktionary, the free dictionary.
- [List of hash functions](https://en.wikipedia.org/wiki/List_of_hash_functions "List of hash functions")
- [Nearest neighbor search](https://en.wikipedia.org/wiki/Nearest_neighbor_search "Nearest neighbor search")
- [Distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table "Distributed hash table")
- [Identicon](https://en.wikipedia.org/wiki/Identicon "Identicon")
- [Low-discrepancy sequence](https://en.wikipedia.org/wiki/Low-discrepancy_sequence "Low-discrepancy sequence")
- [Transposition table](https://en.wikipedia.org/wiki/Transposition_table "Transposition table")
1. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-4)** This is useful in cases where keys are devised by a malicious agent, for example in pursuit of a DOS attack.
2. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-12)** Plain [ASCII](https://en.wikipedia.org/wiki/ASCII "ASCII") is a 7-bit character encoding, although it is often stored in 8-bit bytes with the highest-order bit always clear (zero). Therefore, for plain ASCII, the bytes have only 27 = 128 valid values, and the character translation table has only this many entries.
3. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-13)** For example, for n=15, k=4, t=6,  \[Knuth\]
4. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-14)** Knuth conveniently leaves the proof of this to the reader.
5. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-17)** Unisys large systems.
1. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-1)**
Aggarwal, Kirti; Verma, Harsh K. (March 19, 2015). *Hash\_RC6 â Variable length Hash algorithm using RC6*. 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICACEA.2015.7164747](https://doi.org/10.1109%2FICACEA.2015.7164747).
2. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-2)**
- ["hash digest"](https://csrc.nist.gov/glossary/term/hash_digest). *Computer Security Resource Center - Glossary*. [NIST](https://en.wikipedia.org/wiki/NIST "NIST").
- ["message digest"](https://csrc.nist.gov/glossary/term/message_digest). *Computer Security Resource Center - Glossary*. [NIST](https://en.wikipedia.org/wiki/NIST "NIST").
3. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-1) [***c***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-2) [***d***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-1973_3-3)
[Knuth, Donald E.](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") (1973). *The Art of Computer Programming, Vol. 3, Sorting and Searching*. Reading, MA., United States: [Addison-Wesley](https://en.wikipedia.org/wiki/Addison-Wesley "Addison-Wesley"). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1973acp..book.....K](https://ui.adsabs.harvard.edu/abs/1973acp..book.....K). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0-201-03803-3](https://en.wikipedia.org/wiki/Special:BookSources/978-0-201-03803-3 "Special:BookSources/978-0-201-03803-3")
.
4. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-5)**
Stokes, Jon (2002-07-08). ["Understanding CPU caching and performance"](https://arstechnica.com/gadgets/reviews/2002/07/caching.ars). *Ars Technica*. Retrieved 2022-02-06.
5. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-handbook_of_applied_cryptography_6-0)**
Menezes, Alfred J.; van Oorschot, Paul C.; Vanstone, Scott A (1996). [*Handbook of Applied Cryptography*](https://archive.org/details/handbookofapplie0000mene). CRC Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0849385230](https://en.wikipedia.org/wiki/Special:BookSources/978-0849385230 "Special:BookSources/978-0849385230")
.
6. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-7)**
Castro, Julio Cesar Hernandez; et al. (3 February 2005). "The strict avalanche criterion randomness test". *Mathematics and Computers in Simulation*. **68** (1). [Elsevier](https://en.wikipedia.org/wiki/Elsevier "Elsevier"): 1â7\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.matcom.2004.09.001](https://doi.org/10.1016%2Fj.matcom.2004.09.001). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [18086276](https://api.semanticscholar.org/CorpusID:18086276).
7. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-fibonacci-hashing_8-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-fibonacci-hashing_8-1)
Sharupke, Malte (16 June 2018). ["Fibonacci Hashing: The Optimization that the World Forgot"](https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/). *Probably Dance*.
8. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-9)**
Wagner, Urs; Lugrin, Thomas (2023), Mulder, Valentin; Mermoud, Alain; Lenders, Vincent; Tellenbach, Bernhard (eds.), "Hash Functions", *Trends in Data Protection and Encryption Technologies*, Cham: Springer Nature Switzerland, pp. 21â24, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-031-33386-6\_5](https://doi.org/10.1007%2F978-3-031-33386-6_5), [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-031-33386-6](https://en.wikipedia.org/wiki/Special:BookSources/978-3-031-33386-6 "Special:BookSources/978-3-031-33386-6")
`{{citation}}`: CS1 maint: work parameter with ISBN ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_work_parameter_with_ISBN "Category:CS1 maint: work parameter with ISBN"))
9. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-10)**
["3. Data model â Python 3.6.1 documentation"](https://docs.python.org/3/reference/datamodel.html#object.__hash__). *docs.python.org*. Retrieved 2017-03-24.
10. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-algorithms_in_java_11-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-algorithms_in_java_11-1)
Sedgewick, Robert (2002). "14. Hashing". *Algorithms in Java* (3 ed.). Addison Wesley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0201361209](https://en.wikipedia.org/wiki/Special:BookSources/978-0201361209 "Special:BookSources/978-0201361209")
.
11. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-15)**
Dolev, Shlomi; Lahiani, Limor; Haviv, Yinnon (2013). ["Unique permutation hashing"](https://doi.org/10.1016%2Fj.tcs.2012.12.047). *Theoretical Computer Science*. **475**: 59â65\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.tcs.2012.12.047](https://doi.org/10.1016%2Fj.tcs.2012.12.047).
12. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-16)**
["CS 3110 Lecture 21: Hash functions"](https://www.cs.cornell.edu/courses/cs3110/2008fa/lectures/lec21.html). Section "Multiplicative hashing".
13. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-18)**
[Zobrist, Albert L.](https://en.wikipedia.org/wiki/Albert_Lindsey_Zobrist "Albert Lindsey Zobrist") (April 1970), [*A New Hashing Method with Application for Game Playing*](https://www.cs.wisc.edu/techreports/1970/TR88.pdf) (PDF), Tech. Rep. 88, Madison, Wisconsin: Computer Sciences Department, University of Wisconsin
.
14. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-19)**
[Aho, A.](https://en.wikipedia.org/wiki/Alfred_Aho "Alfred Aho"); [Sethi, R.](https://en.wikipedia.org/wiki/Ravi_Sethi "Ravi Sethi"); [Ullman, J. D.](https://en.wikipedia.org/wiki/Jeffrey_Ullman "Jeffrey Ullman") (1986). *Compilers: Principles, Techniques and Tools*. Reading, MA: [Addison-Wesley](https://en.wikipedia.org/wiki/Addison-Wesley "Addison-Wesley"). p. 435. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[0-201-10088-6](https://en.wikipedia.org/wiki/Special:BookSources/0-201-10088-6 "Special:BookSources/0-201-10088-6")
.
15. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-20)**
Ramakrishna, M. V.; Zobel, Justin (1997). ["Performance in Practice of String Hashing Functions"](https://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.18.7520&rep=rep1&type=pdf). *Database Systems for Advanced Applications '97*. DASFAA 1997. pp. 215â224\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.18.7520](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.7520). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1142/9789812819536\_0023](https://doi.org/10.1142%2F9789812819536_0023). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[981-02-3107-5](https://en.wikipedia.org/wiki/Special:BookSources/981-02-3107-5 "Special:BookSources/981-02-3107-5")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [8250194](https://api.semanticscholar.org/CorpusID:8250194). Retrieved 2021-12-06.
16. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-21)**
Singh, N. B. [*A Handbook of Algorithms*](https://books.google.com/books?id=ALIMEQAAQBAJ&dq=rolling+hash&pg=PT102). N.B. Singh.
17. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_NIST.SP.800-168_22-0)**
Breitinger, Frank (May 2014). ["NIST Special Publication 800-168"](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-168.pdf) (PDF). *NIST Publications*. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.6028/NIST.SP.800-168](https://doi.org/10.6028%2FNIST.SP.800-168). Retrieved January 11, 2023.
18. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_Beyond_Precision_and_Recall:_Understanding_Uses_\(and_Misuses\)_of_Similarity_Hashes_in_Binary_Analysis_23-0)**
Pagani, Fabio; Dell'Amico, Matteo; Balzarotti, Davide (2018-03-13). ["Beyond Precision and Recall"](https://pagabuc.me/docs/codaspy18_pagani.pdf) (PDF). *Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy*. New York, NY, USA: ACM. pp. 354â365\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/3176258.3176306](https://doi.org/10.1145%2F3176258.3176306). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[9781450356329](https://en.wikipedia.org/wiki/Special:BookSources/9781450356329 "Special:BookSources/9781450356329")
. Retrieved December 12, 2022.
19. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_Forensic_Malware_Analysis:_The_Value_of_Fuzzy_Hashing_Algorithms_in_Identifying_Similarities_24-0)**
Sarantinos, Nikolaos; BenzaĂŻd, Chafika; Arabiat, Omar (2016). ["Forensic Malware Analysis: The Value of Fuzzy Hashing Algorithms in Identifying Similarities"](https://ieeexplore.ieee.org/document/7847157). [*2016 IEEE Trustcom/BigDataSE/ISPA*](http://roar.uel.ac.uk/5710/1/Forensic%20Malware%20Analysis.pdf) (PDF). pp. 1782â1787\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TrustCom.2016.0274](https://doi.org/10.1109%2FTrustCom.2016.0274). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-5090-3205-1](https://en.wikipedia.org/wiki/Special:BookSources/978-1-5090-3205-1 "Special:BookSources/978-1-5090-3205-1")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [32568938](https://api.semanticscholar.org/CorpusID:32568938). 10.1109/TrustCom.2016.0274.
20. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_ssdeep_25-0)**
Kornblum, Jesse (2006). ["Identifying almost identical files using context triggered piecewise hashing"](https://doi.org/10.1016%2Fj.diin.2006.06.015). *Digital Investigation*. 3, Supplement (September 2006): 91â97\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.diin.2006.06.015](https://doi.org/10.1016%2Fj.diin.2006.06.015).
21. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Fuzzy_hashing_tlsh_26-0)**
Oliver, Jonathan; Cheng, Chun; Chen, Yanggui (2013). ["TLSH -- A Locality Sensitive Hash"](https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf) (PDF). *2013 Fourth Cybercrime and Trustworthy Computing Workshop*. IEEE. pp. 7â13\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ctc.2013.9](https://doi.org/10.1109%2Fctc.2013.9). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4799-3076-0](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-3076-0 "Special:BookSources/978-1-4799-3076-0")
. Retrieved December 12, 2022.
22. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Perceptual_hashing_buldas13_27-0)**
Buldas, Ahto; Kroonmaa, Andres; Laanoja, Risto (2013). "Keyless Signatures' Infrastructure: How to Build Global Distributed Hash-Trees". In Riis, Nielson H.; Gollmann, D. (eds.). *Secure IT Systems. NordSec 2013*. Lecture Notes in Computer Science. Vol. 8208. Berlin, Heidelberg: Springer. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-642-41488-6\_21](https://doi.org/10.1007%2F978-3-642-41488-6_21). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-642-41487-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-642-41487-9 "Special:BookSources/978-3-642-41487-9")
. "Keyless Signatures Infrastructure (KSI) is a globally distributed system for providing time-stamping and server-supported digital signature services. Global per-second hash trees are created and their root hash values published. We discuss some service quality issues that arise in practical implementation of the service and present solutions for avoiding single points of failure and guaranteeing a service with reasonable and stable delay. Guardtime AS has been operating a KSI Infrastructure for 5 years. We summarize how the KSI Infrastructure is built, and the lessons learned during the operational period of the service."
23. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-Perceptual_hashing_klinger_28-0)**
Klinger, Evan; Starkweather, David. ["pHash.org: Home of pHash, the open source perceptual hash library"](http://www.phash.org/). *pHash.org*. Retrieved 2018-07-05. "pHash is an open source software library released under the GPLv3 license that implements several perceptual hashing algorithms, and provides a C-like API to use those functions in your own programs. pHash itself is written in C++."
24. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-29)**
[Knuth, Donald E.](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") (1975). *The Art of Computer Programming, Vol. 3, Sorting and Searching*. Reading, MA: [Addison-Wesley](https://en.wikipedia.org/wiki/Addison-Wesley "Addison-Wesley"). p. 540.
25. **[^](https://en.wikipedia.org/wiki/Hash_function#cite_ref-30)**
Gonnet, G. (1978). *Expected Length of the Longest Probe Sequence in Hash Code Searching* (Technical report). Ontario, Canada: [University of Waterloo](https://en.wikipedia.org/wiki/University_of_Waterloo "University of Waterloo"). CS-RR-78-46.
26. ^ [***a***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-2000_31-0) [***b***](https://en.wikipedia.org/wiki/Hash_function#cite_ref-knuth-2000_31-1)
[Knuth, Donald E.](https://en.wikipedia.org/wiki/Donald_Knuth "Donald Knuth") (2000). *The Art of Computer Programming, Vol. 3, Sorting and Searching* (2. ed., 6. printing, newly updated and rev. ed.). Boston \[u.a.\]: Addison-Wesley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0-201-89685-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-201-89685-5 "Special:BookSources/978-0-201-89685-5")
.
[](https://en.wikipedia.org/wiki/File:Wiktionary-logo-en-v2.svg)
Look up ***[hash](https://en.wiktionary.org/wiki/hash "wiktionary:hash")*** in Wiktionary, the free dictionary.
- [The Goulburn Hashing Function](http://www.sinfocol.org/archivos/2009/11/Goulburn06.pdf) ([PDF](https://en.wikipedia.org/wiki/Portable_Document_Format "Portable Document Format")) by Mayur Patel
- [Hash Function Construction for Textual and Geometrical Data Retrieval](https://dspace5.zcu.cz/bitstream/11025/11784/1/Skala_2010_Corfu-NAUN-Hash.pdf) ([PDF](https://en.wikipedia.org/wiki/Portable_Document_Format "Portable Document Format")) Latest Trends on Computers, Vol.2, pp. 483â489, CSCC Conference, Corfu, 2010 |
| Shard | 152 (laksa) |
| Root Hash | 17790707453426894952 |
| Unparsed URL | org,wikipedia!en,/wiki/Hash_function s443 |