ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.6 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus |
| Last Crawled | 2026-04-01 10:46:33 (17 days ago) |
| First Indexed | 2015-09-17 23:41:50 (10 years ago) |
| HTTP Status Code | 200 |
| Meta Title | language agnostic - Why should hash functions use a prime number modulus? - Stack Overflow |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Usually a simple hash function works by taking the "component parts" of the input (characters in the case of a string), and multiplying them by the powers of some constant, and adding them together in some integer type. So for example a typical (although not especially good) hash of a string might be: (first char) + k * (second char) + k^2 * (third char) + ...
Then if a bunch of strings all having the same first char are fed in, then the results will all be the same modulo k, at least until the integer type overflows. [As an example, Java's string hashCode is eerily similar to this - it does the characters reverse order, with k=31. So you get striking relationships modulo 31 between strings that end the same way, and striking relationships modulo 2^32 between strings that are the same except near the end. This doesn't seriously mess up hashtable behaviour.] A hashtable works by taking the modulus of the hash over the number of buckets. It's important in a hashtable not to produce collisions for likely cases, since collisions reduce the efficiency of the hashtable. Now, suppose someone puts a whole bunch of values into a hashtable that have some relationship between the items, like all having the same first character. This is a fairly predictable usage pattern, I'd say, so we don't want it to produce too many collisions. It turns out that "because of the nature of maths", if the constant used in the hash, and the number of buckets, are coprime , then collisions are minimised in some common cases. If they are not coprime , then there are some fairly simple relationships between inputs for which collisions are not minimised. All the hashes come out equal modulo the common factor, which means they'll all fall into the 1/n th of the buckets which have that value modulo the common factor. You get n times as many collisions, where n is the common factor. Since n is at least 2, I'd say it's unacceptable for a fairly simple use case to generate at least twice as many collisions as normal. If some user is going to break our distribution into buckets, we want it to be a freak accident, not some simple predictable usage. Now, hashtable implementations obviously have no control over the items put into them. They can't prevent them being related. So the thing to do is to ensure that the constant and the bucket counts are coprime. That way you aren't relying on the "last" component alone to determine the modulus of the bucket with respect to some small common factor. As far as I know they don't have to be prime to achieve this, just coprime. But if the hash function and the hashtable are written independently, then the hashtable doesn't know how the hash function works. It might be using a constant with small factors. If you're lucky it might work completely differently and be nonlinear. If the hash is good enough, then any bucket count is just fine. But a paranoid hashtable can't assume a good hash function, so should use a prime number of buckets. Similarly a paranoid hash function should use a largeish prime constant, to reduce the chance that someone uses a number of buckets which happens to have a common factor with the constant. In practice, I think it's fairly normal to use a power of 2 as the number of buckets. This is convenient and saves having to search around or pre-select a prime number of the right magnitude. So you rely on the hash function not to use even multipliers, which is generally a safe assumption. But you can still get occasional bad hashing behaviours based on hash functions like the one above, and prime bucket count could help further. Putting about the principle that "everything has to be prime" is as far as I know a sufficient but not a necessary condition for good distribution over hashtables. It allows everybody to interoperate without needing to assume that the others have followed the same rule. [Edit: there's another, more specialized reason to use a prime number of buckets, which is if you handle collisions with linear probing. Then you calculate a stride from the hashcode, and if that stride comes out to be a factor of the bucket count then you can only do (bucket_count / stride) probes before you're back where you started. The case you most want to avoid is stride = 0, of course, which must be special-cased, but to avoid also special-casing bucket_count / stride equal to a small integer, you can just make the bucket_count prime and not care what the stride is provided it isn't 0.] |
| Markdown | # 
By clicking “Sign up”, you agree to our [terms of service](https://stackoverflow.com/legal/terms-of-service/public) and acknowledge you have read our [privacy policy](https://stackoverflow.com/legal/privacy-policy).
# OR
Already have an account? [Log in](https://stackoverflow.com/users/login)
[Skip to main content](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#content)
[Stack Overflow](https://stackoverflow.com/)
1. [About](https://stackoverflow.co/)
2. Products
3. [For Teams](https://stackoverflow.co/teams/)
1. [Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers](https://stackoverflow.co/teams/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=top-nav&utm_content=stack-overflow-for-teams)
2. [Advertising Reach devs & technologists worldwide about your product, service or employer brand](https://stackoverflow.co/advertising/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=top-nav&utm_content=stack-overflow-advertising)
3. [Knowledge Solutions Data licensing offering for businesses to build and improve AI tools and models](https://stackoverflow.co/api-solutions/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=top-nav&utm_content=overflow-api)
4. [Labs The future of collective knowledge sharing](https://stackoverflow.co/labs/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=top-nav&utm_content=labs)
5. [About the company](https://stackoverflow.co/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=top-nav&utm_content=about-the-company) [Visit the blog](https://stackoverflow.blog/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=top-nav&utm_content=blog)
1. ### [current community](https://stackoverflow.com/)
- [Stack Overflow](https://stackoverflow.com/)
[help](https://stackoverflow.com/help) [chat](https://chat.stackoverflow.com/?tab=explore)
- [Meta Stack Overflow](https://meta.stackoverflow.com/)
### your communities
[Sign up](https://stackoverflow.com/users/signup?ssrc=site_switcher&returnurl=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F1145217%2Fwhy-should-hash-functions-use-a-prime-number-modulus) or [log in](https://stackoverflow.com/users/login?ssrc=site_switcher&returnurl=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F1145217%2Fwhy-should-hash-functions-use-a-prime-number-modulus) to customize your list.
### [more stack exchange communities](https://stackexchange.com/sites)
[company blog](https://stackoverflow.blog/)
2. [Log in](https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F1145217%2Fwhy-should-hash-functions-use-a-prime-number-modulus)
3. [Sign up](https://stackoverflow.com/users/signup?ssrc=head&returnurl=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F1145217%2Fwhy-should-hash-functions-use-a-prime-number-modulus)
# Let's set up your homepage Select a few topics you're interested in:
python
javascript
c\#
reactjs
java
android
html
flutter
c++
node.js
typescript
css
r
php
angular
next.js
spring-boot
machine-learning
sql
excel
ios
azure
docker
Or search from our full list:
- javascript
- python
- java
- c\#
- php
- android
- html
- jquery
- c++
- css
- ios
- sql
- mysql
- r
- reactjs
- node.js
- arrays
- c
- asp.net
- json
- python-3.x
- .net
- ruby-on-rails
- sql-server
- swift
- django
- angular
- objective-c
- excel
- pandas
- angularjs
- regex
- typescript
- ruby
- linux
- ajax
- iphone
- vba
- xml
- laravel
- spring
- asp.net-mvc
- database
- wordpress
- string
- flutter
- postgresql
- mongodb
- wpf
- windows
- xcode
- amazon-web-services
- bash
- git
- oracle-database
- spring-boot
- dataframe
- azure
- firebase
- list
- multithreading
- docker
- vb.net
- react-native
- eclipse
- algorithm
- powershell
- macos
- visual-studio
- numpy
- image
- forms
- scala
- function
- vue.js
- performance
- twitter-bootstrap
- selenium
- winforms
- kotlin
- loops
- express
- dart
- hibernate
- sqlite
- matlab
- python-2.7
- shell
- rest
- apache
- entity-framework
- android-studio
- csv
- maven
- linq
- qt
- dictionary
- unit-testing
- asp.net-core
- facebook
- apache-spark
- tensorflow
- file
- swing
- class
- unity-game-engine
- sorting
- date
- authentication
- go
- symfony
- t-sql
- opencv
- matplotlib
- .htaccess
- google-chrome
- for-loop
- datetime
- codeigniter
- perl
- http
- validation
- sockets
- google-maps
- object
- uitableview
- xaml
- oop
- visual-studio-code
- if-statement
- cordova
- ubuntu
- web-services
- email
- android-layout
- github
- spring-mvc
- elasticsearch
- kubernetes
- selenium-webdriver
- ms-access
- ggplot2
- user-interface
- parsing
- pointers
- c++11
- google-sheets
- security
- machine-learning
- google-apps-script
- ruby-on-rails-3
- templates
- flask
- nginx
- variables
- exception
- sql-server-2008
- gradle
- debugging
- tkinter
- delphi
- listview
- jpa
- asynchronous
- web-scraping
- haskell
- pdf
- jsp
- ssl
- amazon-s3
- google-cloud-platform
- jenkins
- testing
- xamarin
- wcf
- batch-file
- generics
- npm
- ionic-framework
- network-programming
- unix
- recursion
- google-app-engine
- mongoose
- visual-studio-2010
- .net-core
- android-fragments
- assembly
- animation
- math
- svg
- session
- rust
- intellij-idea
- hadoop
- curl
- join
- next.js
- winapi
- django-models
- laravel-5
- url
- heroku
- http-redirect
- tomcat
- google-cloud-firestore
- inheritance
- webpack
- image-processing
- gcc
- keras
- swiftui
- asp.net-mvc-4
- logging
- dom
- matrix
- pyspark
- actionscript-3
- button
- post
- optimization
- firebase-realtime-database
- web
- jquery-ui
- cocoa
- xpath
- iis
- d3.js
- javafx
- firefox
- xslt
- internet-explorer
- caching
- select
- asp.net-mvc-3
- opengl
- events
- asp.net-web-api
- plot
- dplyr
- encryption
- magento
- stored-procedures
- search
- amazon-ec2
- ruby-on-rails-4
- memory
- canvas
- audio
- multidimensional-array
- random
- jsf
- vector
- redux
- cookies
- input
- facebook-graph-api
- flash
- indexing
- xamarin.forms
- arraylist
- ipad
- cocoa-touch
- data-structures
- video
- azure-devops
- model-view-controller
- apache-kafka
- serialization
- jdbc
- woocommerce
- razor
- routes
- awk
- servlets
- mod-rewrite
- excel-formula
- beautifulsoup
- filter
- docker-compose
- iframe
- aws-lambda
- design-patterns
- text
- visual-c++
- django-rest-framework
- cakephp
- mobile
- android-intent
- struct
- react-hooks
- methods
- groovy
- mvvm
- ssh
- lambda
- checkbox
- time
- ecmascript-6
- grails
- google-chrome-extension
- installation
- cmake
- sharepoint
- shiny
- spring-security
- jakarta-ee
- plsql
- android-recyclerview
- core-data
- types
- sed
- meteor
- android-activity
- activerecord
- bootstrap-4
- websocket
- graph
- replace
- group-by
- scikit-learn
- vim
- file-upload
- junit
- boost
- memory-management
- sass
- async-await
- import
- deep-learning
- error-handling
- eloquent
- dynamic
- soap
- dependency-injection
- silverlight
- layout
- apache-spark-sql
- charts
- deployment
- browser
- gridview
- svn
- while-loop
- google-bigquery
- vuejs2
- highcharts
- dll
- ffmpeg
- view
- foreach
- makefile
- plugins
- redis
- c\#-4.0
- reporting-services
- jupyter-notebook
- unicode
- merge
- reflection
- https
- server
- google-maps-api-3
- twitter
- oauth-2.0
- extjs
- terminal
- axios
- pip
- split
- cmd
- pytorch
- encoding
- django-views
- collections
- database-design
- hash
- netbeans
- automation
- data-binding
- ember.js
- build
- tcp
- pdo
- sqlalchemy
- apache-flex
- mysqli
- entity-framework-core
- concurrency
- command-line
- spring-data-jpa
- printing
- react-redux
- java-8
- lua
- html-table
- ansible
- neo4j
- jestjs
- service
- parameters
- material-ui
- enums
- flexbox
- module
- promise
- visual-studio-2012
- outlook
- firebase-authentication
- web-applications
- webview
- uwp
- jquery-mobile
- utf-8
- datatable
- python-requests
- parallel-processing
- colors
- drop-down-menu
- scipy
- scroll
- tfs
- hive
- count
- syntax
- ms-word
- twitter-bootstrap-3
- ssis
- fonts
- rxjs
- constructor
- google-analytics
- file-io
- three.js
- paypal
- powerbi
- graphql
- cassandra
- discord
- graphics
- compiler-errors
- gwt
- socket.io
- react-router
- solr
- backbone.js
- url-rewriting
- memory-leaks
- datatables
- nlp
- terraform
- oauth
- datagridview
- drupal
- zend-framework
- oracle11g
- knockout.js
- triggers
- neural-network
- interface
- django-forms
- angular-material
- casting
- jmeter
- google-api
- linked-list
- path
- timer
- arduino
- django-templates
- proxy
- orm
- directory
- windows-phone-7
- parse-platform
- visual-studio-2015
- cron
- conditional-statements
- push-notification
- functional-programming
- primefaces
- pagination
- model
- jar
- xamarin.android
- hyperlink
- uiview
- visual-studio-2013
- vbscript
- google-cloud-functions
- gitlab
- azure-active-directory
- jwt
- download
- swift3
- sql-server-2005
- configuration
- process
- rspec
- pygame
- properties
- combobox
- callback
- windows-phone-8
- linux-kernel
- safari
- scrapy
- permissions
- emacs
- clojure
- scripting
- x86
- raspberry-pi
- scope
- io
- expo
- azure-functions
- compilation
- responsive-design
- mongodb-query
- nhibernate
- angularjs-directive
- request
- bluetooth
- reference
- binding
- dns
- architecture
- 3d
- playframework
- pyqt
- version-control
- discord.js
- doctrine-orm
- package
- f\#
- rubygems
- get
- sql-server-2012
- autocomplete
- tree
- openssl
- datepicker
- kendo-ui
- jackson
- controller
- yii
- grep
- nested
- xamarin.ios
- static
- null
- statistics
- transactions
- active-directory
- datagrid
- dockerfile
- uiviewcontroller
- webforms
- discord.py
- sas
- phpmyadmin
- computer-vision
- notifications
- duplicates
- mocking
- pycharm
- youtube
- yaml
- nullpointerexception
- menu
- blazor
- sum
- plotly
- bitmap
- asp.net-mvc-5
- visual-studio-2008
- yii2
- floating-point
- electron
- css-selectors
- stl
- jsf-2
- android-listview
- time-series
- cryptography
- ant
- hashmap
- character-encoding
- stream
- msbuild
- asp.net-core-mvc
- sdk
- google-drive-api
- jboss
- selenium-chromedriver
- joomla
- devise
- cors
- navigation
- anaconda
- cuda
- background
- frontend
- multiprocessing
- binary
- pyqt5
- camera
- iterator
- linq-to-sql
- mariadb
- onclick
- android-jetpack-compose
- ios7
- microsoft-graph-api
- rabbitmq
- android-asynctask
- tabs
- laravel-4
- amazon-dynamodb
- environment-variables
- insert
- uicollectionview
- linker
- xsd
- coldfusion
- console
- continuous-integration
- upload
- textview
- ftp
- opengl-es
- macros
- operating-system
- mockito
- localization
- formatting
- xml-parsing
- vuejs3
- json.net
- type-conversion
- data.table
- kivy
- timestamp
- integer
- calendar
- segmentation-fault
- android-ndk
- prolog
- drag-and-drop
- char
- crash
- jasmine
- automated-tests
- dependencies
- geometry
- azure-pipelines
- android-gradle-plugin
- fortran
- itext
- sprite-kit
- mfc
- header
- attributes
- firebase-cloud-messaging
- nosql
- format
- nuxt.js
- odoo
- db2
- jquery-plugins
- event-handling
- jenkins-pipeline
- julia
- leaflet
- nestjs
- annotations
- flutter-layout
- keyboard
- postman
- textbox
- arm
- visual-studio-2017
- stripe-payments
- gulp
- libgdx
- synchronization
- timezone
- uikit
- azure-web-app-service
- dom-events
- xampp
- wso2
- crystal-reports
- swagger
- namespaces
- aggregation-framework
- android-emulator
- uiscrollview
- google-sheets-formula
- jvm
- sequelize.js
- chart.js
- com
- snowflake-cloud-data-platform
- subprocess
- geolocation
- webdriver
- html5-canvas
- garbage-collection
- centos
- dialog
- sql-update
- widget
- numbers
- concatenation
- qml
- tuples
- set
- java-stream
- mapreduce
- smtp
- ionic2
- windows-10
- android-edittext
- rotation
- modal-dialog
- spring-data
- nuget
- radio-button
- http-headers
- doctrine
- grid
- sonarqube
- lucene
- xmlhttprequest
- listbox
- switch-statement
- initialization
- internationalization
- components
- boolean
- apache-camel
- google-play
- serial-port
- gdb
- ios5
- ldap
- return
- youtube-api
- pivot
- eclipse-plugin
- latex
- frameworks
- tags
- containers
- github-actions
- c++17
- subquery
- dataset
- asp-classic
- foreign-keys
- embedded
- label
- uinavigationcontroller
- copy
- delegates
- struts2
- google-cloud-storage
- migration
- protractor
- base64
- queue
- find
- uibutton
- sql-server-2008-r2
- arguments
- composer-php
- append
- jaxb
- stack
- tailwind-css
- zip
- cucumber
- autolayout
- ide
- entity-framework-6
- iteration
- popup
- r-markdown
- windows-7
- airflow
- vb6
- g++
- ssl-certificate
- hover
- clang
- jqgrid
- range
- gmail
Next
You’ll be prompted to create an account to view your personalized homepage.
1. 1. [Home](https://stackoverflow.com/)
2. [Questions](https://stackoverflow.com/questions)
3. [AI Assist Labs](https://stackoverflow.ai/)
4. [Tags](https://stackoverflow.com/tags)
5. [Challenges](https://stackoverflow.com/beta/challenges)
6. [Chat](https://chat.stackoverflow.com/rooms/259507/stack-overflow-lobby)
7. [Articles](https://stackoverflow.blog/contributed?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=so-blog&utm_content=experiment-articles)
8. [Users](https://stackoverflow.com/users)
9. [Companies](https://stackoverflow.com/jobs/companies?so_medium=stackoverflow&so_source=SiteNav)
10. [Collectives]()
11. Communities for your favorite technologies. [Explore all Collectives](https://stackoverflow.com/collectives-all)
2. Teams

Ask questions, find answers and collaborate at work with Stack Overflow for Teams.
[Try Teams for free](https://stackoverflowteams.com/teams/create/free/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=side-bar&utm_content=explore-teams) [Explore Teams](https://stackoverflow.co/teams/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=side-bar&utm_content=explore-teams)
3. [Teams]()
4. Ask questions, find answers and collaborate at work with Stack Overflow for Teams. [Explore Teams](https://stackoverflow.co/teams/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=side-bar&utm_content=explore-teams-compact)
##### Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
[Learn more about Collectives](https://stackoverflow.com/collectives)
**Teams**
Q\&A for work
Connect and share knowledge within a single location that is structured and easy to search.
[Learn more about Teams](https://stackoverflow.co/teams/)
# 
# Hang on, you can't upvote just yet.
You'll need to complete a few actions and gain 15 reputation points before being able to upvote. **Upvoting** indicates when questions and answers are useful. [What's reputation and how do I get it?](https://stackoverflow.com/help/whats-reputation)
Instead, you can save this post to reference later.
Save this post for later
Not now
# 
# Thanks for your vote! You now have 5 free votes weekly.
Free votes
- count toward the total vote score
- does not give reputation to the author
Continue to help good content that is interesting, well-researched, and useful, rise to the top! To gain full voting privileges, [earn reputation](https://stackoverflow.com/help/whats-reputation).
Got it\!
Go to help center to learn more
# [Why should hash functions use a prime number modulus?](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus)
[Ask Question](https://stackoverflow.com/questions/ask)
Asked
16 years, 3 months ago
Modified [3 years, 5 months ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus?lastactivity "2022-05-03 10:03:19Z")
Viewed 156k times
This question shows research effort; it is useful and clear
469
Save this question.
Show activity on this post.
A long time ago, I bought a data structures book off the bargain table for \$1.25. In it, the explanation for a hashing function said that it should ultimately mod by a prime number because of "the nature of math".
What do you expect from a \$1.25 book?
Anyway, I've had years to think about the nature of math, and still can't figure it out.
**Is the distribution of numbers truly more even when there are a prime number of buckets?**
Or is this an old programmer's tale that everyone accepts because everybody *else* accepts it?
- [language-agnostic](https://stackoverflow.com/questions/tagged/language-agnostic "show questions tagged 'language-agnostic'")
- [data-structures](https://stackoverflow.com/questions/tagged/data-structures "show questions tagged 'data-structures'")
- [hash](https://stackoverflow.com/questions/tagged/hash "show questions tagged 'hash'")
[Share](https://stackoverflow.com/q/1145217 "Short permalink to this question")
Share a link to this question
Copy link
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/ "The current license for this post: CC BY-SA 4.0")
[Improve this question](https://stackoverflow.com/posts/1145217/edit)
Follow
Follow this question to receive notifications
[edited Jul 3, 2020 at 23:32](https://stackoverflow.com/posts/1145217/revisions "show all edits to this post")
[](https://stackoverflow.com/users/1357094/cellepo)
[cellepo](https://stackoverflow.com/users/1357094/cellepo)
4,54944 gold badges4343 silver badges6565 bronze badges
asked Jul 17, 2009 at 19:30
[](https://stackoverflow.com/users/2167252/theschmitzer)
[theschmitzer](https://stackoverflow.com/users/2167252/theschmitzer)theschmitzer
13k1111 gold badges4343 silver badges4949 bronze badges
8
- 5
Perfectly reasonable question: Why should there be a prime number of buckets?
Draemon
– [Draemon](https://stackoverflow.com/users/26334/draemon "34,847 reputation")
2009-07-17 19:35:50 +00:00
[Commented Jul 17, 2009 at 19:35](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment963383_1145217)
- 3
This question appears to be off-topic because it more than likely belongs on [Computer Science](https://cs.stackexchange.com/).
Lightness Races in Orbit
– [Lightness Races in Orbit](https://stackoverflow.com/users/560648/lightness-races-in-orbit "386,598 reputation")
2014-06-06 17:00:26 +00:00
[Commented Jun 6, 2014 at 17:00](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment37148903_1145217)
- 4
[cs.stackexchange.com/a/64191/64222](http://cs.stackexchange.com/a/64191/64222) another well argued explanation.
Volodymyr Boyko
– [Volodymyr Boyko](https://stackoverflow.com/users/4625005/volodymyr-boyko "1,581 reputation")
2017-03-17 19:14:30 +00:00
[Commented Mar 17, 2017 at 19:14](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment72837228_1145217)
- 1
related: [Why is it best to use a prime number as a mod in a hashing function?](https://cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function/64191) and [Why does Java's hashCode() in String use 31 as a multiplier?](http://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier/299748) and [this answer](http://stackoverflow.com/questions/21436334/choosing-radix-and-modulus-prime-in-rabin-karp-rolling-hash#21436682)
bain
– [bain](https://stackoverflow.com/users/1727288/bain "2,152 reputation")
2017-04-23 10:56:14 +00:00
[Commented Apr 23, 2017 at 10:56](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment74190138_1145217)
- Here's another great explanation to a somewhat related question with some startling evidentiary numbers - [quora.com/…](https://www.quora.com/Does-making-array-size-a-prime-number-help-in-hash-table-implementation-Why)
Anjan Biswas
– [Anjan Biswas](https://stackoverflow.com/users/1137672/anjan-biswas "8,012 reputation")
2020-04-06 07:15:18 +00:00
[Commented Apr 6, 2020 at 7:15](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment108013422_1145217)
\| [Show **3** more comments](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus "Expand to show all comments on this post")
## 17 Answers 17
Sorted by:
[Reset to default](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus?answertab=scoredesc#tab-top)
This answer is useful
313
Save this answer.
Show activity on this post.
Usually a simple hash function works by taking the "component parts" of the input (characters in the case of a string), and multiplying them by the powers of some constant, and adding them together in some integer type. So for example a typical (although not especially good) hash of a string might be:
```
(first char) + k * (second char) + k^2 * (third char) + ...
```
Then if a bunch of strings all having the same first char are fed in, then the results will all be the same modulo k, at least until the integer type overflows.
\[As an example, Java's string hashCode is eerily similar to this - it does the characters reverse order, with k=31. So you get striking relationships modulo 31 between strings that end the same way, and striking relationships modulo 2^32 between strings that are the same except near the end. This doesn't seriously mess up hashtable behaviour.\]
A hashtable works by taking the modulus of the hash over the number of buckets.
It's important in a hashtable not to produce collisions for likely cases, since collisions reduce the efficiency of the hashtable.
Now, suppose someone puts a whole bunch of values into a hashtable that have some relationship between the items, like all having the same first character. This is a fairly predictable usage pattern, I'd say, so we don't want it to produce too many collisions.
It turns out that "because of the nature of maths", if the constant used in the hash, and the number of buckets, are [coprime](https://math.stackexchange.com/a/64015), then collisions are minimised in some common cases. If they are not [coprime](https://math.stackexchange.com/a/64015), then there are some fairly simple relationships between inputs for which collisions are not minimised. All the hashes come out equal modulo the common factor, which means they'll all fall into the 1/n th of the buckets which have that value modulo the common factor. You get n times as many collisions, where n is the common factor. Since n is at least 2, I'd say it's unacceptable for a fairly simple use case to generate at least twice as many collisions as normal. If some user is going to break our distribution into buckets, we want it to be a freak accident, not some simple predictable usage.
Now, hashtable implementations obviously have no control over the items put into them. They can't prevent them being related. So the thing to do is to ensure that the constant and the bucket counts are coprime. That way you aren't relying on the "last" component alone to determine the modulus of the bucket with respect to some small common factor. As far as I know they don't have to be prime to achieve this, just coprime.
But if the hash function and the hashtable are written independently, then the hashtable doesn't know how the hash function works. It might be using a constant with small factors. If you're lucky it might work completely differently and be nonlinear. If the hash is good enough, then any bucket count is just fine. But a paranoid hashtable can't assume a good hash function, so should use a prime number of buckets. Similarly a paranoid hash function should use a largeish prime constant, to reduce the chance that someone uses a number of buckets which happens to have a common factor with the constant.
In practice, I think it's fairly normal to use a power of 2 as the number of buckets. This is convenient and saves having to search around or pre-select a prime number of the right magnitude. So you rely on the hash function not to use even multipliers, which is generally a safe assumption. But you can still get occasional bad hashing behaviours based on hash functions like the one above, and prime bucket count could help further.
Putting about the principle that "everything has to be prime" is as far as I know a sufficient but not a necessary condition for good distribution over hashtables. It allows everybody to interoperate without needing to assume that the others have followed the same rule.
\[Edit: there's another, more specialized reason to use a prime number of buckets, which is if you handle collisions with linear probing. Then you calculate a stride from the hashcode, and if that stride comes out to be a factor of the bucket count then you can only do (bucket\_count / stride) probes before you're back where you started. The case you most want to avoid is stride = 0, of course, which must be special-cased, but to avoid also special-casing bucket\_count / stride equal to a small integer, you can just make the bucket\_count prime and not care what the stride is provided it isn't 0.\]
[Share](https://stackoverflow.com/a/1147232 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/1147232/edit)
Follow
Follow this answer to receive notifications
[edited Oct 2, 2017 at 15:21](https://stackoverflow.com/posts/1147232/revisions "show all edits to this post")
[](https://stackoverflow.com/users/1972597/tony)
[Tony](https://stackoverflow.com/users/1972597/tony)
2,4512626 silver badges3434 bronze badges
answered Jul 18, 2009 at 10:43
[](https://stackoverflow.com/users/13005/steve-jessop)
[Steve Jessop](https://stackoverflow.com/users/13005/steve-jessop)Steve Jessop
280k4040 gold badges471471 silver badges708708 bronze badges
Sign up to request clarification or add additional context in comments.
## 11 Comments
Add a comment
[](https://stackoverflow.com/users/21499/dr-hans-peter-st%C3%B6rr)
Dr. Hans-Peter Störr
[Dr. Hans-Peter Störr](https://stackoverflow.com/users/21499/dr-hans-peter-st%C3%B6rr)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment15976194_1147232)
Just as a side note: a discussion for a sensible choice of the factor k for hashCodes is here: [stackoverflow.com/q/1835976/21499](http://stackoverflow.com/q/1835976/21499)
2012-08-16T07:30:27.1Z+00:00
1
Reply
- Copy link
[](https://stackoverflow.com/users/1218599/ordinary)
ordinary
[ordinary](https://stackoverflow.com/users/1218599/ordinary)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment29804940_1147232)
this is an awesome answer. can you please explain this further "So you get striking relationships modulo 31 between strings that end the same way, and striking relationships modulo 2^32 between strings that are the same except near the end. This doesn't seriously mess up hashtable behaviour." I especially dont understand the 2^32 part
2013-11-16T07:33:43.55Z+00:00
11
Reply
- Copy link
[](https://stackoverflow.com/users/4226142/quark)
Quark
[Quark](https://stackoverflow.com/users/4226142/quark)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment50431925_1147232)
Additional note to make things more clear about this: "All the hashes come out equal modulo the common factor" -\> This is because, if you consider the example hash function hash = 1st char + 2nd char\*k + ... , and take strings with the same first character, hash%k will be the same for these strings. If M is the size of the hashtable and g is the gcd of M and k, then (hash%k)%g equals hash%g (since g divides k) and hence hash%g will also be the same for these strings. Now consider (hash%M)%g, this is equal to hash%g (since g divides M). So (hash%M)%g is equal for all these strings.
2015-07-04T00:02:51.47Z+00:00
6
Reply
- Copy link
[](https://stackoverflow.com/users/1727288/bain)
bain
[bain](https://stackoverflow.com/users/1727288/bain)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment74189766_1147232)
@DanielMcLaury Joshua Bloch [explained why](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4045622) for Java - it was recommended in two popular books (K\&R, Dragon book) and performed well with low collisions on the English dictionary. It is fast (uses [Horner's method](https://en.wikipedia.org/wiki/Horner%27s_method)). Apparently even K\&R don't remember where it came from. Similar function is [Rabin fingerprint](https://en.wikipedia.org/wiki/Rabin_fingerprint) from [Rabin-Karp algorithm](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm) (1981) but K\&R (1978) predates that.
2017-04-23T10:32:25.29Z+00:00
1
Reply
- Copy link
[](https://stackoverflow.com/users/3109248/khanna111)
Khanna111
[Khanna111](https://stackoverflow.com/users/3109248/khanna111)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment82757891_1147232)
@SteveJessop, please can you explain "striking relationships modulo 2^32 between strings that are the same except near the end."? Thanks.
2017-12-19T21:39:34.17Z+00:00
1
Reply
- Copy link
Add a comment
\|
Show 6 more comments
This answer is useful
41
Save this answer.
Show activity on this post.
Just to put down some thoughts gathered from the answers.
- Hashing uses modulus so any value can fit into a given range
- We want to randomize collisions
- Randomize collision meaning there are no patterns as how collisions would happen, or, changing a small part in input would result a completely different hash value
- To randomize collision, avoid using the base (`10` in decimal, `16` in hex) as modulus, because `11 % 10 -> 1`, `21 % 10 -> 1`, `31 % 10 -> 1`, it shows a clear pattern of hash value distribution: value with same last digits will collide
- Avoid using powers of base (`10^2`, `10^3`, `10^n`) as modulus because it also creates a pattern: value with same last `n` digits matters will collide
- Actually, avoid using any thing that has factors other than itself and `1`, because it creates a pattern: multiples of a factor will be hashed into selected values
- For example, `9` has `3` as factor, thus `3`, `6`, `9`, ...`999213` will always be hashed into `0`, `3`, `6`
- `12` has `3` and `2` as factor, thus `2n` will always be hashed into `0`, `2`, `4`, `6`, `8`, `10`, and `3n` will always be hashed into `0`, `3`, `6`, `9`
- This will be a problem if input is not evenly distributed, e.g. if many values are of `3n`, then we only get `1/3` of all possible hash values and collision is high
- So by using a prime as a modulus, the only pattern is that multiple of the modulus will always hash into `0`, otherwise hash values distributions are evenly spread
[Share](https://stackoverflow.com/a/70516602 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/ "The current license for this post: CC BY-SA 4.0")
[Improve this answer](https://stackoverflow.com/posts/70516602/edit)
Follow
Follow this answer to receive notifications
answered Dec 29, 2021 at 7:56
[](https://stackoverflow.com/users/902036/dz902)
[dz902](https://stackoverflow.com/users/902036/dz902)dz902
6,0034545 silver badges4848 bronze badges
## 2 Comments
Add a comment
[](https://stackoverflow.com/users/16733101/hamzah-al-qadasi)
Hamzah Al-Qadasi
[Hamzah Al-Qadasi](https://stackoverflow.com/users/16733101/hamzah-al-qadasi)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment128221941_70516602)
I only understand your explanation very well.
2022-06-11T19:09:48.04Z+00:00
5
Reply
- Copy link
[](https://stackoverflow.com/users/2143275/richie-thomas)
Richie Thomas
[Richie Thomas](https://stackoverflow.com/users/2143275/richie-thomas)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment136855027_70516602)
What a great explanation.
2023-12-08T18:00:33.333Z+00:00
0
Reply
- Copy link
This answer is useful
39
Save this answer.
Show activity on this post.
The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table\_length. Here are 2 'statements' that you most probably have read somewhere
1. If you use a power of 2 for table\_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good, you will definitely suffer from clustering of many keys in a few hash buckets.
2. But if you use prime numbers for table\_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.
And here is the proof.
If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table\_length/GreatestCommonFactor(table\_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering
Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries. Or simply make m equal to the table\_length by making GreatestCommonFactor(table\_length, x) equal to 1, i.e by making table\_length coprime with x. And if x can be just about any number then make sure that table\_length is a prime number.
From - <http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html>
[Share](https://stackoverflow.com/a/1464351 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 2.5](https://creativecommons.org/licenses/by-sa/2.5/ "The current license for this post: CC BY-SA 2.5")
[Improve this answer](https://stackoverflow.com/posts/1464351/edit)
Follow
Follow this answer to receive notifications
answered Sep 23, 2009 at 6:58
user177612user177612
## Comments
Add a comment
This answer is useful
16
Save this answer.
Show activity on this post.
<http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/>
Pretty clear explanation, with pictures too.
Edit: As a summary, primes are used because you have the best chance of obtaining a unique value when multiplying values by the prime number chosen and adding them all up. For example given a string, multiplying each letter value with the prime number and then adding those all up will give you its hash value.
A better question would be, why exactly the number 31?
[Share](https://stackoverflow.com/a/1145236 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 2.5](https://creativecommons.org/licenses/by-sa/2.5/ "The current license for this post: CC BY-SA 2.5")
[Improve this answer](https://stackoverflow.com/posts/1145236/edit)
Follow
Follow this answer to receive notifications
[edited Jul 17, 2009 at 19:40](https://stackoverflow.com/posts/1145236/revisions "show all edits to this post")
answered Jul 17, 2009 at 19:33
[](https://stackoverflow.com/users/88383/albertopl)
[AlbertoPL](https://stackoverflow.com/users/88383/albertopl)AlbertoPL
11\.5k55 gold badges5151 silver badges7373 bronze badges
## 8 Comments
Add a comment
[](https://stackoverflow.com/users/572/thomas-owens)
Thomas Owens
[Thomas Owens](https://stackoverflow.com/users/572/thomas-owens)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment963381_1145236)
Although, I think a summary would be helpful, in case that site is ever dead, some remnant of its content will be saved here on SO.
2009-07-17T19:35:44.83Z+00:00
6
Reply
- Copy link
[](https://stackoverflow.com/users/2167252/theschmitzer)
theschmitzer
[theschmitzer](https://stackoverflow.com/users/2167252/theschmitzer)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment963651_1145236)
The article does not explain why, but says "Researchers found that using a prime of 31 gives a better distribution to the keys, and lesser no of collisions. No one knows why..." Funny, asking the same question as me in effect.
2009-07-17T20:20:02.603Z+00:00
4
Reply
- Copy link
[](https://stackoverflow.com/users/125759/sgmoore)
sgmoore
[sgmoore](https://stackoverflow.com/users/125759/sgmoore)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment963662_1145236)
\> A better question would be, why exactly the number 31? If you mean why is the number 31 used, then the article you point tells you why, ie because it is quick to multiple by and cos tests show it is the best one to use. The other popular multiplier I have seen is 33 which lends weight to the theory that the speed issue was (at least initially) an important factor. If you mean, what is it about 31 that makes it better in the tests, then I'm afraid I don't know.
2009-07-17T20:21:54.597Z+00:00
0
Reply
- Copy link
[](https://stackoverflow.com/users/125759/sgmoore)
sgmoore
[sgmoore](https://stackoverflow.com/users/125759/sgmoore)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment963859_1145236)
Exactly, so the only reason that it could have been used as a multiplier was because it was easy to multiply by. (When I say I have seen 33 used as a multiplier, I don't mean recently, this was probably decades ago, and possible before a lot of analysis was done on hashing).
2009-07-17T21:02:29.897Z+00:00
0
Reply
- Copy link
[](https://stackoverflow.com/users/458259/arnaud-bouchez)
Arnaud Bouchez
[Arnaud Bouchez](https://stackoverflow.com/users/458259/arnaud-bouchez)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment28157272_1145236)
@SteveJessop The number 31 is easily optimized by the CPU as a (x\*32)-1 operation, in which `*32` is a simple bit shift, or even better an immediate address scale factor (e.g. `lea eax,eax*8; leax, eax,eax*4` on x86/x64). So `*31` is a good candidate for prime number multiplication. This was pretty much true some years ago - now latest CPUs architecture have an almost instant multiplication - division is always slower...
2013-09-27T13:48:57.543Z+00:00
6
Reply
- Copy link
Add a comment
\|
Show 3 more comments
This answer is useful
11
Save this answer.
Show activity on this post.
Primes are used because you have good chances of obtaining a unique value for a typical hash-function which uses polynomials modulo P. Say, you use such hash-function for strings of length \<= N, and you have a collision. That means that 2 different polynomials produce the same value modulo P. The difference of those polynomials is again a polynomial of the same degree N (or less). It has no more than N roots (this is here the nature of math shows itself, since this claim is only true for a polynomial over a field =\> prime number). So if N is much less than P, you are likely not to have a collision. After that, experiment can probably show that 37 is big enough to avoid collisions for a hash-table of strings which have length 5-10, and is small enough to use for calculations.
[Share](https://stackoverflow.com/a/20206663 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/20206663/edit)
Follow
Follow this answer to receive notifications
answered Nov 26, 2013 at 1:04
[](https://stackoverflow.com/users/2503111/tt-stands-with-russia)
[TT\_ stands with Russia](https://stackoverflow.com/users/2503111/tt-stands-with-russia)TT\_ stands with Russia
1,96233 gold badges2222 silver badges3333 bronze badges
## 1 Comment
Add a comment
[](https://stackoverflow.com/users/2503111/tt-stands-with-russia)
TT\_ stands with Russia
[TT\_ stands with Russia](https://stackoverflow.com/users/2503111/tt-stands-with-russia)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment30129975_20206663)
While the explanation seems now obvious, it got to me after reading a book by A.Shen "Programming: Theorems and problems" (in Russian), see discussion of Rabin algorithm. Not sure if an English translation exists.
2013-11-26T01:20:45.207Z+00:00
1
Reply
- Copy link
This answer is useful
9
Save this answer.
Show activity on this post.
# tl;dr
`index[hash(input)%2]` would result in a collision for half of all possible hashes and a range of values. `index[hash(input)%prime]` results in a collision of \<2 of all possible hashes. Fixing the divisor to the table size also ensures that the number cannot be greater than the table.
[Share](https://stackoverflow.com/a/13243123 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/13243123/edit)
Follow
Follow this answer to receive notifications
[edited Mar 4, 2013 at 6:33](https://stackoverflow.com/posts/13243123/revisions "show all edits to this post")
answered Nov 6, 2012 at 1:31
[](https://stackoverflow.com/users/328144/indolering)
[Indolering](https://stackoverflow.com/users/328144/indolering)Indolering
3,1343333 silver badges4646 bronze badges
## 2 Comments
Add a comment
[](https://stackoverflow.com/users/8706759/ganesh-chowdhary-sadanala)
Ganesh Chowdhary Sadanala
[Ganesh Chowdhary Sadanala](https://stackoverflow.com/users/8706759/ganesh-chowdhary-sadanala)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment92890916_13243123)
2 is a prime number dude
2018-10-25T15:25:15.03Z+00:00
12
Reply
- Copy link
[](https://stackoverflow.com/users/3873799/alelom)
alelom
[alelom](https://stackoverflow.com/users/3873799/alelom)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment131020254_13243123)
Even if we interpreted this answer as *`index[hash(input)%prime]` results in a collision of \<2 of all possible hashes, **where prime \> 2*** it would still make no sense, as that condition would be true for any number \>2.
2022-10-26T14:11:55.75Z+00:00
3
Reply
- Copy link
This answer is useful
5
Save this answer.
Show activity on this post.
Just to provide an alternate viewpoint there's this site:
[http://www.codexon.com/posts/hash-functions-the-modulo-prime-myth](https://web.archive.org/web/20101114013835/http://www.codexon.com/posts/hash-functions-the-modulo-prime-myth)
Which contends that you should use the largest number of buckets possible as opposed to to rounding down to a prime number of buckets. It seems like a reasonable possibility. Intuitively, I can certainly see how a larger number of buckets would be better, but I'm unable to make a mathematical argument of this.
[Share](https://stackoverflow.com/a/1145300 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/1145300/edit)
Follow
Follow this answer to receive notifications
[edited Nov 29, 2015 at 3:50](https://stackoverflow.com/posts/1145300/revisions "show all edits to this post")
[](https://stackoverflow.com/users/3093387/josliber)
[josliber](https://stackoverflow.com/users/3093387/josliber)
44\.4k1212 gold badges103103 silver badges136136 bronze badges
answered Jul 17, 2009 at 19:44
[](https://stackoverflow.com/users/134772/falaina)
[Falaina](https://stackoverflow.com/users/134772/falaina)Falaina
6,6953131 silver badges3131 bronze badges
## 6 Comments
Add a comment
[](https://stackoverflow.com/users/57757/unknown)
Unknown
[Unknown](https://stackoverflow.com/users/57757/unknown)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment1244694_1145300)
Larger number of buckets means less collisions: See the pigeonhole principle.
2009-09-10T03:23:47.227Z+00:00
0
Reply
- Copy link
[](https://stackoverflow.com/users/134772/falaina)
Falaina
[Falaina](https://stackoverflow.com/users/134772/falaina)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment1247250_1145300)
@Unknown: I don't believe that's true. Please correct me if I'm wrong, but I believe applying the pigeonhole principle to hash tables only allows you to assert that there WILL be collisions if you have more elements than bins, not to draw any conclusions on the amount or density of collisions. I still believe that the larger number of bins is the correct route, however.
2009-09-10T14:18:43.59Z+00:00
12
Reply
- Copy link
[](https://stackoverflow.com/users/57757/unknown)
Unknown
[Unknown](https://stackoverflow.com/users/57757/unknown)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment1342355_1145300)
If you assume that the collisions are for all intents and purposes random, then by the birthday paradox a larger space (buckets) will reduce the probability of a collision occurring.
2009-09-29T02:57:52.267Z+00:00
0
Reply
- Copy link
[](https://stackoverflow.com/users/119772/suraj-chandran)
Suraj Chandran
[Suraj Chandran](https://stackoverflow.com/users/119772/suraj-chandran)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment10153362_1145300)
@Unknown you have missed that collisions also depend on the hash function itself. So if the has function is really bad, then no matter how big you increase the size, there still might be significant amount of collisions
2011-11-24T02:25:40.73Z+00:00
2
Reply
- Copy link
[](https://stackoverflow.com/users/1386054/adrian-mccarthy)
Adrian McCarthy
[Adrian McCarthy](https://stackoverflow.com/users/1386054/adrian-mccarthy)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment41890824_1145300)
The original article seems to be gone, but there are some insightful comments here, including a discussion with the original author. [news.ycombinator.com/item?id=650487](https://news.ycombinator.com/item?id=650487)
2014-10-29T22:52:55.29Z+00:00
0
Reply
- Copy link
Add a comment
\|
Show 1 more comment
This answer is useful
5
Save this answer.
Show activity on this post.
> Copying from my other answer <https://stackoverflow.com/a/43126969/917428>. See it for more details and examples.
I believe that it just has to do with the fact that computers work with in base 2. Just think at how the same thing works for base 10:
- 8 % 10 = 8
- 18 % 10 = 8
- 87865378 % 10 = 8
It doesn't matter what the number is: as long as it ends with 8, its modulo 10 will be 8.
**Picking a big enough, non-power-of-two number will make sure the hash function really is a function of all the input bits, rather than a subset of them.**
[Share](https://stackoverflow.com/a/43127092 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/43127092/edit)
Follow
Follow this answer to receive notifications
[edited May 23, 2017 at 11:47](https://stackoverflow.com/posts/43127092/revisions "show all edits to this post")
[](https://stackoverflow.com/users/-1/community)
[Community](https://stackoverflow.com/users/-1/community)Bot
111 silver badge
answered Mar 30, 2017 at 19:48
[](https://stackoverflow.com/users/917428/ste-95)
[Ste\_95](https://stackoverflow.com/users/917428/ste-95)Ste\_95
37155 silver badges1717 bronze badges
## 1 Comment
Add a comment
[](https://stackoverflow.com/users/902036/dz902)
dz902
[dz902](https://stackoverflow.com/users/902036/dz902)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment124649153_43127092)
This is great, even if it may not be complete. I don't what other people are talking about.
2021-12-29T03:44:27.24Z+00:00
0
Reply
- Copy link
This answer is useful
5
Save this answer.
Show activity on this post.
"The nature of math" regarding prime power moduli is that they are one building block of a [finite field](https://en.wikipedia.org/wiki/Finite_field). The other two building blocks are an addition and a multiplication operation. The special property of prime moduli is that they form a finite field with the "regular" addition and multiplication operations, just taken to the modulus. This means every multiplication maps to a different integer modulo the prime, so does every addition.
Prime moduli are advantageous because:
- They give the most freedom when choosing the secondary multiplier in secondary hashing, all multipliers except 0 will end up visiting all elements exactly once
- If all hashes are less than the modulus there will be no collisions at all
- Random primes mix better than power of two moduli and compress the information of all the bits not just a subset
They however have a big downside, they require an integer division, which takes many (~ 15-40) cycles, even on a modern CPU. With around half the computation one can make sure the hash is mixed up very well. Two multiplications and xorshift operations will mix better than a prime moudulus. Then we can use whatever hash table size and hash reduction is fastest, giving 7 operations in total for power of 2 table sizes and around 9 operations for arbitrary sizes.
I recently looked at many of the [fastest hash table implementations](https://1ykos.github.io/ordered_patch_map/) and most of them don't use prime moduli.
The distribution of the hash table indices are mainly dependent on the hash function in use. **A prime modulus can't fix a bad hash function and a [good hash function](https://stackoverflow.com/a/57556517/2119377) does not benefit from a prime modulus.** There are cases where they can be advantageous however. It can mend a half-bad hash function for example.
[Share](https://stackoverflow.com/a/57948442 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/ "The current license for this post: CC BY-SA 4.0")
[Improve this answer](https://stackoverflow.com/posts/57948442/edit)
Follow
Follow this answer to receive notifications
[edited Sep 16, 2020 at 18:33](https://stackoverflow.com/posts/57948442/revisions "show all edits to this post")
answered Sep 15, 2019 at 21:53
[](https://stackoverflow.com/users/2119377/wolfgang-brehm)
[Wolfgang Brehm](https://stackoverflow.com/users/2119377/wolfgang-brehm)Wolfgang Brehm
1,7612020 silver badges2525 bronze badges
## 2 Comments
Add a comment
[](https://stackoverflow.com/users/21499/dr-hans-peter-st%C3%B6rr)
Dr. Hans-Peter Störr
[Dr. Hans-Peter Störr](https://stackoverflow.com/users/21499/dr-hans-peter-st%C3%B6rr)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment130965191_57948442)
Why do you say a prime modulus require an integer division? I guess people were talking about that common implementation that requires addition and multiplication only: hash = modulus \* hash + field1; hash = modulus \* hash + field2; ...
2022-10-24T06:42:49.93Z+00:00
0
Reply
- Copy link
[](https://stackoverflow.com/users/2119377/wolfgang-brehm)
Wolfgang Brehm
[Wolfgang Brehm](https://stackoverflow.com/users/2119377/wolfgang-brehm)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment130970002_57948442)
@Hans-PeterStörr The way a computer calculates the modulus is via an integer division: `n%m = n - floor(n/d)` . What you are referring to is not a prime modulus, but it seems to be linear probing (field1, field2 ...) with a multiplier paradoxically called modulus. But how do you make sure that your `hash` is a number smaller than the size of the table? You need to reduce it, and often this is done with modulo.
2022-10-24T11:05:48.987Z+00:00
0
Reply
- Copy link
This answer is useful
4
Save this answer.
Show activity on this post.
It depends on the choice of hash function.
Many hash functions combine the various elements in the data by multiplying them with some factors modulo the power of two corresponding to the word size of the machine (that modulus is free by just letting the calculation overflow).
You don't want any common factor between a multiplier for a data element and the size of the hash table, because then it could happen that varying the data element doesn't spread the data over the whole table. If you choose a prime for the size of the table such a common factor is highly unlikely.
On the other hand, those factors are usually made up from odd primes, so you should also be safe using powers of two for your hash table (e.g. Eclipse uses 31 when it generates the Java hashCode() method).
[Share](https://stackoverflow.com/a/1146940 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 2.5](https://creativecommons.org/licenses/by-sa/2.5/ "The current license for this post: CC BY-SA 2.5")
[Improve this answer](https://stackoverflow.com/posts/1146940/edit)
Follow
Follow this answer to receive notifications
answered Jul 18, 2009 at 7:32
[](https://stackoverflow.com/users/49246/starblue)
[starblue](https://stackoverflow.com/users/49246/starblue)starblue
57k1414 gold badges101101 silver badges153153 bronze badges
## Comments
Add a comment
This answer is useful
3
Save this answer.
Show activity on this post.
> Primes are unique numbers. They are unique in that, the product of a prime with any other number has the best chance of being unique (not as unique as the prime itself of-course) due to the fact that a prime is used to compose it. This property is used in hashing functions.
>
> Given a string “Samuel”, you can generate a unique hash by multiply each of the constituent digits or letters with a prime number and adding them up. This is why primes are used.
>
> However using primes is an old technique. The key here to understand that as long as you can generate a sufficiently unique key you can move to other hashing techniques too. Go here for more on this topic about <http://www.azillionmonkeys.com/qed/hash.html>
<http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/>
[Share](https://stackoverflow.com/a/1145244 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 2.5](https://creativecommons.org/licenses/by-sa/2.5/ "The current license for this post: CC BY-SA 2.5")
[Improve this answer](https://stackoverflow.com/posts/1145244/edit)
Follow
Follow this answer to receive notifications
[edited Jul 17, 2009 at 19:50](https://stackoverflow.com/posts/1145244/revisions "show all edits to this post")
[](https://stackoverflow.com/users/88383/albertopl)
[AlbertoPL](https://stackoverflow.com/users/88383/albertopl)
11\.5k55 gold badges5151 silver badges7373 bronze badges
answered Jul 17, 2009 at 19:34
[](https://stackoverflow.com/users/105033/user105033)
[user105033](https://stackoverflow.com/users/105033/user105033)user105033
19\.7k2020 gold badges6060 silver badges7070 bronze badges
## 2 Comments
Add a comment
[](https://stackoverflow.com/users/7141/hasanih)
HasaniH
[HasaniH](https://stackoverflow.com/users/7141/hasanih)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment963621_1145244)
hahahah.... actually doesn't the product of 2 primes have a better chance of being 'unique' than the product of a prime and any other number?
2009-07-17T20:14:54.577Z+00:00
1
Reply
- Copy link
[](https://stackoverflow.com/users/2503111/tt-stands-with-russia)
TT\_ stands with Russia
[TT\_ stands with Russia](https://stackoverflow.com/users/2503111/tt-stands-with-russia)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment68510272_1145244)
@Beska Here "uniqueness" is defined recursively, so I believe "non-uniqueness" should be defined in the same way :)
2016-11-16T17:56:43.463Z+00:00
0
Reply
- Copy link
This answer is useful
3
Save this answer.
Show activity on this post.
I would say the first answer at [this link](https://cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function) is the clearest answer I found regarding this question.
Consider the set of keys ***K* = {0,1,...,100}** and a hash table where the number of buckets is **m = 12**. Since **3** is a factor of **12**, the keys that are multiples of **3** will be hashed to buckets that are multiples of **3**:
- Keys **{0,12,24,36,...}** will be hashed to bucket 0.
- Keys **{3,15,27,39,...}** will be hashed to bucket 3.
- Keys **{6,18,30,42,...}** will be hashed to bucket 6.
- Keys **{9,21,33,45,...}** will be hashed to bucket 9.
If ***K*** is uniformly distributed (i.e., every key in ***K*** is equally likely to occur), then the choice of m is not so critical. But, what happens if ***K*** is not uniformly distributed? Imagine that the keys that are most likely to occur are the multiples of **3**. In this case, all of the buckets that are not multiples of **3** will be empty with high probability (which is really bad in terms of hash table performance).
This situation is more common that it may seem. Imagine, for instance, that you are keeping track of objects based on where they are stored in memory. If your computer's word size is four bytes, then you will be hashing keys that are multiples of **4**. Needless to say that choosing m to be a multiple of **4** would be a terrible choice: you would have **3m/4** buckets completely empty, and all of your keys colliding in the remaining **m/4** buckets.
In general:
> *Every key in K that shares a common factor with the number of buckets m will be hashed to a bucket that is a multiple of this factor.*
Therefore, to minimize collisions, it is important to reduce the number of common factors between m and the elements of ***K***. How can this be achieved? By choosing m to be a number that has very few factors: a **prime number**.
**FROM THE ANSWER BY [Mario](https://cs.stackexchange.com/users/57681/mario-cervera).**
[Share](https://stackoverflow.com/a/62509607 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/ "The current license for this post: CC BY-SA 4.0")
[Improve this answer](https://stackoverflow.com/posts/62509607/edit)
Follow
Follow this answer to receive notifications
[edited Jun 22, 2020 at 7:49](https://stackoverflow.com/posts/62509607/revisions "show all edits to this post")
answered Jun 22, 2020 at 7:42
[](https://stackoverflow.com/users/11079546/y-wang)
[Y.Wang](https://stackoverflow.com/users/11079546/y-wang)Y.Wang
15911 silver badge1111 bronze badges
## Comments
Add a comment
This answer is useful
2
Save this answer.
Show activity on this post.
Suppose your table-size (or the number for modulo) is T = (B\*C). Now if hash for your input is like (N\*A\*B) where N can be any integer, then your output won't be well distributed. Because every time n becomes C, 2C, 3C etc., your output will start repeating. i.e. your output will be distributed only in C positions. Note that C here is (T / HCF(table-size, hash)).
This problem can be eliminated by making HCF 1. Prime numbers are very good for that.
Another interesting thing is when T is 2^N. These will give output exactly same as all the lower N bits of input-hash. As every number can be represented powers of 2, when we will take modulo of any number with T, we will subtract all powers of 2 form number, which are \>= N, hence always giving off number of specific pattern, dependent on the input. This is also a bad choice.
Similarly, T as 10^N is bad as well because of similar reasons (pattern in decimal notation of numbers instead of binary).
So, prime numbers tend to give a better distributed results, hence are good choice for table size.
[Share](https://stackoverflow.com/a/39340564 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/39340564/edit)
Follow
Follow this answer to receive notifications
[edited Sep 6, 2016 at 4:23](https://stackoverflow.com/posts/39340564/revisions "show all edits to this post")
answered Sep 6, 2016 at 4:16
[](https://stackoverflow.com/users/3340994/nishantbhardwaj2002)
[nishantbhardwaj2002](https://stackoverflow.com/users/3340994/nishantbhardwaj2002)nishantbhardwaj2002
76722 gold badges66 silver badges2020 bronze badges
## Comments
Add a comment
This answer is useful
1
Save this answer.
Show activity on this post.
I'd like to add something for Steve Jessop's answer(I can't comment on it since I don't have enough reputation). But I found some helpful material. His answer is very help but he made a mistake: the bucket size should not be a power of 2. I'll just quote from the book "Introduction to Algorithm" by Thomas Cormen, Charles Leisersen, et al on page263:
> When using the division method, we usually avoid certain values of m. For example, m should not be a power of 2, since if m = 2^p, then h(k) is just the p lowest-order bits of k. Unless we know that all low-order p-bit patterns are equally likely, we are better off designing the hash function to depend on all the bits of the key. As Exercise 11.3-3 asks you to show, choosing m = 2^p-1 when k is a character string interpreted in radix 2^p may be a poor choice, because permuting the characters of k does not change its hash value.
Hope it helps.
[Share](https://stackoverflow.com/a/34072726 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/34072726/edit)
Follow
Follow this answer to receive notifications
[edited Dec 3, 2015 at 17:48](https://stackoverflow.com/posts/34072726/revisions "show all edits to this post")
answered Dec 3, 2015 at 17:43
[](https://stackoverflow.com/users/2708746/iefgnoix)
[iefgnoix](https://stackoverflow.com/users/2708746/iefgnoix)iefgnoix
9122 silver badges1010 bronze badges
## Comments
Add a comment
This answer is useful
1
Save this answer.
Show activity on this post.
This question was merged with the more appropriate question, why hash tables should use prime sized arrays, and not power of 2. For hash functions itself there are plenty of good answers here, but for the related question, why some security-critical hash tables, like glibc, use prime-sized arrays, there's none yet.
Generally power of 2 tables are much faster. There the expensive `h % n => h & bitmask`, where the bitmask can be calculated via `clz` ("count leading zeros") of the size n. A modulo function needs to do integer division which is about 50x slower than a logical `and`. There are some tricks to avoid a modulo, like using Lemire's <https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/>, but generally fast hash tables use power of 2, and secure hash tables use primes.
Why so?
Security in this case is defined by attacks on the collision resolution strategy, which is with most hash tables just linear search in a linked list of collisions. Or with the faster open-addressing tables linear search in the table directly. So with power of 2 tables and some internal knowledge of the table, e.g. the size or the order of the list of keys provided by some JSON interface, you get the number of right bits used. The number of ones on the bitmask. This is typically lower than 10 bits. And for 5-10 bits it's trivial to brute force collisions even with the strongest and slowest hash functions. You don't get the full security of your 32bit or 64 bit hash functions anymore. And the point is to use fast small hash functions, not monsters such as murmur or even siphash.
So if you provide an external interface to your hash table, like a DNS resolver, a programming language, ... you want to care about abuse folks who like to DOS such services. It's normally easier for such folks to shut down your public service with much easier methods, but it did happen. So people did care.
So the best options to prevent from such collision attacks is either
1\) to use prime tables, because then
- all 32 or 64 bits are relevant to find the bucket, not just a few.
- the hash table resize function is more natural than just double. The best growth function is the fibonacci sequence and primes come closer to that than doubling.
2\) use better measures against the actual attack, together with fast power of 2 sizes.
- count the collisions and abort or sleep on detected attacks, which is collision numbers with a probability of \<1%. Like 100 with 32bit hash tables. This is what e.g. djb's dns resolver does.
- convert the linked list of collisions to tree's with O(log n) search not O(n) when an collision attack is detected. This is what e.g. java does.
There's a wide-spread myth that more secure hash functions help to prevent such attacks, which is wrong as I explained. There's no security with low bits only. This would only work with prime-sized tables, but this would use a combination of the two slowest methods, slow hash plus slow prime modulo.
Hash functions for hash tables primarily need to be small (to be inlinable) and fast. Security can come only from preventing linear search in the collisions. And not to use trivially bad hash functions, like ones insensitive to some values (like \\0 when using multiplication).
Using random seeds is also a good option, people started with that first, but with enough information of the table even a random seed does not help much, and dynamic languages typically make it trivial to get the seed via other methods, as it's stored in known memory locations.
[Share](https://stackoverflow.com/a/60884597 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/ "The current license for this post: CC BY-SA 4.0")
[Improve this answer](https://stackoverflow.com/posts/60884597/edit)
Follow
Follow this answer to receive notifications
answered Mar 27, 2020 at 10:56
[](https://stackoverflow.com/users/414279/rurban)
[rurban](https://stackoverflow.com/users/414279/rurban)rurban
4,1512727 silver badges2828 bronze badges
## Comments
Add a comment
This answer is useful
0
Save this answer.
Show activity on this post.
For a hash function it's not only important to minimize colisions generally but to make it impossible to stay with the same hash while chaning a few bytes.
Say you have an equation: `(x + y*z) % key = x` with `0<x<key` and `0<z<key`. If key is a primenumber n\*y=key is true for every n in N and false for every other number.
An example where key isn't a prime example: x=1, z=2 and key=8 Because key/z=4 is still a natural number, 4 becomes a solution for our equation and in this case (n/2)\*y = key is true for every n in N. The amount of solutions for the equation have practially doubled because 8 isn't a prime.
If our attacker already knows that 8 is possible solution for the equation he can change the file from producing 8 to 4 and still gets the same hash.
[Share](https://stackoverflow.com/a/1147609 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 2.5](https://creativecommons.org/licenses/by-sa/2.5/ "The current license for this post: CC BY-SA 2.5")
[Improve this answer](https://stackoverflow.com/posts/1147609/edit)
Follow
Follow this answer to receive notifications
answered Jul 18, 2009 at 14:01
[](https://stackoverflow.com/users/25282/christian)
[Christian](https://stackoverflow.com/users/25282/christian)Christian
26\.4k4343 gold badges142142 silver badges233233 bronze badges
## Comments
Add a comment
This answer is useful
0
Save this answer.
Show activity on this post.
I've read the popular wordpress website linked in some of the above popular answers at the top. From what I've understood, I'd like to share a simple observation I made.
You can find all the details in the article [here](http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/), but assume the following holds true:
- Using a prime number gives us the "best chance" of an **unique value**
A general hashmap implementation wants 2 things to be unique.
- **Unique** hash code for the **key**
- **Unique** index to store the actual **value**
*How* do we get the unique index? By making the initial size of the internal container a prime as well. So basically, prime is involved because it possesses this unique trait of producing unique numbers which we end up using to ID objects and finding indexes inside the internal container.
Example:
key = "key"
value = "value"
maps to **unique id**
Now we want a **unique location** for our value - so we
`uniqueId % internalContainerSize == uniqueLocationForValue` , assuming `internalContainerSize` is also a prime.
I know this is simplified, but I'm hoping to get the general idea through.
[Share](https://stackoverflow.com/a/49218084 "Short permalink to this answer")
Share a link to this answer
Copy link
[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/ "The current license for this post: CC BY-SA 3.0")
[Improve this answer](https://stackoverflow.com/posts/49218084/edit)
Follow
Follow this answer to receive notifications
answered Mar 11, 2018 at 8:25
[](https://stackoverflow.com/users/1061644/ryhan)
[Ryhan](https://stackoverflow.com/users/1061644/ryhan)Ryhan
1,88511 gold badge1818 silver badges2222 bronze badges
## 2 Comments
Add a comment
[](https://stackoverflow.com/users/6169894/aminm)
AminM
[AminM](https://stackoverflow.com/users/6169894/aminm)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment132446567_49218084)
I don't think it's the best chance of a "unique" value.. but rather that a prime number will spread out the numbers throughout the array in a more "random-like" way than what you'd get with numbers that share common factors. The uniqueness is going to depend on how much room you have in the array. If you only have 4 buckets for 5 items, a collision is inevitable. But 5 items in 100 buckets can be rare to the extreme. So load factor is what will determine "uniqueness". But a good hash algorithm (for example a prime) will give you a good spread.
2023-01-09T06:34:35.79Z+00:00
1
Reply
- Copy link
[](https://stackoverflow.com/users/6169894/aminm)
AminM
[AminM](https://stackoverflow.com/users/6169894/aminm)
[Over a year ago](https://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus#comment132446590_49218084)
If you don't want to be bound by prime numbers you can use hash by multiplication where you multiply times the decimals of some random irrational number like PI or the golden ratio. This has the same randomization as I explained above.
2023-01-09T06:35:52.747Z+00:00
0
Reply
- Copy link
Start asking to get answers
Find the answer to your question by asking.
[Ask question](https://stackoverflow.com/questions/ask)
Explore related questions
- [language-agnostic](https://stackoverflow.com/questions/tagged/language-agnostic "show questions tagged 'language-agnostic'")
- [data-structures](https://stackoverflow.com/questions/tagged/data-structures "show questions tagged 'data-structures'")
- [hash](https://stackoverflow.com/questions/tagged/hash "show questions tagged 'hash'")
See similar questions with these tags.
- The Overflow Blog
- [AI agents for your digital chores](https://stackoverflow.blog/2025/10/14/ai-agents-for-your-digital-chores/?cb=1)
- Featured on Meta
- [A First Look: Stack Overflow Redesign](https://meta.stackexchange.com/questions/412992/a-first-look-stack-overflow-redesign?cb=1)
- [Community Asks Sprint Announcement - October/November 2025](https://meta.stackexchange.com/questions/413210/community-asks-sprint-announcement-october-november-2025?cb=1)
- [Policy: Generative AI (e.g., ChatGPT) is banned](https://meta.stackoverflow.com/questions/421831/policy-generative-ai-e-g-chatgpt-is-banned?cb=1)
- [Exploring new types of questions on Stack Overflow](https://meta.stackoverflow.com/questions/435121/exploring-new-types-of-questions-on-stack-overflow?cb=1)
Community activity
Last 1 hr
- Users online activity
22697 users online
- 24 questions
- 28 answers
- 122 comments
- 417 upvotes
Popular tags
[windows](https://stackoverflow.com/questions/tagged/windows)[c++](https://stackoverflow.com/questions/tagged/c++)[c\#](https://stackoverflow.com/questions/tagged/c)[java](https://stackoverflow.com/questions/tagged/java)[localhost](https://stackoverflow.com/questions/tagged/localhost)[python](https://stackoverflow.com/questions/tagged/python)
Popular unanswered question
[Idiomatic way to define type constraint on forwarding reference argument](https://stackoverflow.com/questions/79791126)
[c++](https://stackoverflow.com/questions/tagged/c++)[c++20](https://stackoverflow.com/questions/tagged/c++20)[c++-concepts](https://stackoverflow.com/questions/tagged/c++-concepts)[forwarding-reference](https://stackoverflow.com/questions/tagged/forwarding-reference)
[](https://stackoverflow.com/users/4437653)
[Sven Sandberg](https://stackoverflow.com/users/4437653)
- 689
1 hour ago
#### Linked
[35](https://stackoverflow.com/questions/3980117/hash-table-why-size-should-be-prime?lq=1 "Question score (upvotes - downvotes)")
[Hash table: why size should be prime?](https://stackoverflow.com/questions/3980117/hash-table-why-size-should-be-prime?noredirect=1&lq=1)
[600](https://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier?lq=1 "Question score (upvotes - downvotes)")
[Why does Java's hashCode() in String use 31 as a multiplier?](https://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier?noredirect=1&lq=1)
[214](https://stackoverflow.com/questions/439870/why-are-primes-important-in-cryptography?lq=1 "Question score (upvotes - downvotes)")
[Why are primes important in cryptography?](https://stackoverflow.com/questions/439870/why-are-primes-important-in-cryptography?noredirect=1&lq=1)
[208](https://stackoverflow.com/questions/3613102/why-use-a-prime-number-in-hashcode?lq=1 "Question score (upvotes - downvotes)")
[Why use a prime number in hashCode?](https://stackoverflow.com/questions/3613102/why-use-a-prime-number-in-hashcode?noredirect=1&lq=1)
[139](https://stackoverflow.com/questions/664014/what-integer-hash-function-are-good-that-accepts-an-integer-hash-key?lq=1 "Question score (upvotes - downvotes)")
[What integer hash function are good that accepts an integer hash key?](https://stackoverflow.com/questions/664014/what-integer-hash-function-are-good-that-accepts-an-integer-hash-key?noredirect=1&lq=1)
[71](https://stackoverflow.com/questions/1835976/what-is-a-sensible-prime-for-hashcode-calculation?lq=1 "Question score (upvotes - downvotes)")
[What is a sensible prime for hashcode calculation?](https://stackoverflow.com/questions/1835976/what-is-a-sensible-prime-for-hashcode-calculation?noredirect=1&lq=1)
[56](https://stackoverflow.com/questions/5929878/why-is-the-size-127-prime-better-than-128-for-a-hash-table?lq=1 "Question score (upvotes - downvotes)")
[Why is the size 127 (prime) better than 128 for a hash-table?](https://stackoverflow.com/questions/5929878/why-is-the-size-127-prime-better-than-128-for-a-hash-table?noredirect=1&lq=1)
[49](https://stackoverflow.com/questions/3521532/how-is-the-c-net-3-5-dictionary-implemented?lq=1 "Question score (upvotes - downvotes)")
[How is the c\#/.net 3.5 dictionary implemented?](https://stackoverflow.com/questions/3521532/how-is-the-c-net-3-5-dictionary-implemented?noredirect=1&lq=1)
[28](https://stackoverflow.com/questions/5152015/why-setting-hashtables-length-to-a-prime-number-is-a-good-practice?lq=1 "Question score (upvotes - downvotes)")
[Why setting HashTable's length to a Prime Number is a good practice?](https://stackoverflow.com/questions/5152015/why-setting-hashtables-length-to-a-prime-number-is-a-good-practice?noredirect=1&lq=1)
[18](https://stackoverflow.com/questions/7278136/create-hash-value-on-a-list?lq=1 "Question score (upvotes - downvotes)")
[Create Hash Value on a List?](https://stackoverflow.com/questions/7278136/create-hash-value-on-a-list?noredirect=1&lq=1)
[See more linked questions](https://stackoverflow.com/questions/linked/1145217?lq=1)
#### Related
[153](https://stackoverflow.com/questions/34595/what-is-a-good-hash-function?rq=3 "Question score (upvotes - downvotes)")
[What is a good Hash Function?](https://stackoverflow.com/questions/34595/what-is-a-good-hash-function?rq=3)
[208](https://stackoverflow.com/questions/3613102/why-use-a-prime-number-in-hashcode?rq=3 "Question score (upvotes - downvotes)")
[Why use a prime number in hashCode?](https://stackoverflow.com/questions/3613102/why-use-a-prime-number-in-hashcode?rq=3)
[3](https://stackoverflow.com/questions/4418424/do-we-use-hash-tables-in-practise?rq=3 "Question score (upvotes - downvotes)")
[Do we use hash tables in practise?](https://stackoverflow.com/questions/4418424/do-we-use-hash-tables-in-practise?rq=3)
[0](https://stackoverflow.com/questions/5224825/analysis-of-the-usage-of-prime-numbers-in-hash-functions?rq=3 "Question score (upvotes - downvotes)")
[Analysis of the usage of prime numbers in hash functions](https://stackoverflow.com/questions/5224825/analysis-of-the-usage-of-prime-numbers-in-hash-functions?rq=3)
[7](https://stackoverflow.com/questions/6234901/5-popular-hash-functions?rq=3 "Question score (upvotes - downvotes)")
[5 popular hash functions..?](https://stackoverflow.com/questions/6234901/5-popular-hash-functions?rq=3)
[5](https://stackoverflow.com/questions/6504429/if-consistent-hash-is-efficient-why-dont-people-use-it-everywhere?rq=3 "Question score (upvotes - downvotes)")
[If consistent hash is efficient,why don't people use it everywhere?](https://stackoverflow.com/questions/6504429/if-consistent-hash-is-efficient-why-dont-people-use-it-everywhere?rq=3)
[4](https://stackoverflow.com/questions/22345310/why-arent-cryptographic-hash-functions-used-in-data-structures?rq=3 "Question score (upvotes - downvotes)")
[Why aren't cryptographic hash functions used in data structures?](https://stackoverflow.com/questions/22345310/why-arent-cryptographic-hash-functions-used-in-data-structures?rq=3)
[1](https://stackoverflow.com/questions/23434693/when-is-it-appropriate-to-use-a-simple-modulus-as-a-hashing-function?rq=3 "Question score (upvotes - downvotes)")
[When is it appropriate to use a simple modulus as a hashing function?](https://stackoverflow.com/questions/23434693/when-is-it-appropriate-to-use-a-simple-modulus-as-a-hashing-function?rq=3)
[1](https://stackoverflow.com/questions/25653394/why-is-it-advantageous-for-the-table-size-to-be-prime-when-using-secondary-hashi?rq=3 "Question score (upvotes - downvotes)")
[Why is it advantageous for the table size to be prime when using secondary Hashing?](https://stackoverflow.com/questions/25653394/why-is-it-advantageous-for-the-table-size-to-be-prime-when-using-secondary-hashi?rq=3)
[0](https://stackoverflow.com/questions/66483472/why-even-have-hash-tables?rq=3 "Question score (upvotes - downvotes)")
[Why even have hash tables?](https://stackoverflow.com/questions/66483472/why-even-have-hash-tables?rq=3)
#### [Hot Network Questions](https://stackexchange.com/questions?tab=hot)
- [Is Tauriel in any of Tolkien's actual writings?](https://literature.stackexchange.com/questions/29873/is-tauriel-in-any-of-tolkiens-actual-writings)
- [Why does a 80C88 require decoupling capacitor between its two ground pins?](https://retrocomputing.stackexchange.com/questions/32154/why-does-a-80c88-require-decoupling-capacitor-between-its-two-ground-pins)
- [Does Acts 7 imply that Jesus cannot be God?](https://hermeneutics.stackexchange.com/questions/108852/does-acts-7-imply-that-jesus-cannot-be-god)
- [Is "shelve" the only verb formed as a back-formation from the plural of its noun form? (shelf -\> shelves -\> shelve)](https://english.stackexchange.com/questions/634464/is-shelve-the-only-verb-formed-as-a-back-formation-from-the-plural-of-its-noun)
- [Why are millipedes called 马陆?](https://chinese.stackexchange.com/questions/61641/why-are-millipedes-called-%E9%A9%AC%E9%99%86)
- [How to write out basic math-by-hand examples](https://tex.stackexchange.com/questions/752398/how-to-write-out-basic-math-by-hand-examples)
- [Allocating values in Boolean expression](https://cstheory.stackexchange.com/questions/55795/allocating-values-in-boolean-expression)
- [What are the arguments for happiness being intrinsically good?](https://philosophy.stackexchange.com/questions/131185/what-are-the-arguments-for-happiness-being-intrinsically-good)
- [Conditional probability in Mathcomp?](https://proofassistants.stackexchange.com/questions/5303/conditional-probability-in-mathcomp)
- [Do I need to defrost frozen tart shells?](https://cooking.stackexchange.com/questions/133243/do-i-need-to-defrost-frozen-tart-shells)
- [What compound has the highest resonance energy per carbon atom?](https://chemistry.stackexchange.com/questions/191098/what-compound-has-the-highest-resonance-energy-per-carbon-atom)
- [Why do prepositions like nach, zu, and aus — which express direction or motion — govern the dative instead of the accusative?](https://german.stackexchange.com/questions/81815/why-do-prepositions-like-nach-zu-and-aus-which-express-direction-or-motion)
- [Programming-free solution to Generate 50 MHz Clock from 25 MHz crystal](https://electronics.stackexchange.com/questions/756956/programming-free-solution-to-generate-50-mhz-clock-from-25-mhz-crystal)
- [How can technical writers make dense research papers more readable without reducing precision?](https://writing.stackexchange.com/questions/71917/how-can-technical-writers-make-dense-research-papers-more-readable-without-reduc)
- ["sort" command ignores locale rules for capitalization, accentuation and special characters in 25.10](https://askubuntu.com/questions/1557259/sort-command-ignores-locale-rules-for-capitalization-accentuation-and-special)
- [Effect of Sun's gravity on an object on the Earth's surface](https://physics.stackexchange.com/questions/860784/effect-of-suns-gravity-on-an-object-on-the-earths-surface)
- [BitLocker without TPM — automatic boot with key on hidden partition of same SSD](https://superuser.com/questions/1926585/bitlocker-without-tpm-automatic-boot-with-key-on-hidden-partition-of-same-ssd)
- [wheel group with selinux](https://unix.stackexchange.com/questions/800470/wheel-group-with-selinux)
- [Increasing points density by interpolating data in QGIS](https://gis.stackexchange.com/questions/495725/increasing-points-density-by-interpolating-data-in-qgis)
- [Mysterious letters from two banks. Should I be worried?](https://money.stackexchange.com/questions/166813/mysterious-letters-from-two-banks-should-i-be-worried)
- [Origin of the Kanji 物: can 勿 have the meaning "Elephant"?](https://japanese.stackexchange.com/questions/109320/origin-of-the-kanji-%E7%89%A9-can-%E5%8B%BF-have-the-meaning-elephant)
- [Postdoc PI connected with me on LinkedIn after I applied—should I reach out or wait?](https://academia.stackexchange.com/questions/221721/postdoc-pi-connected-with-me-on-linkedin-after-i-applied-should-i-reach-out-or-w)
- [What is the meaning of "knowing all the Green functions implies knowledge of the full theory"?](https://physics.stackexchange.com/questions/860838/what-is-the-meaning-of-knowing-all-the-green-functions-implies-knowledge-of-the)
- [First homology groups and the universal cover](https://mathoverflow.net/questions/501612/first-homology-groups-and-the-universal-cover)
[Question feed](https://stackoverflow.com/feeds/question/1145217 "Feed of this question and its answers")
# Subscribe to RSS
Question feed
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

# Why are you flagging this comment?
Probable spam.
This comment promotes a product, service or website while [failing to disclose the author's affiliation](https://stackoverflow.com/help/promotion).
Unfriendly or contains harassment/bigotry/abuse.
This comment is unkind, insulting or attacks another person or group. Learn more in our [Abusive behavior policy](https://stackoverflow.com/conduct/abusive-behavior).
Not needed.
This comment is not relevant to the post.
```
```
Enter at least 6 characters
Something else.
A problem not listed above. Try to be as specific as possible.
```
```
Enter at least 6 characters
Flag comment
Cancel
You have 0 flags left today
##### [Stack Overflow](https://stackoverflow.com/)
- [Questions](https://stackoverflow.com/questions)
- [Help](https://stackoverflow.com/help)
- [Chat](https://chat.stackoverflow.com/?tab=explore)
##### [Products](https://stackoverflow.co/)
- [Teams](https://stackoverflow.co/teams/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=footer&utm_content=teams)
- [Advertising](https://stackoverflow.co/advertising/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=footer&utm_content=advertising)
- [Talent](https://stackoverflow.co/advertising/employer-branding/?utm_medium=referral&utm_source=stackoverflow-community&utm_campaign=footer&utm_content=talent)
##### [Company](https://stackoverflow.co/)
- [About](https://stackoverflow.co/)
- [Press](https://stackoverflow.co/company/press/)
- [Work Here](https://stackoverflow.co/company/work-here/)
- [Legal](https://stackoverflow.com/legal)
- [Privacy Policy](https://stackoverflow.com/legal/privacy-policy)
- [Terms of Service](https://stackoverflow.com/legal/terms-of-service/public)
- [Contact Us](https://stackoverflow.com/contact)
- Cookie Settings
- [Cookie Policy](https://policies.stackoverflow.co/stack-overflow/cookie-policy)
##### [Stack Exchange Network](https://stackexchange.com/)
- [Technology](https://stackexchange.com/sites#technology)
- [Culture & recreation](https://stackexchange.com/sites#culturerecreation)
- [Life & arts](https://stackexchange.com/sites#lifearts)
- [Science](https://stackexchange.com/sites#science)
- [Professional](https://stackexchange.com/sites#professional)
- [Business](https://stackexchange.com/sites#business)
- [API](https://api.stackexchange.com/)
- [Data](https://data.stackexchange.com/)
- [Blog](https://stackoverflow.blog/?blb=1)
- [Facebook](https://www.facebook.com/officialstackoverflow/)
- [Twitter](https://twitter.com/stackoverflow)
- [LinkedIn](https://linkedin.com/company/stack-overflow)
- [Instagram](https://www.instagram.com/thestackoverflow)
Site design / logo © 2025 Stack Exchange Inc; user contributions licensed under [CC BY-SA](https://stackoverflow.com/help/licensing) . rev 2025.10.15.35293 |
| Readable Markdown | null |
| Shard | 169 (laksa) |
| Root Hash | 714406497480128969 |
| Unparsed URL | com,stackoverflow!/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus s443 |