ā¹ļø Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.1 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://statisticsbyjim.com/basics/central-limit-theorem/ |
| Last Crawled | 2026-04-11 20:59:23 (2 days ago) |
| First Indexed | 2019-01-08 16:17:09 (7 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Central Limit Theorem Explained - Statistics By Jim |
| Meta Description | The central limit theorem is vital in statistics for two main reasonsāthe normality assumption and the precision of the estimates. |
| Meta Canonical | null |
| Boilerpipe Text | The central limit theorem in
statistics
states that, given a sufficiently large
sample size
, the sampling
distribution
of the mean for a variable will approximate a normal distribution regardless of that variableās distribution in the
population
.
Unpacking the meaning from that complex definition can be difficult. Thatās the topic for this post! Iāll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics.
Distribution of the Variable in the Population
Part of the definition for the central limit theorem states, āregardless of the variableās distribution in the population.ā This part is easy! In a population, the values of a variable can follow different probability distributions. These distributions can range from normal, left-skewed, right-skewed, and uniform among others.
Normal
Right-Skewed
Left-Skewed
Uniform
This part of the definition refers to the distribution of the variableās values in the population from which you draw a random sample.
The central limit theorem applies to almost all types of probability distributions, but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has infinite variance.
Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And, the distribution of that variable must remain constant across all measurements.
Related Post
:
Understanding Probability Distributions
and
Independent and Identically Distributed Variables
Sampling Distribution of the Mean
The definition for the central limit theorem also refers to āthe sampling distribution of the mean.ā Whatās that?
Typically, you perform a study once, and you might calculate the mean of that one sample. Now, imagine that you repeat the study many times and collect the same sample size for each one. Then, you calculate the mean for each of these samples and graph them on a histogram. The histogram displays the distribution of sample means, which statisticians refer to as the sampling distribution of the mean.
Fortunately, we donāt have to repeat studies many times to estimate the sampling distribution of the mean. Statistical procedures can estimate that from a single random sample.
The shape of the sampling distribution depends on the sample size. If you perform the study using the same procedure and change only the sample size, the shape of the sampling distribution will differ for each sample size. And, that brings us to the next part of the CLT definition!
Central Limit Theorem and a Sufficiently Large Sample Size
As the previous section states, the shape of the sampling distribution changes with the sample size. And, the definition of the central limit theorem states that when you have a sufficiently large sample size, the sampling distribution starts to approximate a normal distribution. How large does the sample size have to be for that approximation to occur?
It depends on the shape of the variableās distribution in the underlying population. The more the population distribution differs from being normal, the larger the sample size must be. Typically, statisticians say that a sample size of 30 is sufficient for most distributions. However, strongly skewed distributions can require larger sample sizes. Weāll see the sample size aspect in action during the empirical demonstration below.
Central Limit Theorem and Approximating the Normal Distribution
To recap, the central limit theorem links the following two distributions:
The distribution of the variable in the population.
The sampling distribution of the mean.
Specifically, the CLT states that regardless of the variableās distribution in the population, the sampling distribution of the mean will tend to approximate the normal distribution.
In other words, the population distribution can look like the following:
But, the sampling distribution can appear like below:
Itās not surprising that a normally distributed variable produces a sampling distribution that also follows the normal distribution. But, surprisingly, nonnormal population distributions can also create normal sampling distributions.
Related Post
:
Normal Distribution in Statistics
Properties of the Central Limit Theorem
Letās get more specific about the normality features of the central limit theorem. Normal distributions have two parameters, the mean and standard deviation. What values do these parameters converge on?
As the sample size increases, the sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals Ļ/ān.Ā Where:
Ļ = the population standard deviation
n = the sample size
As the sample size (n) increases, the standard deviation of the sampling distribution becomes smaller because the square root of the sample size is in the denominator. In other words, the sampling distribution clusters more tightly around the mean as sample size increases.
Letās put all of this together. As sample size increases, the sampling distribution more closely approximates the normal distribution, and the spread of that distribution tightens. These properties have essential implications in statistics that Iāll discuss later in this post.
Related Posts
:
Measures of Central Tendency
and
Measures of Variability
Empirical Demonstration of the Central Limit Theorem
Now the fun part! There is a mathematical proof for the central theorem, but that goes beyond the scope of this blog post. However, I will show how it works empirically by using statistical simulation software. Iāll define population distributions and have the software draw many thousands of random samples from it. The software will calculate the mean of each sample and then graph these sample means on a histogram to display the sampling distribution of the mean.
For the following examples, Iāll vary the sample size to show how that affects the sampling distribution. To produce the sampling distribution, Iāll draw 500,000 random samples because that creates a fairly smooth distribution in the histogram.
Keep this critical difference in mind. While Iāll collect a consistent 500,000 samples per condition, the size of those samples will vary, and that affects the shape of the sampling distribution.
Letās test this theory! To do that, Iāll use
Statistics101
, which is a freeware computer program. This is a great simulation program that Iāve also used to
tackle the Monty Hall Problem
!
Testing the Central Limit Theorem with Three Probability Distributions
Iāll show you how the central limit theorem works with three different distributions: moderately skewed, severely skewed, and a uniform distribution. The first two distributions skew to the right and follow the lognormal distribution. The probability distribution plot below displays the populationās distribution of values. Notice how the red dashed distribution is much more severely skewed. It actually extends quite a way off the graph! Weāll see how this makes a difference in the sampling distributions.
Letās see how the central limit theorem handles these two distributions and the uniform distribution.
Moderately Skewed Distribution and the Central Limit Theorem
The graph below shows the moderately skewed lognormal distribution. This distribution fits the body fat percentage dataset that I use in my post about
identifying the distribution of your data
. These data correspond to the blue line in the probability distribution plot above. I use the simulation software to draw random samples from this population 500,000 times for each sample size (5, 20, 40).
In the graph above, the gray color shows the skewed distribution of the values in the population. The other colors represent the sampling distributions of the means for different sample sizes. The red color shows the distribution of means when your sample size is 5. Blue denotes a sample size of 20. Green is 40. The red curve (n=5) is still skewed a bit, but the blue and green (20 and 40) are not visibly skewed.
As the sample size increases, the sampling distributions more closely approximate the normal distribution and become more tightly clustered around the population meanājust as the central limit theorem states!
Very Skewed Distribution and the Central Limit Theorem
Now, letās try this with the very skewed lognormal distribution. These data follow the red dashed line in the probability distribution plot above. I follow the same process but use larger sample sizes of 40 (grey), 60 (red), and 80 (blue). I do not include the population distribution in this one because it is so skewed that it messes up the X-axis scale!
The population distribution is extremely skewed. Itās probably more skewed than real data tend to be. As you can see, even with the largest sample size (blue, n=80), the sampling distribution of the mean is still skewed right. However, it is less skewed than the sampling distributions for the smaller sample sizes. Also, notice how the peaks of the sampling distribution shift to the right as the sample increases. Eventually, with a large enough sample size, the sampling distributions will become symmetric, and the peak will stop shifting and center on the actual population mean.
If your population distribution is extremely skewed, be aware that you might need a substantial sample size for the central limit theorem to kick in and produce sampling distributions that approximate a normal distribution!
Uniform Distribution and the Central Limit Theorem
Now, letās change gears and look at an entirely different type of distribution. Imagine that we roll a die and take the average value of the rolls. The probabilities for rolling the numbers on a die follow a uniform distribution because all numbers have the same chance of occurring. Can the central limit theorem work with discrete numbers and uniform probabilities? Letās see!
In the graph below, I follow the same procedure as above. In this example, the sample size refers to the number of times we roll the die. The process calculates the mean for each sample.
In the graph above, I use sample sizes of 5, 20, and 40. Weād expect the average to be (1 + 2 + 3 + 4 + 5 + 6 / 6 = 3.5). The sampling distributions of the means center on this value. Just as the central limit theorem predicts, as we increase the sample size, the sampling distributions more closely approximate a normal distribution and have a tighter spread of values.
You could perform a similar experiment using the
binomial distribution
with coin flips and obtain the same types of results when it comes to, say, the probability of getting heads. All thanks to the central limit theorem!
Why is the Central Limit Theorem Important?
The central limit theorem is vital in statistics for two main reasonsāthe normality assumption and the precision of the estimates.
Central limit theorem and the normality assumption
The fact that sampling distributions can approximate a normal distribution has critical implications. In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the
t-test
. Consequently, you might think that these tests are not valid when the data are nonnormally distributed. However, if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributedāas long as your sample size is large enough.
You might have heard that parametric tests of the mean are robust to departures from the normality assumption when your sample size is sufficiently large. Thatās thanks to the central limit theorem!
For more information about this aspect, read my post that compares
parametric and nonparametric tests
.
Precision of estimates
In all of the graphs, notice how the sampling distributions of the mean cluster more tightly around the population mean as the sample sizes increase. This property of the central limit theorem becomes relevant when using a sample to estimate the mean of an entire population. With a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.
Conversely, the sampling distributions of the mean for smaller sample sizes are much broader. For small sample sizes, itās not unusual for sample means to be further away from the actual population mean. You obtain less precise estimates.
In closing, understanding the central limit theorem is crucial when it comes to trusting the validity of your results and assessing the precision of your estimates. Use large sample sizes to satisfy the normality assumption even when your data are nonnormally distributed and to obtain more precise estimates! |
| Markdown | - [Skip to secondary menu](https://statisticsbyjim.com/basics/central-limit-theorem/#genesis-nav-secondary)
- [Skip to main content](https://statisticsbyjim.com/basics/central-limit-theorem/#genesis-content)
- [Skip to primary sidebar](https://statisticsbyjim.com/basics/central-limit-theorem/#genesis-sidebar-primary)
- [My Store](https://statisticsbyjim.com/store/)
- [Glossary](https://statisticsbyjim.com/glossary/)
- [Home](https://statisticsbyjim.com/ "Statistics by Jim")
- [About Me](https://statisticsbyjim.com/jim_frost/)
- [Contact Me](https://statisticsbyjim.com/contact_jim_frost/)
[Statistics By Jim](https://statisticsbyjim.com/)
Making statistics intuitive
- [Graphs](https://statisticsbyjim.com/graphs/)
- [Basics](https://statisticsbyjim.com/basics/)
- [Hypothesis Testing](https://statisticsbyjim.com/hypothesis-testing/)
- [Regression](https://statisticsbyjim.com/regression/)
- [ANOVA](https://statisticsbyjim.com/anova/)
- [Probability](https://statisticsbyjim.com/probability/)
- [Time Series](https://statisticsbyjim.com/time-series/)
- [Fun](https://statisticsbyjim.com/fun/)
- [Calculators](https://statisticsbyjim.com/statistical_calculators/)
# Central Limit Theorem Explained
By [Jim Frost](https://statisticsbyjim.com/author/statis11_wp/) [107 Comments](https://statisticsbyjim.com/basics/central-limit-theorem/#comments)
The central limit theorem in [statistics](https://statisticsbyjim.com/glossary/statistics/) states that, given a sufficiently large [sample size](https://statisticsbyjim.com/glossary/sample-size/), the sampling [distribution](https://statisticsbyjim.com/glossary/distribution/) of the mean for a variable will approximate a normal distribution regardless of that variableās distribution in the [population](https://statisticsbyjim.com/glossary/population/).
Unpacking the meaning from that complex definition can be difficult. Thatās the topic for this post! Iāll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics.
## Distribution of the Variable in the Population
Part of the definition for the central limit theorem states, āregardless of the variableās distribution in the population.ā This part is easy! In a population, the values of a variable can follow different probability distributions. These distributions can range from normal, left-skewed, right-skewed, and uniform among others.
**Normal**

**Right-Skewed**

**Left-Skewed**

**Uniform**

This part of the definition refers to the distribution of the variableās values in the population from which you draw a random sample.
The central limit theorem applies to almost all types of probability distributions, but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has infinite variance.
Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And, the distribution of that variable must remain constant across all measurements.
**Related Post**: [Understanding Probability Distributions](https://statisticsbyjim.com/basics/probability-distributions/) and [Independent and Identically Distributed Variables](https://statisticsbyjim.com/basics/independent-identically-distributed-data/)
## Sampling Distribution of the Mean
The definition for the central limit theorem also refers to āthe sampling distribution of the mean.ā Whatās that?
Typically, you perform a study once, and you might calculate the mean of that one sample. Now, imagine that you repeat the study many times and collect the same sample size for each one. Then, you calculate the mean for each of these samples and graph them on a histogram. The histogram displays the distribution of sample means, which statisticians refer to as the sampling distribution of the mean.
Fortunately, we donāt have to repeat studies many times to estimate the sampling distribution of the mean. Statistical procedures can estimate that from a single random sample.
The shape of the sampling distribution depends on the sample size. If you perform the study using the same procedure and change only the sample size, the shape of the sampling distribution will differ for each sample size. And, that brings us to the next part of the CLT definition\!
## Central Limit Theorem and a Sufficiently Large Sample Size
As the previous section states, the shape of the sampling distribution changes with the sample size. And, the definition of the central limit theorem states that when you have a sufficiently large sample size, the sampling distribution starts to approximate a normal distribution. How large does the sample size have to be for that approximation to occur?
It depends on the shape of the variableās distribution in the underlying population. The more the population distribution differs from being normal, the larger the sample size must be. Typically, statisticians say that a sample size of 30 is sufficient for most distributions. However, strongly skewed distributions can require larger sample sizes. Weāll see the sample size aspect in action during the empirical demonstration below.
## Central Limit Theorem and Approximating the Normal Distribution
To recap, the central limit theorem links the following two distributions:
- The distribution of the variable in the population.
- The sampling distribution of the mean.
Specifically, the CLT states that regardless of the variableās distribution in the population, the sampling distribution of the mean will tend to approximate the normal distribution.
In other words, the population distribution can look like the following:

But, the sampling distribution can appear like below:

Itās not surprising that a normally distributed variable produces a sampling distribution that also follows the normal distribution. But, surprisingly, nonnormal population distributions can also create normal sampling distributions.
**Related Post**: [Normal Distribution in Statistics](https://statisticsbyjim.com/basics/normal-distribution/)
## Properties of the Central Limit Theorem
Letās get more specific about the normality features of the central limit theorem. Normal distributions have two parameters, the mean and standard deviation. What values do these parameters converge on?
As the sample size increases, the sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals Ļ/ān. Where:
- Ļ = the population standard deviation
- n = the sample size
As the sample size (n) increases, the standard deviation of the sampling distribution becomes smaller because the square root of the sample size is in the denominator. In other words, the sampling distribution clusters more tightly around the mean as sample size increases.
Letās put all of this together. As sample size increases, the sampling distribution more closely approximates the normal distribution, and the spread of that distribution tightens. These properties have essential implications in statistics that Iāll discuss later in this post.
**Related Posts**: [Measures of Central Tendency](https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/) and [Measures of Variability](https://statisticsbyjim.com/basics/variability-range-interquartile-variance-standard-deviation/)
## Empirical Demonstration of the Central Limit Theorem
Now the fun part! There is a mathematical proof for the central theorem, but that goes beyond the scope of this blog post. However, I will show how it works empirically by using statistical simulation software. Iāll define population distributions and have the software draw many thousands of random samples from it. The software will calculate the mean of each sample and then graph these sample means on a histogram to display the sampling distribution of the mean.
For the following examples, Iāll vary the sample size to show how that affects the sampling distribution. To produce the sampling distribution, Iāll draw 500,000 random samples because that creates a fairly smooth distribution in the histogram.
Keep this critical difference in mind. While Iāll collect a consistent 500,000 samples per condition, the size of those samples will vary, and that affects the shape of the sampling distribution.
Letās test this theory! To do that, Iāll use [Statistics101](https://sourceforge.net/projects/statistics101/), which is a freeware computer program. This is a great simulation program that Iāve also used to [tackle the Monty Hall Problem](https://statisticsbyjim.com/hypothesis-testing/monty-hall-problem-hypothesis-testing/)\!
## Testing the Central Limit Theorem with Three Probability Distributions
Iāll show you how the central limit theorem works with three different distributions: moderately skewed, severely skewed, and a uniform distribution. The first two distributions skew to the right and follow the lognormal distribution. The probability distribution plot below displays the populationās distribution of values. Notice how the red dashed distribution is much more severely skewed. It actually extends quite a way off the graph! Weāll see how this makes a difference in the sampling distributions.

Letās see how the central limit theorem handles these two distributions and the uniform distribution.
## Moderately Skewed Distribution and the Central Limit Theorem
The graph below shows the moderately skewed lognormal distribution. This distribution fits the body fat percentage dataset that I use in my post about [identifying the distribution of your data](https://statisticsbyjim.com/hypothesis-testing/identify-distribution-data/). These data correspond to the blue line in the probability distribution plot above. I use the simulation software to draw random samples from this population 500,000 times for each sample size (5, 20, 40).

In the graph above, the gray color shows the skewed distribution of the values in the population. The other colors represent the sampling distributions of the means for different sample sizes. The red color shows the distribution of means when your sample size is 5. Blue denotes a sample size of 20. Green is 40. The red curve (n=5) is still skewed a bit, but the blue and green (20 and 40) are not visibly skewed.
As the sample size increases, the sampling distributions more closely approximate the normal distribution and become more tightly clustered around the population meanājust as the central limit theorem states\!
## Very Skewed Distribution and the Central Limit Theorem
Now, letās try this with the very skewed lognormal distribution. These data follow the red dashed line in the probability distribution plot above. I follow the same process but use larger sample sizes of 40 (grey), 60 (red), and 80 (blue). I do not include the population distribution in this one because it is so skewed that it messes up the X-axis scale\!

The population distribution is extremely skewed. Itās probably more skewed than real data tend to be. As you can see, even with the largest sample size (blue, n=80), the sampling distribution of the mean is still skewed right. However, it is less skewed than the sampling distributions for the smaller sample sizes. Also, notice how the peaks of the sampling distribution shift to the right as the sample increases. Eventually, with a large enough sample size, the sampling distributions will become symmetric, and the peak will stop shifting and center on the actual population mean.
If your population distribution is extremely skewed, be aware that you might need a substantial sample size for the central limit theorem to kick in and produce sampling distributions that approximate a normal distribution\!
## Uniform Distribution and the Central Limit Theorem
Now, letās change gears and look at an entirely different type of distribution. Imagine that we roll a die and take the average value of the rolls. The probabilities for rolling the numbers on a die follow a uniform distribution because all numbers have the same chance of occurring. Can the central limit theorem work with discrete numbers and uniform probabilities? Letās see\!
In the graph below, I follow the same procedure as above. In this example, the sample size refers to the number of times we roll the die. The process calculates the mean for each sample.

In the graph above, I use sample sizes of 5, 20, and 40. Weād expect the average to be (1 + 2 + 3 + 4 + 5 + 6 / 6 = 3.5). The sampling distributions of the means center on this value. Just as the central limit theorem predicts, as we increase the sample size, the sampling distributions more closely approximate a normal distribution and have a tighter spread of values.
You could perform a similar experiment using the [binomial distribution](https://statisticsbyjim.com/basics/binary-data-binomial-distribution/) with coin flips and obtain the same types of results when it comes to, say, the probability of getting heads. All thanks to the central limit theorem\!
## Why is the Central Limit Theorem Important?
The central limit theorem is vital in statistics for two main reasonsāthe normality assumption and the precision of the estimates.
### Central limit theorem and the normality assumption
The fact that sampling distributions can approximate a normal distribution has critical implications. In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the [t-test](https://statisticsbyjim.com/hypothesis-testing/t-tests-t-values-t-distributions-probabilities/). Consequently, you might think that these tests are not valid when the data are nonnormally distributed. However, if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributedāas long as your sample size is large enough.
You might have heard that parametric tests of the mean are robust to departures from the normality assumption when your sample size is sufficiently large. Thatās thanks to the central limit theorem\!
For more information about this aspect, read my post that compares [parametric and nonparametric tests](https://statisticsbyjim.com/hypothesis-testing/nonparametric-parametric-tests/).
### Precision of estimates
In all of the graphs, notice how the sampling distributions of the mean cluster more tightly around the population mean as the sample sizes increase. This property of the central limit theorem becomes relevant when using a sample to estimate the mean of an entire population. With a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.
Conversely, the sampling distributions of the mean for smaller sample sizes are much broader. For small sample sizes, itās not unusual for sample means to be further away from the actual population mean. You obtain less precise estimates.
In closing, understanding the central limit theorem is crucial when it comes to trusting the validity of your results and assessing the precision of your estimates. Use large sample sizes to satisfy the normality assumption even when your data are nonnormally distributed and to obtain more precise estimates\!
### Share this:
- [Tweet](https://twitter.com/share)
- [](https://www.pinterest.com/pin/create/button/?url=http%3A%2F%2Fstatisticsbyjim.com%2Fbasics%2Fcentral-limit-theorem%2F&media=https%3A%2F%2Fi0.wp.com%2Fstatisticsbyjim.com%2Fwp-content%2Fuploads%2F2018%2F10%2FlognormalCLT.png%3Ffit%3D573%252C355%26ssl%3D1&description=Central%20Limit%20Theorem%20Explained)
Filed Under: [Basics](https://statisticsbyjim.com/basics/) Tagged With: [assumptions](https://statisticsbyjim.com/tag/assumptions/), [conceptual](https://statisticsbyjim.com/tag/conceptual/), [distributions](https://statisticsbyjim.com/tag/distributions/), [graphs](https://statisticsbyjim.com/tag/graphs/)
## Reader Interactions
### Comments
1. Sharad says
[November 21, 2024 at 3:03 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12896)
Hello,
This is an absolutely brilliant example. Only thing Iād add is that when making the smoothie, you canāt just pick your favorite fruit out of the basket (i.e., need to [sample](https://statisticsbyjim.com/glossary/sample/) randomly from the fruit distribution).
Cheers,
Sharad
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12896)
2. Dr Dilip Raj says
[July 12, 2024 at 8:53 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12638)
very informative sir.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12638)
3. Todd says
[March 31, 2024 at 5:38 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12317)
Hello Jim:
First of all, terrific site so keep up the good work. I will certainly visit frequently given my upcoming coursework.
Iāve been trying to find software or a website that would allow me to play w/ data and see how the histogram changes. What would be your recommendations?
For example, Iām looking at a histogram that is [left skewed](https://statisticsbyjim.com/glossary/skewness/) w/ n of 12K. I want to see how the distribution shape changes if n is significantly less, significantly more, and then around 10K. How would this histogram change in appearance with these varying sample sizes?
Thanks
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12317)
- Jim Frost says
[March 31, 2024 at 6:59 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12321)
Hi Todd,
Iāve used freeware called [Statistics101](http://www.statistics101.net/) for that. In fact, thatās what Iāve used in this post. You can sample data from various distributions and create histograms from them. You can do that for the distribution of individual values as well as for sampling distributions with samples of whichever size you set. You can see both in this post.
Iād suspect that you wonāt notice a significant visual difference between 10k and 12k. Both are very large sample sizes. Starting with small samples (n \< 100), you'll start with blocky distributions that don't represent the shape of the parent distribution well. As you grow larger from there, the the histograms should consistently reflect the shape of the parent distribution but still have some "blockiness." As the sample size increases, the histogram bars become narrower and narrower and start to closely follow the smooth curve of the parent distribution . Eventually, as sample size grows, the distribution of bars almost looks like a continuous distribution.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12321)
- Jakob says
[July 3, 2024 at 7:09 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12611)
Hi Jim,
Sorry for replying to a previous comment; I canāt figure out how to create a new comment.
Is it possible for you to have a sample size that is large enough that the central limit theorem no longer applies? If the population distribution is skewed, but a sample size of sufficient size is not skewed, wouldnāt a sample size that is too large start to look like the skewed population distribution?
Many thanks,
Jakob
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12611)
- Jim Frost says
[July 3, 2024 at 11:30 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12614)
Hi Jakob,
You should just be able to click in the comment box at the bottom to post a new one.
I think I understand what youāre asking. The thing to remember is that you can have large samples sizes, but your plotting the sample means (or other statistic) for those samples. Thatās the distribution in question. And the distribution of sample means will converge on the normal distribution as sample size increases.
So, the parent distribution can be skewed, and your large samples might be similarly skewed, but the distribution of their means wonāt be skewed with a sufficiently large sample size.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-12614)
4. Ary Agrawal says
[December 18, 2023 at 11:47 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11989)
Hey Jim, I enjoy reading your articles. I am working on an assignment that deals with the topic of the central limit theorem, and I was curious to know what simulation software you are using. I wish to calculate the probability of each of the mean value that a sample size of a certain discrete distribution will give me, and even writing a program in Java doesnāt seem to help me for large values where the sample size exceeds 20.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11989)
- Jim Frost says
[December 18, 2023 at 6:03 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11990)
The simulation software I used was Statistics 101. I mention it and have a link to it in the āEmpirical Demonstration . . . ā section in this post. Itās giftware . . . meaning they hope you donate some money to use it.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11990)
5. Harald Foidl says
[December 5, 2023 at 12:50 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11961)
Hi Jim\!
Great post, well explained\!
Although I understand the theorem and also can prove it with simulations I am struggling with an intuitive explanation why the CLT works?
I am aware that the theorem was mathematically proven ā but what should I say to my grandma when she asks āwhy as sample sizes increase, the sampling distribution of the mean approaches a normal distributionā?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11961)
- Jim Frost says
[December 5, 2023 at 3:26 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11962)
Hi Harald,
Thatās a great question! As you might know, I love presenting intuitive explanations whenever possible on my website. Iāll have to admit, I had to think about this one for awhile! Hereās what I came up with. I hope it works. Itās a tough one to explain intuitively\!
Imagine you have a large basket of fruit thatās quite variedāsome apples, a bunch of bananas, a few oranges, etc. This basket represents a population with a non-normal distribution, where each type of fruit represents different values in your population.
Now, letās say you start making smoothies. Each smoothie is like a sample from your population. In each smoothie (sample), you put a random assortment of fruit (data points). If you make a small smoothie (a small sample), the taste of that smoothie might be heavily influenced by the particular fruits you picked. Maybe you got more bananas or mostly apples, so the flavor (sample mean) is skewed towards those fruits.
However, as you start making larger and larger smoothies (increasing your sample size), something interesting happens. The chance of getting a skewed combination of fruits decreases. Youāre more likely to get a more balanced mix of all the fruits in your basket in each smoothie. This is because, with a larger sample, the peculiarities of any single type of fruit ([outliers](https://statisticsbyjim.com/glossary/outliers/) in your population) are averaged out by the presence of all the other types.
As you keep increasing the size of your smoothies, the flavor (sample mean) of each smoothie begins to converge towards a consistent, āaverageā taste, regardless of the original fruit distribution in the basket. This average flavor is akin to the normal distribution in the Central Limit Theorem. It represents the mean of all your smoothies (sample means) becoming more predictable and less variable as your sample size increases.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11962)
- Harald Foidl says
[December 6, 2023 at 8:47 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11969)
Hey Jim! Thank you very much for your time thinking about my question. You really were able to explain it intuitively to me! Great example\!
Best,
Harald
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-11969)
6. Alex F says
[August 3, 2022 at 7:53 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10978)
Hi Jim,
Just discovered your website and love your content.
If we took a sample from a across a whole population of size n, and let X be the random variable for the value of an observation in each sample and sd(X) be the standard deviation of X across the whole population, the standard deviation of the distribution of sample means for such a sample would be exactly zero (rather than sd(X)/sqrt(N) as indicated by the CLT), since each time you repeated the sample across the whole population, the sample mean would be exactly the same each time, since the sample would be exactly the same each time it was carried out (the sample each time would be the entire population).
Does this indicate that the standard deviation of the sample means is only approximately sd(X)/sqrt(N), rather than exactly sd(X)/sqrt(N)?
Many thanks,
Alex
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10978)
- Jim Frost says
[August 4, 2022 at 12:23 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10979)
Hi Alex, Iām so glad my website has been helpful\!
As for your question. Remember that the central limit theorem applies to samples, and youāre talking about the entire population. By definition, thatās not a sample. The CLT is not applicable in that scenario.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10979)
7. David Connell says
[June 21, 2022 at 3:55 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10879)
Hi Jim
I think I am right in summarising that when you use the CLT you get the probability of an event ( for the population) based on the sample. As sample size increases the accuracy of this probability increases. This is independent of the underlying distribution type. My question is whether the probability calculated is an [estimate](https://statisticsbyjim.com/glossary/estimator/) or not. I read that if the underlying distribution is Normal it is not an estimate and I wonder is this also the case if the underlying distribution is not Normal eg Poisson, geometric.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10879)
- Jim Frost says
[June 21, 2022 at 4:09 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10881)
Hi David,
Thatās sort of it. The CLT applies to sampling distributions of the mean. Consequently, it doesnāt apply to individual events but to sample means. You can use sampling distributions of the mean to find the probability of obtaining a particular sample mean with a specified sample size in populations with specified properties.
The blog post explains this in more detail but, in summary, the CLT states that when the distribution of values in a population is nonnormal, the sampling distribution of the mean will approach a normal distribution with specific properties as sample size grows.
So, if you have a nonnormal distribution of values, then, yes, if you use a normal sampling distribution, it is an approximation of the real sampling distribution. As your sample size grows, the difference between the true sampling distribution and your normal approximation becomes smaller. Frequently, statisticians say that a sample size of 30 will often produce a sampling distribution that closely approximates a normal distribution. However, some severely skewed distributions might require much larger sample sizes. I use simulations in this blog post to show the approximation process.
In one sense, if you start with a normal distribution of values in the population, then the normal sampling distribution is correct and *not* an approximation of the real one. However, assuming youāre working with a sample and not the full population, then youāre by definition working with estimates. The sample estimates the distribution of values in the population, which in turn estimates the sampling distribution. But, yes, if the population distribution of values is normal, then the sampling distribution is normal even with small sample sizes. However, keep in mind that small samples produce relatively imprecise estimates of the population distribution, and in turn an imprecise estimate of the sampling distribution. So, a normal distribution of population values doesnāt get you off the hook for needing a good sample size\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10881)
8. Prachi says
[June 18, 2022 at 2:49 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10871)
Hi Jim! I really really love your blogs, they are really helpful and well explained.
It would be even more great if this website had an option for switching to dark [mode](https://statisticsbyjim.com/glossary/mode/). Please do consider it š
Love from India \<3
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10871)
9. Ben says
[May 11, 2022 at 9:38 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10769)
Hi Jim,
That was a really interesting read, very well explained, thank you for that.
I just have one hopefully simple question. When calculating [confidence intervals](https://statisticsbyjim.com/glossary/confidence-interval/) for a large non-normal distribution, can the central limit theorem be applied so that the confidence interval is calculated using the same method as for a normal distribution (with Z values)? Or does the confidence interval need to be calculated using t-values instead or another method maybe?
Sorry if this has already been explained\!
Thanks in advance,
Ben
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10769)
- Jim Frost says
[May 12, 2022 at 7:55 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10776)
Hi Ben,
In the sense that the sampling distribution converges on the normal distribution, yes, you can use z-values to calculate the confidence intervals. However, CIs based on z-scores will always be narrower than using t-values to some degree. T-values account for the changes in precision for different sample sizes, whereas Z-score assume you know the standard deviation of the *population* or have an infinitely large sample\!
The size of that difference between the two methods depends on your DF. With larger sample sizes, the difference becomes trivial. I show this difference between t-value and Z-scores in my post about [Confidence Intervals](https://statisticsbyjim.com/hypothesis-testing/confidence-interval/). Read that for more details\!
Generally, Iād just use t-values because they are designed with the sample size in mind.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10776)
10. Ekaveera Kumar says
[April 24, 2022 at 10:34 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10712)
Thanks Jim. Can i ask any doubts in statistics here?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10712)
- Jim Frost says
[April 24, 2022 at 10:38 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10713)
Hi Ekaveera,
Yes, you can! Please post your questions in the comment section of the *relevant* post. That helps keep the questions and answers organized in the right posts. If you need help finding the relevant post, use the search box near the top of the right margin.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10713)
11. Jeson says
[March 25, 2022 at 2:53 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10587)
Hi Jim
May i have your insight on below question:
as you state in post that āthe sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals Ļ/ānā, why do the SPC use different estimator for population standard deviation, like X bar-s chart, population std is estimated by sum of sample std divided by sample amount?
Appreciated your comments.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10587)
12. Faith says
[March 22, 2022 at 10:46 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10568)
Youāve saved my stats grade! Thank you so much. I have a question.
If the question says to find the probability if the average duration lies within 0.8 of the population mean. Letās assume the average is um 2.5 in this case. How will that be solved. I got a question like this and the word āwithinā is confusing.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10568)
- Jim Frost says
[March 22, 2022 at 10:59 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10569)
Hi Faith,
Yay! Iām so glad my website was helpful\!
Within in this context probability refers to the mean +/- 0.8. However, there should be some units with that 0.8. Is it within +/- 0.8 standard deviations of the mean? Or is it the mean 2.5 +/- 0.8? If the data follow the normal distribution, you will presumably have some way to convert it to a z-score (or it is the z-score) and use that to find the probability. My [z-table article](https://statisticsbyjim.com/hypothesis-testing/z-table/) not only has the z-table but shows you how to use it for various purposes.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10569)
13. gabrielle says
[March 11, 2022 at 7:05 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10537)
Hi Jim ā thank you so much for the post\!
I have been wondering what does the sample size mean in the central limit theorem, as it could mean one of the two things 1) the \# of observations in each sample 2) \# of samples repeatedly draw from population. After several readings & simulation myself, I think I would want to agree it is the 2nd: the \# of samples repeatedly draw from population, although the \# of the observations also plays some roles in determine the sampling distribution of the sample means. Here is what i found & please lmk if otherwise.
The sampling distribution of sample means will approach to normal distribution, regardless of underlying population distribution, if repeatedly draw infinite N times. However, if the \# of observations are large (say, \>30), the sampling distribution will be tighter and more normal, compare to smaller sample, given the same \# of repeatedly draws.
Does it sound right to you?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10537)
- Jim Frost says
[March 11, 2022 at 10:31 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10539)
Hi Gabrielle,
I write about it in this post. Sample size is the number of observations in each sample. If you have a skewed population distribution and you use a small sample size (e.g., n = 5), the sampling distribution will be skewed even if you draw hundreds of thousands of [random samples](https://statisticsbyjim.com/glossary/random-sample/) from the population.
Please read the examples I show in the post more closely to understand this issue. Pay particular attention to the moderately skewed distribution example where I use different samples sizes (n = 5, 20, & 40) but draw each sample size the same number of times (500,000) from the population.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10539)
14. Mihoby says
[March 4, 2022 at 4:19 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10521)
Indeed, thank you\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10521)
15. Mihoby Razafinimanana says
[March 4, 2022 at 11:10 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10517)
Hello,
First, Thank you for your super clear and helpful post, it helped me a lot with my data\!
Second, if I remember well, you were providing a table with a miminum sample size depending on the statistical test we want to apply, and If Iām not mistaken, isnāt present anymore in your post.
Is there a reason for that? Or else do you have a reference to give to get a clear sample size number ?
Thanks a lot\!
MR
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10517)
- Jim Frost says
[March 4, 2022 at 4:06 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10520)
Hi Mihoby,
I know the table youāre referring to. I actually include it in a different post about [parametric vs. nonparametric tests](https://statisticsbyjim.com/hypothesis-testing/nonparametric-parametric-tests/). Click the link to see it\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10520)
16. Kewal says
[March 3, 2022 at 12:45 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10507)
Hi Jim, amazing post. Iām new to stats, accept my apologies in advance if this is a stupid question.
Letās say i have data of 500000 bank withdrawals, i take a sample of 100 withdrawals, find its mean and repeat this process 100000 times. The distribution of this mean would be normal and the mean of this distribution would be the mean of my 500000 withdrawals right?
Secondly, assume on one fine day a customer withdraws \$100, can I use the CLT and the normal distribution i plotted above to find the probability of a \$100 withdrawal? As in how likely is it that a customer would withdraw \$100 from bank?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10507)
- Jim Frost says
[March 3, 2022 at 1:38 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10508)
Hi Kewal,
Thanks for the kind words! And your questions definitely are not stupid\!
Youāre correct about the things in your paragraph about repeating a sample size of 100 for 100,000 times. That sampling distribution should follow a normal distribution and center on the mean of the 500,000 withdrawals. The only caveat is if the full population of 500,000 withdrawals is *extremely* skewed, then a sample size of 100 might not be large enough. But usually a sample size of 100 would be sufficient.
Unfortunately, your next paragraph isnāt quite right. Sampling distributions apply to sample means, not individual observations. So, you wouldnāt use the sampling distribution to assess an individual withdrawal of \$100 and the CLT only applies to sampling distributions. Instead, youād take your sample and use it to estimate the probability distribution that the *population* follows. After determining that, you can use that probability distribution function to calculate the probability associated with that withdrawal. The important difference is that youāre using the probability distribution for the population rather than the sampling distribution because youāre looking at an individual value.
The following two posts provide details about the above:
[Identifying the distribution of your data](https://statisticsbyjim.com/hypothesis-testing/identify-distribution-data/)
[Understanding probability distributions](https://statisticsbyjim.com/basics/probability-distributions/)
I hope that helps\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10508)
17. Niloufar says
[February 1, 2022 at 11:06 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10368)
Thank you for the answer.
I didnāt use ANOVA because variance of different groups are not same and also, the data donāt have normal distribution. So, I was thinking z-test for a large data set is appropriate because I can apply CLT.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10368)
- Jim Frost says
[February 1, 2022 at 5:44 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10369)
Hi Niloufar,
The CLT also applies to ANOVA. So, thatās not a problem. And, read my post about post hoc tests and youāll see why running a series of Z-tests is not a good idea\!
Unequal variance can be a problem. In that case, you should use [Welchās ANOVA](https://statisticsbyjim.com/anova/welchs-anova-compared-to-classic-one-way-anova/) with a post hoc test. It can handle the unequal variances. Read the link for more details.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10369)
18. Niloufar says
[January 31, 2022 at 4:58 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10364)
Hi Jim. Thank you so much for such a great explanation for CLT. Iām not a statistician so my question may be ridiculous.
I have different experiment and I want to apply statistics to compare these experiments with a base experiment. For each experiment I have almost 300 data points, which are detected via sensors. With respect to the conditions, I want to use two sample z-test to compare each experiment with the base one, statistically using python. So, my question is that how does python calculate z and report [p-value](https://statisticsbyjim.com/glossary/p-value/)? Does it select a proper sample from my data and then calculate the average of that sample and then report p-value? OR, I have to select a sample from my data and then give the sample to python to calculate p-value?
Regards,
Nilou
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10364)
- Jim Frost says
[January 31, 2022 at 5:10 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10365)
Hi Niloufar,
Iām not an expert with Python, so I canāt answer that part. However, from a statistical standpoint, if you have three or more experiments (including the base experiment), then youāll need to perform one-way ANOVA and follow that up with a post hoc test. There are post hoc tests specifically designed for comparing each experiment to a baseline. Read my article about [using post hoc tests with ANOVA](https://statisticsbyjim.com/anova/post-hoc-tests-anova/) and youāll see why you should NOT perform a series of z-tests to do this but instead use ANOVA with a post hoc test.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10365)
19. [Chris Gibbons](https://search.crossref.org/citation?format=apa&doi=10.21203/rs.3.rs-1021633/v1) says
[December 22, 2021 at 6:29 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10223)
Hi jim, thank you for your kindness in helping a global community of learners.
In hypothesis testing when exploring differences, we compare the means of the control and exp conditions and we look at the size of the difference in the means and how likely the size of that difference could have occurred by chance.
But from a purely statistical point of view, the calculus just looks at how likely the observed mean could have occurred by chance ie where it would sit in a distribution of multiple means, so why have a control condition?
I understand the obvious reason from a research standpoint, but my query is from the stats standpoint?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10223)
- Jim Frost says
[December 22, 2021 at 11:40 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10225)
Hi Chris,
Technically, the hypothesis tells you the likelihood of obtaining your result, or more extreme, under the assumption that the null hypothesis is correct. Itās not quite accurate to say, āoccurring by chance.ā Occurring by chance but when you assume the null it true. So, a conditional probability. It actually makes an important difference in how you interpret the p-value. Itās not just a minor semantics difference. For more information, read my post about [interpreting p-values](https://statisticsbyjim.com/hypothesis-testing/interpreting-p-values/).
Now, on to your question! Why do you have a control group or condition? Without a control group, you wouldnāt have any basis of comparison for the outcome in the [treatment group](https://statisticsbyjim.com/glossary/experimental-group/). Youād know what the outcome was for that group, but is it better than if you didnāt administer the treatment? Thereās no way to know. For more information, read my post about [control groups](https://statisticsbyjim.com/basics/control-group/), where I discuss this issue specifically. You say you know this from a research standpoint, but Iād say itās the same from a statistics standpoint. Unless Iām misunderstanding what youāre asking.
However, comparing groups isnāt required for hypothesis testing. You can perform a 1-sample test and estimate its CI if you donāt want to make a comparison. So, itās not required. You can do that if you just want to use a sample to estimate the population value of, say, the mean. Alternatively, you can supply a reference value in a 1-sample test to determine whether the sample value is significantly different from that value. Youād need to choose a reference value that is meaningful to your study area.
Iām not sure if Iām answering your question. Let me know if Iām not\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10225)
20. De says
[December 2, 2021 at 8:12 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10144)
Hi Jim,
How do you know if a given distribution is normally distributed? How do you āproofā that, and therefore justify the fact that you are e.g., using a parametric test on a small sample size.
Regards,
De
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10144)
- Jim Frost says
[December 2, 2021 at 9:19 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10145)
Hi De,
That can be a tricky situation! When you have a small sample size, a [normality test](https://statisticsbyjim.com/glossary/normality-test/) might produce a high p-value (indicating a normal distribution) because of the small sample size rather than because the data are consistent with a normal distribution. With a small sample size, the test has low power for detecting deviations from the normal distribution.
Typically, youād use subject-area knowledge and previous research to establish that your measurements follow a normal distribution. If you canāt do that, then you really need to consider a nonparametric test\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10145)
21. [accioyugen](http://accioyugen.wordpress.com/) says
[November 10, 2021 at 8:08 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10055)
I learned a lot. Thanks
Can I know what app/site you used in making the graph in āModerately Skewed Distribution and the Central Limit Theoremā section?
Thank you so much
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10055)
22. Gihan Ragab says
[November 4, 2021 at 2:34 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10042)
Hi Jim,
Thanks for the illustration.
What is a z statistic and what qualifies a statistic to be z statistic based on the central limit theorem and the basic properties of normal distributions?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10042)
- Jim Frost says
[November 4, 2021 at 2:37 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10043)
Hi Gihan,
Please read my post about [z-scores](https://statisticsbyjim.com/basics/z-score/). And you can also read about the [normal distribution](https://statisticsbyjim.com/basics/normal-distribution/). They should answer some of your questions.
Iām not exactly sure what youāre asking. Hopefully, those article help. And then if you have more specific questions, you can post them in the appropriate place.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10043)
23. Marq Piper says
[November 2, 2021 at 6:43 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10036)
Hi Jim,
Thank you for the excellent post. I have a question regarding the iid assumption for the CLT to hold.
I want to do a t-test on two means from a distribution that is not normally distributed. According to the CLT I can do this given a large sample (which I have). However, it may be that the underlying data is not independently distributed. Is it sufficient that the sampling distribution of means is iid, or does the actual underlying data have to be iid ?
Thank you very much for your help.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10036)
- Jim Frost says
[November 2, 2021 at 3:34 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10041)
Hi Marq,
[Independent and identically distribution (IID)](https://statisticsbyjim.com/basics/independent-identically-distributed-data/) combines two broad characteristics of a sample. Click the link to learn more. Iād need to know more about how you suspect your data violates IID? There is no independent distribution assumption, so Iām not sure what you mean exactly. IID is when the observations are independent events and that they all follow an identical distribution. Are you data not independent events, they donāt all follow the same distribution, or both?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-10041)
24. ps says
[August 11, 2021 at 4:55 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9645)
HI Jim ā¦.can you please tell whether this theorem holds for only mean or sampling distribution of variances also
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9645)
- Jim Frost says
[August 11, 2021 at 5:54 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9655)
Hi, the central limit theorem applies only to the sampling distribution of the meansānot variances.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9655)
25. Mohd says
[July 27, 2021 at 12:58 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9591)
What can you do with the outputs of normalized data?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9591)
- Jim Frost says
[July 29, 2021 at 11:29 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9599)
Hi Mohd,
Thatās a fairly broad question. Iāll start by referring you to [my post about the normal distribution](https://statisticsbyjim.com/basics/normal-distribution/). For your question, focus on the portion about [standardization](https://statisticsbyjim.com/glossary/standardization/) and z-scores. If you have a more specific question about this topic, please post it in the comments section of that post. Thanks.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-9599)
26. Anderson Andrade says
[April 14, 2021 at 5:44 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8936)
Hi Jim,
if the population distribution is normal, then the distribution of samples means will always be normal, regardless of the size of the samples, right? Even is the āsample sizeā is 1, the sampling distribution will be normal, as long as I take several replicates (in this case, we are reproducing the original distribution, right?).
Well, If I know that a given variable is normally distributed in a population (e.g., IQ) can I always use parametric tests to analyse samples, regardless of the sample size and distribution of the sample?
best,
Anderson
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8936)
- Jim Frost says
[April 15, 2021 at 4:42 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8946)
Hi Anderson,
Yes, thatās correct. As long as you are taking representative samples from the population, the sampling distribution should be normally distributed regardless of size. The trick is to really know that the population is normally distributed\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8946)
27. Mohit Kumar says
[April 7, 2021 at 5:47 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8863)
Hi Jim, thanks for the excellent article. I think when estimating a mean, doing multiple iterations (let us say 20) of a fixed sample size (let us say 50) is equivalent of doing one iteration of 1000 samples. Is that correct? Does this mean we never need to do multiple iterations?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8863)
- Jim Frost says
[April 8, 2021 at 4:08 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8876)
Hi Mohit,
In the context of the central limit theorem, itās different. If we draw 50 random samples of size 20 from the same population, they will have a range of characteristics. You can plot the distribution of those characteristics (i.e., mean and standard deviations) on histograms. One the whole, youād expect the overall mean of the samples to equal the population mean. The standard deviation of the distributions of the means (known as the standard error of the means) would be a particular value. Now, if you have one large random sample of 1000 from the same population, youād expect it to fall near the population mean. However, because the sample size is so much larger, youād expect the precision of the estimate to be much higher.
In reality, a study would never do multiple iterations. Well, almost never. Statistical process control does use repeated sampling. But generally no. Just collect a large sample size! The reason I show multiple iterations here is so you can see how the central limit theorem works. It works out mathematically. But it also works out when you use simulations to draw random samples from the same population. But, in most real studies, just collect one good sized sample! From the one sample, your statistical software will estimate the sampling distribution using the appropriate formulae. You want those calculations to be based on the largest single sample you can reasonably collect\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8876)
28. Amanda Muller says
[March 31, 2021 at 8:11 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8816)
Can you explain how the central limit theorem relates to sample numbers for [categorical variables](https://statisticsbyjim.com/glossary/categorical-variables/) please?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8816)
29. Evangelist Ndifreke Akpan says
[March 31, 2021 at 5:37 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8814)
Please for the right top above can you give me references, I need them. Thanks
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8814)
- Jim Frost says
[March 31, 2021 at 7:55 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8815)
Hi, Iām not sure what youāre referring to when you say āthe right top above.ā Please be more specific so I can help you. Thanks\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8815)
30. Praise Ololade Farayola says
[January 30, 2021 at 3:35 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8322)
Wow. Nice article. please how does the equation change when the sample size (n) is not the same for all sample?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8322)
31. Thant Ku Tay Aung says
[December 24, 2020 at 3:00 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8125)
Hello , Jim
I donāt get that ..ā Fortunately, we donāt have to repeat studies many times to estimate the sampling distribution of the mean. Statistical procedures can estimate that from a single random sample.ā
how can we estimate the sampling distribution of the mean if we have only a single random sample ?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8125)
- Jim Frost says
[December 27, 2020 at 2:37 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8146)
Hi Thant,
Thatās the magic of statistics and it took statisticians to develop the distributions that estimate sampling distributions! These distributions form the basis of many hypothesis tests, such as [t-tests](https://statisticsbyjim.com/hypothesis-testing/t-tests-t-values-t-distributions-probabilities/), [F-tests](https://statisticsbyjim.com/anova/f-tests-anova/), and [chi-square tests](https://statisticsbyjim.com/hypothesis-testing/chi-squared-independence/). The tests are named after the distributions that estimate sampling distributions. The [bootstrapping method](https://statisticsbyjim.com/hypothesis-testing/bootstrapping/) uses an entirely different technique to estimate sampling distributions. For more information about how it works, click the links.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-8146)
32. Mark says
[October 11, 2020 at 9:20 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-7549)
Thank you. Very helpful blog.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-7549)
33. Junaid Ali says
[August 22, 2020 at 4:03 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-7236)
What is the relationship of CLT with inferential statistics?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-7236)
34. Steve Montgomery says
[July 16, 2020 at 3:42 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6979)
I am running a Monte Carlo analysis on a project schedule and I am interested in both realistic duration and cost. If I understand your blog correctly, as the number of Monte Carlo iterations increases the more the Central Limit Theorem (CLT) causes the random value outputs to form a narrow normal distribution. So the random numbers cluster in the middle and the range of min/max is narrower than reality. If this is the case, would it be best to reduce the number of iterations in the Monte Carlo analysis?
āSteve
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6979)
- Jim Frost says
[July 16, 2020 at 5:52 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6980)
Hi Steve,
There are two things to distinguish between.
The first is the number of iterations. Increase the iterations does two main things. It creates smoother curves and more consistent results if you repeat the analysis. Increasing the iterations does not narrow the distribution.
The other aspect is the sample size for each iteration. And, itās the sample size that reduces the spread of the distribution. Larger samples produce tighter sampling distributions of the mean and those sampling distributions will more closely approximate the normal distribution. The number of samples youād use depends on your scenario. In this post, I show different samples size to show how that affects the spread of sample means. It illustrates why large samples produce more precise estimates along with more closely approximating the normal distribution.
I hope that helps\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6980)
35. ayushi says
[June 9, 2020 at 6:38 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6616)
Hello,
I have used z value for descriptive methods and graphical methods for establishing normality till now, There were slight discrepancies but normality was established by referring to research papers to help cover the discrepancy.
I have read that it is helpful to use a combo of plots and tests for large samples and thats why I decided to use [Shapiro-Wilk test](https://statisticsbyjim.com/glossary/shapiro-wilk-normality-test/). If normality is established in the graphical and descriptive methods, do I still need to use the normality tests?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6616)
- Jim Frost says
[June 10, 2020 at 12:14 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6630)
Hi Ayushi,
You should be using Normality Tests and normal probability plots to assess normalityāand the distribution of [continuous data](https://statisticsbyjim.com/glossary/continuous-variables/) in general. I have to posts that will help you. [Identifying the distribution of your data](https://statisticsbyjim.com/hypothesis-testing/identify-distribution-data/) and [using normal probability plots to assess distributions](https://statisticsbyjim.com/basics/assessing-normality-histograms-probability-plots/). Those posts should answer your questions.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6630)
36. Ayushi says
[June 3, 2020 at 8:07 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6541)
Hello
My question : I am working on my dissertation and establishing normality for a sample of 200 respondents. there was a slight deviation in normality which was worked upon in descriptive statistics. But when i caluclated the Shapiro-wilk test, both my variables ended up being less than the standard p-value. Can I skip using Shapiro and solely rely on graphs and descriptive stats for establishing normality?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6541)
- Jim Frost says
[June 3, 2020 at 7:33 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6546)
Hi Ayushi,
Which descriptive statistics are you using to assess normality? The top two things you should be using to assess normality are normality tests and normal probability plots. I cover both of those in my post about [identifying the distribution of your data](https://statisticsbyjim.com/hypothesis-testing/identify-distribution-data/). That doesnāt focus entirely on normality tests. It includes them but also tests for other distributions.
Also read my post about [using normal probability plots](https://statisticsbyjim.com/basics/assessing-normality-histograms-probability-plots/) for assessing normality. There are cases where youād actually use the graph results over the hypothesis test results.
Finally, with 200 observations, you might not need to worry so much about whether your data follow the normal distribution. I write about that in my post about [parametric vs. nonparametric tests](https://statisticsbyjim.com/hypothesis-testing/nonparametric-parametric-tests/).
I hope that all helps\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6546)
37. Anita says
[April 26, 2020 at 12:28 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6138)
Hi Jim,
I have some very basic stats questions.
I am having trouble determining what question \#5 below is asking me for. Would possible answers be ānormal, poisson, uniformā or is the question more asking me to come up with the population mean and std dev from the sample and use CLT?
5\. You conduct an experiment where you collect the data in the table below. What population distribution does this sample come from? How confident are you of your conclusion?
14.79911 28.23858 22.7928 24.51667 22.50702
35.50089 21.75057 28.0851ā¦..23.47746ā¦..17.71201
etc. there are 100 data points (20 columns and 5 rows)
Here is the output when all the data is put in one column:
Column1
Mean 25.1957477
Standard Error 0.486145044
Median 25.213175
Mode \#N/A
Standard Deviation 4.86145044
[Sample Variance](https://statisticsbyjim.com/glossary/sample-variance/) 23.63370038
Kurtosis -0.332161421
Skewness -0.02021877
Range 23.44958
Minimum 13.42521
Maximum 36.87479
Sum 2519.57477
Count 100
[Confidence Level](https://statisticsbyjim.com/glossary/confidence-level/)(95.0%) 0.964617237
Also, I need a push in the right direction with question \#1 below. Would this be a z-test?
1\. You receive a number of complaints from your employees that this yearās promotions were not assigned fairly (in that some VPs favored different hair colors), so you decide to determine if the distribution of promotions differed by region. You conduct a hypothesis test to this [effect](https://statisticsbyjim.com/glossary/effect/). (There are 5 different regions).
What is the critical value of your test statistic if you are willing to reject your null hypothesis at the ļ” = .05 level of significance? (Ensure you identify what type of statistic it is.)
Your articles are so helpful, they are clear and concise and much appreciated\!
Thanks so much for your time\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6138)
38. Leyla Depret-Bixio says
[April 15, 2020 at 3:37 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6014)
Dear Jim,
Thank you very much for this useful post.
Sometimes as a statisticians we face situations where we have to explain the CLT to a non statistical audience in a easy and understanding way!ā¦your explanation will really helps
Thanks
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6014)
- Jim Frost says
[April 16, 2020 at 11:16 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6035)
Hi Leyla, youāre very welcome! Iām glad it was helpful! š
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-6035)
39. Ben says
[March 28, 2020 at 11:43 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5730)
If a sample size is large enough to rely on CLT and assume normality for the purpose of conducting a t-test, is it necessary to conduct normality tests (such as the Shapiro Wilk test) to establish that the population is normally distributed or is this essentially irrelevant to this hypothesis test?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5730)
- Jim Frost says
[March 29, 2020 at 12:13 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5732)
Hi Ben,
If you have a large enough sample size, you donāt need to check for normality. If you do check and the data are nonnormal, you donāt have to worry because the sampling distribution for the means will still be normally distributed, which is what counts.
However, with smaller sample sizes, satisfying the normality assumption is important.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5732)
40. Luana says
[March 11, 2020 at 7:18 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5570)
Hello
Does the CLT also work with other than the mean, e.g. with the median? And is there no distribution that is not affected by the ctl?
sincerely
Luana
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5570)
- Jim Frost says
[March 12, 2020 at 3:17 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5574)
Hi Luana,
As I point out in this post, the CLT applies to most but not all distributions. First, the distribution must be for independent, identically distributed variables. And, the population must have a finite variance.
The CLT can work for the median with certain distributions. But it must satisfy more conditions than for the mean. These are two complex to discuss here. So, the short answer is to not expect that the CLT applies to the sampling distribution of the medians without checking the properties of that distribution specifically.
I hope this helps\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5574)
41. Rizza says
[February 24, 2020 at 6:55 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5494)
Can you give me an example
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5494)
- Jim Frost says
[February 25, 2020 at 12:34 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5499)
Youāll need to be more specific. I provide multiple examples throughout this post.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5499)
42. SHRINIVAS SHIVAJIRAO JADHAV says
[December 19, 2019 at 4:07 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5149)
Hiā¦ā¦ I have a 30pcs dimensional data which is non-normal, How to apply CLT to this data to calculate subgroup sample size \<30 and the resulting output will be normal & remains within specification limits?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-5149)
43. Jerome says
[October 25, 2019 at 7:05 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4892)
Greetings\!
I truly enjoyed reading your post! but why didnāt you plot your sampling distributions on a normal probability plot? it would help visualizing how much they approximate a strict normal distribution: not all bell-shaped distributions are indeed normal (e.g. the t with low d.f., the [Cauchy](https://statisticsbyjim.com/glossary/cauchy-distribution/)).
I believe that your example with the dice toss is not a good example of a uniform distribution, because the middle value sums can be obtained from a greater number of dice combinations than the extreme value sums.
Besides, if your point is to show that the sampling distribution of a statistic is continuously distributed and approximates (I would emphasize this word!) a normal distribution even if the observed data does not come from a continuous distribution, why donāt you sample from a discrete distribution? I am especially curious to see the results from sampling a negative binomial or a multinomial, and from a mixture distribution such as a zero-inflated Poisson: the central limit theorem does not work with the zero component of the latter distribution.
Respectfully yours,
Jerome
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4892)
- Jim Frost says
[October 25, 2019 at 11:21 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4895)
Hi Jerome,
Thatās a good idea to plot the sample means on a normal probability plot to show how closely the follow the normal distribution. Unfortunately, itās a software issue on my end. The software I used for the [random sampling](https://statisticsbyjim.com/glossary/random-sampling/) canāt create normal probability plots. I might be able to export those sample means but Iām calculating hundreds of thousands of sample means, which my statistical software might balk at importing. I might give it some more thought. Perhaps just using fewer sample means. I use such a large number of means because it produces the nice smooth curves\!
The uniform distribution is defined as distributions where all possible values have an equal probability of occurring. As such, the outcomes of rolling a die do follow the uniform distribution. In my example, I roll only one die, not a pair of dice. So, weāre using the values of 1 to 6 with a probability of 1/6 for each possible outcome. Consequently, they follow the normal distribution.
As for discrete outcomes, I do show the die example, which is a discrete distribution. Also, if you want to see an example that uses the binary distribution, read my post about [Revisiting the Monty Hall Problem](https://statisticsbyjim.com/hypothesis-testing/monty-hall-problem-hypothesis-testing/). That post is not about the central limit theorem but I do show the sampling distributions for two different binary distributions (i.e., different probabilities of events occurring) for sample sizes ranging from 10 to 400. And, youāll see that as the sample size increase, it more closely approximates the normal distribution.
I think the additional distributions you mention are interesting possibilities. The central limit theorem does not apply to the Cauchy distribution because that distribution does not have a finite varianceāwhich is one of the assumptions of the CLT. I could include only so many distributions in a blog post before it become too long. But, the software is free and easy for anyone to use.
Thanks for writing\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4895)
44. M.Khidhir says
[September 18, 2019 at 1:30 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4743)
awesome content! Just to clarify, if I have a right-skewed histogram using a large sample data. Will it meet the CLT requirements?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4743)
45. Olga says
[September 12, 2019 at 12:06 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4730)
Hi Jim,
How do I determine a sufficient sample size?
Thanks,
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4730)
- Jim Frost says
[September 14, 2019 at 6:20 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4733)
Hi Olga,
Iām assuming you mean how large of a sample size where you can be sure that you can use a test for a nonnormal distribution? For most cases, sample sizes of 20-30 will be sufficient. If you have groups in your data for say ANOVA and [2-sample t-test](https://statisticsbyjim.com/glossary/two-sample-t-test/), the sample size per group depends on the number of groups. I have a table that summarizes this property in my post about [parametric vs. nonparametric analyses](https://statisticsbyjim.com/hypothesis-testing/nonparametric-parametric-tests/). These sample sizes were determined by a thorough simulation study.
I hope this helps\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4733)
46. F.Dali says
[August 9, 2019 at 8:22 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4592)
Hi Jim,
Is CLT applicable for large sample size derived from a [non-probability sampling](https://statisticsbyjim.com/glossary/non-probability-sampling/) method (i.e convenience sampling)? I noticed that CLT carries a statement saying that the draw of the samples for mean must be random.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4592)
- Jim Frost says
[August 9, 2019 at 10:20 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4593)
Hi, yes, it must be a random sample. Convenience sampling wonāt necessarily produce a sampling distribution that approximates the normal distribution.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4593)
47. Aditya Kumar says
[August 7, 2019 at 2:07 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4584)
Every time you say āthe sampling distributions more closely approximate a normal distribution as sample size increasesā, it means to say sampling distribution of means approximate to normal distribution, is it so?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4584)
- Jim Frost says
[August 7, 2019 at 8:13 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4585)
Yes, it includes that. But, itās not restricted to the sampling distribution of the means. Even the sampling distribution for a binomial distribution will approximate the normal distribution with a large enough sample size.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4585)
48. Omar Albatayneh says
[March 19, 2019 at 10:32 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4025)
Hello, Jim,
Do you believe the sampling distributions for the slope [coefficients](https://statisticsbyjim.com/glossary/regression-coefficient/) are at least approximately Normal (necessary for [validity](https://statisticsbyjim.com/glossary/validity/) of, for example, the p-values)? Explain.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4025)
- Jim Frost says
[March 21, 2019 at 9:41 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4037)
Hi Omar,
When the [residuals](https://statisticsbyjim.com/glossary/residuals/) are normally distributed, you expect the sampling distributions for the slope coefficients to be normally distributed.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-4037)
49. EDE WILLIAMS says
[March 13, 2019 at 3:49 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3977)
Hi Jim
Really appreciate your efforts, making CLT simple for me to understand that I donāt need anybody to explain any further.
Williams from Federal Polytechnic nekede, studying STATISTICS
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3977)
50. Patrick says
[February 16, 2019 at 8:33 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3822)
Hi Jim,
thank you, I greatly appreciate your detailed answer to my question\!
Best regards,
Patrick
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3822)
51. Patrick says
[February 15, 2019 at 10:50 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3816)
Hi Jim,
thanks for this excellent post\!
I was just wondering about the following: you are saying āIn statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the t-test. (ā¦) if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributedāas long as your sample size is large enough.ā
Does that basically apply to all paramteric hypothesis tests, including linear [regression analysis](https://statisticsbyjim.com/glossary/regression-analysis/)? I once discussed this with a statistician, who objected that the normality assumption does not apply to the distribution of the dependent variable (and this is also true for the t-test which is just a special case of linear regression) but rather to the distribution of the residuals. He then argued that if I have a poor set of [predictors](https://statisticsbyjim.com/glossary/predictor-variable/), the model will most likely not achieve normality of the residuals, regardless of sample size.
This left me wondering whether or not I can use linear regression with large sample sizes without having to worry about distributional assumptions. Do I need normally distributed residuals at all with a large sample size ā or does the CLT also apply to the residuals?
Iād greatly appreciate your view on this aspect.
Thank you and best regards,
Patrick
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3816)
- Jim Frost says
[February 15, 2019 at 4:50 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3819)
Hi Patrick,
Off hand, Iād say that it applies to most types of parametric hypothesis tests. Even populations that follow the binomial and Poisson distributions will have sampling distributions that follow the normal distribution with a large enough sample size. Consequently, it applies to proportions tests and Poisson rates of occurrence tests. However, I havenāt thought it through enough whether it applies to all parametric tests.
As for regression analysis, that gets a bit complicated! First, yes, the normality assumption applies to the distribution of the residuals rather than the dependent variable. However, that assumption is an optional one that applies only if you want to use hypothesis testing and confidence intervals, as you can read about in my post about [OLS Assumptions](https://statisticsbyjim.com/regression/ols-linear-regression-assumptions/).
As for whether the hypothesis test results are valid in regression analysis when residuals are nonnormally distributed with a sufficiently large sample size, Iād say the answer is both yes and no! Howās that for covering my bases?\!
Hereās the rationale for both answers.
Yes, I do believe the central limit theorem kicks in with the sampling distributions for the [coefficient](https://statisticsbyjim.com/glossary/coefficient/) estimates. With a large enough sample size, these sampling distributions should follow a normal distribution even when the residuals are nonnormal. Itās for those sampling distributions of the coefficients estimates where the CLT would come into play. The p-values for the coefficients are, of course, based on those coefficient sampling distributions.
However, the answer can also be no! Often times the residuals wonāt follow a normal distribution because youāre specifying an incorrect model. You might not be including all the relevant variables, not modeling the curvature correctly, not including interaction terms, etc. Model specification errors can produce nonnormal residuals. In that case, I donāt think having a sufficiently large sample size fixes the problem. Chances are your coefficients are biased and not meaningful because the model is just wrong.
Consequently, the answer depends on what is causing the nonnormal residuals. Also, I donāt have time to thoroughly research this issue, but if youāre doing this for a paper or report, Iād find some article to support this just to be sure. Iād also imagine (again, check) that itās really the sample size per number of model terms that is important. Youād need many observations per model term. And, of course, youād have to be certain that youāre specifying the correct model\!
I hope this helps\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3819)
52. Nicole Paschal says
[November 23, 2018 at 10:16 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3491)
For the histogram with the line over it, are you saying that the line is the actual or the estimated data that the collected histogram data fits into? Thank you.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3491)
- Jim Frost says
[November 23, 2018 at 11:57 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3493)
Hi, which section is this graph in? Iām not exactly sure which one youāre referring too.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3493)
53. JOHN HAROLD says
[November 13, 2018 at 12:26 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3448)
Iāll look out for your maiden book on Regression Analysis next year.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3448)
54. JOHN HAROLD says
[November 12, 2018 at 8:59 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3444)
āMaking complex concepts simplerā. That is your trademark.
I often refer my students to read some of your posts especially after I have introduced them to the topic. They thank āmeā for showing them another perspective. But the credit is rightfully yours. Thanks.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3444)
- Jim Frost says
[November 13, 2018 at 9:33 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3446)
Hi John, thanks so much for your kind words. They mean a lot to me because that quote is what I strive for with this blog. Thanks for sharing with your students too\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3446)
55. Linda says
[November 12, 2018 at 7:49 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3443)
Hi Jim,
Thanks for a great post. I just have one quick question about the application of the CLT.
When we use the CLT we can find the probability of a certain event but I am wondering how that probability works with a skewed population.
If a population has for example a large right skew with the most common values around 0 minutes but we know that in the normally distributed sample distribution, the most common values center around the mean (which is the same mean as in the population). I.e. having 68% of the values around the mean in the sample vs most of the values around 0 in the population makes me feel as if the probabilities from the sample do not apply to the population. Would you be willing to explain how this works?
Thanks,
Linda
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3443)
- Jim Frost says
[November 13, 2018 at 10:07 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3447)
Hi Linda,
I think I see where there might be a slight misunderstanding if Iām reading your comment correctly. In this post, I think the example graph I show in the section āModerately Skewed Distribution and the Central Limit Theoremā roughly matches the scenario in your comment other than the fact that the most common values are not around zero. So, Iāll use the graph in that section to answer your question.
There are two different types of distribution in play here. There is the distribution of the data values in the population, which is the grey distribution in the graph. Thatās the distribution of the actual data. That distribution estimates the probability of *individual values* occurring in the population.
Then, there is the sampling distribution of the mean, which I show using different colors to represent different sample sizes. This is a distribution of *sample means* rather than individual values. Youāre correct that the probabilities are different. You can calculate the probabilities of individual values (or ranges technically for continuous data) from the data distribution. And, you can calculate probabilities associated with sample means using the sampling distribution of the means.
For example, if your sample size is large enough so that the sampling distribution approximates a normal distribution, then roughly 68% of *sample means* from that population will fall within +/- 1 standard deviation of the sampling distribution. The standard deviation for the sampling distribution of the means is called the standard error of the mean and it equals the population standard deviation divided by the square root of the sample size.
Point being that the sampling distribution has different properties (particularly the standard deviation) than the data distribution. Hence, youāll obtain different probabilities for a particular individual value versus a sample mean of the same valueāwhich makes sense when you think about it. An individual value is very different from a sample mean even when they have the same numeric value. The graph in the section that I reference shows this visually.
I hope this answers your question. Please let me know if thereās anything else I can clarify\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3447)
56. Sandeep Ray Chaudhuri says
[November 7, 2018 at 11:49 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3407)
Hi Jim,
Your Blog is really helpful in brushing up/learning afresh key concepts in Statistics. I have a suggestion that if the key topics are structured then it will help people who are new to learn in a structured manner.
Keep up the awesome work\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3407)
- Jim Frost says
[November 8, 2018 at 9:47 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3412)
Hi Sandeep,
Thanks for the kind words! Iāll be writing various books that present these topics in an organized manner. The first one on regression analysis will be available in early 2019, and there will be more to follow\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3412)
57. Surya says
[October 29, 2018 at 9:55 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3335)
Jimā¦you are a Gem of a person :-)ā¦I have suggested my Friend as well to subscribe to your posts
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3335)
- Jim Frost says
[October 30, 2018 at 2:26 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3339)
Thank you so much, Surya! And, thanks for sharing! š
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3339)
58. Janna Beckerman says
[October 29, 2018 at 2:28 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3334)
It totally helps. Thank you\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3334)
59. Janna Beckerman says
[October 29, 2018 at 1:05 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3330)
Thank you so much for your blog. My question: You said āTypically, statisticians say that a sample size of 30 is sufficient for most distributions. ā I, too, was taught to obtain a sample size of ~30, but I canāt figure out where we all came up with that number. Iāve asked colleagues. No one knows. Do you? And will you share how this number came about?
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3330)
- Jim Frost says
[October 29, 2018 at 1:49 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3332)
Hi Janna,
Youāre very welcome! This number is what emerges as a good rule of thumb for sample sizes that will generally produce an approximately normal sampling distribution for most types of probability distributions. The idea is that when you meet this threshold, you donāt have to worry about whether your data are normally distributed when you use a parametric test that otherwise assumes normal data. (Of course, depending on your study area, you might need a larger sample size to have adequate statistical power, but thatās a different matter.)
There is a fudge [factor](https://statisticsbyjim.com/glossary/factors/) around this number in several ways. For one thing, how closely does the sampling distribution need to approximate the normal distribution to be good enough? And, the degree to which the population distribution differs from the normal distribution affects this number. In this blog post, some of the examples illustrate how sometimes n=20 is sufficient while in the extremely skewed distribution, a sample size of 80 was not sufficient. That example is probably more skewed than most real data. But, it illustrates the point that the number depends on the shape of the distribution in the underlying population.
Not all statisticians agree. I think 30 is the most common number that I hear. But, others say 40 just to be safe. And, I used to work at Minitab statistical software and a group there did a study about this where they assessed what sample size is required for nonnormal data so that the actual type I error rate matches the [significance level](https://statisticsbyjim.com/glossary/significance-level/) for various parametric testsāand that ultimately links back to the CLT theorem and the ideas discussed in this post.
They developed a table of sample sizes based on the type of analysis. You can see this table in my post about [parametric vs. nonparametric analysis](https://statisticsbyjim.com/hypothesis-testing/nonparametric-parametric-tests/). In that post, there is also a link to the white paper that they developed. For example, they conclude that a 1-sample t-test requires at least a sample size of 20. Indeed, the moderately skewed example in this post produces a fairly normal looking sampling distribution with a sample size of 20.
I think its hard to find a concrete reference to this number because itās more a rule of thumb. Thereās no formula or calculation that spits out this number. Both a researcherās notion of how closely the sampling distribution needs to approximate the normal distribution and how different their distribution is from the normal distribution affect the number they will use! 20-40 should be good for most distributions.
I hope this helps\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3332)
60. Debashis Dalai says
[October 29, 2018 at 4:00 am](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3323)
Thank you so much Jim for all the efforts put into simplifying a complex subject like Statistics! You help us a lot. Thanks again\!
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3323)
61. MG says
[October 28, 2018 at 11:37 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3319)
Great job Jim.
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3319)
- Jim Frost says
[October 28, 2018 at 11:41 pm](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3320)
Thanks, MG! š
Loading...
[Reply](https://statisticsbyjim.com/basics/central-limit-theorem/#comment-3320)
### Comments and Questions[Cancel reply](https://statisticsbyjim.com/basics/central-limit-theorem/#respond)
## Primary Sidebar
### Meet Jim
Iāll help you intuitively understand statistics by focusing on concepts and using plain English so you can concentrate on understanding your results.
[Read More...](https://statisticsbyjim.com/jim_frost/)
### Buy My Introduction to Statistics Book\!
[](https://www.amazon.com/dp/1735431109)
### Buy My Hypothesis Testing Book\!
[](https://www.amazon.com/dp/173543115X)
### Buy My Regression Book\!
[](https://www.amazon.com/dp/1735431184)
### Buy My Thinking Analytically Book\!
[](https://www.amazon.com/dp/B0DCNXP83T)
### Top Posts
- [Percent Change: Formula and Calculation Steps](https://statisticsbyjim.com/basics/percent-change/)
- [F-table](https://statisticsbyjim.com/hypothesis-testing/f-table/)
- [Hypergeometric Distribution: Uses, Calculator & Formula](https://statisticsbyjim.com/probability/hypergeometric-distribution/)
- [Cronbachās Alpha: Definition, Calculations & Example](https://statisticsbyjim.com/basics/cronbachs-alpha/)
- [Z-table](https://statisticsbyjim.com/hypothesis-testing/z-table/)
- [How To Interpret R-squared in Regression Analysis](https://statisticsbyjim.com/regression/interpret-r-squared-regression/)
- [Multicollinearity in Regression Analysis: Problems, Detection, and Solutions](https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/)
- [Interpreting Correlation Coefficients](https://statisticsbyjim.com/basics/correlations/)
- [Box Plot Explained with Examples](https://statisticsbyjim.com/graphs/box-plot/)
- [T-Distribution Table of Critical Values](https://statisticsbyjim.com/hypothesis-testing/t-distribution-table/)
### Recent Posts
- [Data Collection Methods: Step-By-Step Guide with Examples](https://statisticsbyjim.com/basics/data-collection-methods/)
- [ANOVA Calculator](https://statisticsbyjim.com/calculators/anova-calculator/)
- [Positive Predictive Value: Meaning, Formula, and Interpretation](https://statisticsbyjim.com/basics/positive-predictive-value/)
- [Median Absolute Deviation Calculator](https://statisticsbyjim.com/calculators/median-absolute-deviation-calculator/)
- [Median Absolute Deviation: Definition, Finding & Formula](https://statisticsbyjim.com/basics/median-absolute-deviation/)
- [Outlier Calculator](https://statisticsbyjim.com/calculators/outlier-calculator/)
### Recent Comments
- Jim Frost on [Comparing Regression Lines with Hypothesis Tests](https://statisticsbyjim.com/regression/comparing-regression-lines/#comment-13114)
- Skata na fas on [Comparing Regression Lines with Hypothesis Tests](https://statisticsbyjim.com/regression/comparing-regression-lines/#comment-13113)
- Skata na fas on [Comparing Regression Lines with Hypothesis Tests](https://statisticsbyjim.com/regression/comparing-regression-lines/#comment-13112)
- Jim Frost on [Pareto Chart: Making, Reading & Examples](https://statisticsbyjim.com/graphs/pareto-charts/#comment-13110)
- mphatso lazaro on [Pareto Chart: Making, Reading & Examples](https://statisticsbyjim.com/graphs/pareto-charts/#comment-13109)
Copyright Ā© 2026 Ā· Jim Frost Ā· [Privacy Policy](https://statisticsbyjim.com/privacy-policy/)
%d |
| Readable Markdown | The central limit theorem in [statistics](https://statisticsbyjim.com/glossary/statistics/) states that, given a sufficiently large [sample size](https://statisticsbyjim.com/glossary/sample-size/), the sampling [distribution](https://statisticsbyjim.com/glossary/distribution/) of the mean for a variable will approximate a normal distribution regardless of that variableās distribution in the [population](https://statisticsbyjim.com/glossary/population/).
Unpacking the meaning from that complex definition can be difficult. Thatās the topic for this post! Iāll walk you through the various aspects of the central limit theorem (CLT) definition, and show you why it is vital in statistics.
## Distribution of the Variable in the Population
Part of the definition for the central limit theorem states, āregardless of the variableās distribution in the population.ā This part is easy! In a population, the values of a variable can follow different probability distributions. These distributions can range from normal, left-skewed, right-skewed, and uniform among others.
**Normal**
**Right-Skewed**
**Left-Skewed**
**Uniform**
This part of the definition refers to the distribution of the variableās values in the population from which you draw a random sample.
The central limit theorem applies to almost all types of probability distributions, but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has infinite variance.
Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And, the distribution of that variable must remain constant across all measurements.
**Related Post**: [Understanding Probability Distributions](https://statisticsbyjim.com/basics/probability-distributions/) and [Independent and Identically Distributed Variables](https://statisticsbyjim.com/basics/independent-identically-distributed-data/)
## Sampling Distribution of the Mean
The definition for the central limit theorem also refers to āthe sampling distribution of the mean.ā Whatās that?
Typically, you perform a study once, and you might calculate the mean of that one sample. Now, imagine that you repeat the study many times and collect the same sample size for each one. Then, you calculate the mean for each of these samples and graph them on a histogram. The histogram displays the distribution of sample means, which statisticians refer to as the sampling distribution of the mean.
Fortunately, we donāt have to repeat studies many times to estimate the sampling distribution of the mean. Statistical procedures can estimate that from a single random sample.
The shape of the sampling distribution depends on the sample size. If you perform the study using the same procedure and change only the sample size, the shape of the sampling distribution will differ for each sample size. And, that brings us to the next part of the CLT definition\!
## Central Limit Theorem and a Sufficiently Large Sample Size
As the previous section states, the shape of the sampling distribution changes with the sample size. And, the definition of the central limit theorem states that when you have a sufficiently large sample size, the sampling distribution starts to approximate a normal distribution. How large does the sample size have to be for that approximation to occur?
It depends on the shape of the variableās distribution in the underlying population. The more the population distribution differs from being normal, the larger the sample size must be. Typically, statisticians say that a sample size of 30 is sufficient for most distributions. However, strongly skewed distributions can require larger sample sizes. Weāll see the sample size aspect in action during the empirical demonstration below.
## Central Limit Theorem and Approximating the Normal Distribution
To recap, the central limit theorem links the following two distributions:
- The distribution of the variable in the population.
- The sampling distribution of the mean.
Specifically, the CLT states that regardless of the variableās distribution in the population, the sampling distribution of the mean will tend to approximate the normal distribution.
In other words, the population distribution can look like the following:

But, the sampling distribution can appear like below:

Itās not surprising that a normally distributed variable produces a sampling distribution that also follows the normal distribution. But, surprisingly, nonnormal population distributions can also create normal sampling distributions.
**Related Post**: [Normal Distribution in Statistics](https://statisticsbyjim.com/basics/normal-distribution/)
## Properties of the Central Limit Theorem
Letās get more specific about the normality features of the central limit theorem. Normal distributions have two parameters, the mean and standard deviation. What values do these parameters converge on?
As the sample size increases, the sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals Ļ/ān. Where:
- Ļ = the population standard deviation
- n = the sample size
As the sample size (n) increases, the standard deviation of the sampling distribution becomes smaller because the square root of the sample size is in the denominator. In other words, the sampling distribution clusters more tightly around the mean as sample size increases.
Letās put all of this together. As sample size increases, the sampling distribution more closely approximates the normal distribution, and the spread of that distribution tightens. These properties have essential implications in statistics that Iāll discuss later in this post.
**Related Posts**: [Measures of Central Tendency](https://statisticsbyjim.com/basics/measures-central-tendency-mean-median-mode/) and [Measures of Variability](https://statisticsbyjim.com/basics/variability-range-interquartile-variance-standard-deviation/)
## Empirical Demonstration of the Central Limit Theorem
Now the fun part! There is a mathematical proof for the central theorem, but that goes beyond the scope of this blog post. However, I will show how it works empirically by using statistical simulation software. Iāll define population distributions and have the software draw many thousands of random samples from it. The software will calculate the mean of each sample and then graph these sample means on a histogram to display the sampling distribution of the mean.
For the following examples, Iāll vary the sample size to show how that affects the sampling distribution. To produce the sampling distribution, Iāll draw 500,000 random samples because that creates a fairly smooth distribution in the histogram.
Keep this critical difference in mind. While Iāll collect a consistent 500,000 samples per condition, the size of those samples will vary, and that affects the shape of the sampling distribution.
Letās test this theory! To do that, Iāll use [Statistics101](https://sourceforge.net/projects/statistics101/), which is a freeware computer program. This is a great simulation program that Iāve also used to [tackle the Monty Hall Problem](https://statisticsbyjim.com/hypothesis-testing/monty-hall-problem-hypothesis-testing/)\!
## Testing the Central Limit Theorem with Three Probability Distributions
Iāll show you how the central limit theorem works with three different distributions: moderately skewed, severely skewed, and a uniform distribution. The first two distributions skew to the right and follow the lognormal distribution. The probability distribution plot below displays the populationās distribution of values. Notice how the red dashed distribution is much more severely skewed. It actually extends quite a way off the graph! Weāll see how this makes a difference in the sampling distributions.

Letās see how the central limit theorem handles these two distributions and the uniform distribution.
## Moderately Skewed Distribution and the Central Limit Theorem
The graph below shows the moderately skewed lognormal distribution. This distribution fits the body fat percentage dataset that I use in my post about [identifying the distribution of your data](https://statisticsbyjim.com/hypothesis-testing/identify-distribution-data/). These data correspond to the blue line in the probability distribution plot above. I use the simulation software to draw random samples from this population 500,000 times for each sample size (5, 20, 40).

In the graph above, the gray color shows the skewed distribution of the values in the population. The other colors represent the sampling distributions of the means for different sample sizes. The red color shows the distribution of means when your sample size is 5. Blue denotes a sample size of 20. Green is 40. The red curve (n=5) is still skewed a bit, but the blue and green (20 and 40) are not visibly skewed.
As the sample size increases, the sampling distributions more closely approximate the normal distribution and become more tightly clustered around the population meanājust as the central limit theorem states\!
## Very Skewed Distribution and the Central Limit Theorem
Now, letās try this with the very skewed lognormal distribution. These data follow the red dashed line in the probability distribution plot above. I follow the same process but use larger sample sizes of 40 (grey), 60 (red), and 80 (blue). I do not include the population distribution in this one because it is so skewed that it messes up the X-axis scale\!

The population distribution is extremely skewed. Itās probably more skewed than real data tend to be. As you can see, even with the largest sample size (blue, n=80), the sampling distribution of the mean is still skewed right. However, it is less skewed than the sampling distributions for the smaller sample sizes. Also, notice how the peaks of the sampling distribution shift to the right as the sample increases. Eventually, with a large enough sample size, the sampling distributions will become symmetric, and the peak will stop shifting and center on the actual population mean.
If your population distribution is extremely skewed, be aware that you might need a substantial sample size for the central limit theorem to kick in and produce sampling distributions that approximate a normal distribution\!
## Uniform Distribution and the Central Limit Theorem
Now, letās change gears and look at an entirely different type of distribution. Imagine that we roll a die and take the average value of the rolls. The probabilities for rolling the numbers on a die follow a uniform distribution because all numbers have the same chance of occurring. Can the central limit theorem work with discrete numbers and uniform probabilities? Letās see\!
In the graph below, I follow the same procedure as above. In this example, the sample size refers to the number of times we roll the die. The process calculates the mean for each sample.

In the graph above, I use sample sizes of 5, 20, and 40. Weād expect the average to be (1 + 2 + 3 + 4 + 5 + 6 / 6 = 3.5). The sampling distributions of the means center on this value. Just as the central limit theorem predicts, as we increase the sample size, the sampling distributions more closely approximate a normal distribution and have a tighter spread of values.
You could perform a similar experiment using the [binomial distribution](https://statisticsbyjim.com/basics/binary-data-binomial-distribution/) with coin flips and obtain the same types of results when it comes to, say, the probability of getting heads. All thanks to the central limit theorem\!
## Why is the Central Limit Theorem Important?
The central limit theorem is vital in statistics for two main reasonsāthe normality assumption and the precision of the estimates.
### Central limit theorem and the normality assumption
The fact that sampling distributions can approximate a normal distribution has critical implications. In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the [t-test](https://statisticsbyjim.com/hypothesis-testing/t-tests-t-values-t-distributions-probabilities/). Consequently, you might think that these tests are not valid when the data are nonnormally distributed. However, if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributedāas long as your sample size is large enough.
You might have heard that parametric tests of the mean are robust to departures from the normality assumption when your sample size is sufficiently large. Thatās thanks to the central limit theorem\!
For more information about this aspect, read my post that compares [parametric and nonparametric tests](https://statisticsbyjim.com/hypothesis-testing/nonparametric-parametric-tests/).
### Precision of estimates
In all of the graphs, notice how the sampling distributions of the mean cluster more tightly around the population mean as the sample sizes increase. This property of the central limit theorem becomes relevant when using a sample to estimate the mean of an entire population. With a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.
Conversely, the sampling distributions of the mean for smaller sample sizes are much broader. For small sample sizes, itās not unusual for sample means to be further away from the actual population mean. You obtain less precise estimates.
In closing, understanding the central limit theorem is crucial when it comes to trusting the validity of your results and assessing the precision of your estimates. Use large sample sizes to satisfy the normality assumption even when your data are nonnormally distributed and to obtain more precise estimates\! |
| Shard | 85 (laksa) |
| Root Hash | 11135338449183465285 |
| Unparsed URL | com,statisticsbyjim!/basics/central-limit-theorem/ s443 |