🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 77 (from laksa074)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

📄
INDEXABLE
✅
CRAWLED
2 months ago
🤖
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH2.4 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://medium.com/@datasciencedelight/central-limit-theorem-a-beginners-guide-82d0f06cd2df
Last Crawled2026-02-07 08:11:07 (2 months ago)
First Indexednot set
HTTP Status Code200
Meta TitleCentral Limit Theorem: A Beginner’s Guide | by Data Science Delight | Medium
Meta DescriptionCentral Limit Theorem: A Beginner’s Guide Welcome to the beginner’s guide to the Central Limit Theorem, where we will be discussing everything that you should know about the central limit theorem …
Meta Canonicalnull
Boilerpipe Text
6 min read Apr 1, 2024 -- Welcome to the beginner’s guide to the Central Limit Theorem, where we will be discussing everything that you should know about the central limit theorem along with real-world examples. What is the Central Limit Theorem (CLT)? It states that regardless of the shape of the original population distribution, the distribution of the sample mean approaches a normal (bell-shaped) distribution as the sample size increases. In simple words , CLT says that if we were to take many random samples from a population and calculate the mean of each sample, then the distribution of those sample means will tend to approximate a normal distribution (a bell-shaped curve), regardless of the shape of the original population distribution. For Example; Imagine you’re at a carnival surrounded by many people of different heights like tall, short, young, old, etc. Your task is to find the average height of the population (i.e.; people present in the carnival). Press enter or click to view image in full size Photo by CHUTTERSNAP on Unsplash You took a random sample of let’s say 50 people and calculated their average height (x1) . Then, you took another random sample of 50 people and calculated their average height (x2) , and so on. Now, if you were to plot all those sample averages, what do you think would look like? You guessed it — It would create a bell-shaped curve. Even if the heights vary widely, the distribution of sample averages will tend to follow a normal distribution. Demonstrating CLT using Titanic Dataset: Press enter or click to view image in full size Photo by Daniele D'Andreti on Unsplash Remember, we said, “As the sample size increases, the distribution of the sample averages tends to be a normal distribution”. Let’s illustrate this concept with an example: Before starting first, make sure to download the Titanic dataset from Kaggle . Here, I have taken the entire population of the Titanic dataset and created the original distribution of the ‘Fare’ population. Press enter or click to view image in full size Image Source — By Author From the above graph, we can see that our original population distribution is right skewed. Yes! You might be thinking, what is the problem if it is skewed? In data science, we always want our distribution to follow a normal distribution, cause it’s easy to interpret. And, that’s the reason we want our skewed data to follow a bell-shaped curve. This is possible, by using CLT, which will help convert our skewed data to a normal distribution. For the entire code, you can visit github , here . Then I took a random sample of 50 passengers, calculated their mean, and did this 100 times. When we plot the distribution of the sample mean, we get this: Press enter or click to view image in full size Image Source — By Author TASK FOR YOU: What sample size have we used for our analysis? Plot a distribution plot for the above. Can the CLT be applied to discrete data? Plot the original population mean on the distribution of sample means. Steps to follow while applying the Central Limit Theorem (CLT): Define the Population and Parameter of Interest: Identify the population of interest and the parameter you want to estimate or test. For example, if you’re studying the heights of individuals, the population would be all individuals, and the parameter might be the population mean height. Choose a Sampling Method: Select an appropriate sampling method to draw random samples from the population. Random sampling ensures that each member of the population has an equal chance of being selected, which is essential for the validity of the CLT. Determine Sample Size: Decide on the sample size for each random sample. While larger sample sizes generally yield more accurate results, the CLT can still provide reasonable approximations for moderate sample sizes. Collect Data: Gather data by taking random samples from the population according to the chosen sampling method and sample size. Ensure that the samples are representative of the population to avoid bias. Calculate Sample Means: Calculate the mean of each random sample. This step involves adding up all the values in the sample and dividing by the sample size to obtain the sample mean. Repeat Sampling Process: Repeat the sampling process multiple times to obtain several sample means. The more samples you collect, the better the approximation of the sampling distribution of the sample mean to a normal distribution, as per the CLT. Analyze Sampling Distribution: Plot a histogram or frequency distribution of the sample means obtained from the random samples. Visualizing the distribution allows you to observe its shape and assess its similarity to a normal distribution. Check Assumptions: Verify that the assumptions of the CLT are met, including random sampling, independence of observations, and a sufficiently large sample size relative to the population. If any assumptions are violated, adjustments or alternative methods may be necessary. Make Inferences or Conduct Tests: Once you have obtained the sampling distribution of the sample mean, you can use it to make inferences about the population parameter of interest. This may involve constructing confidence intervals, performing hypothesis tests, or estimating population parameters. Interpret Results: Interpret the results of your analysis in the context of the research question or problem at hand. Discuss the implications of your findings and any limitations or assumptions made during the analysis.
Markdown
[Sitemap](https://medium.com/sitemap/sitemap.xml) [Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------) Sign up [Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40datasciencedelight%2Fcentral-limit-theorem-a-beginners-guide-82d0f06cd2df&source=post_page---top_nav_layout_nav-----------------------global_nav------------------) [Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------) [Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------) [Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------) Sign up [Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40datasciencedelight%2Fcentral-limit-theorem-a-beginners-guide-82d0f06cd2df&source=post_page---top_nav_layout_nav-----------------------global_nav------------------) ![](https://miro.medium.com/v2/resize:fill:64:64/1*dmbNkD5D-u45r44go_cf0g.png) # Central Limit Theorem: A Beginner’s Guide [![Data Science Delight](https://miro.medium.com/v2/resize:fill:64:64/1*cnFmbA7zgTbHeXFL95Ky2Q.png)](https://medium.com/@datasciencedelight?source=post_page---byline--82d0f06cd2df---------------------------------------) [Data Science Delight](https://medium.com/@datasciencedelight?source=post_page---byline--82d0f06cd2df---------------------------------------) 6 min read · Apr 1, 2024 \-- Listen Share Welcome to the beginner’s guide to the Central Limit Theorem, where we will be discussing everything that you should know about the central limit theorem along with real-world examples. ![]() Image Source — Author ### What is the Central Limit Theorem (CLT)? It states that regardless of the shape of the original population distribution, the distribution of the sample mean approaches a normal (bell-shaped) distribution as the sample size increases. ***In simple words***, CLT says that if we were to take many random samples from a population and calculate the mean of each sample, then the distribution of those sample means will tend to approximate a normal distribution (a bell-shaped curve), regardless of the shape of the original population distribution. *For Example;* Imagineyou’re at a carnival surrounded by many people of different heights like tall, short, young, old, etc. Your task is to find the average height of the population (i.e.; people present in the carnival). Press enter or click to view image in full size ![]() Photo by [CHUTTERSNAP](https://unsplash.com/@chuttersnap?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com/?utm_source=medium&utm_medium=referral) You took a random sample of let’s say 50 people and calculated their average height **(x1)**. Then, you took another random sample of 50 people and calculated their average height **(x2)**, and so on. ***Now,******if you were to plot all those sample averages, what do you think would look like?***You guessed it — It would create a bell-shaped curve. Even if the heights vary widely, the distribution of sample averages will tend to follow a normal distribution. ### Demonstrating CLT using Titanic Dataset: Press enter or click to view image in full size ![]() Photo by [Daniele D'Andreti](https://unsplash.com/@danieledandreti?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com/?utm_source=medium&utm_medium=referral) Remember, we said, ***“As the sample size increases, the distribution of the sample averages tends to be a normal distribution”.*** Let’s illustrate this concept with an example: Before starting first, make sure to download the Titanic dataset from [*Kaggle*](https://www.kaggle.com/competitions/titanic/overview)*. Here,* I have taken the entire population of the Titanic dataset and created the original distribution of the ‘Fare’ population. Press enter or click to view image in full size ![]() Image Source — By Author From the above graph, we can see that our original population distribution is right skewed. **Yes\!** You might be thinking, ***what is the problem if it is skewed?*** In data science, we always want our distribution to follow a normal distribution, cause it’s easy to interpret. And, that’s the reason we want our skewed data to follow a bell-shaped curve. This is possible, by using CLT, which will help convert our skewed data to a normal distribution. For the entire code, you can visit [*github*](https://github.com/Data-Science-Delight/CLT), [*here*](https://github.com/Data-Science-Delight/CLT/blob/main/code.ipynb)*.* Then I took a random sample of 50 passengers, calculated their mean, and did this 100 times. When we plot the distribution of the sample mean, we get this: Press enter or click to view image in full size ![]() Image Source — By Author ### TASK FOR YOU: 1. What sample size have we used for our analysis? 2. Plot a distribution plot for the above. 3. Can the CLT be applied to discrete data? 4. Plot the original population mean on the distribution of sample means. ### **Steps to follow while applying the Central Limit Theorem (CLT):** - **Define the Population and Parameter of Interest:** Identify the population of interest and the parameter you want to estimate or test. For example, if you’re studying the heights of individuals, the population would be all individuals, and the parameter might be the population mean height. - **Choose a Sampling Method:** Select an appropriate sampling method to draw random samples from the population. Random sampling ensures that each member of the population has an equal chance of being selected, which is essential for the validity of the CLT. - **Determine Sample Size:** Decide on the sample size for each random sample. While larger sample sizes generally yield more accurate results, the CLT can still provide reasonable approximations for moderate sample sizes. - **Collect Data:** Gather data by taking random samples from the population according to the chosen sampling method and sample size. Ensure that the samples are representative of the population to avoid bias. - **Calculate Sample Means:** Calculate the mean of each random sample. This step involves adding up all the values in the sample and dividing by the sample size to obtain the sample mean. - **Repeat Sampling Process:** Repeat the sampling process multiple times to obtain several sample means. The more samples you collect, the better the approximation of the sampling distribution of the sample mean to a normal distribution, as per the CLT. - **Analyze Sampling Distribution:** Plot a histogram or frequency distribution of the sample means obtained from the random samples. Visualizing the distribution allows you to observe its shape and assess its similarity to a normal distribution. - **Check Assumptions:** Verify that the assumptions of the CLT are met, including random sampling, independence of observations, and a sufficiently large sample size relative to the population. If any assumptions are violated, adjustments or alternative methods may be necessary. - **Make Inferences or Conduct Tests:** Once you have obtained the sampling distribution of the sample mean, you can use it to make inferences about the population parameter of interest. This may involve constructing confidence intervals, performing hypothesis tests, or estimating population parameters. - **Interpret Results:** Interpret the results of your analysis in the context of the research question or problem at hand. Discuss the implications of your findings and any limitations or assumptions made during the analysis. ### Properties of CLT - **Robustness to Population Distribution:** The CLT is incredibly robust, meaning it applies to a wide range of population distributions, regardless of their shape. Whether the data follows a normal distribution, skewed distribution, or even a completely irregular distribution, as long as the sample size is sufficiently large, the distribution of sample means will tend to approach a normal distribution. - **Impact of Sample Size:** The CLT emphasizes the importance of sample size. As the sample size increases, the distribution of sample means becomes increasingly closer to a normal distribution. This property highlights the power of larger sample sizes in producing more reliable and accurate estimates of population parameters. - **Universal Applicability:** One of the most remarkable aspects of the CLT is its universal applicability. It applies not only to continuous data but also to discrete data and even to multivariate data. This makes the CLT a versatile and widely applicable tool in various fields of study, including science, engineering, economics, and social sciences. ### When to use CLT? The Central Limit Theorem (CLT) is utilized when dealing with large sample sizes or when working with populations that are not normally distributed. It’s particularly valuable in scenarios requiring statistical inference, hypothesis testing, or constructing confidence intervals, where it ensures the validity of the statistical methods employed despite the shape of the original distribution. ### Applications of CLT Press enter or click to view image in full size ![]() Photo by [m.](https://unsplash.com/@m_____me?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com/?utm_source=medium&utm_medium=referral) - **Hypothesis Testing:** CLT aids in comparing sample data to hypothesized population parameters, enabling researchers to determine statistical significance. - **Financial Analysis:** Used in risk assessment, portfolio management, and asset pricing. - **Polling and Surveys:** Provides estimates of population opinions and preferences from sample data. - **Educational Assessment:** Analyzes student performance on standardized tests and monitors educational progress over time. - **Biostatistics and Epidemiology:** Helps identify health risk factors and evaluate interventions. ### Limitations of CLT **Dependence on Sample Size:** The CLT requires sufficiently large sample sizes to reliably approximate a normal distribution. For small sample sizes, the distribution of sample means may not closely resemble a normal distribution, potentially leading to inaccurate conclusions. **Dependence on Independence:** The CLT assumes that individual observations within each sample are independent of each other. In cases where observations are correlated or dependent, such as time series data or clustered observations, the CLT may not hold, leading to biased estimates or incorrect inferences. **Sensitivity to Outliers:** The CLT may be sensitive to outliers or extreme values within the data. Outliers can disproportionately influence the calculation of sample means, potentially skewing the distribution of sample means away from normality. **Population Distribution Requirements:** While the CLT is robust to a wide range of population distributions, it may not hold for distributions with heavy tails or extreme skewness. In such cases, alternative methods or adjustments may be necessary to account for departures from normality. ### Conclusion Stay Tuned! For Part 2 of the Central Limit Theorem. For code, you can visit [*Github*](https://github.com/Data-Science-Delight/CLT)*,* [*here*](https://github.com/Data-Science-Delight/CLT)*.* If you liked this theorem or explanation ***please like/clap, and follow*** [***Data Science Delight***](https://medium.com/u/47e28d120ff3?source=post_page---user_mention--82d0f06cd2df---------------------------------------) ***for more such amazing blogs.*** [Data Science](https://medium.com/tag/data-science?source=post_page-----82d0f06cd2df---------------------------------------) [Python](https://medium.com/tag/python?source=post_page-----82d0f06cd2df---------------------------------------) [Machine Learning](https://medium.com/tag/machine-learning?source=post_page-----82d0f06cd2df---------------------------------------) [Central Limit Theorem](https://medium.com/tag/central-limit-theorem?source=post_page-----82d0f06cd2df---------------------------------------) [Normal Distribution](https://medium.com/tag/normal-distribution?source=post_page-----82d0f06cd2df---------------------------------------) \-- \-- [![Data Science Delight](https://miro.medium.com/v2/resize:fill:96:96/1*cnFmbA7zgTbHeXFL95Ky2Q.png)](https://medium.com/@datasciencedelight?source=post_page---post_author_info--82d0f06cd2df---------------------------------------) [![Data Science Delight](https://miro.medium.com/v2/resize:fill:128:128/1*cnFmbA7zgTbHeXFL95Ky2Q.png)](https://medium.com/@datasciencedelight?source=post_page---post_author_info--82d0f06cd2df---------------------------------------) [Written by Data Science Delight](https://medium.com/@datasciencedelight?source=post_page---post_author_info--82d0f06cd2df---------------------------------------) [385 followers](https://medium.com/@datasciencedelight/followers?source=post_page---post_author_info--82d0f06cd2df---------------------------------------) ·[185 following](https://medium.com/@datasciencedelight/following?source=post_page---post_author_info--82d0f06cd2df---------------------------------------) Content Creator \| Sharing insights & tips on data science \| Instagram: @datasciencedelight \| YouTube: <https://www.youtube.com/channel/UCpz2054mp5xfcBKUIctnhlw> ## No responses yet [Help](https://help.medium.com/hc/en-us?source=post_page-----82d0f06cd2df---------------------------------------) [Status](https://status.medium.com/?source=post_page-----82d0f06cd2df---------------------------------------) [About](https://medium.com/about?autoplay=1&source=post_page-----82d0f06cd2df---------------------------------------) [Careers](https://medium.com/jobs-at-medium/work-at-medium-959d1a85284e?source=post_page-----82d0f06cd2df---------------------------------------) [Press](mailto:pressinquiries@medium.com) [Blog](https://blog.medium.com/?source=post_page-----82d0f06cd2df---------------------------------------) [Privacy](https://policy.medium.com/medium-privacy-policy-f03bf92035c9?source=post_page-----82d0f06cd2df---------------------------------------) [Rules](https://policy.medium.com/medium-rules-30e5502c4eb4?source=post_page-----82d0f06cd2df---------------------------------------) [Terms](https://policy.medium.com/medium-terms-of-service-9db0094a1e0f?source=post_page-----82d0f06cd2df---------------------------------------) [Text to speech](https://speechify.com/medium?source=post_page-----82d0f06cd2df---------------------------------------)
Readable Markdown
[![Data Science Delight](https://miro.medium.com/v2/resize:fill:64:64/1*cnFmbA7zgTbHeXFL95Ky2Q.png)](https://medium.com/@datasciencedelight?source=post_page---byline--82d0f06cd2df---------------------------------------) 6 min read Apr 1, 2024 \-- Welcome to the beginner’s guide to the Central Limit Theorem, where we will be discussing everything that you should know about the central limit theorem along with real-world examples. ### What is the Central Limit Theorem (CLT)? It states that regardless of the shape of the original population distribution, the distribution of the sample mean approaches a normal (bell-shaped) distribution as the sample size increases. ***In simple words***, CLT says that if we were to take many random samples from a population and calculate the mean of each sample, then the distribution of those sample means will tend to approximate a normal distribution (a bell-shaped curve), regardless of the shape of the original population distribution. *For Example;* Imagineyou’re at a carnival surrounded by many people of different heights like tall, short, young, old, etc. Your task is to find the average height of the population (i.e.; people present in the carnival). Press enter or click to view image in full size Photo by [CHUTTERSNAP](https://unsplash.com/@chuttersnap?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com/?utm_source=medium&utm_medium=referral) You took a random sample of let’s say 50 people and calculated their average height **(x1)**. Then, you took another random sample of 50 people and calculated their average height **(x2)**, and so on. ***Now,******if you were to plot all those sample averages, what do you think would look like?***You guessed it — It would create a bell-shaped curve. Even if the heights vary widely, the distribution of sample averages will tend to follow a normal distribution. ### Demonstrating CLT using Titanic Dataset: Press enter or click to view image in full size Photo by [Daniele D'Andreti](https://unsplash.com/@danieledandreti?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com/?utm_source=medium&utm_medium=referral) Remember, we said, ***“As the sample size increases, the distribution of the sample averages tends to be a normal distribution”.*** Let’s illustrate this concept with an example: Before starting first, make sure to download the Titanic dataset from [*Kaggle*](https://www.kaggle.com/competitions/titanic/overview)*. Here,* I have taken the entire population of the Titanic dataset and created the original distribution of the ‘Fare’ population. Press enter or click to view image in full size Image Source — By Author From the above graph, we can see that our original population distribution is right skewed. **Yes\!** You might be thinking, ***what is the problem if it is skewed?*** In data science, we always want our distribution to follow a normal distribution, cause it’s easy to interpret. And, that’s the reason we want our skewed data to follow a bell-shaped curve. This is possible, by using CLT, which will help convert our skewed data to a normal distribution. For the entire code, you can visit [*github*](https://github.com/Data-Science-Delight/CLT), [*here*](https://github.com/Data-Science-Delight/CLT/blob/main/code.ipynb)*.* Then I took a random sample of 50 passengers, calculated their mean, and did this 100 times. When we plot the distribution of the sample mean, we get this: Press enter or click to view image in full size Image Source — By Author ### TASK FOR YOU: 1. What sample size have we used for our analysis? 2. Plot a distribution plot for the above. 3. Can the CLT be applied to discrete data? 4. Plot the original population mean on the distribution of sample means. ### **Steps to follow while applying the Central Limit Theorem (CLT):** - **Define the Population and Parameter of Interest:** Identify the population of interest and the parameter you want to estimate or test. For example, if you’re studying the heights of individuals, the population would be all individuals, and the parameter might be the population mean height. - **Choose a Sampling Method:** Select an appropriate sampling method to draw random samples from the population. Random sampling ensures that each member of the population has an equal chance of being selected, which is essential for the validity of the CLT. - **Determine Sample Size:** Decide on the sample size for each random sample. While larger sample sizes generally yield more accurate results, the CLT can still provide reasonable approximations for moderate sample sizes. - **Collect Data:** Gather data by taking random samples from the population according to the chosen sampling method and sample size. Ensure that the samples are representative of the population to avoid bias. - **Calculate Sample Means:** Calculate the mean of each random sample. This step involves adding up all the values in the sample and dividing by the sample size to obtain the sample mean. - **Repeat Sampling Process:** Repeat the sampling process multiple times to obtain several sample means. The more samples you collect, the better the approximation of the sampling distribution of the sample mean to a normal distribution, as per the CLT. - **Analyze Sampling Distribution:** Plot a histogram or frequency distribution of the sample means obtained from the random samples. Visualizing the distribution allows you to observe its shape and assess its similarity to a normal distribution. - **Check Assumptions:** Verify that the assumptions of the CLT are met, including random sampling, independence of observations, and a sufficiently large sample size relative to the population. If any assumptions are violated, adjustments or alternative methods may be necessary. - **Make Inferences or Conduct Tests:** Once you have obtained the sampling distribution of the sample mean, you can use it to make inferences about the population parameter of interest. This may involve constructing confidence intervals, performing hypothesis tests, or estimating population parameters. - **Interpret Results:** Interpret the results of your analysis in the context of the research question or problem at hand. Discuss the implications of your findings and any limitations or assumptions made during the analysis.
Shard77 (laksa)
Root Hash13179037029838926277
Unparsed URLcom,medium!/@datasciencedelight/central-limit-theorem-a-beginners-guide-82d0f06cd2df s443