šŸ•·ļø Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 107 (from laksa013)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ā„¹ļø Skipped - page is already crawled

šŸ“„
INDEXABLE
āœ…
CRAWLED
48 minutes ago
šŸ¤–
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/
Last Crawled2026-04-12 22:34:28 (48 minutes ago)
First Indexed2019-05-03 03:40:19 (6 years ago)
HTTP Status Code200
Meta TitleCentral Limit Theorem : Definition ,Formula & Examples
Meta DescriptionLearn about the central limit theorem, a crucial concept in statistics that enhances predictive modeling and hypothesis testing. Read Now!
Meta Canonicalnull
Boilerpipe Text
What is one of the most important and core concepts of statistics that enables us to do predictive modeling, and yet it often confuses aspiring data scientists? Yes, I’m talking about the central limit theorem (CLT). It is a powerful statistical concept that every data scientist MUST know. Now, why is that? Well, the central limit theorem (CLT) is at the heart of hypothesis testing – a critical component of the data science and machine learning lifecycle .Ā That’s right, the idea that lets us explore the vast possibilities of the data we are given springs from CLT. It’s actually a simple notion to understand, yet most data scientists flounder at this question during interviews. In this article, you will get to know all about the Central Limit Throrem its examples, formulas nad practical applications that will clear more concept about Central Limit Theorem. What is Central Limit Theorem? Central Limit Theorem with Example Central Limit Theorem Formula Distribution of the Variable in the Population Conditions of the Central Limit Theorem Significance of the Central Limit Theorem Practical Applications of CLT Assumptions Behind the Central Limit Theorem What Is Standard Error? Implementing the Central Limit Theorem in R Conclusion Frequently Asked Questions? What is Central Limit Theorem? TheĀ Central Limit Theorem (CLT)Ā states that when large enough random samples are taken from any population (regardless of its original distribution), the distribution of the sample means will approximate aĀ normal distribution (bell curve), with the mean equal to the population mean and the standard deviation decreasing as sample size increases. Central Limit Theorem with Example Let’s understand the central limit theorem with the help of an example. This will help you intuitively grasp how CLT works underneath. Consider that there are 15 sections in the science department of a university, and each section hosts around 100 students. Our task is to calculate the average weight of students in the science department. Sounds simple, right? The approach I get from aspiring data scientists is to simply calculate the average: First, measure the weights of all the students in the science department. Add all the weights. Finally, divide the total sum of weights by the total number of students to get the average. But what if the size of the data is humongous? Does this approach make sense? Not really – measuring the weight of all the students will be a very tiresome and long process. So, what can we do instead? Let’s look at an alternate approach. First, draw groups of students at random from the class. We will call this a sample. We’ll draw multiple samples, each consisting of 30 students. Now, calculate the individual mean of these samples. Then, calculate the mean of these sample means. This value will give us the approximate mean weight of the students in the science department. Additionally, the histogram of the sample mean weights of students will resemble a bell curve (or normal distribution). Central Limit Theorem Formula The shape of the sampling distribution of the mean can be determined without repeatedly sampling a population. The parameters are based on the population: The meanĀ (μxˉ)( μ x ˉ​)Ā of the sampling distribution equals the mean of the populationĀ (μ)( μ ). The standard deviationĀ (σxˉ)( σ x ˉ​)Ā of the sampling distribution is the population standard deviationĀ (σ)( σ )Ā divided by the square root of the sample sizeĀ (n)( n ​). Notation: XĢ„ ~ N(μ, σ/√n) Where: Xˉ X ˉ is the sampling distribution of the sample means. ∼∼ means ā€œfollows the distribution.ā€ N N Ā is the normal distribution. μ μ Ā is the mean of the population. σ σ Ā is the standard deviation of the population. n n Ā is the sample size. Distribution of the Variable in the Population Part of the definition for the central limit theorem states, ā€œregardless of the variable’s distribution in the population.ā€ This part is easy! In a population, the values of a variable can follow different probability distributions. These distributions can range from normal, left-skewed, right-skewed, and uniform, among others. Normal: It is also known as the Gaussian distribution. It is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. Right-Skewed: It is also known as the positively skewed. Most of the data lie to the right/positive side of the graph peak. Left-Skewed: Most of the data lies on the left side of the graph at its peak than on its right. Uniform: It is a condition when the data is equally distributed across the graph. This part of the definition refers to the distribution of the variable’s values in the population from which you draw a random sample. The central limit theorem applies to almost all types of probability distributions , but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has an infinite variance. Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And the distribution of that variable must remain constant across all measurements. Conditions of the Central Limit Theorem The central limit theorem states that the sampling distribution of the mean will always follow a normal distribution under the following conditions: The sample size is sufficiently large . This condition is usually met if the size of the sample is n ≄ 30. The samples are independent and identically distributed, i.e., random variables . The sampling should be random. The population’s distribution has a finite variance . The central limit theorem doesn’t apply to distributions with infinite variance. Significance of the Central Limit Theorem The central limit theorem has both, statistical significance as well as practical applications. Isn’t that the sweet spot we aim for when we’re learning a new concept? As a data scientist, you should be able to deeply understand this theorem. You should be able to explain it and understand why it’s so important. Criteria for it to be valid and the details about the statistical inferences that can be made from it. We’ll look at both aspects to gauge where we can use them. Statistical Significance of CLT Analyzing data involves statistical methods like hypothesis testing and constructing confidence intervals. These methods assume that the population is normally distributed . In the case of unknown or non-normal distributions, we treat the sampling distribution as normal according to the central limit theorem. If we increase the samples drawn from the population, the standard deviation of sample means will decrease. This helps us estimate the mean of the population much more accurately. Also, the sample mean can be used to create the range of values known as a confidence interval (that is likely to consist of the population mean). Practical Applications of CLT The central limit theorem has many applications in different fields. Political/election polls are prime CLT applications. These polls estimate the percentage of people who support a particular candidate. You might have seen these results on news channels that come with confidence intervals. The central limit theorem helps calculate the same. Confidence interval, an application of CLT, is used to calculate the mean family income for a particular region. Assumptions Behind the Central Limit Theorem Before we dive into the implementation of the central limit theorem, it’s important to understand the assumptions behind this technique: The data must follow the randomization condition . It must be sampled randomly Samples should be independent of each other. One sample should not influence the other samples Sample size should be not more than 10% of the population when sampling is done without replacement The sample size should be sufficiently large . Now, how will we figure out how large this size should be? Well, it depends on the population. When the population is skewed or asymmetric, the sample size should be large. If the population is symmetric, then we can draw small samples as well. In general, a sample size of 30 is considered sufficient when the population is symmetric . The mean of the sample means is denoted as: µ XĢ„ = µ where, µ XĢ„ Ā = Mean of the sample means µ= Population mean And the standard deviation of the sample mean is denoted as: σ XĢ„ = σ/sqrt(n) where, σ XĢ„ Ā = Standard deviation of the sample mean σ = Standard deviation of the population n = sample size And that’s it for the concept behind the central limit theorem. Time to fire up RStudio and dig into CLT’s implementation! The central limit theorem has important implications in applied machine learning. This theorem does inform the solution to linear algorithms such as linear regression, but not for complex models like artificial neural networks(deep learning) because they are solved using numerical optimization methods. What Is Standard Error? It is also an important term that spurs from the sampling distribution, and it closely resembles the Central limit theorem. TheĀ  standard error.Ā  TheĀ  SD Ā of theĀ  distribution Ā is formed byĀ  sample means . Standard error Ā is used for almost all statistical tests. This is because it is a probabilistic measure that shows how well you approximated the truth. It decreases when the sample size increases. The bigger the samples, the better the approximation of the population. Implementing the Central Limit Theorem in R Are you excited to see how we can code the central limit theorem in R? Let’s dig in then. Understanding the Problem Statement A pipe manufacturing organization produces different kinds of pipes. We are given the monthly data of the wall thickness of certain types of pipes. You can download the data here . The organization wants to analyze the data by performing hypothesis testing and constructing confidence intervals to implement some strategies in the future.Ā The challenge is that the distribution of the data is not normal. Note: This analysis works on a few assumptions and one of them is that the data should be normally distributed. Solution Methodology The central limit theorem will help us get around the problem of this data where the population is not normal. Therefore, we will simulate the CLT on the given dataset in R step-by-step. So, let’s get started. First, import the CSV file in R and then validate the data for correctness: #Step 1 - Importing Data # ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ___ #Importing the csv data data<-read.csv(file.choose()) #Step 2 - Validate data for correctness #__ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ #Count of Rows and columns dim(data) #View top 10 rows of the dataset head(data,10) Output: #Count of Rows and columns 9000 1 #View top 10 rows of the dataset Wall.Thickness 1 12.35487 2 12.61742 3 12.36972 4 13.22335 5 13.15919 6 12.67549 7 12.36131 8 12.44468 9 12.62977 10 12.90381 #View last 10 rows of the dataset Wall.Thickness 8991 12.65444 8992 12.80744 8993 12.93295 8994 12.33271 8995 12.43856 8996 12.99532 8997 13.06003 8998 12.79500 8999 12.77742 9000 13.01416 Next,Ā  calculate the population mean and plot all the observations of the data. #Step 3 - Calculate the population mean and plot the observations # ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ____ ___ #Calculate the population mean mean(data$Wall.Thickness) #Plot all the observations in the data hist(data$Wall.Thickness,col = "pink",main = "Histogram for Wall Thickness",xlab = "wall thickness") abline(v=12.8,col="red",lty=1) Output: #Calculate the population mean [ 1 ] 12.80205 See the red vertical line above? That’s the population mean. WeĀ can also see from the above plot that the population is not normal, right? Therefore, we need to draw sufficient samples of different sizes and compute their means (known as sample means). We will then plot those sample means to get a normal distribution. In our example, we will draw m sample of size n sufficient samples of size 10, calculate their means, and plot them in R. I know that the minimum sample size taken should be 30, but let’s just see what happens when we draw 10: #We will take sample size=10, samples=9000 #Calculate the arithmetice mean and plot the mean of sample 9000 times s10<- c () n= 9000 for (i in 1 :n) { s10[i] = mean ( sample (data $Wall .Thickness, 10 , replace = TRUE ))} hist (s10, col = "lightgreen" , main= "Sample size =10" ,xlab = "wall thickness" ) abline (v = mean (s10), col = "Red" ) abline (v = 12.8 , col = "blue" ) Now, we know that we’ll get a very nice bell-shaped curve as the sample sizes increase. Let us now increase our sample size and see what we get: #We will take sample size=30, 50 & 500 samples=9000 #Calculate the arithmetice mean and plot the mean of sample 9000 times s30 <- c () s50 <- c () s500 <- c () n = 9000 for ( i in 1 :n){ s30[i] = mean ( sample (data $Wall .Thickness, 30 , replace = TRUE )) s50[i] = mean ( sample (data $Wall .Thickness, 50 , replace = TRUE )) s500[i] = mean ( sample (data $Wall .Thickness, 500 , replace = TRUE )) } par (mfrow= c ( 1 , 3 )) hist (s30, col = "lightblue" ,main= "Sample size=30" ,xlab = "wall thickness" ) abline (v = mean (s30), col = "red" ) hist (s50, col = "lightgreen" , main= "Sample size=50" ,xlab = "wall thickness" ) abline (v = mean (s50), col = "red" ) hist (s500, col = "orange" ,main= "Sample size=500" ,xlab = "wall thickness" ) abline (v = mean (s500), col = "red" ) Here, we get a good bell-shaped curve, and the sampling distribution approaches the normal distribution as the sample sizes increase. Therefore, we can consider the sampling distributions as normal, and the pipe manufacturing organization can use these distributions for further analysis. You can also play around by taking different sample sizes and drawing a different number of samples. Let me know how it works out for you! Conclusion The central limit theorem is quite an important concept in statistics and, consequently, data science, which also helps in understanding other properties such as skewness and kurtosis . I cannot stress enough how critical it is to brush up on your statistics knowledge before getting into data science or even sitting for a data science interview. I recommend taking the Introduction to Data Science course – it’s a comprehensive look at statistics before introducing data science. Key Takeaways The central limit theorem says that the sampling distribution of the mean will always be normally distributed until the sample size is large enough. Sampling should be random. The samples should not relate to one another. One sample shouldn’t affect the others. Frequently Asked Questions? Q1. Is there a formula for central limit theorem? A. Yes, the central limit theorem (CLT) does have a formula. It states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. Q2. What are the three points of the central limit theorem? A. The three key points of the central limit theorem are: 1. Regardless of the shape of the population distribution, the sampling distribution of the sample mean will approach a normal distribution as the sample size increases. 2. The mean of the sampling distribution will be equal to the population mean. 3. The standard deviation of the sampling distribution (also known as the standard error) decreases as the sample size increases. Q3. Why central limit theorem is called central? A. The central limit theorem is called ā€œcentralā€ because it is fundamental in statistics and serves as a central pillar for many statistical techniques. It is central in the sense that it allows statisticians to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal. Q4. What is central limit type theorem? A. A central limit type theorem is a generalization or extension of the classical central limit theorem to situations where the conditions of the classical CLT may not hold exactly. These theorems provide conditions under which the distribution of a sum or average of independent and identically distributed random variables approaches a normal distribution, even if the variables themselves are not identically distributed or if they have heavy-tailed distributions. I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together.Ā  Thanks for stopping by my profile - hope you found something you liked :)
Markdown
We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our [Privacy Policy](https://www.analyticsvidhya.com/privacy-policy) & [Cookies Policy](https://www.analyticsvidhya.com/cookies-policy). Show details Accept all cookies Use necessary cookies [Master Generative AI with 10+ Real-world Projects in 2026! d h m s Download Projects](https://www.analyticsvidhya.com/pinnacleplus/pinnacleplus-projects?utm_source=blog_india&utm_medium=desktop_flashstrip&utm_campaign=15-Feb-2025||&utm_content=projects) [![Analytics Vidhya](https://www.analyticsvidhya.com/wp-content/themes/analytics-vidhya/icon/av-logo-svg.svg)](https://www.analyticsvidhya.com/blog/) - [Free Courses](https://www.analyticsvidhya.com/courses/?ref=Navbar) - [Accelerator Program](https://www.analyticsvidhya.com/ai-accelerator-program/?utm_source=blog&utm_medium=navbar) New - [GenAI Pinnacle Plus](https://www.analyticsvidhya.com/pinnacleplus/?ref=blognavbar) - [Agentic AI Pioneer](https://www.analyticsvidhya.com/agenticaipioneer/?ref=blognavbar) - [DHS 2026](https://www.analyticsvidhya.com/datahacksummit?utm_source=blog&utm_medium=navbar) - Login - Switch Mode - [Logout](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/) [Interview Prep](https://www.analyticsvidhya.com/blog/category/interview-questions/?ref=category) [Career](https://www.analyticsvidhya.com/blog/category/career/?ref=category) [GenAI](https://www.analyticsvidhya.com/blog/category/generative-ai/?ref=category) [Prompt Engg](https://www.analyticsvidhya.com/blog/category/prompt-engineering/?ref=category) [ChatGPT](https://www.analyticsvidhya.com/blog/category/chatgpt/?ref=category) [LLM](https://www.analyticsvidhya.com/blog/category/llms/?ref=category) [Langchain](https://www.analyticsvidhya.com/blog/category/langchain/?ref=category) [RAG](https://www.analyticsvidhya.com/blog/category/rag/?ref=category) [AI Agents](https://www.analyticsvidhya.com/blog/category/ai-agent/?ref=category) [Machine Learning](https://www.analyticsvidhya.com/blog/category/machine-learning/?ref=category) [Deep Learning](https://www.analyticsvidhya.com/blog/category/deep-learning/?ref=category) [GenAI Tools](https://www.analyticsvidhya.com/blog/category/ai-tools/?ref=category) [LLMOps](https://www.analyticsvidhya.com/blog/category/llmops/?ref=category) [Python](https://www.analyticsvidhya.com/blog/category/python/?ref=category) [NLP](https://www.analyticsvidhya.com/blog/category/nlp/?ref=category) [SQL](https://www.analyticsvidhya.com/blog/category/sql/?ref=category) [AIML Projects](https://www.analyticsvidhya.com/blog/category/project/?ref=category) #### Reading list ##### Basics of Machine Learning [Machine Learning Basics for a Newbie](https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/) ##### Machine Learning Lifecycle [6 Steps of Machine learning Lifecycle](https://www.analyticsvidhya.com/blog/2020/09/10-things-know-before-first-data-science-project/)[Introduction to Predictive Modeling](https://www.analyticsvidhya.com/blog/2015/09/build-predictive-model-10-minutes-python/) ##### Importance of Stats and EDA [Introduction to Exploratory Data Analysis & Data Insights](https://www.analyticsvidhya.com/blog/2021/02/introduction-to-exploratory-data-analysis-eda/)[Descriptive Statistics](https://www.analyticsvidhya.com/blog/2021/06/how-to-learn-mathematics-for-machine-learning-what-concepts-do-you-need-to-master-in-data-science/)[Inferential Statistics](https://www.analyticsvidhya.com/blog/2017/01/comprehensive-practical-guide-inferential-statistics-data-science/)[How to Understand Population Distributions?](https://www.analyticsvidhya.com/blog/2014/07/statistics/) ##### Understanding Data [Reading Data Files into Python](https://www.analyticsvidhya.com/blog/2021/09/how-to-extract-tabular-data-from-doc-files-using-python/)[Different Variable Datatypes](https://www.analyticsvidhya.com/blog/2021/06/complete-guide-to-data-types-in-statistics-for-data-science/) ##### Probability [Probability for Data Science](https://www.analyticsvidhya.com/blog/2021/03/statistics-for-data-science/)[Basic Concepts of Probability](https://www.analyticsvidhya.com/blog/2017/04/40-questions-on-probability-for-all-aspiring-data-scientists/)[Axioms of Probability](https://www.analyticsvidhya.com/blog/2017/02/basic-probability-data-science-with-examples/)[Conditional Probability](https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/) ##### Exploring Continuous Variable [Central Tendencies for Continuous Variables](https://www.analyticsvidhya.com/blog/2021/07/the-measure-of-central-tendencies-in-statistics-a-beginners-guide/)[Spread of Data](https://www.analyticsvidhya.com/blog/2021/04/dispersion-of-data-range-iqr-variance-standard-deviation/)[KDE plots for Continuous Variable](https://www.analyticsvidhya.com/blog/2020/07/univariate-analysis-visualization-with-illustrations-in-python/)[Overview of Distribution for Continuous variables](https://www.analyticsvidhya.com/blog/2015/11/8-ways-deal-continuous-variables-predictive-modeling/)[Normal Distribution](https://www.analyticsvidhya.com/blog/2020/04/statistics-data-science-normal-distribution/)[Skewed Distribution](https://www.analyticsvidhya.com/blog/2021/05/how-to-transform-features-into-normal-gaussian-distribution/)[Skeweness and Kurtosis](https://www.analyticsvidhya.com/blog/2020/07/what-is-skewness-statistics/)[Distribution for Continuous Variable](https://www.analyticsvidhya.com/blog/2021/07/probability-types-of-probability-distribution-functions/) ##### Exploring Categorical Variables [Central Tendencies for Categorical Variables](https://www.analyticsvidhya.com/blog/2021/04/3-central-tendency-measures-mean-mode-median/)[Understanding Discrete Distributions](https://www.analyticsvidhya.com/blog/2021/01/discrete-probability-distributions/)[Performing EDA on Categorical Variables](https://www.analyticsvidhya.com/blog/2020/08/exploratory-data-analysiseda-from-scratch-in-python/) ##### Missing Values and Outliers [Dealing with Missing Values](https://www.analyticsvidhya.com/blog/2021/05/dealing-with-missing-values-in-python-a-complete-guide/)[Understanding Outliers](https://www.analyticsvidhya.com/blog/2021/05/detecting-and-treating-outliers-treating-the-odd-one-out/)[Identifying Outliers in Data](https://www.analyticsvidhya.com/blog/2021/07/how-to-treat-outliers-in-a-data-set/)[Outlier Detection in Python](https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/)[Outliers Detection Using IQR, Z-score, LOF and DBSCAN](https://www.analyticsvidhya.com/blog/2022/08/dealing-with-outliers-using-the-z-score-method/) ##### Central Limit theorem [Sample and Population](https://www.analyticsvidhya.com/blog/2021/06/introductory-statistics-for-data-science/)[Central Limit Theorem](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/)[Confidence Interval and Margin of Error](https://www.analyticsvidhya.com/blog/2021/08/intermediate-statistical-concepts-for-data-science/) ##### Bivariate Analysis Introduction [Bivariate Analysis Introduction](https://www.analyticsvidhya.com/blog/2021/04/top-python-libraries-to-automate-exploratory-data-analysis-in-2021/) ##### Continuous - Continuous Variables [Covariance](https://www.analyticsvidhya.com/blog/2021/09/different-type-of-correlation-metrics-used-by-data-scientist/)[Pearson Correlation](https://www.analyticsvidhya.com/blog/2021/01/beginners-guide-to-pearsons-correlation-coefficient/)[Spearman's Correlation & Kendall's Tau](https://www.analyticsvidhya.com/blog/2021/03/comparison-of-pearson-and-spearman-correlation-coefficients/)[Correlation versus Causation](https://www.analyticsvidhya.com/blog/2015/06/establish-causality-events/)[Tabular and Graphical methods for Bivariate Analysis](https://www.analyticsvidhya.com/blog/2020/10/the-clever-ingredient-that-decide-the-rise-and-the-fall-of-your-machine-learning-model-exploratory-data-analysis/)[Performing Bivariate Analysis on Continuous-Continuous Variables](https://www.analyticsvidhya.com/blog/2022/03/exploratory-data-analysis-eda-credit-card-fraud-detection-case-study/) ##### Continuous Categorical [Tabular and Graphical methods for Continuous-Categorical Variables](https://www.analyticsvidhya.com/blog/2015/05/data-visualization-resource/)[Introduction to Hypothesis Testing](https://www.analyticsvidhya.com/blog/2021/09/hypothesis-testing-in-machine-learning-everything-you-need-to-know/)[P-value](https://www.analyticsvidhya.com/blog/2019/09/everything-know-about-p-value-from-scratch-data-science/)[Two sample Z-test](https://www.analyticsvidhya.com/blog/2015/09/hypothesis-testing-explained/)[T-test](https://www.analyticsvidhya.com/blog/2020/06/statistics-analytics-hypothesis-testing-z-test-t-test/)[T-test vs Z-test](https://www.analyticsvidhya.com/blog/2021/06/feature-selection-using-statistical-tests/)[Performing Bivariate Analysis on Continuous-Catagorical variables](https://www.analyticsvidhya.com/blog/2021/06/eda-exploratory-data-analysis-with-python/) ##### Categorical Categorical [Chi-Squares Test](https://www.analyticsvidhya.com/blog/2019/11/what-is-chi-square-test-how-it-works/)[Bivariate Analysis on Categorical Categorical Variables](https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-using-data-visualization-techniques/) ##### Multivariate Analysis [Multivariate Analysis](https://www.analyticsvidhya.com/blog/2020/10/exploratory-data-analysis-the-go-to-technique-to-explore-your-data/)[A Comprehensive Guide to Data Exploration](https://www.analyticsvidhya.com/blog/2015/04/comprehensive-guide-data-exploration-sas-using-python-numpy-scipy-matplotlib-pandas/)[The Data Science behind IPL](https://www.analyticsvidhya.com/blog/2020/02/network-analysis-ipl-data/) ##### Different tasks in Machine Learning [Supervised Learning vs Unsupervised Learning](https://www.analyticsvidhya.com/blog/2021/05/5-regression-algorithms-you-should-know-introductory-guide/)[Reinforcement Learning](https://www.analyticsvidhya.com/blog/2017/01/introduction-to-reinforcement-learning-implementation/)[Generative and Descriminative Models](https://www.analyticsvidhya.com/blog/2021/07/deep-understanding-of-discriminative-and-generative-models-in-machine-learning/)[Parametric and Non Parametric model](https://www.analyticsvidhya.com/blog/2021/06/hypothesis-testing-parametric-and-non-parametric-tests-in-statistics/) ##### Build Your First Predictive Model [Machine Learning Pipeline](https://www.analyticsvidhya.com/blog/2020/01/build-your-first-machine-learning-pipeline-using-scikit-learn/)[Preparing Dataset](https://www.analyticsvidhya.com/blog/2020/12/tutorial-to-data-preparation-for-training-machine-learning-model/)[Build a Benchmark Model: Regression](https://www.analyticsvidhya.com/blog/2021/02/build-your-first-linear-regression-machine-learning-model/)[Build a Benchmark Model: Classification](https://www.analyticsvidhya.com/blog/2021/04/wine-quality-prediction-using-machine-learning/) ##### Evaluation Metrics [Evaluation Metrics for Machine Learning Everyone should know](https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/)[Confusion Matrix](https://www.analyticsvidhya.com/blog/2020/04/confusion-matrix-machine-learning/)[Accuracy](https://www.analyticsvidhya.com/blog/2021/06/classification-problem-relation-between-sensitivity-specificity-and-accuracy/)[Precision and Recall](https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/)[AUC-ROC](https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/)[Log Loss](https://www.analyticsvidhya.com/blog/2019/08/detailed-guide-7-loss-functions-machine-learning-python-code/)[R2 and Adjusted R2](https://www.analyticsvidhya.com/blog/2020/07/difference-between-r-squared-and-adjusted-r-squared/) ##### Preprocessing Data [Dealing with Missing Values](https://www.analyticsvidhya.com/blog/2022/10/handling-missing-data-with-simpleimputer/)[Replacing Missing Values](https://www.analyticsvidhya.com/blog/2021/06/defining-analysing-and-implementing-imputation-techniques/)[Imputing Missing Values in Data](https://www.analyticsvidhya.com/blog/2020/07/knnimputer-a-robust-way-to-impute-missing-values-using-scikit-learn/)[Working with Categorical Variables](https://www.analyticsvidhya.com/blog/2015/11/easy-methods-deal-categorical-variables-predictive-modeling/)[Working with Outliers](https://www.analyticsvidhya.com/blog/2021/03/zooming-out-a-look-at-outlier-and-how-to-deal-with-them-indata-science/)[Preprocessing Data for Model Building](https://www.analyticsvidhya.com/blog/2021/08/data-preprocessing-in-data-mining-a-hands-on-guide/) ##### Linear Models [Understanding Cost Function](https://www.analyticsvidhya.com/blog/2021/02/cost-function-is-no-rocket-science/)[Understanding Gradient Descent](https://www.analyticsvidhya.com/blog/2020/10/how-does-the-gradient-descent-algorithm-work-in-machine-learning/)[Math Behind Gradient Descent](https://www.analyticsvidhya.com/blog/2017/03/introduction-to-gradient-descent-algorithm-along-its-variants/)[Assumptions of Linear Regression](https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/)[Implement Linear Regression from Scratch](https://www.analyticsvidhya.com/blog/2021/05/all-you-need-to-know-about-your-first-machine-learning-model-linear-regression/)[Train Linear Regression in Python](https://www.analyticsvidhya.com/blog/2021/05/multiple-linear-regression-using-python-and-scikit-learn/)[Implementing Linear Regression in R](https://www.analyticsvidhya.com/blog/2020/12/predicting-using-linear-regression-in-r/)[Diagnosing Residual Plots in Linear Regression Models](https://www.analyticsvidhya.com/blog/2013/12/residual-plots-regression-model/)[Generalized Linear Models](https://www.analyticsvidhya.com/blog/2021/10/everything-you-need-to-know-about-linear-regression/)[Introduction to Logistic Regression](https://www.analyticsvidhya.com/blog/2017/08/skilltest-logistic-regression/)[Odds Ratio](https://www.analyticsvidhya.com/blog/2021/08/conceptual-understanding-of-logistic-regression-for-data-science-beginners/)[Implementing Logistic Regression from Scratch](https://www.analyticsvidhya.com/blog/2020/12/beginners-take-how-logistic-regression-is-related-to-linear-regression/)[Introduction to Scikit-learn in Python](https://www.analyticsvidhya.com/blog/2015/01/scikit-learn-python-machine-learning-tool/)[Train Logistic Regression in python](https://www.analyticsvidhya.com/blog/2022/01/logistic-regression-an-introductory-note/)[Multiclass using Logistic Regression](https://www.analyticsvidhya.com/blog/2021/05/20-questions-to-test-your-skills-on-logistic-regression/)[How to use Multinomial and Ordinal Logistic Regression in R ?](https://www.analyticsvidhya.com/blog/2016/02/multinomial-ordinal-logistic-regression/)[Challenges with Linear Regression](https://www.analyticsvidhya.com/blog/2017/07/30-questions-to-test-a-data-scientist-on-linear-regression/)[Introduction to Regularisation](https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-tutorial/)[Implementing Regularisation](https://www.analyticsvidhya.com/blog/2021/11/study-of-regularization-techniques-of-linear-model-and-its-roles/)[Ridge Regression](https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide-for-linear-ridge-and-lasso-regression/)[Lasso Regression](https://www.analyticsvidhya.com/blog/2021/09/lasso-and-ridge-regularization-a-rescuer-from-overfitting/) ##### KNN [Introduction to K Nearest Neighbours](https://www.analyticsvidhya.com/blog/2017/09/30-questions-test-k-nearest-neighbors-algorithm/)[Determining the Right Value of K in KNN](https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/)[Implement KNN from Scratch](https://www.analyticsvidhya.com/blog/2021/04/simple-understanding-and-implementation-of-knn-algorithm/)[Implement KNN in Python](https://www.analyticsvidhya.com/blog/2018/08/k-nearest-neighbor-introduction-regression-python/) ##### Selecting the Right Model [Bias Variance Tradeoff](https://www.analyticsvidhya.com/blog/2020/08/bias-and-variance-tradeoff-machine-learning/)[Introduction to Overfitting and Underfitting](https://www.analyticsvidhya.com/blog/2020/02/underfitting-overfitting-best-fitting-machine-learning/)[Visualizing Overfitting and Underfitting](https://www.analyticsvidhya.com/blog/2015/02/avoid-over-fitting-regularization/)[Selecting the Right Model](https://www.analyticsvidhya.com/blog/2021/07/how-to-choose-an-appropriate-ml-algorithm-data-science-projects/)[What is Validation?](https://www.analyticsvidhya.com/blog/2018/05/improve-model-performance-cross-validation-in-python-r/)[Hold-Out Validation](https://www.analyticsvidhya.com/blog/2022/02/k-fold-cross-validation-technique-and-its-essentials/)[Understanding K Fold Cross Validation](https://www.analyticsvidhya.com/blog/2021/03/introduction-to-k-fold-cross-validation-in-r/) ##### Feature Selection Techniques [Introduction to Feature Selection](https://www.analyticsvidhya.com/blog/2020/10/feature-selection-techniques-in-machine-learning/)[Feature Selection Algorithms](https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-methods-with-an-example-or-how-to-select-the-right-variables/)[Missing Value Ratio](https://www.analyticsvidhya.com/blog/2021/04/beginners-guide-to-missing-value-ratio-and-its-implementation/)[Low Variance Filter](https://www.analyticsvidhya.com/blog/2021/04/beginners-guide-to-low-variance-filter-and-its-implementation/)[High Correlation Filter](https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/)[Backward Feature Elimination](https://www.analyticsvidhya.com/blog/2020/10/a-comprehensive-guide-to-feature-selection-using-wrapper-methods-in-python/)[Forward Feature Selection](https://www.analyticsvidhya.com/blog/2021/04/discovering-the-shades-of-feature-selection-methods/)[Implement Feature Selection in Python](https://www.analyticsvidhya.com/blog/2021/04/forward-feature-selection-and-its-implementation/)[Implement Feature Selection in R](https://www.analyticsvidhya.com/blog/2016/03/select-important-variables-boruta-package/) ##### Decision Tree [Introduction to Decision Tree](https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/)[Purity in Decision Tree](https://www.analyticsvidhya.com/blog/2021/03/how-to-select-best-split-in-decision-trees-gini-impurity/)[Terminologies Related to Decision Tree](https://www.analyticsvidhya.com/blog/2022/04/complete-flow-of-decision-tree-algorithm/)[How to Select Best Split Point in Decision Tree?](https://www.analyticsvidhya.com/blog/2020/06/4-ways-split-decision-tree/)[Chi-Squares](https://www.analyticsvidhya.com/blog/2021/03/how-to-select-best-split-in-decision-trees-using-chi-square/)[Information Gain](https://www.analyticsvidhya.com/blog/2021/05/25-questions-to-test-your-skills-on-decision-trees/)[Reduction in Variance](https://www.analyticsvidhya.com/blog/2015/07/dimension-reduction-methods/)[Optimizing Performance of Decision Tree](https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/)[Train Decision Tree using Scikit Learn](https://www.analyticsvidhya.com/blog/2021/04/beginners-guide-to-decision-tree-classification-using-python/)[Pruning of Decision Trees](https://www.analyticsvidhya.com/blog/2020/10/cost-complexity-pruning-decision-trees/) ##### Feature Engineering [Introduction to Feature Engineering](https://www.analyticsvidhya.com/blog/2021/03/step-by-step-process-of-feature-engineering-for-machine-learning-algorithms-in-data-science/)[Feature Transformation](https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/)[Feature Scaling](https://www.analyticsvidhya.com/blog/2020/12/feature-engineering-feature-improvements-scaling/)[Feature Engineering](https://www.analyticsvidhya.com/blog/2018/08/guide-automated-feature-engineering-featuretools-python/)[Frequency Encoding](https://www.analyticsvidhya.com/blog/2021/05/complete-guide-on-encode-numerical-features-in-machine-learning/)[Automated Feature Engineering: Feature Tools](https://www.analyticsvidhya.com/blog/2020/06/feature-engineering-guide-data-science-hackathons/) ##### Naive Bayes [Introduction to Naive Bayes](https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/)[Conditional Probability and Bayes Theorem](https://www.analyticsvidhya.com/blog/2021/09/naive-bayes-algorithm-a-complete-guide-for-data-science-enthusiasts/)[Introduction to Bayesian Adjustment Rating: The Incredible Concept Behind Online Ratings\!](https://www.analyticsvidhya.com/blog/2019/07/introduction-online-rating-systems-bayesian-adjusted-rating/)[Working of Naive Bayes](https://www.analyticsvidhya.com/blog/2022/03/building-naive-bayes-classifier-from-scratch-to-perform-sentiment-analysis/)[Math behind Naive Bayes](https://www.analyticsvidhya.com/blog/2021/01/a-guide-to-the-naive-bayes-algorithm/)[Types of Naive Bayes](https://www.analyticsvidhya.com/blog/2022/10/frequently-asked-interview-questions-on-naive-bayes-classifier/)[Implementation of Naive Bayes](https://www.analyticsvidhya.com/blog/2021/03/introduction-to-naive-bayes-algorithm/) ##### Multiclass and Multilabel [Understanding how to solve Multiclass and Multilabled Classification Problem](https://www.analyticsvidhya.com/blog/2021/07/demystifying-the-difference-between-multi-class-and-multi-label-classification-problem-statements-in-deep-learning/)[Evaluation Metrics: Multi Class Classification](https://www.analyticsvidhya.com/blog/2021/06/confusion-matrix-for-multi-class-classification/) ##### Basics of Ensemble Techniques [Introduction to Ensemble Techniques](https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/)[Basic Ensemble Techniques](https://www.analyticsvidhya.com/blog/2021/08/ensemble-stacking-for-machine-learning-and-deep-learning/)[Implementing Basic Ensemble Techniques](https://www.analyticsvidhya.com/blog/2021/01/exploring-ensemble-learning-in-machine-learning-world/)[Finding Optimal Weights of Ensemble Learner using Neural Network](https://www.analyticsvidhya.com/blog/2015/08/optimal-weights-ensemble-learner-neural-network/)[Why Ensemble Models Work well?](https://www.analyticsvidhya.com/blog/2021/10/ensemble-modeling-for-neural-networks-using-large-datasets-simplified/) ##### Advance Ensemble Techniques [Introduction to Stacking](https://www.analyticsvidhya.com/blog/2020/10/how-to-use-stacking-to-choose-the-best-possible-algorithm/)[Implementing Stacking](https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/)[Variants of Stacking](https://www.analyticsvidhya.com/blog/2020/12/improve-predictive-model-score-stacking-regressor/)[Implementing Variants of Stacking](https://www.analyticsvidhya.com/blog/2021/03/advanced-ensemble-learning-technique-stacking-and-its-variants/)[Introduction to Blending](https://www.analyticsvidhya.com/blog/2021/03/basic-ensemble-technique-in-machine-learning/)[Bootstrap Sampling](https://www.analyticsvidhya.com/blog/2020/02/what-is-bootstrap-sampling-in-statistics-and-machine-learning/)[Introduction to Random Sampling](https://www.analyticsvidhya.com/blog/2019/09/data-scientists-guide-8-types-of-sampling-techniques/)[Hyper-parameters of Random Forest](https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/)[Implementing Random Forest](https://www.analyticsvidhya.com/blog/2018/10/interpret-random-forest-model-machine-learning-programmers/)[Out-of-Bag (OOB) Score in the Random Forest](https://www.analyticsvidhya.com/blog/2020/12/out-of-bag-oob-score-in-the-random-forest-algorithm/)[IPL Team Win Prediction Project Using Machine Learning](https://www.analyticsvidhya.com/blog/2022/05/ipl-team-win-prediction-project-using-machine-learning/)[Introduction to Boosting](https://www.analyticsvidhya.com/blog/2021/09/adaboost-algorithm-a-complete-guide-for-beginners/)[Gradient Boosting Algorithm](https://www.analyticsvidhya.com/blog/2022/01/boosting-in-machine-learning-definition-functions-types-and-features/)[Math behind GBM](https://www.analyticsvidhya.com/blog/2020/02/4-boosting-algorithms-machine-learning/)[Implementing GBM in python](https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/)[Regularized Greedy Forests](https://www.analyticsvidhya.com/blog/2021/04/distinguish-between-tree-based-machine-learning-algorithms/)[Extreme Gradient Boosting](https://www.analyticsvidhya.com/blog/2018/09/an-end-to-end-guide-to-understand-the-math-behind-xgboost/)[Implementing XGBM in python](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/)[Tuning Hyperparameters of XGBoost in Python](https://www.analyticsvidhya.com/blog/2021/06/5-hyperparameter-optimization-techniques-you-must-know-for-data-science-hackathons/)[Implement XGBM in R/H2O](https://www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/)[Adaptive Boosting](https://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/)[Implementing Adaptive Boosing](https://www.analyticsvidhya.com/blog/2021/03/introduction-to-adaboost-algorithm-with-python-implementation/)[LightGBM](https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/)[Implementing LightGBM in Python](https://www.analyticsvidhya.com/blog/2021/08/complete-guide-on-how-to-use-lightgbm-in-python/)[Catboost](https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/)[Implementing Catboost in Python](https://www.analyticsvidhya.com/blog/2021/04/how-to-use-catboost-for-mental-fatigue-score-prediction/) ##### Hyperparameter Tuning [Different Hyperparameter Tuning methods](https://www.analyticsvidhya.com/blog/2021/04/evaluating-machine-learning-models-hyperparameter-tuning/)[Implementing Different Hyperparameter Tuning methods](https://www.analyticsvidhya.com/blog/2021/10/an-effective-approach-to-hyper-parameter-tuning-a-beginners-guide/)[GridsearchCV](https://www.analyticsvidhya.com/blog/2021/06/tune-hyperparameters-with-gridsearchcv/)[RandomizedsearchCV](https://www.analyticsvidhya.com/blog/2022/11/hyperparameter-tuning-using-randomized-search/)[Bayesian Optimization for Hyperparameter Tuning](https://www.analyticsvidhya.com/blog/2020/09/alternative-hyperparameter-optimization-technique-you-need-to-know-hyperopt/)[Hyperopt](https://www.analyticsvidhya.com/blog/2021/05/bayesian-optimization-bayes_opt-or-hyperopt/) ##### Support Vector Machine [Understanding SVM Algorithm](https://www.analyticsvidhya.com/blog/2020/03/support-vector-regression-tutorial-for-machine-learning/)[SVM Kernels In-depth Intuition and Practical Implementation](https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-a-complete-guide-for-beginners/)[SVM Kernel Tricks](https://www.analyticsvidhya.com/blog/2021/06/support-vector-machine-better-understanding/)[Kernels and Hyperparameters in SVM](https://www.analyticsvidhya.com/blog/2021/05/support-vector-machines/)[Implementing SVM from Scratch in Python and R](https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/) ##### Advance Dimensionality Reduction [Introduction to Principal Component Analysis](https://www.analyticsvidhya.com/blog/2021/02/diminishing-the-dimensions-with-pca/)[Steps to Perform Principal Compound Analysis](https://www.analyticsvidhya.com/blog/2020/12/an-end-to-end-comprehensive-guide-for-pca/)[Computation of Covariance Matrix](https://www.analyticsvidhya.com/blog/2021/05/simplifying-maths-behind-pca/)[Finding Eigenvectors and Eigenvalues](https://www.analyticsvidhya.com/blog/2021/09/pca-and-its-underlying-mathematical-principles/)[Implementing PCA in python](https://www.analyticsvidhya.com/blog/2016/03/pca-practical-guide-principal-component-analysis-python/)[Visualizing PCA](https://www.analyticsvidhya.com/blog/2021/02/visualizing-pca-in-r-programming-with-factoshiny/)[A Brief Introduction to Linear Discriminant Analysis](https://www.analyticsvidhya.com/blog/2021/08/a-brief-introduction-to-linear-discriminant-analysis/)[Introduction to Factor Analysis](https://www.analyticsvidhya.com/blog/2020/10/dimensionality-reduction-using-factor-analysis-in-python/) ##### Unsupervised Machine Learning Methods [Introduction to Clustering](https://www.analyticsvidhya.com/blog/2020/11/introduction-to-clustering-in-python-for-beginners-in-data-science/)[Applications of Clustering](https://www.analyticsvidhya.com/blog/2022/11/hierarchical-clustering-in-machine-learning/)[Evaluation Metrics for Clustering](https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/)[Understanding K-Means](https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/)[Implementation of K-Means in Python](https://www.analyticsvidhya.com/blog/2021/04/k-means-clustering-simplified-in-python/)[Implementation of K-Means in R](https://www.analyticsvidhya.com/blog/2021/04/beginners-guide-to-clustering-in-r-program/)[Choosing Right Value for K](https://www.analyticsvidhya.com/blog/2021/01/in-depth-intuition-of-k-means-clustering-algorithm-in-machine-learning/)[Profiling Market Segments using K-Means Clustering](https://www.analyticsvidhya.com/blog/2020/10/a-definitive-guide-for-predicting-customer-lifetime-value-clv/)[Hierarchical Clustering](https://www.analyticsvidhya.com/blog/2021/06/single-link-hierarchical-clustering-clearly-explained/)[Implementation of Hierarchial Clustering](https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/)[DBSCAN](https://www.analyticsvidhya.com/blog/2020/09/how-dbscan-clustering-works/)[Defining Similarity between clusters](https://www.analyticsvidhya.com/blog/2017/02/test-data-scientist-clustering/)[Build Better and Accurate Clusters with Gaussian Mixture Models](https://www.analyticsvidhya.com/blog/2019/10/gaussian-mixture-models-clustering/) ##### Recommendation Engines [Understand Basics of Recommendation Engine with Case Study](https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-recommendation-engine-python/) ##### Improving ML models [8 Ways to Improve Accuracy of Machine Learning Models](https://www.analyticsvidhya.com/blog/2015/12/improve-machine-learning-results/) ##### Working with Large Datasets [Introduction to Dask](https://www.analyticsvidhya.com/blog/2018/08/dask-big-datasets-machine_learning-python/)[Working with CuML](https://www.analyticsvidhya.com/blog/2022/01/cuml-blazing-fast-machine-learning-model-training-with-nvidias-rapids/) ##### Interpretability of Machine Learning Models [Introduction to Machine Learning Interpretability](https://www.analyticsvidhya.com/blog/2021/06/beginners-guide-to-machine-learning-explainability/)[Framework and Interpretable Models](https://www.analyticsvidhya.com/blog/2017/06/building-trust-in-machine-learning-models/)[model Agnostic Methods for Interpretability](https://www.analyticsvidhya.com/blog/2021/01/explain-how-your-model-works-using-explainable-ai/)[Implementing Interpretable Model](https://www.analyticsvidhya.com/blog/2019/08/decoding-black-box-step-by-step-guide-interpretable-machine-learning-models-python/)[Understanding SHAP](https://www.analyticsvidhya.com/blog/2019/11/shapley-value-machine-learning-interpretability-game-theory/)[Out-of-Core ML](https://www.analyticsvidhya.com/blog/2022/09/out-of-core-ml-an-efficient-technique-to-handle-large-data/)[Introduction to Interpretable Machine Learning Models](https://www.analyticsvidhya.com/blog/2020/03/6-python-libraries-interpret-machine-learning-models/)[Model Agnostic Methods for Interpretability](https://www.analyticsvidhya.com/blog/2021/01/ml-interpretability-using-lime-in-r/)[Game Theory & Shapley Values](https://www.analyticsvidhya.com/blog/2019/12/game-theory-101-decision-making-normal-form-games/) ##### Automated Machine Learning [Introduction to AutoML](https://www.analyticsvidhya.com/blog/2021/04/does-the-popularity-of-automl-means-the-end-of-data-science-jobs/)[Implementation of MLBox](https://www.analyticsvidhya.com/blog/2017/07/mlbox-library-automated-machine-learning/)[Introduction to PyCaret](https://www.analyticsvidhya.com/blog/2021/07/anomaly-detection-using-isolation-forest-a-complete-guide/)[TPOT](https://www.analyticsvidhya.com/blog/2021/05/automate-machine-learning-using-tpot%20-%20explore-thousands-of-possible-pipelines-and-find-the-best/)[Auto-Sklearn](https://www.analyticsvidhya.com/blog/2021/10/beginners-guide-to-automl-with-an-easy-autogluon-example/)[EvalML](https://www.analyticsvidhya.com/blog/2021/04/breast-cancer-prediction-using-evalml/) ##### Model Deployment [Pickle and Joblib](https://www.analyticsvidhya.com/blog/2021/08/quick-hacks-to-save-machine-learning-model-using-pickle-and-joblib/)[Introduction to Model Deployment](https://www.analyticsvidhya.com/blog/2020/09/integrating-machine-learning-into-web-applications-with-flask/) ##### Deploying ML Models [Deploying Machine Learning Model using Streamlit](https://www.analyticsvidhya.com/blog/2021/06/build-web-app-instantly-for-machine-learning-using-streamlit/)[Deploying ML Models in Docker](https://www.analyticsvidhya.com/blog/2021/06/a-hands-on-guide-to-containerized-your-machine-learning-workflow-with-docker/)[Deploy Using Streamlit](https://www.analyticsvidhya.com/blog/2021/04/developing-data-web-streamlit-app/)[Deploy on Heroku](https://www.analyticsvidhya.com/blog/2021/06/deploy-your-ml-dl-streamlit-application-on-heroku/)[Deploy Using Netlify](https://www.analyticsvidhya.com/blog/2021/04/easily-deploy-your-machine-learning-model-into-a-web-app-netlify/)[Introduction to Amazon Sagemaker](https://www.analyticsvidhya.com/blog/2022/02/building-ml-model-in-aws-sagemaker/)[Setting up Amazon SageMaker](https://www.analyticsvidhya.com/blog/2022/01/huggingface-transformer-model-using-amazon-sagemaker/)[Using SageMaker Endpoint to Generate Inference](https://www.analyticsvidhya.com/blog/2020/11/deployment-of-ml-models-in-cloud-aws-sagemaker%20in-built-algorithms/)[Deploy on Microsoft Azure Cloud](https://www.analyticsvidhya.com/blog/2020/10/how-to-deploy-machine-learning-models-in-azure-cloud-with-the-help-of-python-and-flask/)[Introduction to Flask for Model](https://www.analyticsvidhya.com/blog/2021/10/easy-introduction-to-flask-framework-for-beginners/)[Deploying ML model using Flask](https://www.analyticsvidhya.com/blog/2020/04/how-to-deploy-machine-learning-model-flask/) ##### Embedded Devices [Model Deployment in Android](https://www.analyticsvidhya.com/blog/2015/12/18-mobile-apps-data-scientist-data-analysts/)[Model Deployment in Iphone](https://www.analyticsvidhya.com/blog/2019/11/introduction-apple-core-ml-3-deep-learning-models-iphone/) 1. [Home](https://www.analyticsvidhya.com/blog/) 2. [Statistics](https://www.analyticsvidhya.com/blog/category/statistics/) 3. Central Limit Theorem : Definition , Formula & Examples # Central Limit Theorem : Definition , Formula & Examples [![Himanshi Singh](https://av-eks-lekhak.s3.amazonaws.com/media/lekhak-profile-images/converted_image_WxTrFfG.webp)](https://www.analyticsvidhya.com/blog/author/hs13/) [Himanshi Singh](https://www.analyticsvidhya.com/blog/author/hs13/) Last Updated : 31 Mar, 2025 10 min read 16 What is one of the most important and core concepts of statistics that enables us to do predictive modeling, and yet it often confuses aspiring data scientists? Yes, I’m talking about the central limit theorem (CLT). It is a powerful statistical concept that every data scientist MUST know. Now, why is that? Well, the central limit theorem (CLT) is at the heart of hypothesis testing – a critical component of the data science and [machine learning lifecycle](https://www.analyticsvidhya.com/blog/2021/05/machine-learning-life-cycle-explained/). That’s right, the idea that lets us explore the vast possibilities of the data we are given springs from CLT. It’s actually a simple notion to understand, yet most data scientists flounder at this question during interviews. In this article, you will get to know all about the Central Limit Throrem its examples, formulas nad practical applications that will clear more concept about Central Limit Theorem. ## Table of contents 1. [What is Central Limit Theorem?](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Central_Limit_Theorem_Explained) 2. [Central Limit Theorem with Example](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#What_Is_the_Central_Limit_Theorem_\(CLT\)?) 3. [Central Limit Theorem Formula](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#h-central-limit-theorem-formula) 4. [Distribution of the Variable in the Population](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Distribution_of_the_Variable_in_the_Population) 5. [Conditions of the Central Limit Theorem](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Conditions_of_the_Central_Limit_Theorem) 6. [Significance of the Central Limit Theorem](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Significance_of_the_Central_Limit_Theorem) 7. [Practical Applications of CLT](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Practical_Applications_of_CLT) 8. [Assumptions Behind the Central Limit Theorem](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Assumptions_Behind_the_Central_Limit_Theorem) 9. [What Is Standard Error?](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#What_Is_Standard_Error?) 10. [Implementing the Central Limit Theorem in R](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Implementing_the_Central_Limit_Theorem_in_R) 11. [Conclusion](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Conclusion) 12. [Frequently Asked Questions?](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Frequently_Asked_Questions) [Free Certification Courses Data Science Bundle (4 Free Courses) Python • Machine Learning • Deep Learning • NLP • Data Engineering • Hands-on Projects • Expert-led Get Certified Now](https://www.analyticsvidhya.com/courses/learning-path/data-science-program/?utm_source=blog&utm_medium=free_course_banner&utm_term=free_course_auto_enrollment) ## What is Central Limit Theorem? The Central Limit Theorem (CLT) states that when large enough random samples are taken from any population (regardless of its original distribution), the distribution of the sample means will approximate a normal distribution (bell curve), with the mean equal to the population mean and the standard deviation decreasing as sample size increases. ## Central Limit Theorem with Example Let’s understand the central limit theorem with the help of an example. This will help you intuitively grasp how CLT works underneath. Consider that there are 15 sections in the science department of a university, and each section hosts around 100 students. Our task is to calculate the average weight of students in the science department. Sounds simple, right? The approach I get from aspiring data scientists is to simply calculate the average: - First, measure the weights of all the students in the science department. - Add all the weights. - Finally, divide the total sum of weights by the total number of students to get the average. But what if the size of the data is humongous? Does this approach make sense? Not really – measuring the weight of all the students will be a very tiresome and long process. So, what can we do instead? Let’s look at an alternate approach. - First, draw groups of students at random from the class. We will call this a sample. We’ll draw multiple samples, each consisting of 30 students. ![data and sample sizes \| central limit theorem](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/capture161268-673b1d329f670.webp) - Now, calculate the individual mean of these samples. - Then, calculate the mean of these sample means. - This value will give us the approximate mean weight of the students in the science department. - Additionally, the histogram of the sample mean weights of students will resemble a bell curve (or normal distribution). ## Central Limit Theorem Formula The shape of the sampling distribution of the mean can be determined without repeatedly sampling a population. The parameters are based on the population: - The mean (μxˉ)(*μ**x*ˉ​) of the sampling distribution equals the mean of the population (μ)(*μ*). - The standard deviation (σxˉ)(*σ**x*ˉ​) of the sampling distribution is the population standard deviation (σ)(*σ*) divided by the square root of the sample size (n)(*n*​). **Notation:** XĢ„ ~ N(μ, σ/√n) **Where:** - Xˉ*X*ˉ is the sampling distribution of the sample means. - ∼∼ means ā€œfollows the distribution.ā€ - N*N* is the normal distribution. - μ*μ* is the mean of the population. - σ*σ* is the standard deviation of the population. - n*n* is the sample size. ## Distribution of the Variable in the Population Part of the definition for the central limit theorem states, ā€œregardless of the variable’s distribution in the population.ā€ This part is easy! In a population, the values of a variable can follow different probability distributions. These distributions can range from normal, left-skewed, right-skewed, and uniform, among others. - **Normal:** It is also known as the Gaussian distribution. It is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. - **Right-Skewed:** It is also known as the positively skewed. Most of the data lie to the right/positive side of the graph peak. - **Left-Skewed:** Most of the data lies on the left side of the graph at its peak than on its right. - **Uniform:** It is a condition when the data is equally distributed across the graph. - This part of the definition refers to the distribution of the variable’s values in the population from which you draw a random sample. The central limit theorem applies to almost all types of **[probability distributions](https://www.analyticsvidhya.com/blog/2017/09/6-probability-distributions-data-science/)**, but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has an infinite variance. Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And the distribution of that variable must remain constant across all measurements. ## Conditions of the Central Limit Theorem The central limit theorem states that the sampling distribution of the mean will always follow a normal distribution under the following conditions: - The sample size is **sufficiently large**. This condition is usually met if the size of the sample is *n* ≄ 30. - The samples are **independent and identically distributed, i.e., [random variables](https://www.analyticsvidhya.com/blog/2021/05/understanding-random-variables-their-distributions/)**. The sampling should be random. - The population’s distribution has a **finite variance**. The central limit theorem doesn’t apply to distributions with infinite variance. ## Significance of the Central Limit Theorem The central limit theorem has both, statistical significance as well as practical applications. Isn’t that the sweet spot we aim for when we’re learning a new concept? As a data scientist, you should be able to deeply understand this theorem. You should be able to explain it and understand why it’s so important. Criteria for it to be valid and the details about the statistical inferences that can be made from it. We’ll look at both aspects to gauge where we can use them. #### Statistical Significance of CLT ![Statistical Significance of CLT](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/24-1.png) Analyzing data involves statistical methods like hypothesis testing and constructing confidence intervals. These methods assume that the population is **[normally distributed](https://www.analyticsvidhya.com/blog/2021/05/normal-distribution-an-ultimate-guide/)**. In the case of unknown or non-normal distributions, we treat the sampling distribution as normal according to the central limit theorem. If we increase the samples drawn from the population, the **[standard deviation](https://www.analyticsvidhya.com/blog/2024/06/standard-deviation-in-excel/)** of sample means will decrease. This helps us estimate the mean of the population much more accurately. Also, the sample mean can be used to create the range of values known as a confidence interval (that is likely to consist of the population mean). ## Practical Applications of CLT ![central limit theorem used for political prediction](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/25-673b1d30f3a7b.webp) The central limit theorem has many applications in different fields. Political/election polls are prime CLT applications. These polls estimate the percentage of people who support a particular candidate. You might have seen these results on news channels that come with confidence intervals. The central limit theorem helps calculate the same. Confidence interval, an application of CLT, is used to calculate the mean family income for a particular region. ## Assumptions Behind the Central Limit Theorem Before we dive into the implementation of the central limit theorem, it’s important to understand the assumptions behind this technique: - The **data must follow the randomization condition**. It must be sampled randomly - **Samples should be independent of each other.** One sample should not influence the other samples - **Sample size should be not more than 10% of the population** when sampling is done without replacement - The **sample size should be sufficiently large**. Now, how will we figure out how large this size should be? Well, it depends on the population. When the population is skewed or asymmetric, the sample size should be large. If the population is symmetric, then we can draw small samples as well. In general, **a sample size of 30 is considered sufficient when the population is symmetric**. The mean of the sample means is denoted as: **µ XĢ„ \= µ** where, - µ XĢ„ = Mean of the sample means - µ= Population mean And the standard deviation of the sample mean is denoted as: σ XĢ„ = σ/sqrt(n) where, - σ XĢ„ = Standard deviation of the sample mean - σ = Standard deviation of the population - n = sample size And that’s it for the concept behind the central limit theorem. Time to fire up RStudio and dig into CLT’s implementation\! The central limit theorem has important implications in applied machine learning. This theorem does inform the solution to linear algorithms such as linear regression, but not for complex models like artificial neural networks(deep learning) because they are solved using numerical optimization methods. ## What Is Standard Error? It is also an important term that spurs from the sampling distribution, and it closely resembles the Central limit theorem. The **standard error.** The **SD** of the **distribution** is formed by **sample means**. **Standard error** is used for almost all statistical tests. This is because it is a probabilistic measure that shows how well you approximated the truth. It decreases when the sample size increases. The bigger the samples, the better the approximation of the population. ## Implementing the Central Limit Theorem in R Are you excited to see how we can code the central limit theorem in R? Let’s dig in then. #### Understanding the Problem Statement A pipe manufacturing organization produces different kinds of pipes. We are given the monthly data of the wall thickness of certain types of pipes. You can download the data [**here**](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/Clt-data.csv)**.** The organization wants to analyze the data by performing **[hypothesis testing](https://www.analyticsvidhya.com/blog/2021/07/hypothesis-testing-made-easy-for-the-data-science-beginners/)** and constructing confidence intervals to implement some strategies in the future. The challenge is that the distribution of the data is not normal. *Note: This analysis works on a few assumptions and one of them is that the data should be normally distributed.* #### Solution Methodology The central limit theorem will help us get around the problem of this data where the population is not normal. Therefore, we will simulate the CLT on the given dataset in R step-by-step. So, let’s get started. First, import the CSV file in R and then validate the data for correctness: ``` Copy Code ``` **Output:** ``` Copy Code ``` Next, **calculate the population mean and plot all the observations of the data.** ``` Copy Code ``` **Output:** ``` Copy Code ``` ![histogram for wall thickness](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/rplot1-673b1d306e0ed-1.webp) See the red vertical line above? That’s the population mean. We can also see from the above plot that the population is not normal, right? Therefore, we need to draw sufficient samples of different sizes and compute their means (known as sample means). We will then plot those sample means to get a normal distribution. In our example, we will draw m sample of size n sufficient samples of size 10, calculate their means, and plot them in R. I know that the minimum sample size taken should be 30, but let’s just see what happens when we draw 10: ``` Copy Code ``` ![sample size for testing central limit theorem](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/rplot3-673b1d2fa41cc-1.webp) Now, we know that we’ll get a very nice bell-shaped curve as the sample sizes increase. Let us now increase our sample size and see what we get: ``` Copy Code ``` ![sample distribution, normal distribution](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/rplot6-673b1d2fe6592-1.webp) Here, we get a good bell-shaped curve, and the sampling distribution approaches the normal distribution as the sample sizes increase. Therefore, we can consider the sampling distributions as normal, and the pipe manufacturing organization can use these distributions for further analysis. You can also play around by taking different sample sizes and drawing a different number of samples. Let me know how it works out for you\! ## Conclusion The central limit theorem is quite an important concept in statistics and, consequently, data science, which also helps in understanding other properties such as **[skewness and kurtosis](https://www.analyticsvidhya.com/blog/2021/05/shape-of-data-skewness-and-kurtosis/)**. I cannot stress enough how critical it is to brush up on your statistics knowledge before getting into data science or even sitting for a data science interview. I recommend taking the [**Introduction to Data Science course**](https://courses.analyticsvidhya.com/courses/introduction-to-data-science-2?utm_source=blog&utm_medium=statistics-101-introduction-central-limit-theorem) – it’s a comprehensive look at statistics before introducing data science. #### **Key Takeaways** - The central limit theorem says that the sampling distribution of the mean will always be normally distributed until the sample size is large enough. - Sampling should be random. The samples should not relate to one another. One sample shouldn’t affect the others. ## Frequently Asked Questions? **Q1. Is there a formula for central limit theorem?** A. Yes, the central limit theorem (CLT) does have a formula. It states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. **Q2. What are the three points of the central limit theorem?** A. The three key points of the central limit theorem are: 1. Regardless of the shape of the population distribution, the sampling distribution of the sample mean will approach a normal distribution as the sample size increases. 2. The mean of the sampling distribution will be equal to the population mean. 3. The standard deviation of the sampling distribution (also known as the standard error) decreases as the sample size increases. **Q3. Why central limit theorem is called central?** A. The central limit theorem is called ā€œcentralā€ because it is fundamental in statistics and serves as a central pillar for many statistical techniques. It is central in the sense that it allows statisticians to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal. **Q4. What is central limit type theorem?** A. A central limit type theorem is a generalization or extension of the classical central limit theorem to situations where the conditions of the classical CLT may not hold exactly. These theorems provide conditions under which the distribution of a sum or average of independent and identically distributed random variables approaches a normal distribution, even if the variables themselves are not identically distributed or if they have heavy-tailed distributions. [![Himanshi Singh](https://av-eks-lekhak.s3.amazonaws.com/media/lekhak-profile-images/converted_image_WxTrFfG.webp)](https://www.analyticsvidhya.com/blog/author/hs13/) [Himanshi Singh](https://www.analyticsvidhya.com/blog/author/hs13/) I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together. Thanks for stopping by my profile - hope you found something you liked :) [Intermediate](https://www.analyticsvidhya.com/blog/category/intermediate/)[Probability](https://www.analyticsvidhya.com/blog/category/probability/)[R](https://www.analyticsvidhya.com/blog/category/r/)[Statistics](https://www.analyticsvidhya.com/blog/category/statistics/)[Structured Data](https://www.analyticsvidhya.com/blog/category/structured-data/)[Technique](https://www.analyticsvidhya.com/blog/category/technique/) #### Login to continue reading and enjoy expert-curated content. Keep Reading for Free ## Free Courses [![Generative AI](https://www.analyticsvidhya.com/wp-content/themes/analytics-vidhya/images/Generative-AI---A-Way-of-Life---Free-Course.webp) 4.7 Generative AI - A Way of Life Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.](https://www.analyticsvidhya.com/courses/genai-a-way-of-life/?utm_source=blog&utm_medium=free_course_recommendation) [![Generative AI](https://www.analyticsvidhya.com/wp-content/themes/analytics-vidhya/images/Getting-Started-with-Large-Language-Models.webp) 4.5 Getting Started with Large Language Models Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.](https://www.analyticsvidhya.com/courses/getting-started-with-llms/?utm_source=blog&utm_medium=free_course_recommendation) [![Generative AI](https://www.analyticsvidhya.com/wp-content/themes/analytics-vidhya/images/Building-LLM-Applications-using-Prompt-Engineering---Free-Course.webp) 4.6 Building LLM Applications using Prompt Engineering This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.](https://www.analyticsvidhya.com/courses/building-llm-applications-using-prompt-engineering-free/?utm_source=blog&utm_medium=free_course_recommendation) [![Generative AI](https://www.analyticsvidhya.com/wp-content/themes/analytics-vidhya/images/Real-World-RAG-Systems.webp) 4.6 Improving Real World RAG Systems: Key Challenges & Practical Solutions Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.](https://www.analyticsvidhya.com/courses/improving-real-world-rag-systems-key-challenges/?utm_source=blog&utm_medium=free_course_recommendation) [![Generative AI](https://www.analyticsvidhya.com/wp-content/themes/analytics-vidhya/images/excel.webp) 4.7 Microsoft Excel: Formulas & Functions Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.](https://www.analyticsvidhya.com/courses/microsoft-excel-formulas-functions/?utm_source=blog&utm_medium=free_course_recommendation) #### Recommended Articles - [GPT-4 vs. Llama 3.1 – Which Model is Better?](https://www.analyticsvidhya.com/blog/2024/08/gpt-4-vs-llama-3-1/) - [Llama-3.1-Storm-8B: The 8B LLM Powerhouse Surpa...](https://www.analyticsvidhya.com/blog/2024/08/llama-3-1-storm-8b/) - [A Comprehensive Guide to Building Agentic RAG S...](https://www.analyticsvidhya.com/blog/2024/07/building-agentic-rag-systems-with-langgraph/) - [Top 10 Machine Learning Algorithms in 2026](https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/) - [45 Questions to Test a Data Scientist on Basics...](https://www.analyticsvidhya.com/blog/2017/01/must-know-questions-deep-learning/) - [90+ Python Interview Questions and Answers (202...](https://www.analyticsvidhya.com/blog/2022/07/python-coding-interview-questions-for-freshers/) - [8 Easy Ways to Access ChatGPT for Free](https://www.analyticsvidhya.com/blog/2023/12/chatgpt-4-for-free/) - [Prompt Engineering: Definition, Examples, Tips ...](https://www.analyticsvidhya.com/blog/2023/06/what-is-prompt-engineering/) - [What is LangChain?](https://www.analyticsvidhya.com/blog/2024/06/langchain-guide/) - [What is Retrieval-Augmented Generation (RAG)?](https://www.analyticsvidhya.com/blog/2023/09/retrieval-augmented-generation-rag-in-ai/) ### Responses From Readers [Cancel reply](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#respond) ![lawani abiola kingsley](https://secure.gravatar.com/avatar/81470b3c1b0a1cb9cac9dabf9dbc6b9f35e8b79284b17a8e5a68d216e00a96b7?s=74&d=mm&r=g) lawani abiola kingsley please sir, can you explain this using python. i will appreciate it sir. moreso will love you to keep explaining core statistics for data science and machine learning this way sir 123 1 [Cancel reply](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#respond) Show 1 reply ![Harshit Gupta](https://secure.gravatar.com/avatar/2dc1eec85a39589ae860e4e827b8376d88fd0e19c43ef2816c1a1771334b41d8?s=74&d=mm&r=g) Harshit Gupta Hello, We will try to come up with the same concept using python. Also, for more posts on core statistics for data science stay tuned to Analytics Vidhya. 123 456 [Cancel reply](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#respond) ![Sebastian](https://secure.gravatar.com/avatar/52b03a1034f60643d8b6f5d38f557efa38103f92a3a15193c5ade1459f4bdd62?s=74&d=mm&r=g) Sebastian Ver good, thanks 123 [Cancel reply](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#respond) ![AP](https://secure.gravatar.com/avatar/644a156eb3695bae854f460544dcce64147ab02958de76ef7af687487f783c10?s=74&d=mm&r=g) AP The code in last 3 histograms looks like it is missing 30, 50 and 100 in the sample function? Good post in general. 123 1 [Cancel reply](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#respond) Show 1 reply ![Harshit Gupta](https://secure.gravatar.com/avatar/2dc1eec85a39589ae860e4e827b8376d88fd0e19c43ef2816c1a1771334b41d8?s=74&d=mm&r=g) Harshit Gupta Hello, Thanks for the feedback. Necessary changes have been made. 123 456 [Cancel reply](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#respond) View All ### Frequently Asked Questions ## Q1. Is there a formula for central limit theorem? A. Yes, the central limit theorem (CLT) does have a formula. It states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. ## Q2. What are the three points of the central limit theorem? A. The three key points of the central limit theorem are: 1. Regardless of the shape of the population distribution, the sampling distribution of the sample mean will approach a normal distribution as the sample size increases. 2. The mean of the sampling distribution will be equal to the population mean. 3. The standard deviation of the sampling distribution (also known as the standard error) decreases as the sample size increases. ## Q3. Why central limit theorem is called central? A. The central limit theorem is called ā€œcentralā€ because it is fundamental in statistics and serves as a central pillar for many statistical techniques. It is central in the sense that it allows statisticians to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal. ## Q4. What is central limit type theorem? A. A central limit type theorem is a generalization or extension of the classical central limit theorem to situations where the conditions of the classical CLT may not hold exactly. These theorems provide conditions under which the distribution of a sum or average of independent and identically distributed random variables approaches a normal distribution, even if the variables themselves are not identically distributed or if they have heavy-tailed distributions. [Become an Author Share insights, grow your voice, and inspire the data community.](https://www.analyticsvidhya.com/become-an-author) [Reach a Global Audience Share Your Expertise with the World Build Your Brand & Audience Join a Thriving AI Community Level Up Your AI Game Expand Your Influence in Genrative AI](https://www.analyticsvidhya.com/become-an-author) [![imag](https://www.analyticsvidhya.com/wp-content/themes/analytics-vidhya/images/Write-for-us.webp)](https://www.analyticsvidhya.com/become-an-author) ## Flagship Programs [GenAI Pinnacle Program](https://www.analyticsvidhya.com/genaipinnacle/?ref=footer)\| [GenAI Pinnacle Plus Program](https://www.analyticsvidhya.com/pinnacleplus/?ref=blogflashstripfooter)\| [AI/ML BlackBelt Program](https://www.analyticsvidhya.com/bbplus?ref=footer)\| [Agentic AI Pioneer Program](https://www.analyticsvidhya.com/agenticaipioneer?ref=footer) ## Free Courses [Generative AI](https://www.analyticsvidhya.com/courses/genai-a-way-of-life/?ref=footer)\| [DeepSeek](https://www.analyticsvidhya.com/courses/getting-started-with-deepseek/?ref=footer)\| [OpenAI Agent SDK](https://www.analyticsvidhya.com/courses/demystifying-openai-agents-sdk/?ref=footer)\| [LLM Applications using Prompt Engineering](https://www.analyticsvidhya.com/courses/building-llm-applications-using-prompt-engineering-free/?ref=footer)\| [DeepSeek from Scratch](https://www.analyticsvidhya.com/courses/deepseek-from-scratch/?ref=footer)\| [Stability.AI](https://www.analyticsvidhya.com/courses/exploring-stability-ai/?ref=footer)\| [SSM & MAMBA](https://www.analyticsvidhya.com/courses/building-smarter-llms-with-mamba-and-state-space-model/?ref=footer)\| [RAG Systems using LlamaIndex](https://www.analyticsvidhya.com/courses/building-first-rag-systems-using-llamaindex/?ref=footer)\| [Building LLMs for Code](https://www.analyticsvidhya.com/courses/building-large-language-models-for-code/?ref=footer)\| [Python](https://www.analyticsvidhya.com/courses/introduction-to-data-science/?ref=footer)\| [Microsoft Excel](https://www.analyticsvidhya.com/courses/microsoft-excel-formulas-functions/?ref=footer)\| [Machine Learning](https://www.analyticsvidhya.com/courses/Machine-Learning-Certification-Course-for-Beginners/?ref=footer)\| [Deep Learning](https://www.analyticsvidhya.com/courses/getting-started-with-deep-learning/?ref=footer)\| [Mastering Multimodal RAG](https://www.analyticsvidhya.com/courses/mastering-multimodal-rag-and-embeddings-with-amazon-nova-and-bedrock/?ref=footer)\| [Introduction to Transformer Model](https://www.analyticsvidhya.com/courses/introduction-to-transformers-and-attention-mechanisms/?ref=footer)\| [Bagging & Boosting](https://www.analyticsvidhya.com/courses/bagging-boosting-ML-Algorithms/?ref=footer)\| [Loan Prediction](https://www.analyticsvidhya.com/courses/loan-prediction-practice-problem-using-python/?ref=footer)\| [Time Series Forecasting](https://www.analyticsvidhya.com/courses/creating-time-series-forecast-using-python/?ref=footer)\| [Tableau](https://www.analyticsvidhya.com/courses/tableau-for-beginners/?ref=footer)\| [Business Analytics](https://www.analyticsvidhya.com/courses/introduction-to-analytics/?ref=footer)\| [Vibe Coding in Windsurf](https://www.analyticsvidhya.com/courses/guide-to-vibe-coding-in-windsurf/?ref=footer)\| [Model Deployment using FastAPI](https://www.analyticsvidhya.com/courses/model-deployment-using-fastapi/?ref=footer)\| [Building Data Analyst AI Agent](https://www.analyticsvidhya.com/courses/building-data-analyst-AI-agent/?ref=footer)\| [Getting started with OpenAI o3-mini](https://www.analyticsvidhya.com/courses/getting-started-with-openai-o3-mini/?ref=footer)\| [Introduction to Transformers and Attention Mechanisms](https://www.analyticsvidhya.com/courses/introduction-to-transformers-and-attention-mechanisms/?ref=footer) ## Popular Categories [AI Agents](https://www.analyticsvidhya.com/blog/category/ai-agent/?ref=footer)\| [Generative AI](https://www.analyticsvidhya.com/blog/category/generative-ai/?ref=footer)\| [Prompt Engineering](https://www.analyticsvidhya.com/blog/category/prompt-engineering/?ref=footer)\| [Generative AI Application](https://www.analyticsvidhya.com/blog/category/generative-ai-application/?ref=footer)\| [News](https://news.google.com/publications/CAAqBwgKMJiWzAswyLHjAw?hl=en-IN&gl=IN&ceid=IN%3Aen)\| [Technical Guides](https://www.analyticsvidhya.com/blog/category/guide/?ref=footer)\| [AI Tools](https://www.analyticsvidhya.com/blog/category/ai-tools/?ref=footer)\| [Interview Preparation](https://www.analyticsvidhya.com/blog/category/interview-questions/?ref=footer)\| [Research Papers](https://www.analyticsvidhya.com/blog/category/research-paper/?ref=footer)\| [Success Stories](https://www.analyticsvidhya.com/blog/category/success-story/?ref=footer)\| [Quiz](https://www.analyticsvidhya.com/blog/category/quiz/?ref=footer)\| [Use Cases](https://www.analyticsvidhya.com/blog/category/use-cases/?ref=footer)\| [Listicles](https://www.analyticsvidhya.com/blog/category/listicle/?ref=footer) ## Generative AI Tools and Techniques [GANs](https://www.analyticsvidhya.com/blog/2021/10/an-end-to-end-introduction-to-generative-adversarial-networksgans/?ref=footer)\| [VAEs](https://www.analyticsvidhya.com/blog/2023/07/an-overview-of-variational-autoencoders/?ref=footer)\| [Transformers](https://www.analyticsvidhya.com/blog/2019/06/understanding-transformers-nlp-state-of-the-art-models?ref=footer)\| [StyleGAN](https://www.analyticsvidhya.com/blog/2021/05/stylegan-explained-in-less-than-five-minutes/?ref=footer)\| [Pix2Pix](https://www.analyticsvidhya.com/blog/2023/10/pix2pix-unleashed-transforming-images-with-creative-superpower?ref=footer)\| [Autoencoders](https://www.analyticsvidhya.com/blog/2021/06/autoencoders-a-gentle-introduction?ref=footer)\| [GPT](https://www.analyticsvidhya.com/blog/2022/10/generative-pre-training-gpt-for-natural-language-understanding/?ref=footer)\| [BERT](https://www.analyticsvidhya.com/blog/2022/11/comprehensive-guide-to-bert/?ref=footer)\| [Word2Vec](https://www.analyticsvidhya.com/blog/2021/07/word2vec-for-word-embeddings-a-beginners-guide/?ref=footer)\| [LSTM](https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-memory-lstm?ref=footer)\| [Attention Mechanisms](https://www.analyticsvidhya.com/blog/2019/11/comprehensive-guide-attention-mechanism-deep-learning/?ref=footer)\| [Diffusion Models](https://www.analyticsvidhya.com/blog/2024/09/what-are-diffusion-models/?ref=footer)\| [LLMs](https://www.analyticsvidhya.com/blog/2023/03/an-introduction-to-large-language-models-llms/?ref=footer)\| [SLMs](https://www.analyticsvidhya.com/blog/2024/05/what-are-small-language-models-slms/?ref=footer)\| [Encoder Decoder Models](https://www.analyticsvidhya.com/blog/2023/10/advanced-encoders-and-decoders-in-generative-ai/?ref=footer)\| [Prompt Engineering](https://www.analyticsvidhya.com/blog/2023/06/what-is-prompt-engineering/?ref=footer)\| [LangChain](https://www.analyticsvidhya.com/blog/2024/06/langchain-guide/?ref=footer)\| [LlamaIndex](https://www.analyticsvidhya.com/blog/2023/10/rag-pipeline-with-the-llama-index/?ref=footer)\| [RAG](https://www.analyticsvidhya.com/blog/2023/09/retrieval-augmented-generation-rag-in-ai/?ref=footer)\| [Fine-tuning](https://www.analyticsvidhya.com/blog/2023/08/fine-tuning-large-language-models/?ref=footer)\| [LangChain AI Agent](https://www.analyticsvidhya.com/blog/2024/07/langchains-agent-framework/?ref=footer)\| [Multimodal Models](https://www.analyticsvidhya.com/blog/2023/12/what-are-multimodal-models/?ref=footer)\| [RNNs](https://www.analyticsvidhya.com/blog/2022/03/a-brief-overview-of-recurrent-neural-networks-rnn/?ref=footer)\| [DCGAN](https://www.analyticsvidhya.com/blog/2021/07/deep-convolutional-generative-adversarial-network-dcgan-for-beginners/?ref=footer)\| [ProGAN](https://www.analyticsvidhya.com/blog/2021/05/progressive-growing-gan-progan/?ref=footer)\| [Text-to-Image Models](https://www.analyticsvidhya.com/blog/2024/02/llm-driven-text-to-image-with-diffusiongpt/?ref=footer)\| [DDPM](https://www.analyticsvidhya.com/blog/2024/08/different-components-of-diffusion-models/?ref=footer)\| [Document Question Answering](https://www.analyticsvidhya.com/blog/2024/04/a-hands-on-guide-to-creating-a-pdf-based-qa-assistant-with-llama-and-llamaindex/?ref=footer)\| [Imagen](https://www.analyticsvidhya.com/blog/2024/09/google-imagen-3/?ref=footer)\| [T5 (Text-to-Text Transfer Transformer)](https://www.analyticsvidhya.com/blog/2024/05/text-summarization-using-googles-t5-base/?ref=footer)\| [Seq2seq Models](https://www.analyticsvidhya.com/blog/2020/08/a-simple-introduction-to-sequence-to-sequence-models/?ref=footer)\| [WaveNet](https://www.analyticsvidhya.com/blog/2020/01/how-to-perform-automatic-music-generation/?ref=footer)\| [Attention Is All You Need (Transformer Architecture)](https://www.analyticsvidhya.com/blog/2019/11/comprehensive-guide-attention-mechanism-deep-learning/?ref=footer) \| [WindSurf](https://www.analyticsvidhya.com/blog/2024/11/windsurf-editor/?ref=footer)\| [Cursor](https://www.analyticsvidhya.com/blog/2025/03/vibe-coding-with-cursor-ai/?ref=footer) ## Popular GenAI Models [Llama 4](https://www.analyticsvidhya.com/blog/2025/04/meta-llama-4/?ref=footer)\| [Llama 3.1](https://www.analyticsvidhya.com/blog/2024/07/meta-llama-3-1/?ref=footer)\| [GPT 4.5](https://www.analyticsvidhya.com/blog/2025/02/openai-gpt-4-5/?ref=footer)\| [GPT 4.1](https://www.analyticsvidhya.com/blog/2025/04/open-ai-gpt-4-1/?ref=footer)\| [GPT 4o](https://www.analyticsvidhya.com/blog/2025/03/updated-gpt-4o/?ref=footer)\| [o3-mini](https://www.analyticsvidhya.com/blog/2025/02/openai-o3-mini/?ref=footer)\| [Sora](https://www.analyticsvidhya.com/blog/2024/12/openai-sora/?ref=footer)\| [DeepSeek R1](https://www.analyticsvidhya.com/blog/2025/01/deepseek-r1/?ref=footer)\| [DeepSeek V3](https://www.analyticsvidhya.com/blog/2025/01/ai-application-with-deepseek-v3/?ref=footer)\| [Janus Pro](https://www.analyticsvidhya.com/blog/2025/01/deepseek-janus-pro-7b/?ref=footer)\| [Veo 2](https://www.analyticsvidhya.com/blog/2024/12/googles-veo-2/?ref=footer)\| [Gemini 2.5 Pro](https://www.analyticsvidhya.com/blog/2025/03/gemini-2-5-pro-experimental/?ref=footer)\| [Gemini 2.0](https://www.analyticsvidhya.com/blog/2025/02/gemini-2-0-everything-you-need-to-know-about-googles-latest-llms/?ref=footer)\| [Gemma 3](https://www.analyticsvidhya.com/blog/2025/03/gemma-3/?ref=footer)\| [Claude Sonnet 3.7](https://www.analyticsvidhya.com/blog/2025/02/claude-sonnet-3-7/?ref=footer)\| [Claude 3.5 Sonnet](https://www.analyticsvidhya.com/blog/2024/06/claude-3-5-sonnet/?ref=footer)\| [Phi 4](https://www.analyticsvidhya.com/blog/2025/02/microsoft-phi-4-multimodal/?ref=footer)\| [Phi 3.5](https://www.analyticsvidhya.com/blog/2024/09/phi-3-5-slms/?ref=footer)\| [Mistral Small 3.1](https://www.analyticsvidhya.com/blog/2025/03/mistral-small-3-1/?ref=footer)\| [Mistral NeMo](https://www.analyticsvidhya.com/blog/2024/08/mistral-nemo/?ref=footer)\| [Mistral-7b](https://www.analyticsvidhya.com/blog/2024/01/making-the-most-of-mistral-7b-with-finetuning/?ref=footer)\| [Bedrock](https://www.analyticsvidhya.com/blog/2024/02/building-end-to-end-generative-ai-models-with-aws-bedrock/?ref=footer)\| [Vertex AI](https://www.analyticsvidhya.com/blog/2024/02/build-deploy-and-manage-ml-models-with-google-vertex-ai/?ref=footer)\| [Qwen QwQ 32B](https://www.analyticsvidhya.com/blog/2025/03/qwens-qwq-32b/?ref=footer)\| [Qwen 2](https://www.analyticsvidhya.com/blog/2024/06/qwen2/?ref=footer)\| [Qwen 2.5 VL](https://www.analyticsvidhya.com/blog/2025/01/qwen2-5-vl-vision-model/?ref=footer)\| [Qwen Chat](https://www.analyticsvidhya.com/blog/2025/03/qwen-chat/?ref=footer)\| [Grok 3](https://www.analyticsvidhya.com/blog/2025/02/grok-3/?ref=footer) ## AI Development Frameworks [n8n](https://www.analyticsvidhya.com/blog/2025/03/content-creator-agent-with-n8n/?ref=footer)\| [LangChain](https://www.analyticsvidhya.com/blog/2024/06/langchain-guide/?ref=footer)\| [Agent SDK](https://www.analyticsvidhya.com/blog/2025/03/open-ai-responses-api/?ref=footer)\| [A2A by Google](https://www.analyticsvidhya.com/blog/2025/04/agent-to-agent-protocol/?ref=footer)\| [SmolAgents](https://www.analyticsvidhya.com/blog/2025/01/smolagents/?ref=footer)\| [LangGraph](https://www.analyticsvidhya.com/blog/2024/07/langgraph-revolutionizing-ai-agent/?ref=footer)\| [CrewAI](https://www.analyticsvidhya.com/blog/2024/01/building-collaborative-ai-agents-with-crewai/?ref=footer)\| [Agno](https://www.analyticsvidhya.com/blog/2025/03/agno-framework/?ref=footer)\| [LangFlow](https://www.analyticsvidhya.com/blog/2023/06/langflow-ui-for-langchain-to-develop-applications-with-llms/?ref=footer)\| [AutoGen](https://www.analyticsvidhya.com/blog/2023/11/launching-into-autogen-exploring-the-basics-of-a-multi-agent-framework/?ref=footer)\| [LlamaIndex](https://www.analyticsvidhya.com/blog/2024/08/implementing-ai-agents-using-llamaindex/?ref=footer)\| [Swarm](https://www.analyticsvidhya.com/blog/2024/12/managing-multi-agent-systems-with-openai-swarm/?ref=footer)\| [AutoGPT](https://www.analyticsvidhya.com/blog/2023/05/learn-everything-about-autogpt/?ref=footer) ## Data Science Tools and Techniques [Python](https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/?ref=footer)\| [R](https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/?ref=footer)\| [SQL](https://www.analyticsvidhya.com/blog/2022/01/learning-sql-from-basics-to-advance/?ref=footer)\| [Jupyter Notebooks](https://www.analyticsvidhya.com/blog/2018/05/starters-guide-jupyter-notebook/?ref=footer)\| [TensorFlow](https://www.analyticsvidhya.com/blog/2021/11/tensorflow-for-beginners-with-examples-and-python-implementation/?ref=footer)\| [Scikit-learn](https://www.analyticsvidhya.com/blog/2021/08/complete-guide-on-how-to-learn-scikit-learn-for-data-science/?ref=footer)\| [PyTorch](https://www.analyticsvidhya.com/blog/2018/02/pytorch-tutorial/?ref=footer)\| [Tableau](https://www.analyticsvidhya.com/blog/2021/09/a-complete-guide-to-tableau-for-beginners-in-data-visualization/?ref=footer)\| [Apache Spark](https://www.analyticsvidhya.com/blog/2022/08/introduction-to-on-apache-spark-and-its-datasets/?ref=footer)\| [Matplotlib](https://www.analyticsvidhya.com/blog/2021/10/introduction-to-matplotlib-using-python-for-beginners/?ref=footer)\| [Seaborn](https://www.analyticsvidhya.com/blog/2021/02/a-beginners-guide-to-seaborn-the-simplest-way-to-learn/?ref=footer)\| [Pandas](https://www.analyticsvidhya.com/blog/2021/03/pandas-functions-for-data-analysis-and-manipulation/?ref=footer)\| [Hadoop](https://www.analyticsvidhya.com/blog/2022/05/an-introduction-to-hadoop-ecosystem-for-big-data/?ref=footer)\| [Docker](https://www.analyticsvidhya.com/blog/2021/10/end-to-end-guide-to-docker-for-aspiring-data-engineers/?ref=footer)\| [Git](https://www.analyticsvidhya.com/blog/2021/09/git-and-github-tutorial-for-beginners/?ref=footer)\| [Keras](https://www.analyticsvidhya.com/blog/2016/10/tutorial-optimizing-neural-networks-using-keras-with-image-recognition-case-study/?ref=footer)\| [Apache Kafka](https://www.analyticsvidhya.com/blog/2022/12/introduction-to-apache-kafka-fundamentals-and-working/?ref=footer)\| [AWS](https://www.analyticsvidhya.com/blog/2020/09/what-is-aws-amazon-web-services-data-science/?ref=footer)\| [NLP](https://www.analyticsvidhya.com/blog/2017/01/ultimate-guide-to-understand-implement-natural-language-processing-codes-in-python/?ref=footer)\| [Random Forest](https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/?ref=footer)\| [Computer Vision](https://www.analyticsvidhya.com/blog/2020/01/computer-vision-learning-path/?ref=footer)\| [Data Visualization](https://www.analyticsvidhya.com/blog/2021/04/a-complete-beginners-guide-to-data-visualization/?ref=footer)\| [Data Exploration](https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/?ref=footer)\| [Big Data](https://www.analyticsvidhya.com/blog/2021/05/what-is-big-data-introduction-uses-and-applications/?ref=footer)\| [Common Machine Learning Algorithms](https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/?ref=footer)\| [Machine Learning](https://www.analyticsvidhya.com/blog/category/Machine-Learning/?ref=footer)\| [Google Data Science Agent](https://www.analyticsvidhya.com/blog/2025/03/gemini-data-science-agent/?ref=footer) ## Company - [About Us](https://www.analyticsvidhya.com/about/?ref=global_footer) - [Contact Us](https://www.analyticsvidhya.com/contact/?ref=global_footer) - [Careers](https://www.analyticsvidhya.com/careers/?ref=global_footer) ## Discover - [Blogs](https://www.analyticsvidhya.com/blog/?ref=global_footer) - [Expert Sessions](https://www.analyticsvidhya.com/events/datahour/?ref=global_footer) - [Learning Paths](https://www.analyticsvidhya.com/blog/category/learning-path/?ref=global_footer) - [Comprehensive Guides](https://www.analyticsvidhya.com/category/guide/?ref=global_footer) ## Learn - [Free Courses](https://www.analyticsvidhya.com/courses?ref=global_footer) - [AI\&ML Program](https://www.analyticsvidhya.com/bbplus?ref=global_footer) - [Pinnacle Plus Program](https://www.analyticsvidhya.com/pinnacleplus/?ref=global_footer) - [Agentic AI Program](https://www.analyticsvidhya.com/agenticaipioneer/?ref=global_footer) ## Engage - [Hackathons](https://www.analyticsvidhya.com/datahack/?ref=global_footer) - [Events](https://www.analyticsvidhya.com/events/?ref=global_footer) - [Podcasts](https://www.analyticsvidhya.com/events/leading-with-data/?ref=global_footer) ## Contribute - [Become an Author](https://www.analyticsvidhya.com/become-an-author) - [Become a Speaker](https://docs.google.com/forms/d/e/1FAIpQLSdTDIsIUzmliuTkXIlTX6qI65RCiksQ3nCbTJ7twNx2rgEsXw/viewform?ref=global_footer) - [Become a Mentor](https://docs.google.com/forms/d/e/1FAIpQLSdTDIsIUzmliuTkXIlTX6qI65RCiksQ3nCbTJ7twNx2rgEsXw/viewform?ref=global_footer) - [Become an Instructor](https://docs.google.com/forms/d/e/1FAIpQLSdTDIsIUzmliuTkXIlTX6qI65RCiksQ3nCbTJ7twNx2rgEsXw/viewform?ref=global_footer) ## Enterprise - [Our Offerings](https://enterprise.analyticsvidhya.com/?ref=global_footer) - [Trainings](https://www.analyticsvidhya.com/enterprise/training?ref=global_footer) - [Data Culture](https://www.analyticsvidhya.com/enterprise/data-culture?ref=global_footer) - [AI Newsletter](https://newsletter.ai/?ref=global_footer) [Terms & conditions](https://www.analyticsvidhya.com/terms/) [Refund Policy](https://www.analyticsvidhya.com/refund-policy/) [Privacy Policy](https://www.analyticsvidhya.com/privacy-policy/) [Cookies Policy](https://www.analyticsvidhya.com/cookies-policy) Ā© Analytics Vidhya 2026.All rights reserved. #### Kickstart Your Generative AI Journey ###### Generalized Learning Path A standard roadmap to explore Generative AI. Download Now Most Popular ###### Personalized Learning Path Your goals. Your timeline. Your custom learning plan. Create Now ### Build Agentic AI Systems in 6 Weeks\! ### A live, cohort-based, instructor-led program - Weekend live classes with top AI Experts - 10+ guided projects + 5 mini assignments - Weekly office hours for discussions and Q\&A - Lifetime access to sessions and resources I don't want to upskill ![Av Logo White](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKcAAAAwCAYAAAB0dWoXAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAj0SURBVHgB7VzvdeM2DGf6+r3uBFUmqDvBqRNcbgI7EySdIMoEyU1gZ4IkE8g3QXITyJ0g7gQoEYFnCAJFylYcOcffe0xk/oFACAJBkJQxCQkJCQkJCR8EJ+ZIAQCZ/ffZplObnk5OTu7MnrA0F+znagiaQ8HyNrX/LujnxvL2j3kjkGwv6T6FSeiGFdjEprlNtzZV0Mal2QNEm6MyI4LlJz8Ub5b+cii57oNfzUiBymj/5ZQ+2TQNNMF6t2Z3zMTvDBXCWo6V+WAgy4hpbfu3VqrwvBeTsAUKz6YX6IedhUj3c3hi16UZCYa0nMwyPnjK3Sg1M++IsVrO3KYJ+72xaUUJy86UNijQibUEG9MfBbtGX+7G1JYaFWJqaT5rjZgF2rg6LA/hs0yu/YTq/uhrH0st7rXR+KR7TD11PqESsjKnrNhm7e7h64O4fxcPUfWOBrZDBdT+Zc7yMujGZ9MT0LTSFbu3w21H2yWztkinVHi6Udrhi1TpXXjlRWvTspwi74UU0ccjYqbkNWgq5YXC+xX4R7a54LmEHv08WihCraBWYIfezjs0J0Lu4U1gK3z1oSv8VOBHIdqFXjKQDw48wzo0H/5nhUeuRJnCt0ZTVU7ofqkc5j362Ok2/WKOBCRY6QMVNvHh4U/TH1fs+hv+IdfAhZFQMWOUPrPp2qa/KF2zMsn3hmh+MXUozKVzVucSPC+FwCNvwwtIURyNpWd4Pj+pcWrCwFEko2vsw7Vp8v+3qV0vyctXLMeb2P+/m1o+2Ndv5iMA2sNDSfmNyUxPmtwaLTrKXjztO0MuwKyWieeJT8gyDz/cynmtvL2+Z22mHr7ngX4VLL9i+bNAP5ZaP/pgtKEkDhJgLrJfrQxaA1uObzF3/GNxwa7vSIhzlufoogKEwkobT96rsoCYWJAS4cQus+kPyv7X9ARaeUsLR4+c7jW36ZbRRzzvOwGB7eTN3Te0QLFm16Vtv6Lr/0wtl2XXZPFoAG0/R1q5su9bCk2L6yYYc/CjVGiELFCl8WWvpxD23WQb1XIqZaXSl1lPvluWE5ryCobtIC4ceNVFY/Q+JwkvY1lr0/TnEN/ZdazfWSjXFx31UQFyMwzuzbZPGL7BUQDdAuxX71AYWXTXDvlEK+cUcj3QMiznawIBf5isIvqW2D+02mtKnE7RRecYhnX5dt0pw8HS1OvsiO8mDp/YtXPMvyj1UGmc0uIwuTJ7AJoxPxxuv4hy5Cs3/YGTDierG0ZjZQYAuQ8/3BRTu1CrQJu1UWRK1j0P0Rm15SSzn7GstbYRAf0pnG1SWpsw3Tmj+8P3wf8ymVrxHWYQN4OOxUTwNZF5PfDArs/Y9XWgXZ8Ix1d2vQA9dOV87EnHSDOkDA8P8lkq4aMMspwm6OYR9Uvpg1H+rj4n98Vwdo7D2wLaPhpv4/U5PXwCeOKI0IwNOx7uPf0qWL4W53yh+1Z0Pae6maBfUuJ9xDbHp6jQDhTfmwEAO6xRgyesBLsr5xn4UXraxCjnpaA189TTJiuxK0Sa0QApBwgH4fH+Z6YDo/Q5QQ+4D7V/Ed/UJV0/xjTACQc+MPcbtmGhFf6k7LXSFOn/RtcbRu/B0sCgNQ6LLvyFk4Y78u0KU4eX+ORhzfj2TZpWvL5vIkThN5yszM02jLURdIDx1Whr/51CPeKgS3BKfcQQUeV4oHucUv8wTVg9pPmw4z6IMKA28SW9BVdmQEDbai5MQhBQr3mrFu+nghAEhEx0D7q5oFvBjisMPxPE8xjVRumDAnS/42UIJVLoFiahE9BcphzkORwtxFvamHmaPQAjPyoxVsB2glP91IqJgK11q0gw/M3deY8etK1mbgYGNPdqerfDiTbypcko/6Zvn0FZNmVllbxHD7p99xV8PIgHVVAe3xmzk1IpCvAmkyCF1yKiTaXxBYrCRtB6E+VMMG2ryfJ57K6X3wO6Dxvdvi+gGQ14CtTNfXzBNloR7c4k5XwjQMdOFyrnw3v04TBoz/wL84ZQFC7vqLvs6hP0H36Tckai10cVSJiZqQO8p0o5+m9Yx/lx/9h6twGaGbVxWEfuyt4L0Nx88CA3YHh4O7P1Hlk5DvEZ/TyX6/qk9Bhiw0A3ygQ3pWAd56M2+srkizil6xn9x4A1rp8/uuA18cfdny8ysA11iM9tXHl2H2OAbRB9apr7F76TPFbmWBCymqyeXJqbBuguRf2DHEeF5lKf73DY3GflqLzyWTpojwYauiznva8NNHe7l6wsV3i812QL4f2kx3P4jHUmGN4RAmkIU9TLhECiXYF9ARETIwgcSwD/2vlc9AvrldA8gtGSJeix4wW0FbVgbfhLtlD62HWvimgvKJXiPgcxFHsBIq0mqy8f/MJTTwojMwcEiLCSKMt9D5bVqTTeRX4h2kx9dEU7d9TClXElfGD5XAHlGaK57xmA32Dc+tqMEtDDarI2cni/FOXSuhxcENC23DkrW4Z4A0U5pbIE7hk9IQq0K1nZ3JOfBeSQwfZLH539Hg2gp9UUbeWyGj8BWHFhwzvNTsUD1E50eh8s6Mo5lfREm7dQzjzQB42PHPwfPHAYvXJWmkAi28rhvSKhXQkhvJtvA+2wUrT1gLByPiltBldOKn/p6IM84CZHLbf0Kfd4jlc5YQ+ryWh0baxVBX1oiIdSQOROeRjJsE7l3H9G/7TsqFvy/ooy/rxGrZyVr4M96fjCIohBttrtyV/h4S20elSBokwiX/rab6WcfATgUYEilm+FzjiVEwawmoyWHN5H1fkO/kJftai0hwxtZV/YdEH5jTBbDD0q61ROqlO2u9D2l0W9G9hOiObg2UcwKkDHsLAjPTm8VzCiJTpoH/qKiedWHcr0BGEMrZzyDNGDp17I1XJ4V+XsOhqMy3R41mOQszv07cdzorky9VLb2owH8kF+NXvA9s19rOpZFOHy4trU54EKMyyWpnkWaOHhjT8LztfK1B/jGuIjDAlDArbD22vMr0/9mLr70ossb3xrNAagD/1RMkhIiAJ41tETEt4N0P6q8l7HZRISBgO0w0iZSUgYC6CONtwmPzEhISEhISFhhPgfihLY+meSzmwAAAAASUVORK5CYII=) SKIP ## Continue your learning for FREE Login with Google Login with Email [Forgot your password?](https://id.analyticsvidhya.com/auth/password/reset/?utm_source=newhomepage) I accept the [Terms and Conditions](https://www.analyticsvidhya.com/terms) Receive updates on WhatsApp ![Av Logo White](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKcAAAAwCAYAAAB0dWoXAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAj0SURBVHgB7VzvdeM2DGf6+r3uBFUmqDvBqRNcbgI7EySdIMoEyU1gZ4IkE8g3QXITyJ0g7gQoEYFnCAJFylYcOcffe0xk/oFACAJBkJQxCQkJCQkJCR8EJ+ZIAQCZ/ffZplObnk5OTu7MnrA0F+znagiaQ8HyNrX/LujnxvL2j3kjkGwv6T6FSeiGFdjEprlNtzZV0Mal2QNEm6MyI4LlJz8Ub5b+cii57oNfzUiBymj/5ZQ+2TQNNMF6t2Z3zMTvDBXCWo6V+WAgy4hpbfu3VqrwvBeTsAUKz6YX6IedhUj3c3hi16UZCYa0nMwyPnjK3Sg1M++IsVrO3KYJ+72xaUUJy86UNijQibUEG9MfBbtGX+7G1JYaFWJqaT5rjZgF2rg6LA/hs0yu/YTq/uhrH0st7rXR+KR7TD11PqESsjKnrNhm7e7h64O4fxcPUfWOBrZDBdT+Zc7yMujGZ9MT0LTSFbu3w21H2yWztkinVHi6Udrhi1TpXXjlRWvTspwi74UU0ccjYqbkNWgq5YXC+xX4R7a54LmEHv08WihCraBWYIfezjs0J0Lu4U1gK3z1oSv8VOBHIdqFXjKQDw48wzo0H/5nhUeuRJnCt0ZTVU7ofqkc5j362Ok2/WKOBCRY6QMVNvHh4U/TH1fs+hv+IdfAhZFQMWOUPrPp2qa/KF2zMsn3hmh+MXUozKVzVucSPC+FwCNvwwtIURyNpWd4Pj+pcWrCwFEko2vsw7Vp8v+3qV0vyctXLMeb2P+/m1o+2Ndv5iMA2sNDSfmNyUxPmtwaLTrKXjztO0MuwKyWieeJT8gyDz/cynmtvL2+Z22mHr7ngX4VLL9i+bNAP5ZaP/pgtKEkDhJgLrJfrQxaA1uObzF3/GNxwa7vSIhzlufoogKEwkobT96rsoCYWJAS4cQus+kPyv7X9ARaeUsLR4+c7jW36ZbRRzzvOwGB7eTN3Te0QLFm16Vtv6Lr/0wtl2XXZPFoAG0/R1q5su9bCk2L6yYYc/CjVGiELFCl8WWvpxD23WQb1XIqZaXSl1lPvluWE5ryCobtIC4ceNVFY/Q+JwkvY1lr0/TnEN/ZdazfWSjXFx31UQFyMwzuzbZPGL7BUQDdAuxX71AYWXTXDvlEK+cUcj3QMiznawIBf5isIvqW2D+02mtKnE7RRecYhnX5dt0pw8HS1OvsiO8mDp/YtXPMvyj1UGmc0uIwuTJ7AJoxPxxuv4hy5Cs3/YGTDierG0ZjZQYAuQ8/3BRTu1CrQJu1UWRK1j0P0Rm15SSzn7GstbYRAf0pnG1SWpsw3Tmj+8P3wf8ymVrxHWYQN4OOxUTwNZF5PfDArs/Y9XWgXZ8Ix1d2vQA9dOV87EnHSDOkDA8P8lkq4aMMspwm6OYR9Uvpg1H+rj4n98Vwdo7D2wLaPhpv4/U5PXwCeOKI0IwNOx7uPf0qWL4W53yh+1Z0Pae6maBfUuJ9xDbHp6jQDhTfmwEAO6xRgyesBLsr5xn4UXraxCjnpaA189TTJiuxK0Sa0QApBwgH4fH+Z6YDo/Q5QQ+4D7V/Ed/UJV0/xjTACQc+MPcbtmGhFf6k7LXSFOn/RtcbRu/B0sCgNQ6LLvyFk4Y78u0KU4eX+ORhzfj2TZpWvL5vIkThN5yszM02jLURdIDx1Whr/51CPeKgS3BKfcQQUeV4oHucUv8wTVg9pPmw4z6IMKA28SW9BVdmQEDbai5MQhBQr3mrFu+nghAEhEx0D7q5oFvBjisMPxPE8xjVRumDAnS/42UIJVLoFiahE9BcphzkORwtxFvamHmaPQAjPyoxVsB2glP91IqJgK11q0gw/M3deY8etK1mbgYGNPdqerfDiTbypcko/6Zvn0FZNmVllbxHD7p99xV8PIgHVVAe3xmzk1IpCvAmkyCF1yKiTaXxBYrCRtB6E+VMMG2ryfJ57K6X3wO6Dxvdvi+gGQ14CtTNfXzBNloR7c4k5XwjQMdOFyrnw3v04TBoz/wL84ZQFC7vqLvs6hP0H36Tckai10cVSJiZqQO8p0o5+m9Yx/lx/9h6twGaGbVxWEfuyt4L0Nx88CA3YHh4O7P1Hlk5DvEZ/TyX6/qk9Bhiw0A3ygQ3pWAd56M2+srkizil6xn9x4A1rp8/uuA18cfdny8ysA11iM9tXHl2H2OAbRB9apr7F76TPFbmWBCymqyeXJqbBuguRf2DHEeF5lKf73DY3GflqLzyWTpojwYauiznva8NNHe7l6wsV3i812QL4f2kx3P4jHUmGN4RAmkIU9TLhECiXYF9ARETIwgcSwD/2vlc9AvrldA8gtGSJeix4wW0FbVgbfhLtlD62HWvimgvKJXiPgcxFHsBIq0mqy8f/MJTTwojMwcEiLCSKMt9D5bVqTTeRX4h2kx9dEU7d9TClXElfGD5XAHlGaK57xmA32Dc+tqMEtDDarI2cni/FOXSuhxcENC23DkrW4Z4A0U5pbIE7hk9IQq0K1nZ3JOfBeSQwfZLH539Hg2gp9UUbeWyGj8BWHFhwzvNTsUD1E50eh8s6Mo5lfREm7dQzjzQB42PHPwfPHAYvXJWmkAi28rhvSKhXQkhvJtvA+2wUrT1gLByPiltBldOKn/p6IM84CZHLbf0Kfd4jlc5YQ+ryWh0baxVBX1oiIdSQOROeRjJsE7l3H9G/7TsqFvy/ooy/rxGrZyVr4M96fjCIohBttrtyV/h4S20elSBokwiX/rab6WcfATgUYEilm+FzjiVEwawmoyWHN5H1fkO/kJftai0hwxtZV/YdEH5jTBbDD0q61ROqlO2u9D2l0W9G9hOiObg2UcwKkDHsLAjPTm8VzCiJTpoH/qKiedWHcr0BGEMrZzyDNGDp17I1XJ4V+XsOhqMy3R41mOQszv07cdzorky9VLb2owH8kF+NXvA9s19rOpZFOHy4trU54EKMyyWpnkWaOHhjT8LztfK1B/jGuIjDAlDArbD22vMr0/9mLr70ossb3xrNAagD/1RMkhIiAJ41tETEt4N0P6q8l7HZRISBgO0w0iZSUgYC6CONtwmPzEhISEhISFhhPgfihLY+meSzmwAAAAASUVORK5CYII=) ## Enter email address to continue Email address Get OTP ![Av Logo White](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKcAAAAwCAYAAAB0dWoXAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAj0SURBVHgB7VzvdeM2DGf6+r3uBFUmqDvBqRNcbgI7EySdIMoEyU1gZ4IkE8g3QXITyJ0g7gQoEYFnCAJFylYcOcffe0xk/oFACAJBkJQxCQkJCQkJCR8EJ+ZIAQCZ/ffZplObnk5OTu7MnrA0F+znagiaQ8HyNrX/LujnxvL2j3kjkGwv6T6FSeiGFdjEprlNtzZV0Mal2QNEm6MyI4LlJz8Ub5b+cii57oNfzUiBymj/5ZQ+2TQNNMF6t2Z3zMTvDBXCWo6V+WAgy4hpbfu3VqrwvBeTsAUKz6YX6IedhUj3c3hi16UZCYa0nMwyPnjK3Sg1M++IsVrO3KYJ+72xaUUJy86UNijQibUEG9MfBbtGX+7G1JYaFWJqaT5rjZgF2rg6LA/hs0yu/YTq/uhrH0st7rXR+KR7TD11PqESsjKnrNhm7e7h64O4fxcPUfWOBrZDBdT+Zc7yMujGZ9MT0LTSFbu3w21H2yWztkinVHi6Udrhi1TpXXjlRWvTspwi74UU0ccjYqbkNWgq5YXC+xX4R7a54LmEHv08WihCraBWYIfezjs0J0Lu4U1gK3z1oSv8VOBHIdqFXjKQDw48wzo0H/5nhUeuRJnCt0ZTVU7ofqkc5j362Ok2/WKOBCRY6QMVNvHh4U/TH1fs+hv+IdfAhZFQMWOUPrPp2qa/KF2zMsn3hmh+MXUozKVzVucSPC+FwCNvwwtIURyNpWd4Pj+pcWrCwFEko2vsw7Vp8v+3qV0vyctXLMeb2P+/m1o+2Ndv5iMA2sNDSfmNyUxPmtwaLTrKXjztO0MuwKyWieeJT8gyDz/cynmtvL2+Z22mHr7ngX4VLL9i+bNAP5ZaP/pgtKEkDhJgLrJfrQxaA1uObzF3/GNxwa7vSIhzlufoogKEwkobT96rsoCYWJAS4cQus+kPyv7X9ARaeUsLR4+c7jW36ZbRRzzvOwGB7eTN3Te0QLFm16Vtv6Lr/0wtl2XXZPFoAG0/R1q5su9bCk2L6yYYc/CjVGiELFCl8WWvpxD23WQb1XIqZaXSl1lPvluWE5ryCobtIC4ceNVFY/Q+JwkvY1lr0/TnEN/ZdazfWSjXFx31UQFyMwzuzbZPGL7BUQDdAuxX71AYWXTXDvlEK+cUcj3QMiznawIBf5isIvqW2D+02mtKnE7RRecYhnX5dt0pw8HS1OvsiO8mDp/YtXPMvyj1UGmc0uIwuTJ7AJoxPxxuv4hy5Cs3/YGTDierG0ZjZQYAuQ8/3BRTu1CrQJu1UWRK1j0P0Rm15SSzn7GstbYRAf0pnG1SWpsw3Tmj+8P3wf8ymVrxHWYQN4OOxUTwNZF5PfDArs/Y9XWgXZ8Ix1d2vQA9dOV87EnHSDOkDA8P8lkq4aMMspwm6OYR9Uvpg1H+rj4n98Vwdo7D2wLaPhpv4/U5PXwCeOKI0IwNOx7uPf0qWL4W53yh+1Z0Pae6maBfUuJ9xDbHp6jQDhTfmwEAO6xRgyesBLsr5xn4UXraxCjnpaA189TTJiuxK0Sa0QApBwgH4fH+Z6YDo/Q5QQ+4D7V/Ed/UJV0/xjTACQc+MPcbtmGhFf6k7LXSFOn/RtcbRu/B0sCgNQ6LLvyFk4Y78u0KU4eX+ORhzfj2TZpWvL5vIkThN5yszM02jLURdIDx1Whr/51CPeKgS3BKfcQQUeV4oHucUv8wTVg9pPmw4z6IMKA28SW9BVdmQEDbai5MQhBQr3mrFu+nghAEhEx0D7q5oFvBjisMPxPE8xjVRumDAnS/42UIJVLoFiahE9BcphzkORwtxFvamHmaPQAjPyoxVsB2glP91IqJgK11q0gw/M3deY8etK1mbgYGNPdqerfDiTbypcko/6Zvn0FZNmVllbxHD7p99xV8PIgHVVAe3xmzk1IpCvAmkyCF1yKiTaXxBYrCRtB6E+VMMG2ryfJ57K6X3wO6Dxvdvi+gGQ14CtTNfXzBNloR7c4k5XwjQMdOFyrnw3v04TBoz/wL84ZQFC7vqLvs6hP0H36Tckai10cVSJiZqQO8p0o5+m9Yx/lx/9h6twGaGbVxWEfuyt4L0Nx88CA3YHh4O7P1Hlk5DvEZ/TyX6/qk9Bhiw0A3ygQ3pWAd56M2+srkizil6xn9x4A1rp8/uuA18cfdny8ysA11iM9tXHl2H2OAbRB9apr7F76TPFbmWBCymqyeXJqbBuguRf2DHEeF5lKf73DY3GflqLzyWTpojwYauiznva8NNHe7l6wsV3i812QL4f2kx3P4jHUmGN4RAmkIU9TLhECiXYF9ARETIwgcSwD/2vlc9AvrldA8gtGSJeix4wW0FbVgbfhLtlD62HWvimgvKJXiPgcxFHsBIq0mqy8f/MJTTwojMwcEiLCSKMt9D5bVqTTeRX4h2kx9dEU7d9TClXElfGD5XAHlGaK57xmA32Dc+tqMEtDDarI2cni/FOXSuhxcENC23DkrW4Z4A0U5pbIE7hk9IQq0K1nZ3JOfBeSQwfZLH539Hg2gp9UUbeWyGj8BWHFhwzvNTsUD1E50eh8s6Mo5lfREm7dQzjzQB42PHPwfPHAYvXJWmkAi28rhvSKhXQkhvJtvA+2wUrT1gLByPiltBldOKn/p6IM84CZHLbf0Kfd4jlc5YQ+ryWh0baxVBX1oiIdSQOROeRjJsE7l3H9G/7TsqFvy/ooy/rxGrZyVr4M96fjCIohBttrtyV/h4S20elSBokwiX/rab6WcfATgUYEilm+FzjiVEwawmoyWHN5H1fkO/kJftai0hwxtZV/YdEH5jTBbDD0q61ROqlO2u9D2l0W9G9hOiObg2UcwKkDHsLAjPTm8VzCiJTpoH/qKiedWHcr0BGEMrZzyDNGDp17I1XJ4V+XsOhqMy3R41mOQszv07cdzorky9VLb2owH8kF+NXvA9s19rOpZFOHy4trU54EKMyyWpnkWaOHhjT8LztfK1B/jGuIjDAlDArbD22vMr0/9mLr70ossb3xrNAagD/1RMkhIiAJ41tETEt4N0P6q8l7HZRISBgO0w0iZSUgYC6CONtwmPzEhISEhISFhhPgfihLY+meSzmwAAAAASUVORK5CYII=) ## Enter OTP sent to Edit Wrong OTP. ### Enter the OTP Resend OTP Resend OTP in 45s Verify OTP [![Popup Banner](https://imgcdn.analyticsvidhya.com/freecourses_cms/Frame%201437255970%201.jpg)](https://www.analyticsvidhya.com/pinnacleplus/?utm_source=website_property&utm_medium=desktop_popup&utm_campaign=non_technical_blogsutm_content=pinnacleplus%0A)
Readable Markdown
What is one of the most important and core concepts of statistics that enables us to do predictive modeling, and yet it often confuses aspiring data scientists? Yes, I’m talking about the central limit theorem (CLT). It is a powerful statistical concept that every data scientist MUST know. Now, why is that? Well, the central limit theorem (CLT) is at the heart of hypothesis testing – a critical component of the data science and [machine learning lifecycle](https://www.analyticsvidhya.com/blog/2021/05/machine-learning-life-cycle-explained/). That’s right, the idea that lets us explore the vast possibilities of the data we are given springs from CLT. It’s actually a simple notion to understand, yet most data scientists flounder at this question during interviews. In this article, you will get to know all about the Central Limit Throrem its examples, formulas nad practical applications that will clear more concept about Central Limit Theorem. 1. [What is Central Limit Theorem?](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Central_Limit_Theorem_Explained) 2. [Central Limit Theorem with Example](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#What_Is_the_Central_Limit_Theorem_\(CLT\)?) 3. [Central Limit Theorem Formula](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#h-central-limit-theorem-formula) 4. [Distribution of the Variable in the Population](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Distribution_of_the_Variable_in_the_Population) 5. [Conditions of the Central Limit Theorem](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Conditions_of_the_Central_Limit_Theorem) 6. [Significance of the Central Limit Theorem](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Significance_of_the_Central_Limit_Theorem) 7. [Practical Applications of CLT](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Practical_Applications_of_CLT) 8. [Assumptions Behind the Central Limit Theorem](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Assumptions_Behind_the_Central_Limit_Theorem) 9. [What Is Standard Error?](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#What_Is_Standard_Error?) 10. [Implementing the Central Limit Theorem in R](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Implementing_the_Central_Limit_Theorem_in_R) 11. [Conclusion](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Conclusion) 12. [Frequently Asked Questions?](https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/#Frequently_Asked_Questions) ## What is Central Limit Theorem? The Central Limit Theorem (CLT) states that when large enough random samples are taken from any population (regardless of its original distribution), the distribution of the sample means will approximate a normal distribution (bell curve), with the mean equal to the population mean and the standard deviation decreasing as sample size increases. ## Central Limit Theorem with Example Let’s understand the central limit theorem with the help of an example. This will help you intuitively grasp how CLT works underneath. Consider that there are 15 sections in the science department of a university, and each section hosts around 100 students. Our task is to calculate the average weight of students in the science department. Sounds simple, right? The approach I get from aspiring data scientists is to simply calculate the average: - First, measure the weights of all the students in the science department. - Add all the weights. - Finally, divide the total sum of weights by the total number of students to get the average. But what if the size of the data is humongous? Does this approach make sense? Not really – measuring the weight of all the students will be a very tiresome and long process. So, what can we do instead? Let’s look at an alternate approach. - First, draw groups of students at random from the class. We will call this a sample. We’ll draw multiple samples, each consisting of 30 students. ![data and sample sizes \| central limit theorem](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/capture161268-673b1d329f670.webp) - Now, calculate the individual mean of these samples. - Then, calculate the mean of these sample means. - This value will give us the approximate mean weight of the students in the science department. - Additionally, the histogram of the sample mean weights of students will resemble a bell curve (or normal distribution). ## Central Limit Theorem Formula The shape of the sampling distribution of the mean can be determined without repeatedly sampling a population. The parameters are based on the population: - The mean (μxˉ)(*μ**x*ˉ​) of the sampling distribution equals the mean of the population (μ)(*μ*). - The standard deviation (σxˉ)(*σ**x*ˉ​) of the sampling distribution is the population standard deviation (σ)(*σ*) divided by the square root of the sample size (n)(*n*​). **Notation:** XĢ„ ~ N(μ, σ/√n) **Where:** - Xˉ*X*ˉ is the sampling distribution of the sample means. - ∼∼ means ā€œfollows the distribution.ā€ - N*N* is the normal distribution. - μ*μ* is the mean of the population. - σ*σ* is the standard deviation of the population. - n*n* is the sample size. ## Distribution of the Variable in the Population Part of the definition for the central limit theorem states, ā€œregardless of the variable’s distribution in the population.ā€ This part is easy! In a population, the values of a variable can follow different probability distributions. These distributions can range from normal, left-skewed, right-skewed, and uniform, among others. - **Normal:** It is also known as the Gaussian distribution. It is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. - **Right-Skewed:** It is also known as the positively skewed. Most of the data lie to the right/positive side of the graph peak. - **Left-Skewed:** Most of the data lies on the left side of the graph at its peak than on its right. - **Uniform:** It is a condition when the data is equally distributed across the graph. - This part of the definition refers to the distribution of the variable’s values in the population from which you draw a random sample. The central limit theorem applies to almost all types of **[probability distributions](https://www.analyticsvidhya.com/blog/2017/09/6-probability-distributions-data-science/)**, but there are exceptions. For example, the population must have a finite variance. That restriction rules out the Cauchy distribution because it has an infinite variance. Additionally, the central limit theorem applies to independent, identically distributed variables. In other words, the value of one observation does not depend on the value of another observation. And the distribution of that variable must remain constant across all measurements. ## Conditions of the Central Limit Theorem The central limit theorem states that the sampling distribution of the mean will always follow a normal distribution under the following conditions: - The sample size is **sufficiently large**. This condition is usually met if the size of the sample is *n* ≄ 30. - The samples are **independent and identically distributed, i.e., [random variables](https://www.analyticsvidhya.com/blog/2021/05/understanding-random-variables-their-distributions/)**. The sampling should be random. - The population’s distribution has a **finite variance**. The central limit theorem doesn’t apply to distributions with infinite variance. ## Significance of the Central Limit Theorem The central limit theorem has both, statistical significance as well as practical applications. Isn’t that the sweet spot we aim for when we’re learning a new concept? As a data scientist, you should be able to deeply understand this theorem. You should be able to explain it and understand why it’s so important. Criteria for it to be valid and the details about the statistical inferences that can be made from it. We’ll look at both aspects to gauge where we can use them. #### Statistical Significance of CLT ![Statistical Significance of CLT](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/24-1.png) Analyzing data involves statistical methods like hypothesis testing and constructing confidence intervals. These methods assume that the population is **[normally distributed](https://www.analyticsvidhya.com/blog/2021/05/normal-distribution-an-ultimate-guide/)**. In the case of unknown or non-normal distributions, we treat the sampling distribution as normal according to the central limit theorem. If we increase the samples drawn from the population, the **[standard deviation](https://www.analyticsvidhya.com/blog/2024/06/standard-deviation-in-excel/)** of sample means will decrease. This helps us estimate the mean of the population much more accurately. Also, the sample mean can be used to create the range of values known as a confidence interval (that is likely to consist of the population mean). ## Practical Applications of CLT ![central limit theorem used for political prediction](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/25-673b1d30f3a7b.webp) The central limit theorem has many applications in different fields. Political/election polls are prime CLT applications. These polls estimate the percentage of people who support a particular candidate. You might have seen these results on news channels that come with confidence intervals. The central limit theorem helps calculate the same. Confidence interval, an application of CLT, is used to calculate the mean family income for a particular region. ## Assumptions Behind the Central Limit Theorem Before we dive into the implementation of the central limit theorem, it’s important to understand the assumptions behind this technique: - The **data must follow the randomization condition**. It must be sampled randomly - **Samples should be independent of each other.** One sample should not influence the other samples - **Sample size should be not more than 10% of the population** when sampling is done without replacement - The **sample size should be sufficiently large**. Now, how will we figure out how large this size should be? Well, it depends on the population. When the population is skewed or asymmetric, the sample size should be large. If the population is symmetric, then we can draw small samples as well. In general, **a sample size of 30 is considered sufficient when the population is symmetric**. The mean of the sample means is denoted as: **µ XĢ„ \= µ** where, - µ XĢ„ = Mean of the sample means - µ= Population mean And the standard deviation of the sample mean is denoted as: σ XĢ„ = σ/sqrt(n) where, - σ XĢ„ = Standard deviation of the sample mean - σ = Standard deviation of the population - n = sample size And that’s it for the concept behind the central limit theorem. Time to fire up RStudio and dig into CLT’s implementation\! The central limit theorem has important implications in applied machine learning. This theorem does inform the solution to linear algorithms such as linear regression, but not for complex models like artificial neural networks(deep learning) because they are solved using numerical optimization methods. ## What Is Standard Error? It is also an important term that spurs from the sampling distribution, and it closely resembles the Central limit theorem. The **standard error.** The **SD** of the **distribution** is formed by **sample means**. **Standard error** is used for almost all statistical tests. This is because it is a probabilistic measure that shows how well you approximated the truth. It decreases when the sample size increases. The bigger the samples, the better the approximation of the population. ## Implementing the Central Limit Theorem in R Are you excited to see how we can code the central limit theorem in R? Let’s dig in then. #### Understanding the Problem Statement A pipe manufacturing organization produces different kinds of pipes. We are given the monthly data of the wall thickness of certain types of pipes. You can download the data [**here**](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/Clt-data.csv)**.** The organization wants to analyze the data by performing **[hypothesis testing](https://www.analyticsvidhya.com/blog/2021/07/hypothesis-testing-made-easy-for-the-data-science-beginners/)** and constructing confidence intervals to implement some strategies in the future. The challenge is that the distribution of the data is not normal. *Note: This analysis works on a few assumptions and one of them is that the data should be normally distributed.* #### Solution Methodology The central limit theorem will help us get around the problem of this data where the population is not normal. Therefore, we will simulate the CLT on the given dataset in R step-by-step. So, let’s get started. First, import the CSV file in R and then validate the data for correctness: ``` #Step 1 - Importing Data #_______________________________________________________ #Importing the csv data data<-read.csv(file.choose()) #Step 2 - Validate data for correctness #______________________________________________________ #Count of Rows and columns dim(data) #View top 10 rows of the dataset head(data,10) ``` **Output:** ``` #Count of Rows and columns 9000 1 #View top 10 rows of the dataset Wall.Thickness 1 12.35487 2 12.61742 3 12.36972 4 13.22335 5 13.15919 6 12.67549 7 12.36131 8 12.44468 9 12.62977 10 12.90381 #View last 10 rows of the dataset Wall.Thickness 8991 12.65444 8992 12.80744 8993 12.93295 8994 12.33271 8995 12.43856 8996 12.99532 8997 13.06003 8998 12.79500 8999 12.77742 9000 13.01416 ``` Next, **calculate the population mean and plot all the observations of the data.** ``` #Step 3 - Calculate the population mean and plot the observations #___________________________________________________________________ #Calculate the population mean mean(data$Wall.Thickness) #Plot all the observations in the data hist(data$Wall.Thickness,col = "pink",main = "Histogram for Wall Thickness",xlab = "wall thickness") abline(v=12.8,col="red",lty=1) ``` **Output:** ``` #Calculate the population mean [1] 12.80205 ``` ![histogram for wall thickness](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/rplot1-673b1d306e0ed-1.webp) See the red vertical line above? That’s the population mean. We can also see from the above plot that the population is not normal, right? Therefore, we need to draw sufficient samples of different sizes and compute their means (known as sample means). We will then plot those sample means to get a normal distribution. In our example, we will draw m sample of size n sufficient samples of size 10, calculate their means, and plot them in R. I know that the minimum sample size taken should be 30, but let’s just see what happens when we draw 10: ``` #We will take sample size=10, samples=9000 #Calculate the arithmetice mean and plot the mean of sample 9000 times s10<-c() n=9000 for (i in 1:n) { s10[i] = mean(sample(data$Wall.Thickness,10, replace = TRUE))} hist(s10, col ="lightgreen", main="Sample size =10",xlab = "wall thickness") abline(v = mean(s10), col = "Red") abline(v = 12.8, col = "blue") ``` ![sample size for testing central limit theorem](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/rplot3-673b1d2fa41cc-1.webp) Now, we know that we’ll get a very nice bell-shaped curve as the sample sizes increase. Let us now increase our sample size and see what we get: ``` #We will take sample size=30, 50 & 500 samples=9000 #Calculate the arithmetice mean and plot the mean of sample 9000 times s30 <- c() s50 <- c() s500 <- c() n =9000 for ( i in 1:n){ s30[i] = mean(sample(data$Wall.Thickness,30, replace = TRUE)) s50[i] = mean(sample(data$Wall.Thickness,50, replace = TRUE)) s500[i] = mean(sample(data$Wall.Thickness,500, replace = TRUE)) } par(mfrow=c(1,3)) hist(s30, col ="lightblue",main="Sample size=30",xlab ="wall thickness") abline(v = mean(s30), col = "red") hist(s50, col ="lightgreen", main="Sample size=50",xlab ="wall thickness") abline(v = mean(s50), col = "red") hist(s500, col ="orange",main="Sample size=500",xlab ="wall thickness") abline(v = mean(s500), col = "red") ``` ![sample distribution, normal distribution](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/05/rplot6-673b1d2fe6592-1.webp) Here, we get a good bell-shaped curve, and the sampling distribution approaches the normal distribution as the sample sizes increase. Therefore, we can consider the sampling distributions as normal, and the pipe manufacturing organization can use these distributions for further analysis. You can also play around by taking different sample sizes and drawing a different number of samples. Let me know how it works out for you\! ## Conclusion The central limit theorem is quite an important concept in statistics and, consequently, data science, which also helps in understanding other properties such as **[skewness and kurtosis](https://www.analyticsvidhya.com/blog/2021/05/shape-of-data-skewness-and-kurtosis/)**. I cannot stress enough how critical it is to brush up on your statistics knowledge before getting into data science or even sitting for a data science interview. I recommend taking the [**Introduction to Data Science course**](https://courses.analyticsvidhya.com/courses/introduction-to-data-science-2?utm_source=blog&utm_medium=statistics-101-introduction-central-limit-theorem) – it’s a comprehensive look at statistics before introducing data science. #### **Key Takeaways** - The central limit theorem says that the sampling distribution of the mean will always be normally distributed until the sample size is large enough. - Sampling should be random. The samples should not relate to one another. One sample shouldn’t affect the others. ## Frequently Asked Questions? **Q1. Is there a formula for central limit theorem?** A. Yes, the central limit theorem (CLT) does have a formula. It states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. **Q2. What are the three points of the central limit theorem?** A. The three key points of the central limit theorem are: 1. Regardless of the shape of the population distribution, the sampling distribution of the sample mean will approach a normal distribution as the sample size increases. 2. The mean of the sampling distribution will be equal to the population mean. 3. The standard deviation of the sampling distribution (also known as the standard error) decreases as the sample size increases. **Q3. Why central limit theorem is called central?** A. The central limit theorem is called ā€œcentralā€ because it is fundamental in statistics and serves as a central pillar for many statistical techniques. It is central in the sense that it allows statisticians to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal. **Q4. What is central limit type theorem?** A. A central limit type theorem is a generalization or extension of the classical central limit theorem to situations where the conditions of the classical CLT may not hold exactly. These theorems provide conditions under which the distribution of a sum or average of independent and identically distributed random variables approaches a normal distribution, even if the variables themselves are not identically distributed or if they have heavy-tailed distributions. [![Himanshi Singh](https://av-eks-lekhak.s3.amazonaws.com/media/lekhak-profile-images/converted_image_WxTrFfG.webp)](https://www.analyticsvidhya.com/blog/author/hs13/) I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together. Thanks for stopping by my profile - hope you found something you liked :)
Shard107 (laksa)
Root Hash2772082033814679907
Unparsed URLcom,analyticsvidhya!www,/blog/2019/05/statistics-101-introduction-central-limit-theorem/ s443