ā¹ļø Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | FAIL | download_stamp > now() - 6 MONTH | 8.1 months ago |
| History drop | FAIL | isNull(history_drop_reason) | disallowed |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://www.dotlayer.org/en/stein-baseball/ |
| Last Crawled | 2025-08-16 09:21:42 (8 months ago) |
| First Indexed | 2021-09-03 03:12:10 (4 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Stein's paradox and batting averages |
| Meta Description | A simple explanation of Stein's paradox through the famous baseball example of Efron and Morris |
| Meta Canonical | null |
| Boilerpipe Text | Statistics is, of course, important and people are interested in applying it.
Charles Stein, in an interview by Y.K. Leong.
There is nothing from my first stats course that I remember more clearly than Prof. Asgharian repeating āI have seen what I should have seenā to describe the idea behind maximum likelihood theory. Given a family of models, maximum likelihood estimation consists of finding which values of the parameters maximize the probability of observing the dataset we have observed. This idea, popularized in part by Sir Ronald A. Fisher, profoundly changed the field of statistics at a time when access to data wasnāt at all like today. In their book Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (Chapter 7) , Bradley Efron and Trevor Hastie write: If Fisher had lived in the era of āapps,ā maximum likelihood estimation might have made him a billionaire. What makes the maximum likelihood estimator (MLE) so useful is that it is consistent (converges in probability to the value it estimates) and efficient (no other estimator has lower asymptotic mean squared error). But watch out, these are statements about the MLEās asymptotic ( n ā ā ) behaviour when the dimension is held fixed . The story is quite different in the finite-sample situation and this is what Steinās paradox reminds us of: in some circumstances, the MLE is bound to be (potentially grossly) sub-optimal. Hence Hastieās and Efronās claim: āmaximum likelihood estimation has shown itself to be an inadequate and dangerous tool in many twenty-first-century applications. [ā¦] unbiasedness can be an unaffordable luxury when there are hundreds or thousands of parameters to estimate at the same time.ā So who is Stein and what is Steinās paradox? Read on. Steinās paradox is attributed to Charles Stein (1920-2016), an American mathematical statistician who spent most of his career at Stanford University. Stein is remembered by his colleagues not only for his exceptional work, but also for his strong belief in basic human rights and passionate social activism. In an interview by Y.K. Leong , the very first question touched his statistical work (verifying weather broadcasts for understanding how weather might affect wartime activities) for the Air Force during World War II; Steinās first words are unequivocal: First I should say that I am strongly opposed to war and
to military work. Our participation in World War II was
necessary in the fight against fascism and, in a way, I am
ashamed that I was never close to combat. However, I
have opposed all wars by the United States since then and
cannot imagine any circumstances that would justify war
by the United States at the present time, other than very
limited defensive actions. According to Stanford News , the man who was called the āEinstein of the Statistics Departmentā, was also the first Stanford professor arrested for protesting apartheid and was often involved in anti-war protests. On the math-stat side, Stein is the author of a very influential/controversial paper entitled Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution (1956) . To understand what it is about, we need to understand what it means for an estimator to be admissible, which is a statement about its performance. For the purpose, let us use Steinās setup where X = ( X 1 , X 2 , X 3 ) is such that each component X i is independent and normally distributed, that is X i ā¼ N ( Īø i , 1 ) . Assessing the performance of an estimator Ė Īø ( X ) = Ė Īø = ( Ė Īø 1 , Ė Īø 2 , Ė Īø 3 ) usually involves a loss function L ( Ė Īø , Īø ) , whose purpose is to quantify how much Ė Īø differs from the true value Īø it estimates. The loss function that interested Stein was the (very popular) squared Euclidean distance
`$$
L(\hat\theta(X), \theta) = ||\hat\theta - \theta ||^2 = \sum_i (\hat\theta_i - \theta_i)^2. $$
Assessing the performance of Ė Īø directly with L has the disadvantage of depending on the observed data X . We overcome such limitation by taking the expectation of L ( Ė Īø ( X ) , Īø ) over X , that is we consider
$$ R(\hat\theta, \theta) = \mathbb{E}_X\ L(\hat\theta(X), \theta). $$
This latter is called the mean squared error, and this is what we try to minimize. An estimator Ė Īø is deemed inadmissible when there exists at least one estimator Ė Ļ for which R ( Ė Īø , Īø ) ā„ R ( Ė Ļ , Īø ) for all Īø , with the equality being strict for at least one value of Īø `. In other words, an estimator is inadmissible if and only if there exists another estimator that dominates it. Stein showed that, for the normal model, the āusual estimatorā, which is the MLE, not only isnāt always the best (in terms of mean squared error), but never is the best! In their 1977 paper Steinās Paradox in Statistics (where the term āSteinās paradoxā seems to have appeared for the first time), Bradley Efron and Carl Morris loosely described it as follows: The best guess about the future is usually obtained by computing the average of past events. Steinās paradox defines circumstances in which there are estimators better than the arithmetic average. In the normal model discussed, the MLE is the arithmetic average: for iid observations Z 1 , Z 2 , Z 3 ā¼ N ( Ļ , 1 ) , the MLE of Ļ is ā i Z i / 3 . Steinās setup is a little different in that each variable X i has its own location parameter Īø i , and so the MLE of ( Īø 1 , Īø 2 , Īø 3 ) simply is ( X 1 , X 2 , X 3 ) . Now, since the variables are independent, it seems natural to think that this is the best estimator we can find. After all, the groups are totally unrelated. Well well, nope. As long as we consider three different location parameters Īø i or more, we can do better! Or, to put it differently, the MLE is optimal only when we consider one or two unique location parameters. In 1961, a few years after the publication of Inadmissibility , Stein and his graduate student, Willard James, strengthened the argument in Estimation with quadratic loss . They provided an explicit estimator out-performing the MLE. Hastie and Efron highlight this as an important moment in the history of statistics: āIt begins the story of shrinkage estimation, in which deliberate biases are introduced to improve overall performance, at a possible danger to individual estimates.ā For the rest of the post, I focus on a concrete application of Steinās theory. In that department, thereās nothing like the famous baseball example of Efron and Morris, which first appeared in their 1975 paper Data Analysis Using Steinās Estimator and its Generalizations . The example involves the batting averages (number of hits/number of times at bat) of 18 major-league baseball players. During the 1970 season, when top batter Roberto Clemente had appeared 45 times at bat, seventeen other players had n = 45 times at bat. The first column (Hits/AB) of the following table provides the batting averages for the eighteen players in question (reproduced with Professor Efronās permission, whom we would like to thank): Let us go back in time and suppose we are at this exact date in 1970. To predict the batting averages of each player for the remainder of the season, it seems natural (with no other information) to use their current batting averages. Again, this coincides with the MLE under the normal model, and this is why the column labeled Ė Ī¼ ( M L E ) i simply shows the same ratios, this time in decimal form. To get a glimpse of why the MLE is sub-optimal, define a class of estimators parametrized by c as follows: let Ė Ī¼ ( c ) = ( Ė Ī¼ ( c ) 1 , ⦠, Ė Ī¼ ( c ) 18 ) where
`
$$ \hat\mu_{i}^{(c)} = \bar{\mu} + c(\hat\mu_i^{(\mathrm{MLE})} - \bar{\mu}),
$$
and Ė Ī¼ is the average of all means, *i.e.* the grand mean. In our case,
$$ \bar{\mu} = \frac{1}{18} \sum_{i=1}^{18} \hat\mu_i^{(\mathrm{MLE})} $$
is the overall batting average of the d = 18 players. Note that Ė Ī¼ ( M L E ) = Ė Ī¼ ( 1 ) , and so the MLE is indeed part of the class of estimators we consider. Now, define the James-Stein estimator as the one minimizing the mean squared error, that is $\hat\mu i^{(JS)} = \hat\mu {i}^{(c^*)}$ for
$$ c^* := {\rm argmin}_{c \in [0,1]}\ R(\hat\mu^{Ā©}, \mu), $$
where $\mu = (\mu 1, \dots, \mu {18})$ represents the *true/ideal* batting averages of the players (may they exist) that we are trying to estimate. As the definition of Ė Ī¼ Ā© ` suggests, Steinās method consists of shrinking the individual batting average of each player towards their grand average: in Efronās and Morrisā words, if a playerās hitting record is better than the grand average, then it must be reduced; if he is not hitting as well as the grand average, then his hitting record must be increased. If the MLE Ė Ī¼ ( M L E ) i were to minimize the mean squared error R ( Ė Ī¼ ( c ) , μ ) , it would mean c ā = 1 . This is where Steinās result comes into play: it states that if c ā minimizes the mean squared error, then it must be the case that c ā < 1 and so Ė Ī¼ ( J S ) ā Ė Ī¼ ( M L E ) . The method proposed by James and Stein (1961) to estmimate c ā leads to the value c ā = .212 , and since in our example the grand average is Ė Ī¼ = .265 , we get
`
$$ \hat\mu_i^{(\mathrm{JS})} = .265 + (.212)(\hat\mu_i^{(\mathrm{MLE})} - .265)
$$
`
Efron and Morris provide a very intuitive illustration of the effect of shrinkage in this particular example. As we can see, everyone is pulled by the grand average of batting averages. Coming back to 2018, let us look at what actually happened during the remainder of the 1970 season. The tableās bold column provides the batting averages computed for the rest of the season. Let us say that those, which were often computed on more than 200 (new) times at bat, are good approximations of the true/ideal batting averages. It turns out that, for 16 of the 18 players, Ė Ī¼ ( J S ) i actually does a better job than Ė Ī¼ ( M L E ) i at predicting μ i . It does a better job in terms of total squared error as well. It can seem puzzling that, to estimate Clementeās batting average (the highest), using Alvisā batting average (the lowest) should help. According to our formulas, if Alvisā batting average Ė Ī¼ ( M L E ) A l v i s was different, then our guess Ė Ī¼ ( J S ) C l e m e n t e for Clemente would be different as well (because Ė Ī¼ would be different). It becomes more intuitive when you realize that the value of c ā actually depends on n , the number of observations available to us (times at bat in our example). As n increases, the optimal value c ā gets closer to 1 , and so less shrinkage is applied. Steinās Theorem states that c ā < 1 no matter what; yet, it might be very close to one, and so Ė Ī¼ ( M L E ) i and Ė Ī¼ ( J S ) i might be extremely similar. This last property of the James-Stein estimator is similar to that of bayesian estimators, which relies more and more on the data (and less on the prior distribution) as the number of observations increases. Indeed, Efron and Morris identified Steinās method, designed under a strictly frequentist regime, as an empirical Bayes rule (inference procedures where the prior is estimated from the data as well, instead of being set by the user). For more details on Steinās Paradox and its relation to empirical Bayes methods, I recommend the book Computer Age Statistical Inference: Algorithms, Evidence, and Data Science by Efron and Hastie, which provides, among many otherthings, a nice historical perspective of modern methods used by statisticians and a gentle introduction to the connections and disagreements relating and opposing the two main statistical theories: frequentism and bayesianism. I have to give some credit to Laurent Caron, StĆ©phane Caron and Simon Youde for their useful comments on preliminary versions of the post. $$ |
| Markdown | null |
| Readable Markdown | null |
| Shard | 96 (laksa) |
| Root Hash | 14454927226388175096 |
| Unparsed URL | org,dotlayer!www,/en/stein-baseball/ s443 |