âšď¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 3.4 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://andrewcharlesjones.github.io/journal/james-stein-estimator.html |
| Last Crawled | 2026-01-11 00:28:56 (3 months ago) |
| First Indexed | 2021-04-04 23:32:36 (5 years ago) |
| HTTP Status Code | 200 |
| Content | |
| Meta Title | Andy Jones |
| Meta Description | The James-Stein estimator dominates the MLE by sharing information across seemingly unrelated variables. |
| Meta Canonical | null |
| Boilerpipe Text | The James-Stein estimator dominates the MLE by sharing information across seemingly unrelated variables.
Admissibility
Suppose we want to estimate a parameter (or parameter vector) $\theta$ in some statistical model. Broadly, we do this by constructing an âestimatorâ $\theta(x)$, which is a function of the data $x$. Let $\theta^\star$ denote the true value of $\theta$ (the one that actually corresponds to the data generating process).
Given a set of data observations $x$, we can assess the quality of the estimator using a loss function $\mathcal{L}(\theta^\star, \hat{\theta})$, which compares the true $\theta^\star$ to our estimate. Lower loss values typically correspond a âbetterâ estimator. Example loss functions are the squared (L2) error $\mathcal{L}(\theta^\star, \hat{\theta}) = ||\theta - \theta^\star||_2^2$ and the L1 error $\mathcal{L}(\theta^\star, \hat{\theta}) = ||\theta - \theta^\star||_1$.
If we want to assess the estimator over
all possible
data (not just one set of observations), we can compute the estimatorâs
risk
, which is the expectation of $\mathcal{L}$ over the data distribution $p(x | \theta^\star)$.
Specifically, given a loss function $\mathcal{L}$, the risk function is defined as
\[R(\theta^\star, \hat{\theta}) = \mathbb{E}_{p(x | \theta^\star)}[\mathcal{L}(\theta^\star, \hat{\theta}(x))].\]
Now, we can use the risk function to compare different estimators. Suppose we have two different estimators $\hat{\theta}^{(1)}$ and $\hat{\theta}^{(2)}$. For any true parameter value $\theta^\star$, we can then compute the risk of each, $R(\theta^\star, \hat{\theta}^{(1)})$ and $R(\theta^\star, \hat{\theta}^{(2)})$, and compare them.
Often, one estimator will have a lower risk for some values of $\theta^\star$ and a higher risk for others. However, we say that an estimator $\hat{\theta}^{(1)}$
dominates
another estimator $\hat{\theta}^{(2)}$ if $\hat{\theta}^{(1)}$ doesnât has a higher risk for any $\theta^\star$ and has a lower risk for at least one value of $\theta^\star$. In other words, $\hat{\theta}^{(1)}$ dominates another estimator $\hat{\theta}^{(2)}$ if:
$R(\theta^\star, \hat{\theta}^{(1)}) \leq R(\theta^\star, \hat{\theta}^{(2)}) \;\;\; \forall \theta^\star \in \Theta.$
$\exists \theta^\star \text{ such that } R(\theta^\star, \hat{\theta}^{(1)}) < R(\theta^\star, \hat{\theta}^{(2)})$
Finally, an estimator is
admissible
if itâs not dominated by any other estimator. Otherwise, we say itâs
inadmissible
.
Stein phenomenon
Consider $p$ Gaussian random variables $X_1, \dots, X_p$, where
\[X_i \sim \mathcal{N}(\mu_i, \sigma^2), \;\;\; i = 1, \dots, p\]
where $\sigma^2$ is known, and weâd like to estimate each $\mu_i$.
Suppose our data consists of one observation of each variable $x_1, \dots, x_p$. With such little information to work with, under the squared error loss, the least squares estimator (maximum likelihood estimator AKA âordinaryâ estimator AKA âusualâ estimator) would simply estimate each mean as the data pointâs value:
\[\hat{\mu_i}^{(LS)} = x_i.\]
However, Charles Stein discovered interesting and surprising result:
The least squares estimator is
inadmissible
with respect to the squared error loss when $p \geq 3$. In other words, the least squares estimator is
dominated
by another estimator.
Prof. John Carlos Baez summarizes this unintuitive result nicely in this Twitter thread:
I have a Gaussian distribution like this in 2d. You know its variance is 1 but don't know its mean. I randomly pick a point (xâ,xâ) according to this distribution and tell you. You try to guess the mean.
Your best guess is (xâ,xâ).
But this is not true in 3d!!!
(1/n)
pic.twitter.com/pWPD8sFmZ6
â John Carlos Baez (@johncarlosbaez)
August 25, 2020
So, whatâs this other estimator?
James-Stein estimator
The James-Stein estimator (concocted by Charles Stein and Willard James) is
\[\hat{\mu}_i^{(JS)} = \left( 1 - \frac{(p - 2) \sigma^2}{||\mathbf{x}||_2^2} \right) x_i.\]
where $\mathbf{x}$ is the $p$-vector of observations.
Notice that this estimator is essentially multiplying each $x_i$ by a term $\left( 1 - \frac{(p - 2) \sigma^2}{||\mathbf{x}||_2^2} \right)$ that depends on the other variables as well.
To start building intuition about what this estimator is doing, consider the case when $p = 3$, and $\sigma^2 = 1$. Then the James-Stein estimator reduces to
\[\hat{\mu}_i^{(JS)} = \left( 1 - \frac{1}{||\mathbf{x}||_2^2} \right) x_i.\]
Since $||\mathbf{x}||_2^2 \geq 0$, we know that $\left( 1 - \frac{1}{||\mathbf{x}||_2^2} \right) < 1$. If $||\mathbf{x}||_2^2 > 1$, then the estimator shrinks $\mu_i$ toward 0 compared to the least squares estimator. If $||\mathbf{x}||_2^2 < 1$, the estimator adjusts the LS estimator even further, and even performs a sign flip.
More generally, if $||\mathbf{x}||_2^2 > (p - 2) \sigma^2$, then the James-Stein estimator shrinks each $\mu_i$ toward zero. In other words, if the overall (L2) magnitude of the data vector $\mathbf{x}$ exceeds the variance (multiplied by $p-2$), the James-Stein estimator âregularizesâ the estimates $\mu_i$ by shrinking it toward zero.
Another way to think about the James-Stein estimator is as a âshrinkageâ estimator. The James-Stein estimator has the effect of nudging each individual $\hat{\mu}_i$ toward the overall average of the data points, $\bar{\mathbf{x}} = \frac1n \sum\limits_{i=1}^n x_i$. Brad Efron and Carl Morris have a nice figure demonstrating this in their paper
âSteinâs Paradox in Statisticsâ
. In the figure below, the top row shows the batting averages for 18 baseball players, and the bottom row shows the corresponding James-Stein estimator of each. Notice how the estimates move closer together, thereby sharing information across them.
Relationship to empirical Bayes
The James-Stein estimator also has strong connections to the empirical Bayes methodology. Under an empirical Bayes framework, instead of completely marginalizing out the prior (as in a fully Bayesian treatment), we estimate the prior from the data.
For example, consider again $p$ Gaussian random variables $X_1, \dots, X_p$, where
\[X_i \sim \mathcal{N}(\mu_i, \sigma^2), \;\;\; i = 1, \dots, p\]
where $\sigma^2$ is known, and weâd like to estimate each $\mu_i$. Now, letâs place a shared normal prior on each $\mu_i$:
\[\mu_i \sim \mathcal{N}(0, \tau^2), \;\;\; i = 1, \dots, p.\]
We could manually set $\tau^2$ to some value, e.g. $\tau^2 = 1$, or we could place another prior on it and integrate it out.
On the other hand, the empirical Bayes approach seeks to estimate $\tau^2$ from the data itself, leveraging information across observations.
First, notice that the posterior $p(\mu_i | X)$ is
\[p(\mu_i | X_i) = \frac{p(X_i | \mu_i) p(\mu_i)}{\int p(X_i | \mu_i) p(\mu_i) d\mu_i)}.\]
After some arithmetic and extra work to solve the integral (e.g., through completing the square), we see that the posterior is also Gaussian:
\[\mu_i | X_i \sim \mathcal{N}\left(\frac{\tau^2}{\tau^2 + \sigma^2} X_i, \frac{\tau^2}{\tau^2 + \sigma^2} \right).\]
So the âBayesâ estimator (if we just take the expectation of the posterior above) is
\[\hat{\mu}_i^{(\text{Bayes})} = \frac{\tau^2}{\tau^2 + \sigma^2} X_i.\]
Notice that this estimator is effectively shrinking $X_i$ toward zero.
Now, what if we want a more principled way to set $\tau^2$ above, as opposed to just setting it to some value manually? One way to do this would be to look for an unbiased estimator $\hat{\alpha}$ of the âshrinkage coefficientâ $\frac{\tau^2}{\tau^2 + \sigma^2}$ such that
\[\mathbb{E}_{p(X_i)}[\hat{\alpha}] = \frac{\tau^2}{\tau^2 + \sigma^2}.\]
Notice that the marginal distribution
\[p(X_i) = \int p(X_i | \mu_i) p(\mu_i) d\mu_i\]
is again Gaussian with
\[X_i \sim \mathcal{N}(0, \sigma^2 + \tau^2).\]
Recall that for a Guassian random vector $\mathbf{z} = (z_1, \dots, z_p)^\top$, its squared L2-norm $||\mathbf{z}||_2^2$ will follow a $\chi^2$ distribution with $p$ degrees of freedom. Furthermore, $1 / ||\mathbf{z}||_2^2$ will follow an inverse-$\chi^2$ distribution, again with $p$ degrees of freedom.
In the case of our data, we can notice that the vector of data $\mathbf{X} = (X_1, \dots, X_p)^\top$ has a norm $||\mathbf{X}||_2^2$ that follows a scaled $\chi^2$ (scaled by $\sigma^2 + \tau^2$). Consequently,
\[\frac{1}{||\mathbf{X}||_2^2} \sim \frac{1}{\tau^2 + \sigma^2} \cdot \text{inverse-}\chi^2_p.\]
Notice that the expectation of the James-Stein estimatorâs coefficient (letâs call it $\hat{\alpha}^{(JS)}$) is unbiased in this case!
\begin{align} \mathbb{E}[\hat{\alpha}^{(JS)}] &= \mathbb{E}\left[\left( 1 - \frac{(p - 2) \sigma^2}{||\mathbf{X}||_2^2} \right)\right] \\ &= 1 - \mathbb{E}\left[\left(\frac{(p - 2) \sigma^2}{||\mathbf{X}||_2^2} \right)\right] \\ &= 1 - (p - 2) \sigma^2 \mathbb{E}\left[\frac{1}{||\mathbf{X}||_2^2} \right] \\ &= 1 - (p - 2) \sigma^2 \frac{1}{(\tau^2 + \sigma^2) (p - 2)} \\ &= 1 - \frac{\sigma^2}{\tau^2 + \sigma^2} \\ &= \frac{\tau^2}{\tau^2 + \sigma^2} \\ \end{align}
Thus, although Charles Stein didnât originally derive the James-Stein estimator this way, we see that it also arises as a particular case of performing empirical Bayes.
References
Stein, Charles. âInadmissibility of the usual estimator for the mean of a multivariate distribution.â Proc. Third Berkeley Symp. Math. Statist. Prob. Vol. 1. 1956.
Wikipedia entry on
admissibility
Wikipedia entry on the
James-Stein estimator
Efron, Bradley, and Carl Morris. âSteinâs paradox in statistics.â Scientific American 236.5 (1977): 119-127.
Professor Efronâs
notes on the James-Stein estimator
.
This
StackOverflow post
Blog post
by Austin Rochford on the relationship between empirical Bayes and the James-Stein estimator. |
| Markdown | ### [Andy Jones](https://andrewcharlesjones.github.io/) [Technical blog](https://andrewcharlesjones.github.io/menu/blog.html) [ajones788@gmail.com]()
# James-Stein estimator
The James-Stein estimator dominates the MLE by sharing information across seemingly unrelated variables.
## Admissibility
Suppose we want to estimate a parameter (or parameter vector) \$\\theta\$ in some statistical model. Broadly, we do this by constructing an âestimatorâ \$\\theta(x)\$, which is a function of the data \$x\$. Let \$\\theta^\\star\$ denote the true value of \$\\theta\$ (the one that actually corresponds to the data generating process).
Given a set of data observations \$x\$, we can assess the quality of the estimator using a loss function \$\\mathcal{L}(\\theta^\\star, \\hat{\\theta})\$, which compares the true \$\\theta^\\star\$ to our estimate. Lower loss values typically correspond a âbetterâ estimator. Example loss functions are the squared (L2) error \$\\mathcal{L}(\\theta^\\star, \\hat{\\theta}) = \|\|\\theta - \\theta^\\star\|\|\_2^2\$ and the L1 error \$\\mathcal{L}(\\theta^\\star, \\hat{\\theta}) = \|\|\\theta - \\theta^\\star\|\|\_1\$.
If we want to assess the estimator over *all possible* data (not just one set of observations), we can compute the estimatorâs **risk**, which is the expectation of \$\\mathcal{L}\$ over the data distribution \$p(x \| \\theta^\\star)\$.
Specifically, given a loss function \$\\mathcal{L}\$, the risk function is defined as
\\\[R(\\theta^\\star, \\hat{\\theta}) = \\mathbb{E}\_{p(x \| \\theta^\\star)}\[\\mathcal{L}(\\theta^\\star, \\hat{\\theta}(x))\].\\\]
Now, we can use the risk function to compare different estimators. Suppose we have two different estimators \$\\hat{\\theta}^{(1)}\$ and \$\\hat{\\theta}^{(2)}\$. For any true parameter value \$\\theta^\\star\$, we can then compute the risk of each, \$R(\\theta^\\star, \\hat{\\theta}^{(1)})\$ and \$R(\\theta^\\star, \\hat{\\theta}^{(2)})\$, and compare them.
Often, one estimator will have a lower risk for some values of \$\\theta^\\star\$ and a higher risk for others. However, we say that an estimator \$\\hat{\\theta}^{(1)}\$ **dominates** another estimator \$\\hat{\\theta}^{(2)}\$ if \$\\hat{\\theta}^{(1)}\$ doesnât has a higher risk for any \$\\theta^\\star\$ and has a lower risk for at least one value of \$\\theta^\\star\$. In other words, \$\\hat{\\theta}^{(1)}\$ dominates another estimator \$\\hat{\\theta}^{(2)}\$ if:
1. \$R(\\theta^\\star, \\hat{\\theta}^{(1)}) \\leq R(\\theta^\\star, \\hat{\\theta}^{(2)}) \\;\\;\\; \\forall \\theta^\\star \\in \\Theta.\$
2. \$\\exists \\theta^\\star \\text{ such that } R(\\theta^\\star, \\hat{\\theta}^{(1)}) \< R(\\theta^\\star, \\hat{\\theta}^{(2)})\$
Finally, an estimator is **admissible** if itâs not dominated by any other estimator. Otherwise, we say itâs **inadmissible**.
## Stein phenomenon
Consider \$p\$ Gaussian random variables \$X\_1, \\dots, X\_p\$, where
\\\[X\_i \\sim \\mathcal{N}(\\mu\_i, \\sigma^2), \\;\\;\\; i = 1, \\dots, p\\\]
where \$\\sigma^2\$ is known, and weâd like to estimate each \$\\mu\_i\$.
Suppose our data consists of one observation of each variable \$x\_1, \\dots, x\_p\$. With such little information to work with, under the squared error loss, the least squares estimator (maximum likelihood estimator AKA âordinaryâ estimator AKA âusualâ estimator) would simply estimate each mean as the data pointâs value:
\\\[\\hat{\\mu\_i}^{(LS)} = x\_i.\\\]
However, Charles Stein discovered interesting and surprising result:
> The least squares estimator is **inadmissible** with respect to the squared error loss when \$p \\geq 3\$. In other words, the least squares estimator is **dominated** by another estimator.
Prof. John Carlos Baez summarizes this unintuitive result nicely in this Twitter thread:
> I have a Gaussian distribution like this in 2d. You know its variance is 1 but don't know its mean. I randomly pick a point (xâ,xâ) according to this distribution and tell you. You try to guess the mean.
>
> Your best guess is (xâ,xâ).
>
> But this is not true in 3d!!\!
>
> (1/n) [pic.twitter.com/pWPD8sFmZ6](https://t.co/pWPD8sFmZ6)
>
> â John Carlos Baez (@johncarlosbaez) [August 25, 2020](https://twitter.com/johncarlosbaez/status/1298274201682325509?ref_src=twsrc%5Etfw)
So, whatâs this other estimator?
## James-Stein estimator
The James-Stein estimator (concocted by Charles Stein and Willard James) is
\\\[\\hat{\\mu}\_i^{(JS)} = \\left( 1 - \\frac{(p - 2) \\sigma^2}{\|\|\\mathbf{x}\|\|\_2^2} \\right) x\_i.\\\]
where \$\\mathbf{x}\$ is the \$p\$-vector of observations.
Notice that this estimator is essentially multiplying each \$x\_i\$ by a term \$\\left( 1 - \\frac{(p - 2) \\sigma^2}{\|\|\\mathbf{x}\|\|\_2^2} \\right)\$ that depends on the other variables as well.
To start building intuition about what this estimator is doing, consider the case when \$p = 3\$, and \$\\sigma^2 = 1\$. Then the James-Stein estimator reduces to
\\\[\\hat{\\mu}\_i^{(JS)} = \\left( 1 - \\frac{1}{\|\|\\mathbf{x}\|\|\_2^2} \\right) x\_i.\\\]
Since \$\|\|\\mathbf{x}\|\|\_2^2 \\geq 0\$, we know that \$\\left( 1 - \\frac{1}{\|\|\\mathbf{x}\|\|\_2^2} \\right) \< 1\$. If \$\|\|\\mathbf{x}\|\|\_2^2 \> 1\$, then the estimator shrinks \$\\mu\_i\$ toward 0 compared to the least squares estimator. If \$\|\|\\mathbf{x}\|\|\_2^2 \< 1\$, the estimator adjusts the LS estimator even further, and even performs a sign flip.
More generally, if \$\|\|\\mathbf{x}\|\|\_2^2 \> (p - 2) \\sigma^2\$, then the James-Stein estimator shrinks each \$\\mu\_i\$ toward zero. In other words, if the overall (L2) magnitude of the data vector \$\\mathbf{x}\$ exceeds the variance (multiplied by \$p-2\$), the James-Stein estimator âregularizesâ the estimates \$\\mu\_i\$ by shrinking it toward zero.
Another way to think about the James-Stein estimator is as a âshrinkageâ estimator. The James-Stein estimator has the effect of nudging each individual \$\\hat{\\mu}\_i\$ toward the overall average of the data points, \$\\bar{\\mathbf{x}} = \\frac1n \\sum\\limits\_{i=1}^n x\_i\$. Brad Efron and Carl Morris have a nice figure demonstrating this in their paper [âSteinâs Paradox in Statisticsâ](https://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf). In the figure below, the top row shows the batting averages for 18 baseball players, and the bottom row shows the corresponding James-Stein estimator of each. Notice how the estimates move closer together, thereby sharing information across them.

## Relationship to empirical Bayes
The James-Stein estimator also has strong connections to the empirical Bayes methodology. Under an empirical Bayes framework, instead of completely marginalizing out the prior (as in a fully Bayesian treatment), we estimate the prior from the data.
For example, consider again \$p\$ Gaussian random variables \$X\_1, \\dots, X\_p\$, where
\\\[X\_i \\sim \\mathcal{N}(\\mu\_i, \\sigma^2), \\;\\;\\; i = 1, \\dots, p\\\]
where \$\\sigma^2\$ is known, and weâd like to estimate each \$\\mu\_i\$. Now, letâs place a shared normal prior on each \$\\mu\_i\$:
\\\[\\mu\_i \\sim \\mathcal{N}(0, \\tau^2), \\;\\;\\; i = 1, \\dots, p.\\\]
We could manually set \$\\tau^2\$ to some value, e.g. \$\\tau^2 = 1\$, or we could place another prior on it and integrate it out.
On the other hand, the empirical Bayes approach seeks to estimate \$\\tau^2\$ from the data itself, leveraging information across observations.
First, notice that the posterior \$p(\\mu\_i \| X)\$ is
\\\[p(\\mu\_i \| X\_i) = \\frac{p(X\_i \| \\mu\_i) p(\\mu\_i)}{\\int p(X\_i \| \\mu\_i) p(\\mu\_i) d\\mu\_i)}.\\\]
After some arithmetic and extra work to solve the integral (e.g., through completing the square), we see that the posterior is also Gaussian:
\\\[\\mu\_i \| X\_i \\sim \\mathcal{N}\\left(\\frac{\\tau^2}{\\tau^2 + \\sigma^2} X\_i, \\frac{\\tau^2}{\\tau^2 + \\sigma^2} \\right).\\\]
So the âBayesâ estimator (if we just take the expectation of the posterior above) is
\\\[\\hat{\\mu}\_i^{(\\text{Bayes})} = \\frac{\\tau^2}{\\tau^2 + \\sigma^2} X\_i.\\\]
Notice that this estimator is effectively shrinking \$X\_i\$ toward zero.
Now, what if we want a more principled way to set \$\\tau^2\$ above, as opposed to just setting it to some value manually? One way to do this would be to look for an unbiased estimator \$\\hat{\\alpha}\$ of the âshrinkage coefficientâ \$\\frac{\\tau^2}{\\tau^2 + \\sigma^2}\$ such that
\\\[\\mathbb{E}\_{p(X\_i)}\[\\hat{\\alpha}\] = \\frac{\\tau^2}{\\tau^2 + \\sigma^2}.\\\]
Notice that the marginal distribution
\\\[p(X\_i) = \\int p(X\_i \| \\mu\_i) p(\\mu\_i) d\\mu\_i\\\]
is again Gaussian with
\\\[X\_i \\sim \\mathcal{N}(0, \\sigma^2 + \\tau^2).\\\]
Recall that for a Guassian random vector \$\\mathbf{z} = (z\_1, \\dots, z\_p)^\\top\$, its squared L2-norm \$\|\|\\mathbf{z}\|\|\_2^2\$ will follow a \$\\chi^2\$ distribution with \$p\$ degrees of freedom. Furthermore, \$1 / \|\|\\mathbf{z}\|\|\_2^2\$ will follow an inverse-\$\\chi^2\$ distribution, again with \$p\$ degrees of freedom.
In the case of our data, we can notice that the vector of data \$\\mathbf{X} = (X\_1, \\dots, X\_p)^\\top\$ has a norm \$\|\|\\mathbf{X}\|\|\_2^2\$ that follows a scaled \$\\chi^2\$ (scaled by \$\\sigma^2 + \\tau^2\$). Consequently,
\\\[\\frac{1}{\|\|\\mathbf{X}\|\|\_2^2} \\sim \\frac{1}{\\tau^2 + \\sigma^2} \\cdot \\text{inverse-}\\chi^2\_p.\\\]
Notice that the expectation of the James-Stein estimatorâs coefficient (letâs call it \$\\hat{\\alpha}^{(JS)}\$) is unbiased in this case\!
\\begin{align} \\mathbb{E}\[\\hat{\\alpha}^{(JS)}\] &= \\mathbb{E}\\left\[\\left( 1 - \\frac{(p - 2) \\sigma^2}{\|\|\\mathbf{X}\|\|\_2^2} \\right)\\right\] \\\\ &= 1 - \\mathbb{E}\\left\[\\left(\\frac{(p - 2) \\sigma^2}{\|\|\\mathbf{X}\|\|\_2^2} \\right)\\right\] \\\\ &= 1 - (p - 2) \\sigma^2 \\mathbb{E}\\left\[\\frac{1}{\|\|\\mathbf{X}\|\|\_2^2} \\right\] \\\\ &= 1 - (p - 2) \\sigma^2 \\frac{1}{(\\tau^2 + \\sigma^2) (p - 2)} \\\\ &= 1 - \\frac{\\sigma^2}{\\tau^2 + \\sigma^2} \\\\ &= \\frac{\\tau^2}{\\tau^2 + \\sigma^2} \\\\ \\end{align}
Thus, although Charles Stein didnât originally derive the James-Stein estimator this way, we see that it also arises as a particular case of performing empirical Bayes.
## References
- Stein, Charles. âInadmissibility of the usual estimator for the mean of a multivariate distribution.â Proc. Third Berkeley Symp. Math. Statist. Prob. Vol. 1. 1956.
- Wikipedia entry on [admissibility](https://www.wikiwand.com/en/Admissible_decision_rule)
- Wikipedia entry on the [James-Stein estimator](https://www.wikiwand.com/en/James%E2%80%93Stein_estimator)
- Efron, Bradley, and Carl Morris. âSteinâs paradox in statistics.â Scientific American 236.5 (1977): 119-127.
- Professor Efronâs [notes on the James-Stein estimator](https://statweb.stanford.edu/~ckirby/brad/LSI/chapter1.pdf).
- This [StackOverflow post](https://stats.stackexchange.com/questions/304308/why-is-the-james-stein-estimator-called-a-shrinkage-estimator)
- [Blog post](https://austinrochford.com/posts/2013-11-30-steins-paradox-and-empirical-bayes.html) by Austin Rochford on the relationship between empirical Bayes and the James-Stein estimator. |
| Readable Markdown | null |
| ML Classification | |
| ML Categories | null |
| ML Page Types | null |
| ML Intent Types | null |
| Content Metadata | |
| Language | null |
| Author | Andy Jones |
| Publish Time | 2020-09-05 00:00:00 (5 years ago) |
| Original Publish Time | 2020-09-05 00:00:00 (5 years ago) |
| Republished | No |
| Word Count (Total) | 1,382 |
| Word Count (Content) | 1,368 |
| Links | |
| External Links | 10 |
| Internal Links | 3 |
| Technical SEO | |
| Meta Nofollow | No |
| Meta Noarchive | No |
| JS Rendered | No |
| Redirect Target | null |
| Performance | |
| Download Time (ms) | 55 |
| TTFB (ms) | 55 |
| Download Size (bytes) | 6,118 |
| Shard | 143 (laksa) |
| Root Hash | 2566890010099092343 |
| Unparsed URL | io,github!andrewcharlesjones,/journal/james-stein-estimator.html s443 |