ā¹ļø Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 1 months ago (distributed domain, exempt) |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator |
| Last Crawled | 2026-03-19 01:38:09 (1 month ago) |
| First Indexed | 2015-05-31 01:26:37 (10 years ago) |
| HTTP Status Code | 200 |
| Meta Title | JamesāStein estimator - Wikipedia |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | From Wikipedia, the free encyclopedia
The
JamesāStein estimator
is an
estimator
of the
mean
for a multivariate
random variable
.
It arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,
[
1
]
when
Charles Stein
reached a relatively shocking conclusion that while the then-usual estimate of the mean, the
sample mean
, is
admissible
when
, it is
inadmissible
when
. Stein proposed a possible improvement to the estimator that
shrinks
the sample mean
towards a more central mean vector
(which can be chosen
a priori
or commonly as the "average of averages" of the sample means, given all samples share the same size). This observation is commonly referred to as
Stein's example or paradox
. In 1961,
Willard James
and Charles Stein simplified the original process.
[
2
]
It can be shown that the JamesāStein estimator
dominates
the "ordinary"
least squares
approach in the sense that the JamesāStein estimator has a lower
mean squared error
than the "ordinary" least squares estimator for all
. This is possible because the JamesāStein estimator is
biased
, so that the
GaussāMarkov theorem
does not apply.
Similar to the
Hodges' estimator
, the James-Stein estimator is
superefficient
and
non-regular
at
.
[
3
]
Let
where the vector
is the unknown
mean
of
, which is
-variate normally distributed
and with known
covariance matrix
.
We are interested in obtaining an estimate,
, of
, based on a single observation,
, of
.
In real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent
Gaussian noise
. Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the
least squares
estimator, which is
.
Stein demonstrated that in terms of
mean squared error
, the least squares estimator,
, is sub-optimal to shrinkage based estimators, such as the
JamesāStein estimator
,
.
[
1
]
The paradoxical result, that there is a (possibly) better and never any worse estimate of
in mean squared error as compared to the sample mean, became known as
Stein's example
.
MSE (R) of least squares estimator (ML) vs. JamesāStein estimator (JS). The JamesāStein estimator gives its best estimate when the norm of the actual parameter vector Īø is near zero.
If
is known, the JamesāStein estimator is given by
James and Stein showed that the above estimator
dominates
for any
, meaning that the JamesāStein estimator has a lower
mean squared error
(MSE) than the
maximum likelihood
estimator.
[
2
]
[
4
]
By definition, this makes the least squares estimator
inadmissible
when
.
Notice that if
then this estimator simply takes the natural estimator
and shrinks it towards the origin
0
. In fact this is not the only direction of
shrinkage
that works. Let
be an arbitrary fixed vector of dimension
. Then there exists an estimator of the JamesāStein type that shrinks toward
, namely
The JamesāStein estimator dominates the usual estimator for any
. A natural question to ask is whether the improvement over the usual estimator is independent of the choice of
. The answer is no. The improvement is small if
is large. Thus to get a very great improvement some knowledge of the location of
is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge
a priori
. But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that
any
finite guess
improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite
, surely a poor guess.
Seeing the JamesāStein estimator as an
empirical Bayes method
gives some intuition to this result: One assumes that
itself is a random variable with
prior distribution
, where
is estimated from the data itself. Estimating
only gives an advantage compared to the
maximum-likelihood estimator
when the dimension
is large enough; hence it does not work for
. The JamesāStein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.
[
5
]
A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the JamesāStein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is
admissible
. A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The JamesāStein estimator always improves upon the
total
MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the JamesāStein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the JamesāStein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.
The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a
telecommunication
setting, it is reasonable to combine
channel
tap measurements in a
channel estimation
scenario, as the goal is to minimize the total channel estimation error.
The JamesāStein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the
entropic uncertainty principle
for more than three measurements.
[
6
]
An intuitive derivation and interpretation is given by the
Galtonian
perspective.
[
7
]
Under this interpretation, we aim to predict the population means using the
imperfectly measured sample means
. The equation of the
OLS
estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the JamesāStein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary).
Positive-part JamesāStein shrinkage operator
[
edit
]
Despite the intuition that the JamesāStein estimator shrinks the unbiased least-squares estimator
toward
, the estimator actually moves
away
from
for small values of
as the multiplier on
is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the
positive-part James-Stein shrinkage operator
:
where
, and apply this operator component-wise to the (unbiased) least-squares estimator of
(with known
) for each
:
The resulting estimator
of
is called the
positive-part JamesāStein estimator
and can be written in vector notation as:
This estimator has a smaller risk than the basic JamesāStein estimator for
. It follows that the basic JamesāStein estimator is itself
inadmissible
.
[
8
]
It turns out, however, that the positive-part estimator is also inadmissible.
[
4
]
This follows from a more general result which requires admissible estimators to be smooth.
Positive-part JamesāStein shrinkage and model selection
[
edit
]
Recall the initial setup:
where the variance coefficient
is known and we wish to estimate the unknown (mean response) coefficient
. In the more general setting of
linear regression
, the mean response is instead given by
where
is a matrix with
columns. As in the previous section,
we can use the
positive-part James-Stein shrinkage operator
to obtain a
shrinkage estimator
of
. In particular, any
that satisfies the
James-Stein
KKT conditions
:
[
9
]
is a (positive-part) James-Stein estimator of
with the useful property that it performs both shrinkage and
model selection
simultaneously. This is because, depending on the value of the known
, there is a (possibly empty) set
such that
In other words, some (or all) of the
could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model.
The JamesāStein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the "ordinary" or least squares estimator is often
inadmissible
for simultaneous estimation of several parameters. This effect has been called
Stein's phenomenon
, and has been demonstrated for several different problem settings, some of which are briefly outlined below.
James and Stein demonstrated that the estimator presented above can still be used when the variance
is unknown, by replacing it with the standard estimator of the variance,
. The dominance result still holds under the same condition, namely,
.
[
2
]
All the results above are for the case when only a single observation vector
y
is available. For the more general case when
vectors are available, we consider the estimator
where
is the
-length average of the
observations, so that,
.
The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.
[
10
]
A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a
linear regression
technique which outperforms the standard application of the LS estimator.
[
10
]
Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.
[
11
]
It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.
[
4
]
Admissible decision rule
Hodges' estimator
Shrinkage estimator
Regular estimator
KL divergence
^
a
b
Stein, C.
(1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution",
Proc. Third Berkeley Symp. Math. Statist. Prob.
, vol.Ā 1, pp.Ā
197ā
206,
MR
Ā
0084922
,
Zbl
Ā
0073.35602
^
a
b
c
James, W.;
Stein, C.
(1961), "Estimation with quadratic loss",
Proc. Fourth Berkeley Symp. Math. Statist. Prob.
, vol.Ā 1, pp.Ā
361ā
379,
MR
Ā
0133191
^
Beran, R. (1995). THE ROLE OF HAJEKāS CONVOLUTION THEOREM IN STATISTICAL THEORY
^
a
b
c
Lehmann, E. L.; Casella, G. (1998),
Theory of Point Estimation
(2ndĀ ed.), New York: Springer
^
Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its CompetitorsāAn Empirical Bayes Approach".
Journal of the American Statistical Association
.
68
(341). American Statistical Association:
117ā
130.
doi
:
10.2307/2284155
.
JSTOR
Ā
2284155
.
^
Stander, M. (2017),
Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements
,
arXiv
:
1702.02440
,
Bibcode
:
2017arXiv170202440S
^
Stigler, Stephen M. (1990-02-01).
"The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators"
.
Statistical Science
.
5
(1).
doi
:
10.1214/ss/1177012274
.
ISSN
Ā
0883-4237
.
^
Anderson, T. W. (1984),
An Introduction to Multivariate Statistical Analysis
(2ndĀ ed.), New York: John Wiley & Sons
^
Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025).
Data Science and Machine Learning: Mathematical and Statistical Methods
(2ndĀ ed.). Boca RatonĀ ; London: CRC Press. pp.Ā
277ā
279.
ISBN
Ā
978-1-032-48868-4
.
^
a
b
Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution",
Annals of Statistics
,
3
(1):
209ā
218,
doi
:
10.1214/aos/1176343009
,
MR
Ā
0381064
,
Zbl
Ā
0314.62005
^
Brown, L. D.
(1966), "On the admissibility of invariant estimators of one or more location parameters",
Annals of Mathematical Statistics
,
37
(5):
1087ā
1136,
doi
:
10.1214/aoms/1177699259
,
MR
Ā
0216647
,
Zbl
Ā
0156.39401
Judge, George G.; Bock, M. E. (1978).
The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics
. New York: North Holland. pp.Ā
229ā
257.
ISBN
Ā
0-7204-0729-X
. |
| Markdown | [Jump to content](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#bodyContent)
Main menu
Main menu
move to sidebar
hide
Navigation
- [Main page](https://en.wikipedia.org/wiki/Main_Page "Visit the main page [z]")
- [Contents](https://en.wikipedia.org/wiki/Wikipedia:Contents "Guides to browsing Wikipedia")
- [Current events](https://en.wikipedia.org/wiki/Portal:Current_events "Articles related to current events")
- [Random article](https://en.wikipedia.org/wiki/Special:Random "Visit a randomly selected article [x]")
- [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About "Learn about Wikipedia and how it works")
- [Contact us](https://en.wikipedia.org/wiki/Wikipedia:Contact_us "How to contact Wikipedia")
Contribute
- [Help](https://en.wikipedia.org/wiki/Help:Contents "Guidance on how to use and edit Wikipedia")
- [Learn to edit](https://en.wikipedia.org/wiki/Help:Introduction "Learn how to edit Wikipedia")
- [Community portal](https://en.wikipedia.org/wiki/Wikipedia:Community_portal "The hub for editors")
- [Recent changes](https://en.wikipedia.org/wiki/Special:RecentChanges "A list of recent changes to Wikipedia [r]")
- [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_upload_wizard "Add images or other media for use on Wikipedia")
- [Special pages](https://en.wikipedia.org/wiki/Special:SpecialPages "A list of all special pages [q]")
[  ](https://en.wikipedia.org/wiki/Main_Page)
[Search](https://en.wikipedia.org/wiki/Special:Search "Search Wikipedia [f]")
Appearance
- [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)
- [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=James%E2%80%93Stein+estimator "You are encouraged to create an account and log in; however, it is not mandatory")
- [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=James%E2%80%93Stein+estimator "You're encouraged to log in; however, it's not mandatory. [o]")
Personal tools
- [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)
- [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=James%E2%80%93Stein+estimator "You are encouraged to create an account and log in; however, it is not mandatory")
- [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=James%E2%80%93Stein+estimator "You're encouraged to log in; however, it's not mandatory. [o]")
## Contents
move to sidebar
hide
- [(Top)](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator)
- [1 Setting](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Setting)
- [2 Formulation](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Formulation)
- [3 Interpretation](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Interpretation)
- [4 Positive-part JamesāStein shrinkage operator](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Positive-part_James%E2%80%93Stein_shrinkage_operator)
- [5 Positive-part JamesāStein shrinkage and model selection](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Positive-part_James%E2%80%93Stein_shrinkage_and_model_selection)
- [6 Further extensions](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Further_extensions)
- [7 See also](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#See_also)
- [8 References](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#References)
- [9 Further reading](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Further_reading)
Toggle the table of contents
# JamesāStein estimator
3 languages
- [Deutsch](https://de.wikipedia.org/wiki/James-Stein-Sch%C3%A4tzer "James-Stein-SchƤtzer ā German")
- [EspaƱol](https://es.wikipedia.org/wiki/Estimador_de_James-Stein "Estimador de James-Stein ā Spanish")
- [Polski](https://pl.wikipedia.org/wiki/Estymator_Jamesa-Steina "Estymator Jamesa-Steina ā Polish")
[Edit links](https://www.wikidata.org/wiki/Special:EntityPage/Q6146297#sitelinks-wikipedia "Edit interlanguage links")
- [Article](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator "View the content page [c]")
- [Talk](https://en.wikipedia.org/wiki/Talk:James%E2%80%93Stein_estimator "Discuss improvements to the content page [t]")
English
- [Read](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator)
- [Edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit "Edit this page [e]")
- [View history](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=history "Past revisions of this page [h]")
Tools
Tools
move to sidebar
hide
Actions
- [Read](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator)
- [Edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit "Edit this page [e]")
- [View history](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=history)
General
- [What links here](https://en.wikipedia.org/wiki/Special:WhatLinksHere/James%E2%80%93Stein_estimator "List of all English Wikipedia pages containing links to this page [j]")
- [Related changes](https://en.wikipedia.org/wiki/Special:RecentChangesLinked/James%E2%80%93Stein_estimator "Recent changes in pages linked from this page [k]")
- [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_Upload_Wizard "Upload files [u]")
- [Permanent link](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&oldid=1342138970 "Permanent link to this revision of this page")
- [Page information](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=info "More information about this page")
- [Cite this page](https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&page=James%E2%80%93Stein_estimator&id=1342138970&wpFormIdentifier=titleform "Information on how to cite this page")
- [Get shortened URL](https://en.wikipedia.org/w/index.php?title=Special:UrlShortener&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJames%25E2%2580%2593Stein_estimator)
Print/export
- [Download as PDF](https://en.wikipedia.org/w/index.php?title=Special:DownloadAsPdf&page=James%E2%80%93Stein_estimator&action=show-download-screen "Download this page as a PDF file")
- [Printable version](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&printable=yes "Printable version of this page [p]")
In other projects
- [Wikidata item](https://www.wikidata.org/wiki/Special:EntityPage/Q6146297 "Structured data on this page hosted by Wikidata [g]")
Appearance
move to sidebar
hide
From Wikipedia, the free encyclopedia
Rule for estimating the mean of a dataset
| | |
|---|---|
|  | This article **may be too technical for most readers to understand**. Please [help improve it](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit) to [make it understandable to non-experts](https://en.wikipedia.org/wiki/Wikipedia:Make_technical_articles_understandable "Wikipedia:Make technical articles understandable"), without removing the technical details. *(November 2017)* *([Learn how and when to remove this message](https://en.wikipedia.org/wiki/Help:Maintenance_template_removal "Help:Maintenance template removal"))* |
The **JamesāStein estimator** is an [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of the [mean](https://en.wikipedia.org/wiki/Mean "Mean") Īø := ( Īø 1 , Īø 2 , ⦠θ m ) {\\displaystyle {\\boldsymbol {\\theta }}:=(\\theta \_{1},\\theta \_{2},\\dots \\theta \_{m})}  for a multivariate [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") Y := ( Y 1 , Y 2 , ⦠Y m ) {\\displaystyle {\\boldsymbol {Y}}:=(Y\_{1},Y\_{2},\\dots Y\_{m})} .
It arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) when [Charles Stein](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") reached a relatively shocking conclusion that while the then-usual estimate of the mean, the [sample mean](https://en.wikipedia.org/wiki/Sample_mean "Sample mean"), is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when m ⤠2 {\\displaystyle m\\leq 2} , it is [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when m ℠3 {\\displaystyle m\\geq 3} . Stein proposed a possible improvement to the estimator that [shrinks](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") the sample mean θ {\\displaystyle {\\boldsymbol {\\theta }}}  towards a more central mean vector ν {\\displaystyle {\\boldsymbol {\\nu }}}  (which can be chosen [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori") or commonly as the "average of averages" of the sample means, given all samples share the same size). This observation is commonly referred to as [Stein's example or paradox](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example"). In 1961, [Willard James](https://en.wikipedia.org/wiki/Willard_D._James "Willard D. James") and Charles Stein simplified the original process.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)
It can be shown that the JamesāStein estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule") the "ordinary" [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") approach in the sense that the JamesāStein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") than the "ordinary" least squares estimator for all Īø {\\displaystyle {\\boldsymbol {\\theta }}} . This is possible because the JamesāStein estimator is [biased](https://en.wikipedia.org/wiki/Bias_of_an_estimator "Bias of an estimator"), so that the [GaussāMarkov theorem](https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem "GaussāMarkov theorem") does not apply.
Similar to the [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator"), the James-Stein estimator is [superefficient](https://en.wikipedia.org/w/index.php?title=Superefficient&action=edit&redlink=1 "Superefficient (page does not exist)") and [non-regular](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator") at Īø \= 0 {\\displaystyle {\\boldsymbol {\\theta }}=\\mathbf {0} } .[\[3\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-3)
## Setting
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=1 "Edit section: Setting")\]
Let Y ā¼ N m ( Īø , Ļ 2 I ) , {\\displaystyle {\\mathbf {Y} }\\sim N\_{m}({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,} where the vector Īø {\\displaystyle {\\boldsymbol {\\theta }}}  is the unknown [mean](https://en.wikipedia.org/wiki/Expected_value "Expected value") of Y {\\displaystyle {\\mathbf {Y} }} , which is [m {\\displaystyle m} \-variate normally distributed](https://en.wikipedia.org/wiki/Multivariate_normal_distribution "Multivariate normal distribution") and with known [covariance matrix](https://en.wikipedia.org/wiki/Covariance_matrix "Covariance matrix") Ļ 2 I {\\displaystyle \\sigma ^{2}I} .
We are interested in obtaining an estimate, Īø ^ {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}} , of Īø {\\displaystyle {\\boldsymbol {\\theta }}} , based on a single observation, y {\\displaystyle {\\mathbf {y} }} , of Y {\\displaystyle {\\mathbf {Y} }} .
In real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent [Gaussian noise](https://en.wikipedia.org/wiki/Gaussian_noise "Gaussian noise"). Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") estimator, which is Īø ^ L S \= Y {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}={\\mathbf {Y} }} .
Stein demonstrated that in terms of [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") E ā” \[ ā Īø ā Īø ^ ā 2 \] {\\displaystyle \\operatorname {E} \\left\[\\left\\\|{\\boldsymbol {\\theta }}-{\\widehat {\\boldsymbol {\\theta }}}\\right\\\|^{2}\\right\]} ![{\\displaystyle \\operatorname {E} \\left\[\\left\\\|{\\boldsymbol {\\theta }}-{\\widehat {\\boldsymbol {\\theta }}}\\right\\\|^{2}\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bdd1163606c619b36e0e0d2310a54e10903e7234), the least squares estimator, Īø ^ L S {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}} , is sub-optimal to shrinkage based estimators, such as the **JamesāStein estimator**, Īø ^ J S {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}} .[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) The paradoxical result, that there is a (possibly) better and never any worse estimate of Īø {\\displaystyle {\\boldsymbol {\\theta }}}  in mean squared error as compared to the sample mean, became known as [Stein's example](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example").
## Formulation
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=2 "Edit section: Formulation")\]
[](https://en.wikipedia.org/wiki/File:MSE_of_ML_vs_JS.png)
MSE (R) of least squares estimator (ML) vs. JamesāStein estimator (JS). The JamesāStein estimator gives its best estimate when the norm of the actual parameter vector Īø is near zero.
If Ļ 2 {\\displaystyle \\sigma ^{2}}  is known, the JamesāStein estimator is given by
Īø
^
J
S
\=
(
1
ā
(
m
ā
2
)
Ļ
2
ā
Y
ā
2
)
Y
.
{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }\\\|^{2}}}\\right){\\mathbf {Y} }.}

James and Stein showed that the above estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule") Īø ^ L S {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}}  for any m ā„ 3 {\\displaystyle m\\geq 3} , meaning that the JamesāStein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") (MSE) than the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) By definition, this makes the least squares estimator [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when m ā„ 3 {\\displaystyle m\\geq 3} .
Notice that if ( m ā 2 ) Ļ 2 \< ā Y ā 2 {\\displaystyle (m-2)\\sigma ^{2}\<\\\|{\\mathbf {Y} }\\\|^{2}}  then this estimator simply takes the natural estimator Y {\\displaystyle \\mathbf {Y} }  and shrinks it towards the origin **0**. In fact this is not the only direction of [shrinkage](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") that works. Let ν {\\displaystyle {\\boldsymbol {\\nu }}}  be an arbitrary fixed vector of dimension m {\\displaystyle m} . Then there exists an estimator of the JamesāStein type that shrinks toward ν {\\displaystyle {\\boldsymbol {\\nu }}} , namely
Īø
^
J
S
\=
(
1
ā
(
m
ā
2
)
Ļ
2
ā
Y
ā
ν
ā
2
)
(
Y
ā
ν
)
\+
ν
,
m
ā„
3\.
{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)({\\mathbf {Y} }-{\\boldsymbol {\\nu }})+{\\boldsymbol {\\nu }},\\qquad m\\geq 3.}

The JamesāStein estimator dominates the usual estimator for any ν {\\displaystyle {\\boldsymbol {\\nu }}} . A natural question to ask is whether the improvement over the usual estimator is independent of the choice of ν {\\displaystyle {\\boldsymbol {\\nu }}} . The answer is no. The improvement is small if ā Īø ā ν ā {\\displaystyle \\\|{{\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}}\\\|}  is large. Thus to get a very great improvement some knowledge of the location of Īø {\\displaystyle {\\boldsymbol {\\theta }}}  is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori"). But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that *any* finite guess ν {\\displaystyle {\\boldsymbol {\\nu }}}  improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite ν {\\displaystyle {\\boldsymbol {\\nu }}} , surely a poor guess.
## Interpretation
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=3 "Edit section: Interpretation")\]
Seeing the JamesāStein estimator as an [empirical Bayes method](https://en.wikipedia.org/wiki/Empirical_Bayes_method "Empirical Bayes method") gives some intuition to this result: One assumes that Īø {\\displaystyle {\\boldsymbol {\\theta }}}  itself is a random variable with [prior distribution](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") ā¼ N ( 0 , A ) {\\displaystyle \\sim N(0,A)} , where A {\\displaystyle A}  is estimated from the data itself. Estimating A {\\displaystyle A}  only gives an advantage compared to the [maximum-likelihood estimator](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") when the dimension m {\\displaystyle m}  is large enough; hence it does not work for m ⤠2 {\\displaystyle m\\leq 2} . The JamesāStein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.[\[5\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-5)
A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the JamesāStein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule"). A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The JamesāStein estimator always improves upon the *total* MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the JamesāStein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the JamesāStein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.
The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a [telecommunication](https://en.wikipedia.org/wiki/Telecommunication "Telecommunication") setting, it is reasonable to combine [channel](https://en.wikipedia.org/wiki/Communication_channel "Communication channel") tap measurements in a [channel estimation](https://en.wikipedia.org/wiki/Channel_estimation "Channel estimation") scenario, as the goal is to minimize the total channel estimation error.
The JamesāStein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the [entropic uncertainty principle](https://en.wikipedia.org/wiki/Entropic_uncertainty_principle "Entropic uncertainty principle") for more than three measurements.[\[6\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stander-17-6)
An intuitive derivation and interpretation is given by the [Galtonian](https://en.wikipedia.org/wiki/Francis_Galton "Francis Galton") perspective.[\[7\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-7) Under this interpretation, we aim to predict the population means using the [imperfectly measured sample means](https://en.wikipedia.org/wiki/Measurement_error_model "Measurement error model"). The equation of the [OLS](https://en.wikipedia.org/wiki/Ordinary_least_squares "Ordinary least squares") estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the JamesāStein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary).
## Positive-part JamesāStein shrinkage operator
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=4 "Edit section: Positive-part JamesāStein shrinkage operator")\]
Despite the intuition that the JamesāStein estimator shrinks the unbiased least-squares estimator Y {\\displaystyle {\\mathbf {Y} }}  *toward* ν {\\displaystyle {\\boldsymbol {\\nu }}} , the estimator actually moves *away* from ν {\\displaystyle {\\boldsymbol {\\nu }}}  for small values of ā Y ā ν ā , {\\displaystyle \\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|,}  as the multiplier on Y ā ν {\\displaystyle {\\mathbf {Y} }-{\\boldsymbol {\\nu }}}  is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the *positive-part James-Stein shrinkage operator*:
S
Ī»
(
x
)
\=
x
\[
1
ā
(
Ī»
/
x
)
2
\]
\+
,
{\\displaystyle S\_{\\lambda }(x)=x\\left\[1-\\left(\\lambda /x\\right)^{2}\\right\]\_{+},}
![{\\displaystyle S\_{\\lambda }(x)=x\\left\[1-\\left(\\lambda /x\\right)^{2}\\right\]\_{+},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682ea45fcbaf678f56319df06a3029306a5e5e51)
where x \+ \= max { 0 , x } {\\displaystyle x\_{+}=\\max\\{0,x\\}} , and apply this operator component-wise to the (unbiased) least-squares estimator of Īø ā ν {\\displaystyle {\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}}  (with known ν {\\displaystyle {\\boldsymbol {\\nu }}} ) for each i \= 1 , ⦠, m {\\displaystyle i=1,\\ldots ,m} :
Īø
^
i
\+
ā
ν
i
\=
S
Ī»
i
(
Y
i
ā
ν
i
)
,
Ī»
i
:=
Ļ
m
ā
2
\|
Y
i
ā
ν
i
\|
ā
Y
ā
ν
ā
.
{\\displaystyle {\\widehat {\\theta }}\_{i}^{+}-\\nu \_{i}=S\_{\\lambda \_{i}}(Y\_{i}-\\nu \_{i}),\\quad \\lambda \_{i}:=\\sigma {\\sqrt {m-2}}\\,{\\frac {\|Y\_{i}-\\nu \_{i}\|}{\\\|\\mathbf {Y} -{\\boldsymbol {\\nu }}\\\|}}.}

The resulting estimator Īø ^ \+ {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}}  of Īø {\\displaystyle {\\boldsymbol {\\theta }}}  is called the *positive-part JamesāStein estimator* and can be written in vector notation as:
Īø
^
\+
ā
ν
\=
(
1
ā
(
m
ā
2
)
Ļ
2
ā
Y
ā
ν
ā
2
)
\+
(
Y
ā
ν
)
.
{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}-{\\boldsymbol {\\nu }}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)\_{+}({\\mathbf {Y} }-{\\boldsymbol {\\nu }}).}

This estimator has a smaller risk than the basic JamesāStein estimator for m ā„ 4 {\\displaystyle m\\geq 4} . It follows that the basic JamesāStein estimator is itself [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule").[\[8\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-Anderson-84-8)
It turns out, however, that the positive-part estimator is also inadmissible.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) This follows from a more general result which requires admissible estimators to be smooth.
## Positive-part JamesāStein shrinkage and model selection
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=5 "Edit section: Positive-part JamesāStein shrinkage and model selection")\]
Recall the initial setup:
Y
ā¼
N
(
Īø
,
Ļ
2
I
)
,
{\\displaystyle {\\mathbf {Y} }\\sim N({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,}

where the variance coefficient Ļ 2 {\\displaystyle \\sigma ^{2}}  is known and we wish to estimate the unknown (mean response) coefficient Īø \= E Y {\\displaystyle {\\boldsymbol {\\theta }}=\\mathbb {E} \\mathbf {Y} } . In the more general setting of [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression"), the mean response is instead given by
E
Y
\=
X
Īø
,
{\\displaystyle \\mathbb {E} \\mathbf {Y} =\\mathbf {X} {\\boldsymbol {\\theta }},}

where X \= \[ v 1 , ⦠, v m \] {\\displaystyle \\mathbf {X} =\[\\mathbf {v} \_{1},\\ldots ,\\mathbf {v} \_{m}\]} ![{\\displaystyle \\mathbf {X} =\[\\mathbf {v} \_{1},\\ldots ,\\mathbf {v} \_{m}\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0ad1c826b3a19756dc60523573d2d769dd2315d6) is a matrix with m {\\displaystyle m}  columns. As in the previous section, we can use the *positive-part James-Stein shrinkage operator* to obtain a [shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator") of θ {\\displaystyle {\\boldsymbol {\\theta }}} . In particular, any θ ^ {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}}  that satisfies the *James-Stein [KKT conditions](https://en.wikipedia.org/wiki/KKT_conditions "KKT conditions")*:[\[9\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-:0-9)
Īø
^
i
\=
S
Ļ
ā
v
i
ā
(
Īø
^
i
\+
v
i
ā¤
(
Y
ā
X
Īø
^
)
ā
v
i
ā
2
)
,
i
\=
1
,
ā¦
,
m
{\\displaystyle {\\hat {\\theta }}\_{i}=S\_{\\frac {\\sigma }{\\\|\\mathbf {v} \_{i}\\\|}}{\\bigg (}{\\hat {\\theta }}\_{i}+{\\frac {\\mathbf {v} \_{i}^{\\top }(\\mathbf {Y} -\\mathbf {X} {\\widehat {\\boldsymbol {\\theta }}})}{\\\|\\mathbf {v} \_{i}\\\|^{2}}}{\\bigg )},\\quad i=1,\\ldots ,m}

is a (positive-part) James-Stein estimator of Īø {\\displaystyle {\\boldsymbol {\\theta }}}  with the useful property that it performs both shrinkage and [model selection](https://en.wikipedia.org/wiki/Model_selection "Model selection") simultaneously. This is because, depending on the value of the known Ļ 2 {\\displaystyle \\sigma ^{2}} , there is a (possibly empty) set S ā { 1 , ⦠, m } {\\displaystyle {\\mathcal {S}}\\subseteq \\{1,\\ldots ,m\\}}  such that
Īø
^
i
\=
0
,
i
ā
S
.
{\\displaystyle {\\hat {\\theta }}\_{i}=0,\\quad i\\in {\\mathcal {S}}.}

In other words, some (or all) of the Īø i {\\displaystyle \\theta \_{i}}  could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model.
## Further extensions
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=6 "Edit section: Further extensions")\]
The JamesāStein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the "ordinary" or least squares estimator is often [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") for simultaneous estimation of several parameters. This effect has been called [Stein's phenomenon](https://en.wikipedia.org/wiki/Stein%27s_phenomenon "Stein's phenomenon"), and has been demonstrated for several different problem settings, some of which are briefly outlined below.
- James and Stein demonstrated that the estimator presented above can still be used when the variance
Ļ
2
{\\displaystyle \\sigma ^{2}}

is unknown, by replacing it with the standard estimator of the variance,
Ļ
^
2
\=
1
m
ā
(
Y
i
ā
Y
ĀÆ
)
2
{\\displaystyle {\\widehat {\\sigma }}^{2}={\\frac {1}{m}}\\sum (Y\_{i}-{\\overline {Y}})^{2}}

. The dominance result still holds under the same condition, namely,
m
\>
2
{\\displaystyle m\>2}

.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)
- All the results above are for the case when only a single observation vector **y** is available. For the more general case when
n
{\\displaystyle n}

vectors are available, we consider the estimator
Īø
^
J
S
\=
(
1
ā
(
m
ā
2
)
Ļ
2
n
ā
Y
ĀÆ
ā
2
)
Y
ĀÆ
,
{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2){\\frac {\\sigma ^{2}}{n}}}{\\\|{\\overline {\\mathbf {Y} }}\\\|^{2}}}\\right){\\overline {\\mathbf {Y} }},}

where
Y
ĀÆ
{\\displaystyle {\\overline {\\mathbf {Y} }}}

is the
m
{\\displaystyle m}

\-length average of the
n
{\\displaystyle n}

observations, so that,
Y
ĀÆ
ā¼
N
m
(
Īø
,
Ļ
2
n
I
)
{\\displaystyle {\\overline {\\mathbf {Y} }}\\sim N\_{m}{\\Big (}{\\boldsymbol {\\theta }},{\\frac {\\sigma ^{2}}{n}}I{\\Big )}}

.
- The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10) A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression") technique which outperforms the standard application of the LS estimator.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10)
- Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.[\[11\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-brown66-11) It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4)
## See also
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=7 "Edit section: See also")\]
- [Admissible decision rule](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule")
- [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator")
- [Shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator")
- [Regular estimator](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator")
- [KL divergence](https://en.wikipedia.org/wiki/KL_divergence "KL divergence")
## References
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=8 "Edit section: References")\]
1. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-1)
[Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution", [*Proc. Third Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200501656), vol. 1, pp. 197ā206, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0084922](https://mathscinet.ams.org/mathscinet-getitem?mr=0084922), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0073\.35602](https://zbmath.org/?format=complete&q=an:0073.35602)
2. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-2)
James, W.; [Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1961), "Estimation with quadratic loss", [*Proc. Fourth Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200512173), vol. 1, pp. 361ā379, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0133191](https://mathscinet.ams.org/mathscinet-getitem?mr=0133191)
3. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-3)** Beran, R. (1995). THE ROLE OF HAJEKāS CONVOLUTION THEOREM IN STATISTICAL THEORY
4. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-2)
Lehmann, E. L.; Casella, G. (1998), *Theory of Point Estimation* (2nd ed.), New York: Springer
5. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-5)**
Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its CompetitorsāAn Empirical Bayes Approach". *Journal of the American Statistical Association*. **68** (341). American Statistical Association: 117ā130\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/2284155](https://doi.org/10.2307%2F2284155). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2284155](https://www.jstor.org/stable/2284155).
6. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stander-17_6-0)**
Stander, M. (2017), *Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.02440](https://arxiv.org/abs/1702.02440), [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2017arXiv170202440S](https://ui.adsabs.harvard.edu/abs/2017arXiv170202440S)
7. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-7)**
Stigler, Stephen M. (1990-02-01). ["The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators"](https://doi.org/10.1214%2Fss%2F1177012274). *Statistical Science*. **5** (1). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/ss/1177012274](https://doi.org/10.1214%2Fss%2F1177012274). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0883-4237](https://search.worldcat.org/issn/0883-4237).
8. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-Anderson-84_8-0)**
Anderson, T. W. (1984), *An Introduction to Multivariate Statistical Analysis* (2nd ed.), New York: John Wiley & Sons
9. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-:0_9-0)**
Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025). *Data Science and Machine Learning: Mathematical and Statistical Methods* (2nd ed.). Boca Raton ; London: CRC Press. pp. 277ā279\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-032-48868-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-032-48868-4 "Special:BookSources/978-1-032-48868-4")
.
10. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-1)
Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution", *[Annals of Statistics](https://en.wikipedia.org/wiki/Annals_of_Statistics "Annals of Statistics")*, **3** (1): 209ā218, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aos/1176343009](https://doi.org/10.1214%2Faos%2F1176343009), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0381064](https://mathscinet.ams.org/mathscinet-getitem?mr=0381064), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0314\.62005](https://zbmath.org/?format=complete&q=an:0314.62005)
11. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-brown66_11-0)**
[Brown, L. D.](https://en.wikipedia.org/wiki/Lawrence_D._Brown "Lawrence D. Brown") (1966), "On the admissibility of invariant estimators of one or more location parameters", *Annals of Mathematical Statistics*, **37** (5): 1087ā1136, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aoms/1177699259](https://doi.org/10.1214%2Faoms%2F1177699259), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0216647](https://mathscinet.ams.org/mathscinet-getitem?mr=0216647), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0156\.39401](https://zbmath.org/?format=complete&q=an:0156.39401)
## Further reading
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=9 "Edit section: Further reading")\]
- Judge, George G.; Bock, M. E. (1978). *The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics*. New York: North Holland. pp. 229ā257\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[0-7204-0729-X](https://en.wikipedia.org/wiki/Special:BookSources/0-7204-0729-X "Special:BookSources/0-7204-0729-X")
.

Retrieved from "<https://en.wikipedia.org/w/index.php?title=JamesāStein_estimator&oldid=1342138970>"
[Categories](https://en.wikipedia.org/wiki/Help:Category "Help:Category"):
- [Estimator](https://en.wikipedia.org/wiki/Category:Estimator "Category:Estimator")
- [Normal distribution](https://en.wikipedia.org/wiki/Category:Normal_distribution "Category:Normal distribution")
Hidden categories:
- [Articles with short description](https://en.wikipedia.org/wiki/Category:Articles_with_short_description "Category:Articles with short description")
- [Short description is different from Wikidata](https://en.wikipedia.org/wiki/Category:Short_description_is_different_from_Wikidata "Category:Short description is different from Wikidata")
- [Wikipedia articles that are too technical from November 2017](https://en.wikipedia.org/wiki/Category:Wikipedia_articles_that_are_too_technical_from_November_2017 "Category:Wikipedia articles that are too technical from November 2017")
- [All articles that are too technical](https://en.wikipedia.org/wiki/Category:All_articles_that_are_too_technical "Category:All articles that are too technical")
- This page was last edited on 7 March 2026, at 06:50 (UTC).
- Text is available under the [Creative Commons Attribution-ShareAlike 4.0 License](https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License "Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License"); additional terms may apply. By using this site, you agree to the [Terms of Use](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Terms_of_Use "foundation:Special:MyLanguage/Policy:Terms of Use") and [Privacy Policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy "foundation:Special:MyLanguage/Policy:Privacy policy"). WikipediaĀ® is a registered trademark of the [Wikimedia Foundation, Inc.](https://wikimediafoundation.org/), a non-profit organization.
- [Privacy policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy)
- [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About)
- [Disclaimers](https://en.wikipedia.org/wiki/Wikipedia:General_disclaimer)
- [Contact Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Contact_us)
- [Legal & safety contacts](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Legal:Wikimedia_Foundation_Legal_and_Safety_Contact_Information)
- [Code of Conduct](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Universal_Code_of_Conduct)
- [Developers](https://developer.wikimedia.org/)
- [Statistics](https://stats.wikimedia.org/#/en.wikipedia.org)
- [Cookie statement](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Cookie_statement)
- [Mobile view](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&mobileaction=toggle_view_mobile)
- [](https://www.wikimedia.org/)
- [](https://www.mediawiki.org/)
Search
Toggle the table of contents
JamesāStein estimator
3 languages
[Add topic](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator) |
| Readable Markdown | From Wikipedia, the free encyclopedia
The **JamesāStein estimator** is an [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of the [mean](https://en.wikipedia.org/wiki/Mean "Mean")  for a multivariate [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") .
It arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) when [Charles Stein](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") reached a relatively shocking conclusion that while the then-usual estimate of the mean, the [sample mean](https://en.wikipedia.org/wiki/Sample_mean "Sample mean"), is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when , it is [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when . Stein proposed a possible improvement to the estimator that [shrinks](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") the sample mean  towards a more central mean vector  (which can be chosen [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori") or commonly as the "average of averages" of the sample means, given all samples share the same size). This observation is commonly referred to as [Stein's example or paradox](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example"). In 1961, [Willard James](https://en.wikipedia.org/wiki/Willard_D._James "Willard D. James") and Charles Stein simplified the original process.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)
It can be shown that the JamesāStein estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule") the "ordinary" [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") approach in the sense that the JamesāStein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") than the "ordinary" least squares estimator for all . This is possible because the JamesāStein estimator is [biased](https://en.wikipedia.org/wiki/Bias_of_an_estimator "Bias of an estimator"), so that the [GaussāMarkov theorem](https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem "GaussāMarkov theorem") does not apply.
Similar to the [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator"), the James-Stein estimator is [superefficient](https://en.wikipedia.org/w/index.php?title=Superefficient&action=edit&redlink=1 "Superefficient (page does not exist)") and [non-regular](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator") at .[\[3\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-3)
Let where the vector  is the unknown [mean](https://en.wikipedia.org/wiki/Expected_value "Expected value") of , which is [\-variate normally distributed](https://en.wikipedia.org/wiki/Multivariate_normal_distribution "Multivariate normal distribution") and with known [covariance matrix](https://en.wikipedia.org/wiki/Covariance_matrix "Covariance matrix") .
We are interested in obtaining an estimate, , of , based on a single observation, , of .
In real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent [Gaussian noise](https://en.wikipedia.org/wiki/Gaussian_noise "Gaussian noise"). Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") estimator, which is .
Stein demonstrated that in terms of [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") ![{\\displaystyle \\operatorname {E} \\left\[\\left\\\|{\\boldsymbol {\\theta }}-{\\widehat {\\boldsymbol {\\theta }}}\\right\\\|^{2}\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bdd1163606c619b36e0e0d2310a54e10903e7234), the least squares estimator, , is sub-optimal to shrinkage based estimators, such as the **JamesāStein estimator**, .[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) The paradoxical result, that there is a (possibly) better and never any worse estimate of  in mean squared error as compared to the sample mean, became known as [Stein's example](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example").
[](https://en.wikipedia.org/wiki/File:MSE_of_ML_vs_JS.png)
MSE (R) of least squares estimator (ML) vs. JamesāStein estimator (JS). The JamesāStein estimator gives its best estimate when the norm of the actual parameter vector Īø is near zero.
If  is known, the JamesāStein estimator is given by

James and Stein showed that the above estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule")  for any , meaning that the JamesāStein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") (MSE) than the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) By definition, this makes the least squares estimator [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when .
Notice that if  then this estimator simply takes the natural estimator  and shrinks it towards the origin **0**. In fact this is not the only direction of [shrinkage](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") that works. Let  be an arbitrary fixed vector of dimension . Then there exists an estimator of the JamesāStein type that shrinks toward , namely

The JamesāStein estimator dominates the usual estimator for any . A natural question to ask is whether the improvement over the usual estimator is independent of the choice of . The answer is no. The improvement is small if  is large. Thus to get a very great improvement some knowledge of the location of  is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori"). But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that *any* finite guess  improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite , surely a poor guess.
Seeing the JamesāStein estimator as an [empirical Bayes method](https://en.wikipedia.org/wiki/Empirical_Bayes_method "Empirical Bayes method") gives some intuition to this result: One assumes that  itself is a random variable with [prior distribution](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") , where  is estimated from the data itself. Estimating  only gives an advantage compared to the [maximum-likelihood estimator](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") when the dimension  is large enough; hence it does not work for . The JamesāStein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.[\[5\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-5)
A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the JamesāStein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule"). A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The JamesāStein estimator always improves upon the *total* MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the JamesāStein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the JamesāStein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.
The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a [telecommunication](https://en.wikipedia.org/wiki/Telecommunication "Telecommunication") setting, it is reasonable to combine [channel](https://en.wikipedia.org/wiki/Communication_channel "Communication channel") tap measurements in a [channel estimation](https://en.wikipedia.org/wiki/Channel_estimation "Channel estimation") scenario, as the goal is to minimize the total channel estimation error.
The JamesāStein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the [entropic uncertainty principle](https://en.wikipedia.org/wiki/Entropic_uncertainty_principle "Entropic uncertainty principle") for more than three measurements.[\[6\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stander-17-6)
An intuitive derivation and interpretation is given by the [Galtonian](https://en.wikipedia.org/wiki/Francis_Galton "Francis Galton") perspective.[\[7\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-7) Under this interpretation, we aim to predict the population means using the [imperfectly measured sample means](https://en.wikipedia.org/wiki/Measurement_error_model "Measurement error model"). The equation of the [OLS](https://en.wikipedia.org/wiki/Ordinary_least_squares "Ordinary least squares") estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the JamesāStein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary).
## Positive-part JamesāStein shrinkage operator
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=4 "Edit section: Positive-part JamesāStein shrinkage operator")\]
Despite the intuition that the JamesāStein estimator shrinks the unbiased least-squares estimator  *toward* , the estimator actually moves *away* from  for small values of  as the multiplier on  is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the *positive-part James-Stein shrinkage operator*:
![{\\displaystyle S\_{\\lambda }(x)=x\\left\[1-\\left(\\lambda /x\\right)^{2}\\right\]\_{+},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682ea45fcbaf678f56319df06a3029306a5e5e51)
where , and apply this operator component-wise to the (unbiased) least-squares estimator of  (with known ) for each :

The resulting estimator  of  is called the *positive-part JamesāStein estimator* and can be written in vector notation as:

This estimator has a smaller risk than the basic JamesāStein estimator for . It follows that the basic JamesāStein estimator is itself [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule").[\[8\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-Anderson-84-8)
It turns out, however, that the positive-part estimator is also inadmissible.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) This follows from a more general result which requires admissible estimators to be smooth.
## Positive-part JamesāStein shrinkage and model selection
\[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit§ion=5 "Edit section: Positive-part JamesāStein shrinkage and model selection")\]
Recall the initial setup:

where the variance coefficient  is known and we wish to estimate the unknown (mean response) coefficient . In the more general setting of [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression"), the mean response is instead given by

where ![{\\displaystyle \\mathbf {X} =\[\\mathbf {v} \_{1},\\ldots ,\\mathbf {v} \_{m}\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0ad1c826b3a19756dc60523573d2d769dd2315d6) is a matrix with  columns. As in the previous section, we can use the *positive-part James-Stein shrinkage operator* to obtain a [shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator") of . In particular, any  that satisfies the *James-Stein [KKT conditions](https://en.wikipedia.org/wiki/KKT_conditions "KKT conditions")*:[\[9\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-:0-9)

is a (positive-part) James-Stein estimator of  with the useful property that it performs both shrinkage and [model selection](https://en.wikipedia.org/wiki/Model_selection "Model selection") simultaneously. This is because, depending on the value of the known , there is a (possibly empty) set  such that

In other words, some (or all) of the  could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model.
The JamesāStein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the "ordinary" or least squares estimator is often [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") for simultaneous estimation of several parameters. This effect has been called [Stein's phenomenon](https://en.wikipedia.org/wiki/Stein%27s_phenomenon "Stein's phenomenon"), and has been demonstrated for several different problem settings, some of which are briefly outlined below.
- James and Stein demonstrated that the estimator presented above can still be used when the variance  is unknown, by replacing it with the standard estimator of the variance, . The dominance result still holds under the same condition, namely, .[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)
- All the results above are for the case when only a single observation vector **y** is available. For the more general case when  vectors are available, we consider the estimator  where  is the \-length average of the  observations, so that, .
- The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10) A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression") technique which outperforms the standard application of the LS estimator.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10)
- Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.[\[11\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-brown66-11) It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4)
- [Admissible decision rule](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule")
- [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator")
- [Shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator")
- [Regular estimator](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator")
- [KL divergence](https://en.wikipedia.org/wiki/KL_divergence "KL divergence")
1. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-1)
[Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution", [*Proc. Third Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200501656), vol. 1, pp. 197ā206, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0084922](https://mathscinet.ams.org/mathscinet-getitem?mr=0084922), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0073\.35602](https://zbmath.org/?format=complete&q=an:0073.35602)
2. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-2)
James, W.; [Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1961), "Estimation with quadratic loss", [*Proc. Fourth Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200512173), vol. 1, pp. 361ā379, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0133191](https://mathscinet.ams.org/mathscinet-getitem?mr=0133191)
3. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-3)** Beran, R. (1995). THE ROLE OF HAJEKāS CONVOLUTION THEOREM IN STATISTICAL THEORY
4. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-2)
Lehmann, E. L.; Casella, G. (1998), *Theory of Point Estimation* (2nd ed.), New York: Springer
5. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-5)**
Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its CompetitorsāAn Empirical Bayes Approach". *Journal of the American Statistical Association*. **68** (341). American Statistical Association: 117ā130\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/2284155](https://doi.org/10.2307%2F2284155). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2284155](https://www.jstor.org/stable/2284155).
6. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stander-17_6-0)**
Stander, M. (2017), *Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.02440](https://arxiv.org/abs/1702.02440), [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2017arXiv170202440S](https://ui.adsabs.harvard.edu/abs/2017arXiv170202440S)
7. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-7)**
Stigler, Stephen M. (1990-02-01). ["The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators"](https://doi.org/10.1214%2Fss%2F1177012274). *Statistical Science*. **5** (1). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/ss/1177012274](https://doi.org/10.1214%2Fss%2F1177012274). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0883-4237](https://search.worldcat.org/issn/0883-4237).
8. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-Anderson-84_8-0)**
Anderson, T. W. (1984), *An Introduction to Multivariate Statistical Analysis* (2nd ed.), New York: John Wiley & Sons
9. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-:0_9-0)**
Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025). *Data Science and Machine Learning: Mathematical and Statistical Methods* (2nd ed.). Boca Raton ; London: CRC Press. pp. 277ā279\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-032-48868-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-032-48868-4 "Special:BookSources/978-1-032-48868-4")
.
10. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-1)
Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution", *[Annals of Statistics](https://en.wikipedia.org/wiki/Annals_of_Statistics "Annals of Statistics")*, **3** (1): 209ā218, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aos/1176343009](https://doi.org/10.1214%2Faos%2F1176343009), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0381064](https://mathscinet.ams.org/mathscinet-getitem?mr=0381064), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0314\.62005](https://zbmath.org/?format=complete&q=an:0314.62005)
11. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-brown66_11-0)**
[Brown, L. D.](https://en.wikipedia.org/wiki/Lawrence_D._Brown "Lawrence D. Brown") (1966), "On the admissibility of invariant estimators of one or more location parameters", *Annals of Mathematical Statistics*, **37** (5): 1087ā1136, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aoms/1177699259](https://doi.org/10.1214%2Faoms%2F1177699259), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0216647](https://mathscinet.ams.org/mathscinet-getitem?mr=0216647), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0156\.39401](https://zbmath.org/?format=complete&q=an:0156.39401)
- Judge, George G.; Bock, M. E. (1978). *The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics*. New York: North Holland. pp. 229ā257\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[0-7204-0729-X](https://en.wikipedia.org/wiki/Special:BookSources/0-7204-0729-X "Special:BookSources/0-7204-0729-X")
. |
| Shard | 152 (laksa) |
| Root Hash | 17790707453426894952 |
| Unparsed URL | org,wikipedia!en,/wiki/James%E2%80%93Stein_estimator s443 |