🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:

Response:

Calculated Shard: 152 (from laksa194)

2. Crawled Status Check

Query:

curl -X POST \
  'http://laksa152.int.ahrefs:8124/' \
  -H 'Content-Type: text/plain' \
  -H 'X-ClickHouse-Database: crawler3' \
  -H 'Authorization: Basic YXBpOg==' \
  -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, attrs_boilerpipe_text AS attrs_boilerpipe_text, attrs_markdown AS attrs_markdown, attrs_readable_markdown AS attrs_readable_markdown, meta_canonical AS meta_canonical, ml_categories_json AS ml_categories_json, ml_types_json AS ml_types_json, ml_intent_types_json AS ml_intent_types_json, meta_language AS meta_language, attrs_author AS attrs_author, ifNull(toUnixTimestamp(attrs_publish_time), 0) AS attrs_publish_time, ifNull(toUnixTimestamp(attrs_original_publish_time), 0) AS attrs_original_publish_time, ifNull(attrs_is_republished, 0) AS attrs_is_republished, ifNull(attrs_nr_words, 0) AS attrs_nr_words, ifNull(attrs_boilerpipe_nr_words, 0) AS attrs_boilerpipe_nr_words, ifNull(body_ext_links_number, 0) AS body_ext_links_number, ifNull(body_int_links_number, 0) AS body_int_links_number, ifNull(meta_nofollow, 0) AS meta_nofollow, ifNull(meta_noarchive, 0) AS meta_noarchive, ifNull(props_was_rendered, 0) AS props_was_rendered, ifNull(src_redirect, \'\') AS src_redirect, ifNull(download_time_msec, 0) AS download_time_msec, ifNull(download_ttfb_msec, 0) AS download_ttfb_msec, ifNull(download_size, 0) AS download_size FROM crawler3.page_info_local FINAL PREWHERE (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator\')), getAhrefsUnparsedNoserviceFromURL(\'https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator\'))) FORMAT JSONEachRow'

Response:

{"found_url":"https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator","crawl_time":1773884289,"first_indexed_time":1433035597,"http_code":200,"src_unparsed":"org,wikipedia!en,\/wiki\/James%E2%80%93Stein_estimator s443","src_root_hash":"17790707453426894952","history_drop_reason":null,"meta_title":"James–Stein estimator - Wikipedia","meta_descriptions":[],"attrs_boilerpipe_text":"From Wikipedia, the free encyclopedia\nThe\nJames–Stein estimator\nis an\nestimator\nof the\nmean\nfor a multivariate\nrandom variable\n.\nIt arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,\n[\n1\n]\nwhen\nCharles Stein\nreached a relatively shocking conclusion that while the then-usual estimate of the mean, the\nsample mean\n, is\nadmissible\nwhen\n, it is\ninadmissible\nwhen\n. Stein proposed a possible improvement to the estimator that\nshrinks\nthe sample mean\ntowards a more central mean vector\n(which can be chosen\na priori\nor commonly as the \"average of averages\" of the sample means, given all samples share the same size). This observation is commonly referred to as\nStein's example or paradox\n. In 1961,\nWillard James\nand Charles Stein simplified the original process.\n[\n2\n]\nIt can be shown that the James–Stein estimator\ndominates\nthe \"ordinary\"\nleast squares\napproach in the sense that the James–Stein estimator has a lower\nmean squared error\nthan the \"ordinary\" least squares estimator for all\n. This is possible because the James–Stein estimator is\nbiased\n, so that the\nGauss–Markov theorem\ndoes not apply.\nSimilar to the\nHodges' estimator\n, the James-Stein estimator is\nsuperefficient\nand\nnon-regular\nat\n.\n[\n3\n]\nLet\nwhere the vector\nis the unknown\nmean\nof\n, which is\n-variate normally distributed\nand with known\ncovariance matrix\n.\nWe are interested in obtaining an estimate,\n, of\n, based on a single observation,\n, of\n.\nIn real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent\nGaussian noise\n. Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the\nleast squares\nestimator, which is\n.\nStein demonstrated that in terms of\nmean squared error\n, the least squares estimator,\n, is sub-optimal to shrinkage based estimators, such as the\nJames–Stein estimator\n,\n.\n[\n1\n]\nThe paradoxical result, that there is a (possibly) better and never any worse estimate of\nin mean squared error as compared to the sample mean, became known as\nStein's example\n.\nMSE (R) of least squares estimator (ML) vs. James–Stein estimator (JS). The James–Stein estimator gives its best estimate when the norm of the actual parameter vector θ is near zero.\nIf\nis known, the James–Stein estimator is given by\nJames and Stein showed that the above estimator\ndominates\nfor any\n, meaning that the James–Stein estimator has a lower\nmean squared error\n(MSE) than the\nmaximum likelihood\nestimator.\n[\n2\n]\n[\n4\n]\nBy definition, this makes the least squares estimator\ninadmissible\nwhen\n.\nNotice that if\nthen this estimator simply  takes the natural estimator\nand shrinks it towards the origin\n0\n.  In fact this is not the only direction of\nshrinkage\nthat works.  Let\nbe an arbitrary fixed vector of dimension\n.  Then there exists an estimator of the James–Stein type that shrinks toward\n, namely\nThe James–Stein estimator dominates the usual estimator for any\n.  A natural question to ask is whether the improvement over the usual estimator is independent of the choice of\n.  The answer is no. The improvement is small if\nis large.  Thus to get a very great improvement some knowledge of the location of\nis necessary.  Of course this is the quantity we are trying to estimate so we don't have this knowledge\na priori\n.  But we may have some guess as to what the mean vector is.  This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher.  Nonetheless, James and Stein's result is that\nany\nfinite guess\nimproves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite\n, surely a poor guess.\nSeeing the James–Stein estimator as an\nempirical Bayes method\ngives some intuition to this result: One assumes that\nitself is a random variable with\nprior distribution\n, where\nis estimated from the data itself. Estimating\nonly gives an advantage compared to the\nmaximum-likelihood estimator\nwhen the dimension\nis large enough; hence it does not work for\n. The James–Stein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.\n[\n5\n]\nA consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James–Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is\nadmissible\n.  A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together.  The James–Stein estimator always improves upon the\ntotal\nMSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the James–Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James–Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.\nThe conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a\ntelecommunication\nsetting, it is reasonable to combine\nchannel\ntap measurements in a\nchannel estimation\nscenario, as the goal is to minimize the total channel estimation error.\nThe James–Stein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the\nentropic uncertainty principle\nfor more than three measurements.\n[\n6\n]\nAn intuitive derivation and interpretation is given by the\nGaltonian\nperspective.\n[\n7\n]\nUnder this interpretation, we aim to predict the population means using the\nimperfectly measured sample means\n. The equation of the\nOLS\nestimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the James–Stein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary).\nPositive-part James–Stein shrinkage operator\n[\nedit\n]\nDespite the intuition that the James–Stein estimator shrinks the unbiased least-squares estimator\ntoward\n, the estimator actually moves\naway\nfrom\nfor small values of\nas the multiplier on\nis then negative. This can be  remedied by replacing this multiplier by zero when it is negative. To this end, define the\npositive-part James-Stein shrinkage operator\n:\nwhere\n, and apply this operator component-wise to the (unbiased) least-squares estimator of\n(with known\n) for each\n:\nThe resulting estimator\nof\nis called the\npositive-part James–Stein estimator\nand can  be written  in vector notation as:\nThis estimator has a smaller risk than the basic James–Stein estimator for\n.  It follows that the basic James–Stein estimator is itself\ninadmissible\n.\n[\n8\n]\nIt turns out, however, that the positive-part estimator is also inadmissible.\n[\n4\n]\nThis follows from a more general result which requires admissible estimators to be smooth.\nPositive-part James–Stein shrinkage and model selection\n[\nedit\n]\nRecall the initial setup:\nwhere the variance coefficient\nis known and we wish to estimate the unknown (mean response) coefficient\n. In the more general setting of\nlinear regression\n, the mean response is instead given by\nwhere\nis a matrix with\ncolumns. As in the previous section,\nwe can use the\npositive-part James-Stein shrinkage operator\nto obtain a\nshrinkage estimator\nof\n. In particular, any\nthat satisfies the\nJames-Stein\nKKT conditions\n:\n[\n9\n]\nis a (positive-part) James-Stein estimator of\nwith the useful property that it performs both shrinkage and\nmodel selection\nsimultaneously. This is because, depending on the value of the known\n, there is a (possibly empty) set\nsuch that\nIn other words, some (or all) of the\ncould be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model.\nThe James–Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the \"ordinary\" or least squares estimator is often\ninadmissible\nfor simultaneous estimation of several parameters. This effect has been called\nStein's phenomenon\n, and has been demonstrated for several different problem settings, some of which are briefly outlined below.\nJames and Stein demonstrated that the estimator presented above can still be used when the variance\nis unknown, by replacing it with the standard estimator of the variance,\n. The dominance result still holds under the same condition, namely,\n.\n[\n2\n]\nAll the results above are for the case when only a single observation vector\ny\nis available.  For the more general case when\nvectors are available, we consider the estimator\nwhere\nis the\n-length average of the\nobservations, so that,\n.\nThe work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.\n[\n10\n]\nA similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a\nlinear regression\ntechnique which outperforms the standard application of the LS estimator.\n[\n10\n]\nStein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.\n[\n11\n]\nIt is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.\n[\n4\n]\nAdmissible decision rule\nHodges' estimator\nShrinkage estimator\nRegular estimator\nKL divergence\n^\na\nb\nStein, C.\n(1956), \"Inadmissibility of the usual estimator for the mean of a multivariate distribution\",\nProc. Third Berkeley Symp. Math. Statist. Prob.\n, vol. 1, pp. \n197–\n206,\nMR\n \n0084922\n,\nZbl\n \n0073.35602\n^\na\nb\nc\nJames, W.;\nStein, C.\n(1961), \"Estimation with quadratic loss\",\nProc. Fourth Berkeley Symp. Math. Statist. Prob.\n, vol. 1, pp. \n361–\n379,\nMR\n \n0133191\n^\nBeran, R. (1995). THE ROLE OF HAJEK’S CONVOLUTION THEOREM IN STATISTICAL THEORY\n^\na\nb\nc\nLehmann, E. L.; Casella, G. (1998),\nTheory of Point Estimation\n(2nd ed.), New York: Springer\n^\nEfron, B.; Morris, C. (1973). \"Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach\".\nJournal of the American Statistical Association\n.\n68\n(341). American Statistical Association:\n117–\n130.\ndoi\n:\n10.2307\/2284155\n.\nJSTOR\n \n2284155\n.\n^\nStander, M. (2017),\nUsing Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements\n,\narXiv\n:\n1702.02440\n,\nBibcode\n:\n2017arXiv170202440S\n^\nStigler, Stephen M. (1990-02-01).\n\"The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators\"\n.\nStatistical Science\n.\n5\n(1).\ndoi\n:\n10.1214\/ss\/1177012274\n.\nISSN\n \n0883-4237\n.\n^\nAnderson, T. W. (1984),\nAn Introduction to Multivariate Statistical Analysis\n(2nd ed.), New York: John Wiley & Sons\n^\nBotev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025).\nData Science and Machine Learning: Mathematical and Statistical Methods\n(2nd ed.). Boca Raton ; London: CRC Press. pp. \n277–\n279.\nISBN\n \n978-1-032-48868-4\n.\n^\na\nb\nBock, M. E. (1975), \"Minimax estimators of the mean of a multivariate normal distribution\",\nAnnals of Statistics\n,\n3\n(1):\n209–\n218,\ndoi\n:\n10.1214\/aos\/1176343009\n,\nMR\n \n0381064\n,\nZbl\n \n0314.62005\n^\nBrown, L. D.\n(1966), \"On the admissibility of invariant estimators of one or more location parameters\",\nAnnals of Mathematical Statistics\n,\n37\n(5):\n1087–\n1136,\ndoi\n:\n10.1214\/aoms\/1177699259\n,\nMR\n \n0216647\n,\nZbl\n \n0156.39401\nJudge, George G.; Bock, M. E. (1978).\nThe Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics\n. New York: North Holland. pp. \n229–\n257.\nISBN\n \n0-7204-0729-X\n.","attrs_markdown":"[Jump to content](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#bodyContent)\n\nMain menu\n\nMain menu\n\nmove to sidebar\n\nhide\n\nNavigation\n\n- [Main page](https:\/\/en.wikipedia.org\/wiki\/Main_Page \"Visit the main page [z]\")\n- [Contents](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Contents \"Guides to browsing Wikipedia\")\n- [Current events](https:\/\/en.wikipedia.org\/wiki\/Portal:Current_events \"Articles related to current events\")\n- [Random article](https:\/\/en.wikipedia.org\/wiki\/Special:Random \"Visit a randomly selected article [x]\")\n- [About Wikipedia](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:About \"Learn about Wikipedia and how it works\")\n- [Contact us](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Contact_us \"How to contact Wikipedia\")\n\nContribute\n\n- [Help](https:\/\/en.wikipedia.org\/wiki\/Help:Contents \"Guidance on how to use and edit Wikipedia\")\n- [Learn to edit](https:\/\/en.wikipedia.org\/wiki\/Help:Introduction \"Learn how to edit Wikipedia\")\n- [Community portal](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Community_portal \"The hub for editors\")\n- [Recent changes](https:\/\/en.wikipedia.org\/wiki\/Special:RecentChanges \"A list of recent changes to Wikipedia [r]\")\n- [Upload file](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:File_upload_wizard \"Add images or other media for use on Wikipedia\")\n- [Special pages](https:\/\/en.wikipedia.org\/wiki\/Special:SpecialPages \"A list of all special pages [q]\")\n\n[![](https:\/\/en.wikipedia.org\/static\/images\/icons\/enwiki-25.svg) ![Wikipedia](https:\/\/en.wikipedia.org\/static\/images\/mobile\/copyright\/wikipedia-wordmark-en-25.svg) ![The Free Encyclopedia](https:\/\/en.wikipedia.org\/static\/images\/mobile\/copyright\/wikipedia-tagline-en-25.svg)](https:\/\/en.wikipedia.org\/wiki\/Main_Page)\n\n[Search](https:\/\/en.wikipedia.org\/wiki\/Special:Search \"Search Wikipedia [f]\")\n\nAppearance\n\n- [Donate](https:\/\/donate.wikimedia.org\/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)\n- [Create account](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:CreateAccount&returnto=James%E2%80%93Stein+estimator \"You are encouraged to create an account and log in; however, it is not mandatory\")\n- [Log in](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:UserLogin&returnto=James%E2%80%93Stein+estimator \"You're encouraged to log in; however, it's not mandatory. [o]\")\n\nPersonal tools\n\n- [Donate](https:\/\/donate.wikimedia.org\/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)\n- [Create account](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:CreateAccount&returnto=James%E2%80%93Stein+estimator \"You are encouraged to create an account and log in; however, it is not mandatory\")\n- [Log in](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:UserLogin&returnto=James%E2%80%93Stein+estimator \"You're encouraged to log in; however, it's not mandatory. [o]\")\n\n## Contents\nmove to sidebar\n\nhide\n\n- [(Top)](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator)\n- [1 Setting](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#Setting)\n- [2 Formulation](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#Formulation)\n- [3 Interpretation](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#Interpretation)\n- [4 Positive-part James–Stein shrinkage operator](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#Positive-part_James%E2%80%93Stein_shrinkage_operator)\n- [5 Positive-part James–Stein shrinkage and model selection](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#Positive-part_James%E2%80%93Stein_shrinkage_and_model_selection)\n- [6 Further extensions](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#Further_extensions)\n- [7 See also](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#See_also)\n- [8 References](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#References)\n- [9 Further reading](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#Further_reading)\n\nToggle the table of contents\n\n# James–Stein estimator\n3 languages\n\n- [Deutsch](https:\/\/de.wikipedia.org\/wiki\/James-Stein-Sch%C3%A4tzer \"James-Stein-Schätzer – German\")\n- [Español](https:\/\/es.wikipedia.org\/wiki\/Estimador_de_James-Stein \"Estimador de James-Stein – Spanish\")\n- [Polski](https:\/\/pl.wikipedia.org\/wiki\/Estymator_Jamesa-Steina \"Estymator Jamesa-Steina – Polish\")\n\n[Edit links](https:\/\/www.wikidata.org\/wiki\/Special:EntityPage\/Q6146297#sitelinks-wikipedia \"Edit interlanguage links\")\n\n- [Article](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator \"View the content page [c]\")\n- [Talk](https:\/\/en.wikipedia.org\/wiki\/Talk:James%E2%80%93Stein_estimator \"Discuss improvements to the content page [t]\")\n\nEnglish\n\n- [Read](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator)\n- [Edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit \"Edit this page [e]\")\n- [View history](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=history \"Past revisions of this page [h]\")\n\nTools\n\nTools\n\nmove to sidebar\n\nhide\n\nActions\n\n- [Read](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator)\n- [Edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit \"Edit this page [e]\")\n- [View history](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=history)\n\nGeneral\n\n- [What links here](https:\/\/en.wikipedia.org\/wiki\/Special:WhatLinksHere\/James%E2%80%93Stein_estimator \"List of all English Wikipedia pages containing links to this page [j]\")\n- [Related changes](https:\/\/en.wikipedia.org\/wiki\/Special:RecentChangesLinked\/James%E2%80%93Stein_estimator \"Recent changes in pages linked from this page [k]\")\n- [Upload file](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:File_Upload_Wizard \"Upload files [u]\")\n- [Permanent link](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&oldid=1342138970 \"Permanent link to this revision of this page\")\n- [Page information](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=info \"More information about this page\")\n- [Cite this page](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:CiteThisPage&page=James%E2%80%93Stein_estimator&id=1342138970&wpFormIdentifier=titleform \"Information on how to cite this page\")\n- [Get shortened URL](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:UrlShortener&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJames%25E2%2580%2593Stein_estimator)\n\nPrint\/export\n\n- [Download as PDF](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:DownloadAsPdf&page=James%E2%80%93Stein_estimator&action=show-download-screen \"Download this page as a PDF file\")\n- [Printable version](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&printable=yes \"Printable version of this page [p]\")\n\nIn other projects\n\n- [Wikidata item](https:\/\/www.wikidata.org\/wiki\/Special:EntityPage\/Q6146297 \"Structured data on this page hosted by Wikidata [g]\")\n\nAppearance\n\nmove to sidebar\n\nhide\n\nFrom Wikipedia, the free encyclopedia\n\nRule for estimating the mean of a dataset\n\n|   |   |\n|---|---|\n| ![](https:\/\/upload.wikimedia.org\/wikipedia\/en\/thumb\/f\/f2\/Edit-clear.svg\/40px-Edit-clear.svg.png) | This article **may be too technical for most readers to understand**. Please [help improve it](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit) to [make it understandable to non-experts](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Make_technical_articles_understandable \"Wikipedia:Make technical articles understandable\"), without removing the technical details. *(November 2017)* *([Learn how and when to remove this message](https:\/\/en.wikipedia.org\/wiki\/Help:Maintenance_template_removal \"Help:Maintenance template removal\"))* |\n\nThe **James–Stein estimator** is an [estimator](https:\/\/en.wikipedia.org\/wiki\/Estimator \"Estimator\") of the [mean](https:\/\/en.wikipedia.org\/wiki\/Mean \"Mean\") θ := ( θ 1 , θ 2 , … θ m ) {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}:=(\\\\theta \\_{1},\\\\theta \\_{2},\\\\dots \\\\theta \\_{m})} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}:=(\\\\theta \\_{1},\\\\theta \\_{2},\\\\dots \\\\theta \\_{m})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ea925cd9dce005c92b0c86c343e8b005ecf8a3f3) for a multivariate [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") Y := ( Y 1 , Y 2 , … Y m ) {\\\\displaystyle {\\\\boldsymbol {Y}}:=(Y\\_{1},Y\\_{2},\\\\dots Y\\_{m})} ![{\\\\displaystyle {\\\\boldsymbol {Y}}:=(Y\\_{1},Y\\_{2},\\\\dots Y\\_{m})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c7d304d83650977659f023b20fae116a55c065b0).\n\nIt arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-stein-56-1) when [Charles Stein](https:\/\/en.wikipedia.org\/wiki\/Charles_Stein_\\(statistician\\) \"Charles Stein (statistician)\") reached a relatively shocking conclusion that while the then-usual estimate of the mean, the [sample mean](https:\/\/en.wikipedia.org\/wiki\/Sample_mean \"Sample mean\"), is [admissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") when m ≤ 2 {\\\\displaystyle m\\\\leq 2} ![{\\\\displaystyle m\\\\leq 2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9ef46309a66e2cdf7737c7246afd8e78e3052f3d), it is [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") when m ≥ 3 {\\\\displaystyle m\\\\geq 3} ![{\\\\displaystyle m\\\\geq 3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4610f29d2708d1febf1c0090f58ddd3986593545). Stein proposed a possible improvement to the estimator that [shrinks](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_\\(statistics\\) \"Shrinkage (statistics)\") the sample mean θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) towards a more central mean vector ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) (which can be chosen [a priori](https:\/\/en.wikipedia.org\/wiki\/A_priori_and_a_posteriori \"A priori and a posteriori\") or commonly as the \"average of averages\" of the sample means, given all samples share the same size). This observation is commonly referred to as [Stein's example or paradox](https:\/\/en.wikipedia.org\/wiki\/Stein%27s_example \"Stein's example\"). In 1961, [Willard James](https:\/\/en.wikipedia.org\/wiki\/Willard_D._James \"Willard D. James\") and Charles Stein simplified the original process.[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)\n\nIt can be shown that the James–Stein estimator [dominates](https:\/\/en.wikipedia.org\/wiki\/Dominating_decision_rule \"Dominating decision rule\") the \"ordinary\" [least squares](https:\/\/en.wikipedia.org\/wiki\/Least_squares \"Least squares\") approach in the sense that the James–Stein estimator has a lower [mean squared error](https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error \"Mean squared error\") than the \"ordinary\" least squares estimator for all θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf). This is possible because the James–Stein estimator is [biased](https:\/\/en.wikipedia.org\/wiki\/Bias_of_an_estimator \"Bias of an estimator\"), so that the [Gauss–Markov theorem](https:\/\/en.wikipedia.org\/wiki\/Gauss%E2%80%93Markov_theorem \"Gauss–Markov theorem\") does not apply.\n\nSimilar to the [Hodges' estimator](https:\/\/en.wikipedia.org\/wiki\/Hodges%27_estimator \"Hodges' estimator\"), the James-Stein estimator is [superefficient](https:\/\/en.wikipedia.org\/w\/index.php?title=Superefficient&action=edit&redlink=1 \"Superefficient (page does not exist)\") and [non-regular](https:\/\/en.wikipedia.org\/wiki\/Regular_estimator \"Regular estimator\") at θ \\= 0 {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}=\\\\mathbf {0} } ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}=\\\\mathbf {0} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3278daf129fb6d34f19a067b7fd6b7a90cee5aeb).[\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-3)\n\n## Setting\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=1 \"Edit section: Setting\")\\]\n\nLet Y ∼ N m ( θ , σ 2 I ) , {\\\\displaystyle {\\\\mathbf {Y} }\\\\sim N\\_{m}({\\\\boldsymbol {\\\\theta }},\\\\sigma ^{2}I),\\\\,} ![{\\\\displaystyle {\\\\mathbf {Y} }\\\\sim N\\_{m}({\\\\boldsymbol {\\\\theta }},\\\\sigma ^{2}I),\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9f91e318ab7b67f7cf60d538cc31280dde768387)where the vector θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) is the unknown [mean](https:\/\/en.wikipedia.org\/wiki\/Expected_value \"Expected value\") of Y {\\\\displaystyle {\\\\mathbf {Y} }} ![{\\\\displaystyle {\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7020214a70ec832bbdd74738ec96ca9989a695e6), which is [m {\\\\displaystyle m} ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc)\\-variate normally distributed](https:\/\/en.wikipedia.org\/wiki\/Multivariate_normal_distribution \"Multivariate normal distribution\") and with known [covariance matrix](https:\/\/en.wikipedia.org\/wiki\/Covariance_matrix \"Covariance matrix\") σ 2 I {\\\\displaystyle \\\\sigma ^{2}I} ![{\\\\displaystyle \\\\sigma ^{2}I}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6e2975cf33cb590e842a2c26a906f0949af795ff).\n\nWe are interested in obtaining an estimate, θ ^ {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}} ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42013234f6f8ffb7d8cc4268ec825db065113510), of θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf), based on a single observation, y {\\\\displaystyle {\\\\mathbf {y} }} ![{\\\\displaystyle {\\\\mathbf {y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c4ce8e1f631864c8e56c48f7861efef6666236d9), of Y {\\\\displaystyle {\\\\mathbf {Y} }} ![{\\\\displaystyle {\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7020214a70ec832bbdd74738ec96ca9989a695e6).\n\nIn real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent [Gaussian noise](https:\/\/en.wikipedia.org\/wiki\/Gaussian_noise \"Gaussian noise\"). Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the [least squares](https:\/\/en.wikipedia.org\/wiki\/Least_squares \"Least squares\") estimator, which is θ ^ L S \\= Y {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}={\\\\mathbf {Y} }} ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}={\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/83ee2a6a995c8a02c8f565486b778e1e70bd2fac).\n\nStein demonstrated that in terms of [mean squared error](https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error \"Mean squared error\") E ⁡ \\[ ‖ θ − θ ^ ‖ 2 \\] {\\\\displaystyle \\\\operatorname {E} \\\\left\\[\\\\left\\\\\\|{\\\\boldsymbol {\\\\theta }}-{\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\\\right\\\\\\|^{2}\\\\right\\]} ![{\\\\displaystyle \\\\operatorname {E} \\\\left\\[\\\\left\\\\\\|{\\\\boldsymbol {\\\\theta }}-{\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\\\right\\\\\\|^{2}\\\\right\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bdd1163606c619b36e0e0d2310a54e10903e7234), the least squares estimator, θ ^ L S {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}} ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d), is sub-optimal to shrinkage based estimators, such as the **James–Stein estimator**, θ ^ J S {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}} ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/805e2127008d46c29a7952101fada0840f6fe252).[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-stein-56-1) The paradoxical result, that there is a (possibly) better and never any worse estimate of θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) in mean squared error as compared to the sample mean, became known as [Stein's example](https:\/\/en.wikipedia.org\/wiki\/Stein%27s_example \"Stein's example\").\n\n## Formulation\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=2 \"Edit section: Formulation\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/MSE_of_ML_vs_JS.png\/500px-MSE_of_ML_vs_JS.png)](https:\/\/en.wikipedia.org\/wiki\/File:MSE_of_ML_vs_JS.png)\n\nMSE (R) of least squares estimator (ML) vs. James–Stein estimator (JS). The James–Stein estimator gives its best estimate when the norm of the actual parameter vector θ is near zero.\n\nIf σ 2 {\\\\displaystyle \\\\sigma ^{2}} ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known, the James–Stein estimator is given by\n\nθ\n\n^\n\nJ\n\nS\n\n\\=\n\n(\n\n1\n\n−\n\n(\n\nm\n\n−\n\n2\n\n)\n\nσ\n\n2\n\n‖\n\nY\n\n‖\n\n2\n\n)\n\nY\n\n.\n\n{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }\\\\\\|^{2}}}\\\\right){\\\\mathbf {Y} }.}\n\n![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }\\\\\\|^{2}}}\\\\right){\\\\mathbf {Y} }.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/abf3aef7dbf3a1cd7401484e59500b5504961d65)\n\nJames and Stein showed that the above estimator [dominates](https:\/\/en.wikipedia.org\/wiki\/Dominating_decision_rule \"Dominating decision rule\") θ ^ L S {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}} ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d) for any m ≥ 3 {\\\\displaystyle m\\\\geq 3} ![{\\\\displaystyle m\\\\geq 3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4610f29d2708d1febf1c0090f58ddd3986593545), meaning that the James–Stein estimator has a lower [mean squared error](https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error \"Mean squared error\") (MSE) than the [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimator.[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) By definition, this makes the least squares estimator [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") when m ≥ 3 {\\\\displaystyle m\\\\geq 3} ![{\\\\displaystyle m\\\\geq 3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4610f29d2708d1febf1c0090f58ddd3986593545).\n\nNotice that if ( m − 2 ) σ 2 \\< ‖ Y ‖ 2 {\\\\displaystyle (m-2)\\\\sigma ^{2}\\<\\\\\\|{\\\\mathbf {Y} }\\\\\\|^{2}} ![{\\\\displaystyle (m-2)\\\\sigma ^{2}\\<\\\\\\|{\\\\mathbf {Y} }\\\\\\|^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/791e4ba2fdd291a31cd1e0da5f16826c143f55c1) then this estimator simply takes the natural estimator Y {\\\\displaystyle \\\\mathbf {Y} } ![{\\\\displaystyle \\\\mathbf {Y} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c92a7716a99fadda050469747fce1e475e0ec549) and shrinks it towards the origin **0**. In fact this is not the only direction of [shrinkage](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_\\(statistics\\) \"Shrinkage (statistics)\") that works. Let ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) be an arbitrary fixed vector of dimension m {\\\\displaystyle m} ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc). Then there exists an estimator of the James–Stein type that shrinks toward ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160), namely\n\nθ\n\n^\n\nJ\n\nS\n\n\\=\n\n(\n\n1\n\n−\n\n(\n\nm\n\n−\n\n2\n\n)\n\nσ\n\n2\n\n‖\n\nY\n\n−\n\nν\n\n‖\n\n2\n\n)\n\n(\n\nY\n\n−\n\nν\n\n)\n\n\\+\n\nν\n\n,\n\nm\n\n≥\n\n3\\.\n\n{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|^{2}}}\\\\right)({\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }})+{\\\\boldsymbol {\\\\nu }},\\\\qquad m\\\\geq 3.}\n\n![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|^{2}}}\\\\right)({\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }})+{\\\\boldsymbol {\\\\nu }},\\\\qquad m\\\\geq 3.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fda106b6c468638610f33348f5c7549428c9810f)\n\nThe James–Stein estimator dominates the usual estimator for any ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160). A natural question to ask is whether the improvement over the usual estimator is independent of the choice of ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160). The answer is no. The improvement is small if ‖ θ − ν ‖ {\\\\displaystyle \\\\\\|{{\\\\boldsymbol {\\\\theta }}-{\\\\boldsymbol {\\\\nu }}}\\\\\\|} ![{\\\\displaystyle \\\\\\|{{\\\\boldsymbol {\\\\theta }}-{\\\\boldsymbol {\\\\nu }}}\\\\\\|}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c4f285d34fe737275b0d397ec39b887f4fdabd4c) is large. Thus to get a very great improvement some knowledge of the location of θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge [a priori](https:\/\/en.wikipedia.org\/wiki\/A_priori_and_a_posteriori \"A priori and a posteriori\"). But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that *any* finite guess ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160), surely a poor guess.\n\n## Interpretation\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=3 \"Edit section: Interpretation\")\\]\n\nSeeing the James–Stein estimator as an [empirical Bayes method](https:\/\/en.wikipedia.org\/wiki\/Empirical_Bayes_method \"Empirical Bayes method\") gives some intuition to this result: One assumes that θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) itself is a random variable with [prior distribution](https:\/\/en.wikipedia.org\/wiki\/Prior_probability \"Prior probability\") ∼ N ( 0 , A ) {\\\\displaystyle \\\\sim N(0,A)} ![{\\\\displaystyle \\\\sim N(0,A)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/70b48c6623236065e6e4d8940bd524e756c7b641), where A {\\\\displaystyle A} ![{\\\\displaystyle A}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7daff47fa58cdfd29dc333def748ff5fa4c923e3) is estimated from the data itself. Estimating A {\\\\displaystyle A} ![{\\\\displaystyle A}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7daff47fa58cdfd29dc333def748ff5fa4c923e3) only gives an advantage compared to the [maximum-likelihood estimator](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") when the dimension m {\\\\displaystyle m} ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) is large enough; hence it does not work for m ≤ 2 {\\\\displaystyle m\\\\leq 2} ![{\\\\displaystyle m\\\\leq 2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9ef46309a66e2cdf7737c7246afd8e78e3052f3d). The James–Stein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-5)\n\nA consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James–Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is [admissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\"). A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The James–Stein estimator always improves upon the *total* MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the James–Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James–Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.\n\nThe conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a [telecommunication](https:\/\/en.wikipedia.org\/wiki\/Telecommunication \"Telecommunication\") setting, it is reasonable to combine [channel](https:\/\/en.wikipedia.org\/wiki\/Communication_channel \"Communication channel\") tap measurements in a [channel estimation](https:\/\/en.wikipedia.org\/wiki\/Channel_estimation \"Channel estimation\") scenario, as the goal is to minimize the total channel estimation error.\n\nThe James–Stein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the [entropic uncertainty principle](https:\/\/en.wikipedia.org\/wiki\/Entropic_uncertainty_principle \"Entropic uncertainty principle\") for more than three measurements.[\\[6\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-stander-17-6)\n\nAn intuitive derivation and interpretation is given by the [Galtonian](https:\/\/en.wikipedia.org\/wiki\/Francis_Galton \"Francis Galton\") perspective.[\\[7\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-7) Under this interpretation, we aim to predict the population means using the [imperfectly measured sample means](https:\/\/en.wikipedia.org\/wiki\/Measurement_error_model \"Measurement error model\"). The equation of the [OLS](https:\/\/en.wikipedia.org\/wiki\/Ordinary_least_squares \"Ordinary least squares\") estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the James–Stein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary).\n\n## Positive-part James–Stein shrinkage operator\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=4 \"Edit section: Positive-part James–Stein shrinkage operator\")\\]\n\nDespite the intuition that the James–Stein estimator shrinks the unbiased least-squares estimator Y {\\\\displaystyle {\\\\mathbf {Y} }} ![{\\\\displaystyle {\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7020214a70ec832bbdd74738ec96ca9989a695e6) *toward* ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160), the estimator actually moves *away* from ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) for small values of ‖ Y − ν ‖ , {\\\\displaystyle \\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|,} ![{\\\\displaystyle \\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/65e1a7e674ea9579de967e045b596907ae5fde77) as the multiplier on Y − ν {\\\\displaystyle {\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6fddab8608ca8b09eeef298a960defea57c78395) is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the *positive-part James-Stein shrinkage operator*:\n\nS\n\nλ\n\n(\n\nx\n\n)\n\n\\=\n\nx\n\n\\[\n\n1\n\n−\n\n(\n\nλ\n\n\/\n\nx\n\n)\n\n2\n\n\\]\n\n\\+\n\n,\n\n{\\\\displaystyle S\\_{\\\\lambda }(x)=x\\\\left\\[1-\\\\left(\\\\lambda \/x\\\\right)^{2}\\\\right\\]\\_{+},}\n\n![{\\\\displaystyle S\\_{\\\\lambda }(x)=x\\\\left\\[1-\\\\left(\\\\lambda \/x\\\\right)^{2}\\\\right\\]\\_{+},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682ea45fcbaf678f56319df06a3029306a5e5e51)\n\nwhere x \\+ \\= max { 0 , x } {\\\\displaystyle x\\_{+}=\\\\max\\\\{0,x\\\\}} ![{\\\\displaystyle x\\_{+}=\\\\max\\\\{0,x\\\\}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5a5cf1f9bd0134bd7a9e50fa8b4e96acbbd05e32), and apply this operator component-wise to the (unbiased) least-squares estimator of θ − ν {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}-{\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}-{\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad81208959563b1216bdeb1e64c895fe4c1fddf4) (with known ν {\\\\displaystyle {\\\\boldsymbol {\\\\nu }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160)) for each i \\= 1 , … , m {\\\\displaystyle i=1,\\\\ldots ,m} ![{\\\\displaystyle i=1,\\\\ldots ,m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/74690f54a3c93a332ecb2935e900178b9a555483):\n\nθ\n\n^\n\ni\n\n\\+\n\n−\n\nν\n\ni\n\n\\=\n\nS\n\nλ\n\ni\n\n(\n\nY\n\ni\n\n−\n\nν\n\ni\n\n)\n\n,\n\nλ\n\ni\n\n:=\n\nσ\n\nm\n\n−\n\n2\n\n\\|\n\nY\n\ni\n\n−\n\nν\n\ni\n\n\\|\n\n‖\n\nY\n\n−\n\nν\n\n‖\n\n.\n\n{\\\\displaystyle {\\\\widehat {\\\\theta }}\\_{i}^{+}-\\\\nu \\_{i}=S\\_{\\\\lambda \\_{i}}(Y\\_{i}-\\\\nu \\_{i}),\\\\quad \\\\lambda \\_{i}:=\\\\sigma {\\\\sqrt {m-2}}\\\\,{\\\\frac {\\|Y\\_{i}-\\\\nu \\_{i}\\|}{\\\\\\|\\\\mathbf {Y} -{\\\\boldsymbol {\\\\nu }}\\\\\\|}}.}\n\n![{\\\\displaystyle {\\\\widehat {\\\\theta }}\\_{i}^{+}-\\\\nu \\_{i}=S\\_{\\\\lambda \\_{i}}(Y\\_{i}-\\\\nu \\_{i}),\\\\quad \\\\lambda \\_{i}:=\\\\sigma {\\\\sqrt {m-2}}\\\\,{\\\\frac {\\|Y\\_{i}-\\\\nu \\_{i}\\|}{\\\\\\|\\\\mathbf {Y} -{\\\\boldsymbol {\\\\nu }}\\\\\\|}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/26dfce05be8073ab053938126bd30ae4555cad60)\n\nThe resulting estimator θ ^ \\+ {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}^{+}} ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}^{+}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7f7f54f4ecec860db59a9850ffa278bda136898d) of θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) is called the *positive-part James–Stein estimator* and can be written in vector notation as:\n\nθ\n\n^\n\n\\+\n\n−\n\nν\n\n\\=\n\n(\n\n1\n\n−\n\n(\n\nm\n\n−\n\n2\n\n)\n\nσ\n\n2\n\n‖\n\nY\n\n−\n\nν\n\n‖\n\n2\n\n)\n\n\\+\n\n(\n\nY\n\n−\n\nν\n\n)\n\n.\n\n{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}^{+}-{\\\\boldsymbol {\\\\nu }}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|^{2}}}\\\\right)\\_{+}({\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}).}\n\n![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}^{+}-{\\\\boldsymbol {\\\\nu }}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|^{2}}}\\\\right)\\_{+}({\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f8dada56b5049df33fce01cf1d6f9ddbdab23eb3)\n\nThis estimator has a smaller risk than the basic James–Stein estimator for m ≥ 4 {\\\\displaystyle m\\\\geq 4} ![{\\\\displaystyle m\\\\geq 4}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e9ba367983142680a2456aa5ae1f61e21e1aa186). It follows that the basic James–Stein estimator is itself [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\").[\\[8\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-Anderson-84-8)\n\nIt turns out, however, that the positive-part estimator is also inadmissible.[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) This follows from a more general result which requires admissible estimators to be smooth.\n\n## Positive-part James–Stein shrinkage and model selection\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=5 \"Edit section: Positive-part James–Stein shrinkage and model selection\")\\]\n\nRecall the initial setup:\n\nY\n\n∼\n\nN\n\n(\n\nθ\n\n,\n\nσ\n\n2\n\nI\n\n)\n\n,\n\n{\\\\displaystyle {\\\\mathbf {Y} }\\\\sim N({\\\\boldsymbol {\\\\theta }},\\\\sigma ^{2}I),\\\\,}\n\n![{\\\\displaystyle {\\\\mathbf {Y} }\\\\sim N({\\\\boldsymbol {\\\\theta }},\\\\sigma ^{2}I),\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b19ee31bd76ccbb7a75983bad82ac0421914d40a)\n\nwhere the variance coefficient σ 2 {\\\\displaystyle \\\\sigma ^{2}} ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known and we wish to estimate the unknown (mean response) coefficient θ \\= E Y {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}=\\\\mathbb {E} \\\\mathbf {Y} } ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}=\\\\mathbb {E} \\\\mathbf {Y} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a9b78572d3e5dd5cb8362412e27f57d1a5241fdd). In the more general setting of [linear regression](https:\/\/en.wikipedia.org\/wiki\/Linear_regression \"Linear regression\"), the mean response is instead given by\n\nE\n\nY\n\n\\=\n\nX\n\nθ\n\n,\n\n{\\\\displaystyle \\\\mathbb {E} \\\\mathbf {Y} =\\\\mathbf {X} {\\\\boldsymbol {\\\\theta }},}\n\n![{\\\\displaystyle \\\\mathbb {E} \\\\mathbf {Y} =\\\\mathbf {X} {\\\\boldsymbol {\\\\theta }},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a9b27e84b5235faaa4c502229ce10644dbba2ac5)\n\nwhere X \\= \\[ v 1 , … , v m \\] {\\\\displaystyle \\\\mathbf {X} =\\[\\\\mathbf {v} \\_{1},\\\\ldots ,\\\\mathbf {v} \\_{m}\\]} ![{\\\\displaystyle \\\\mathbf {X} =\\[\\\\mathbf {v} \\_{1},\\\\ldots ,\\\\mathbf {v} \\_{m}\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0ad1c826b3a19756dc60523573d2d769dd2315d6) is a matrix with m {\\\\displaystyle m} ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) columns. As in the previous section, we can use the *positive-part James-Stein shrinkage operator* to obtain a [shrinkage estimator](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_estimator \"Shrinkage estimator\") of θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf). In particular, any θ ^ {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}} ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42013234f6f8ffb7d8cc4268ec825db065113510) that satisfies the *James-Stein [KKT conditions](https:\/\/en.wikipedia.org\/wiki\/KKT_conditions \"KKT conditions\")*:[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-:0-9)\n\nθ\n\n^\n\ni\n\n\\=\n\nS\n\nσ\n\n‖\n\nv\n\ni\n\n‖\n\n(\n\nθ\n\n^\n\ni\n\n\\+\n\nv\n\ni\n\n⊤\n\n(\n\nY\n\n−\n\nX\n\nθ\n\n^\n\n)\n\n‖\n\nv\n\ni\n\n‖\n\n2\n\n)\n\n,\n\ni\n\n\\=\n\n1\n\n,\n\n…\n\n,\n\nm\n\n{\\\\displaystyle {\\\\hat {\\\\theta }}\\_{i}=S\\_{\\\\frac {\\\\sigma }{\\\\\\|\\\\mathbf {v} \\_{i}\\\\\\|}}{\\\\bigg (}{\\\\hat {\\\\theta }}\\_{i}+{\\\\frac {\\\\mathbf {v} \\_{i}^{\\\\top }(\\\\mathbf {Y} -\\\\mathbf {X} {\\\\widehat {\\\\boldsymbol {\\\\theta }}})}{\\\\\\|\\\\mathbf {v} \\_{i}\\\\\\|^{2}}}{\\\\bigg )},\\\\quad i=1,\\\\ldots ,m}\n\n![{\\\\displaystyle {\\\\hat {\\\\theta }}\\_{i}=S\\_{\\\\frac {\\\\sigma }{\\\\\\|\\\\mathbf {v} \\_{i}\\\\\\|}}{\\\\bigg (}{\\\\hat {\\\\theta }}\\_{i}+{\\\\frac {\\\\mathbf {v} \\_{i}^{\\\\top }(\\\\mathbf {Y} -\\\\mathbf {X} {\\\\widehat {\\\\boldsymbol {\\\\theta }}})}{\\\\\\|\\\\mathbf {v} \\_{i}\\\\\\|^{2}}}{\\\\bigg )},\\\\quad i=1,\\\\ldots ,m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/936cbd3895bad590cedb7d6a87a98a32fb35ce2a)\n\nis a (positive-part) James-Stein estimator of θ {\\\\displaystyle {\\\\boldsymbol {\\\\theta }}} ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) with the useful property that it performs both shrinkage and [model selection](https:\/\/en.wikipedia.org\/wiki\/Model_selection \"Model selection\") simultaneously. This is because, depending on the value of the known σ 2 {\\\\displaystyle \\\\sigma ^{2}} ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5), there is a (possibly empty) set S ⊆ { 1 , … , m } {\\\\displaystyle {\\\\mathcal {S}}\\\\subseteq \\\\{1,\\\\ldots ,m\\\\}} ![{\\\\displaystyle {\\\\mathcal {S}}\\\\subseteq \\\\{1,\\\\ldots ,m\\\\}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/720aba8454507e0c7eded393c9d045ff1417c5ce) such that\n\nθ\n\n^\n\ni\n\n\\=\n\n0\n\n,\n\ni\n\n∈\n\nS\n\n.\n\n{\\\\displaystyle {\\\\hat {\\\\theta }}\\_{i}=0,\\\\quad i\\\\in {\\\\mathcal {S}}.}\n\n![{\\\\displaystyle {\\\\hat {\\\\theta }}\\_{i}=0,\\\\quad i\\\\in {\\\\mathcal {S}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2192ecfb09e8f50a2d6a6d23f9fd842e4fbe6321)\n\nIn other words, some (or all) of the θ i {\\\\displaystyle \\\\theta \\_{i}} ![{\\\\displaystyle \\\\theta \\_{i}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/302b19204ed378e99ff4575341a67eebdbe5a555) could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model.\n\n## Further extensions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=6 \"Edit section: Further extensions\")\\]\n\nThe James–Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the \"ordinary\" or least squares estimator is often [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") for simultaneous estimation of several parameters. This effect has been called [Stein's phenomenon](https:\/\/en.wikipedia.org\/wiki\/Stein%27s_phenomenon \"Stein's phenomenon\"), and has been demonstrated for several different problem settings, some of which are briefly outlined below.\n\n- James and Stein demonstrated that the estimator presented above can still be used when the variance\n  σ\n  \n  2\n  \n  {\\\\displaystyle \\\\sigma ^{2}}\n  \n  ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5)\n  is unknown, by replacing it with the standard estimator of the variance,\n  σ\n  \n  ^\n  \n  2\n  \n  \\=\n  \n  1\n  \n  m\n  \n  ∑\n  \n  (\n  \n  Y\n  \n  i\n  \n  −\n  \n  Y\n  \n  ¯\n  \n  )\n  \n  2\n  \n  {\\\\displaystyle {\\\\widehat {\\\\sigma }}^{2}={\\\\frac {1}{m}}\\\\sum (Y\\_{i}-{\\\\overline {Y}})^{2}}\n  \n  ![{\\\\displaystyle {\\\\widehat {\\\\sigma }}^{2}={\\\\frac {1}{m}}\\\\sum (Y\\_{i}-{\\\\overline {Y}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/950af315e27d5b7a8dcb8a34f12250702441a8ad)\n  . The dominance result still holds under the same condition, namely,\n  m\n  \n  \\>\n  \n  2\n  \n  {\\\\displaystyle m\\>2}\n  \n  ![{\\\\displaystyle m\\>2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/44e4ce1c04edd8f9602e60f0ec4457b7ac12fcd4)\n  .[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)\n- All the results above are for the case when only a single observation vector **y** is available. For the more general case when\n  n\n  \n  {\\\\displaystyle n}\n  \n  ![{\\\\displaystyle n}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b)\n  vectors are available, we consider the estimator\n  θ\n  \n  ^\n  \n  J\n  \n  S\n  \n  \\=\n  \n  (\n  \n  1\n  \n  −\n  \n  (\n  \n  m\n  \n  −\n  \n  2\n  \n  )\n  \n  σ\n  \n  2\n  \n  n\n  \n  ‖\n  \n  Y\n  \n  ¯\n  \n  ‖\n  \n  2\n  \n  )\n  \n  Y\n  \n  ¯\n  \n  ,\n  \n  {\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2){\\\\frac {\\\\sigma ^{2}}{n}}}{\\\\\\|{\\\\overline {\\\\mathbf {Y} }}\\\\\\|^{2}}}\\\\right){\\\\overline {\\\\mathbf {Y} }},}\n  \n  ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2){\\\\frac {\\\\sigma ^{2}}{n}}}{\\\\\\|{\\\\overline {\\\\mathbf {Y} }}\\\\\\|^{2}}}\\\\right){\\\\overline {\\\\mathbf {Y} }},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a97236e07a0088f3d9a1b8adc410e9e6de0fd599)\n  where\n  Y\n  \n  ¯\n  \n  {\\\\displaystyle {\\\\overline {\\\\mathbf {Y} }}}\n  \n  ![{\\\\displaystyle {\\\\overline {\\\\mathbf {Y} }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9415319ff8cfc6377fe06bf6ae88700802497c3c)\n  is the\n  m\n  \n  {\\\\displaystyle m}\n  \n  ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc)\n  \\-length average of the\n  n\n  \n  {\\\\displaystyle n}\n  \n  ![{\\\\displaystyle n}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b)\n  observations, so that,\n  Y\n  \n  ¯\n  \n  ∼\n  \n  N\n  \n  m\n  \n  (\n  \n  θ\n  \n  ,\n  \n  σ\n  \n  2\n  \n  n\n  \n  I\n  \n  )\n  \n  {\\\\displaystyle {\\\\overline {\\\\mathbf {Y} }}\\\\sim N\\_{m}{\\\\Big (}{\\\\boldsymbol {\\\\theta }},{\\\\frac {\\\\sigma ^{2}}{n}}I{\\\\Big )}}\n  \n  ![{\\\\displaystyle {\\\\overline {\\\\mathbf {Y} }}\\\\sim N\\_{m}{\\\\Big (}{\\\\boldsymbol {\\\\theta }},{\\\\frac {\\\\sigma ^{2}}{n}}I{\\\\Big )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e72554ac36054093a779d821974a6bbedec2d3df)\n  .\n- The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-bock75-10) A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a [linear regression](https:\/\/en.wikipedia.org\/wiki\/Linear_regression \"Linear regression\") technique which outperforms the standard application of the LS estimator.[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-bock75-10)\n- Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.[\\[11\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-brown66-11) It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4)\n\n## See also\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=7 \"Edit section: See also\")\\]\n\n- [Admissible decision rule](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\")\n- [Hodges' estimator](https:\/\/en.wikipedia.org\/wiki\/Hodges%27_estimator \"Hodges' estimator\")\n- [Shrinkage estimator](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_estimator \"Shrinkage estimator\")\n- [Regular estimator](https:\/\/en.wikipedia.org\/wiki\/Regular_estimator \"Regular estimator\")\n- [KL divergence](https:\/\/en.wikipedia.org\/wiki\/KL_divergence \"KL divergence\")\n\n## References\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=8 \"Edit section: References\")\\]\n\n1. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-1)\n   [Stein, C.](https:\/\/en.wikipedia.org\/wiki\/Charles_Stein_\\(statistician\\) \"Charles Stein (statistician)\") (1956), \"Inadmissibility of the usual estimator for the mean of a multivariate distribution\", [*Proc. Third Berkeley Symp. Math. Statist. Prob.*](http:\/\/projecteuclid.org\/euclid.bsmsp\/1200501656), vol. 1, pp. 197–206, [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0084922](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0084922), [Zbl](https:\/\/en.wikipedia.org\/wiki\/Zbl_\\(identifier\\) \"Zbl (identifier)\") [0073\\.35602](https:\/\/zbmath.org\/?format=complete&q=an:0073.35602)\n2. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-2)\n   James, W.; [Stein, C.](https:\/\/en.wikipedia.org\/wiki\/Charles_Stein_\\(statistician\\) \"Charles Stein (statistician)\") (1961), \"Estimation with quadratic loss\", [*Proc. Fourth Berkeley Symp. Math. Statist. Prob.*](http:\/\/projecteuclid.org\/euclid.bsmsp\/1200512173), vol. 1, pp. 361–379, [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0133191](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0133191)\n3. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-3)** Beran, R. (1995). THE ROLE OF HAJEK’S CONVOLUTION THEOREM IN STATISTICAL THEORY\n4. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-2)\n   Lehmann, E. L.; Casella, G. (1998), *Theory of Point Estimation* (2nd ed.), New York: Springer\n5. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-5)**\n   Efron, B.; Morris, C. (1973). \"Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach\". *Journal of the American Statistical Association*. **68** (341). American Statistical Association: 117–130\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.2307\/2284155](https:\/\/doi.org\/10.2307%2F2284155). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [2284155](https:\/\/www.jstor.org\/stable\/2284155).\n6. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-stander-17_6-0)**\n   Stander, M. (2017), *Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements*, [arXiv](https:\/\/en.wikipedia.org\/wiki\/ArXiv_\\(identifier\\) \"ArXiv (identifier)\"):[1702\\.02440](https:\/\/arxiv.org\/abs\/1702.02440), [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2017arXiv170202440S](https:\/\/ui.adsabs.harvard.edu\/abs\/2017arXiv170202440S)\n7. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-7)**\n   Stigler, Stephen M. (1990-02-01). [\"The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators\"](https:\/\/doi.org\/10.1214%2Fss%2F1177012274). *Statistical Science*. **5** (1). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/ss\/1177012274](https:\/\/doi.org\/10.1214%2Fss%2F1177012274). [ISSN](https:\/\/en.wikipedia.org\/wiki\/ISSN_\\(identifier\\) \"ISSN (identifier)\") [0883-4237](https:\/\/search.worldcat.org\/issn\/0883-4237).\n8. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-Anderson-84_8-0)**\n   Anderson, T. W. (1984), *An Introduction to Multivariate Statistical Analysis* (2nd ed.), New York: John Wiley & Sons\n9. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-:0_9-0)**\n   Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025). *Data Science and Machine Learning: Mathematical and Statistical Methods* (2nd ed.). Boca Raton ; London: CRC Press. pp. 277–279\\. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-1-032-48868-4](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-032-48868-4 \"Special:BookSources\/978-1-032-48868-4\")\n   \n   .\n10. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-bock75_10-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-bock75_10-1)\n    Bock, M. E. (1975), \"Minimax estimators of the mean of a multivariate normal distribution\", *[Annals of Statistics](https:\/\/en.wikipedia.org\/wiki\/Annals_of_Statistics \"Annals of Statistics\")*, **3** (1): 209–218, [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/aos\/1176343009](https:\/\/doi.org\/10.1214%2Faos%2F1176343009), [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0381064](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0381064), [Zbl](https:\/\/en.wikipedia.org\/wiki\/Zbl_\\(identifier\\) \"Zbl (identifier)\") [0314\\.62005](https:\/\/zbmath.org\/?format=complete&q=an:0314.62005)\n11. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-brown66_11-0)**\n    [Brown, L. D.](https:\/\/en.wikipedia.org\/wiki\/Lawrence_D._Brown \"Lawrence D. Brown\") (1966), \"On the admissibility of invariant estimators of one or more location parameters\", *Annals of Mathematical Statistics*, **37** (5): 1087–1136, [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/aoms\/1177699259](https:\/\/doi.org\/10.1214%2Faoms%2F1177699259), [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0216647](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0216647), [Zbl](https:\/\/en.wikipedia.org\/wiki\/Zbl_\\(identifier\\) \"Zbl (identifier)\") [0156\\.39401](https:\/\/zbmath.org\/?format=complete&q=an:0156.39401)\n\n## Further reading\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=9 \"Edit section: Further reading\")\\]\n\n- Judge, George G.; Bock, M. E. (1978). *The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics*. New York: North Holland. pp. 229–257\\. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n  \n  [0-7204-0729-X](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/0-7204-0729-X \"Special:BookSources\/0-7204-0729-X\")\n  \n  .\n\n![](https:\/\/en.wikipedia.org\/wiki\/Special:CentralAutoLogin\/start?useformat=desktop&type=1x1&usesul3=1)\n\nRetrieved from \"<https:\/\/en.wikipedia.org\/w\/index.php?title=James–Stein_estimator&oldid=1342138970>\"\n\n[Categories](https:\/\/en.wikipedia.org\/wiki\/Help:Category \"Help:Category\"):\n\n- [Estimator](https:\/\/en.wikipedia.org\/wiki\/Category:Estimator \"Category:Estimator\")\n- [Normal distribution](https:\/\/en.wikipedia.org\/wiki\/Category:Normal_distribution \"Category:Normal distribution\")\n\nHidden categories:\n\n- [Articles with short description](https:\/\/en.wikipedia.org\/wiki\/Category:Articles_with_short_description \"Category:Articles with short description\")\n- [Short description is different from Wikidata](https:\/\/en.wikipedia.org\/wiki\/Category:Short_description_is_different_from_Wikidata \"Category:Short description is different from Wikidata\")\n- [Wikipedia articles that are too technical from November 2017](https:\/\/en.wikipedia.org\/wiki\/Category:Wikipedia_articles_that_are_too_technical_from_November_2017 \"Category:Wikipedia articles that are too technical from November 2017\")\n- [All articles that are too technical](https:\/\/en.wikipedia.org\/wiki\/Category:All_articles_that_are_too_technical \"Category:All articles that are too technical\")\n\n- This page was last edited on 7 March 2026, at 06:50 (UTC).\n- Text is available under the [Creative Commons Attribution-ShareAlike 4.0 License](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License \"Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License\"); additional terms may apply. By using this site, you agree to the [Terms of Use](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Terms_of_Use \"foundation:Special:MyLanguage\/Policy:Terms of Use\") and [Privacy Policy](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Privacy_policy \"foundation:Special:MyLanguage\/Policy:Privacy policy\"). Wikipedia® is a registered trademark of the [Wikimedia Foundation, Inc.](https:\/\/wikimediafoundation.org\/), a non-profit organization.\n\n- [Privacy policy](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Privacy_policy)\n- [About Wikipedia](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:About)\n- [Disclaimers](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:General_disclaimer)\n- [Contact Wikipedia](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Contact_us)\n- [Legal & safety contacts](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Legal:Wikimedia_Foundation_Legal_and_Safety_Contact_Information)\n- [Code of Conduct](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Universal_Code_of_Conduct)\n- [Developers](https:\/\/developer.wikimedia.org\/)\n- [Statistics](https:\/\/stats.wikimedia.org\/#\/en.wikipedia.org)\n- [Cookie statement](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Cookie_statement)\n- [Mobile view](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&mobileaction=toggle_view_mobile)\n\n- [![Wikimedia Foundation](https:\/\/en.wikipedia.org\/static\/images\/footer\/wikimedia.svg)](https:\/\/www.wikimedia.org\/)\n- [![Powered by MediaWiki](https:\/\/en.wikipedia.org\/w\/resources\/assets\/mediawiki_compact.svg)](https:\/\/www.mediawiki.org\/)\n\nSearch\n\nToggle the table of contents\n\nJames–Stein estimator\n\n3 languages\n\n[Add topic](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator)","attrs_readable_markdown":"From Wikipedia, the free encyclopedia\n\nThe **James–Stein estimator** is an [estimator](https:\/\/en.wikipedia.org\/wiki\/Estimator \"Estimator\") of the [mean](https:\/\/en.wikipedia.org\/wiki\/Mean \"Mean\") ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}:=(\\\\theta \\_{1},\\\\theta \\_{2},\\\\dots \\\\theta \\_{m})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ea925cd9dce005c92b0c86c343e8b005ecf8a3f3) for a multivariate [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") ![{\\\\displaystyle {\\\\boldsymbol {Y}}:=(Y\\_{1},Y\\_{2},\\\\dots Y\\_{m})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c7d304d83650977659f023b20fae116a55c065b0).\n\nIt arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-stein-56-1) when [Charles Stein](https:\/\/en.wikipedia.org\/wiki\/Charles_Stein_\\(statistician\\) \"Charles Stein (statistician)\") reached a relatively shocking conclusion that while the then-usual estimate of the mean, the [sample mean](https:\/\/en.wikipedia.org\/wiki\/Sample_mean \"Sample mean\"), is [admissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") when ![{\\\\displaystyle m\\\\leq 2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9ef46309a66e2cdf7737c7246afd8e78e3052f3d), it is [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") when ![{\\\\displaystyle m\\\\geq 3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4610f29d2708d1febf1c0090f58ddd3986593545). Stein proposed a possible improvement to the estimator that [shrinks](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_\\(statistics\\) \"Shrinkage (statistics)\") the sample mean ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) towards a more central mean vector ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) (which can be chosen [a priori](https:\/\/en.wikipedia.org\/wiki\/A_priori_and_a_posteriori \"A priori and a posteriori\") or commonly as the \"average of averages\" of the sample means, given all samples share the same size). This observation is commonly referred to as [Stein's example or paradox](https:\/\/en.wikipedia.org\/wiki\/Stein%27s_example \"Stein's example\"). In 1961, [Willard James](https:\/\/en.wikipedia.org\/wiki\/Willard_D._James \"Willard D. James\") and Charles Stein simplified the original process.[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)\n\nIt can be shown that the James–Stein estimator [dominates](https:\/\/en.wikipedia.org\/wiki\/Dominating_decision_rule \"Dominating decision rule\") the \"ordinary\" [least squares](https:\/\/en.wikipedia.org\/wiki\/Least_squares \"Least squares\") approach in the sense that the James–Stein estimator has a lower [mean squared error](https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error \"Mean squared error\") than the \"ordinary\" least squares estimator for all ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf). This is possible because the James–Stein estimator is [biased](https:\/\/en.wikipedia.org\/wiki\/Bias_of_an_estimator \"Bias of an estimator\"), so that the [Gauss–Markov theorem](https:\/\/en.wikipedia.org\/wiki\/Gauss%E2%80%93Markov_theorem \"Gauss–Markov theorem\") does not apply.\n\nSimilar to the [Hodges' estimator](https:\/\/en.wikipedia.org\/wiki\/Hodges%27_estimator \"Hodges' estimator\"), the James-Stein estimator is [superefficient](https:\/\/en.wikipedia.org\/w\/index.php?title=Superefficient&action=edit&redlink=1 \"Superefficient (page does not exist)\") and [non-regular](https:\/\/en.wikipedia.org\/wiki\/Regular_estimator \"Regular estimator\") at ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}=\\\\mathbf {0} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3278daf129fb6d34f19a067b7fd6b7a90cee5aeb).[\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-3)\n\nLet ![{\\\\displaystyle {\\\\mathbf {Y} }\\\\sim N\\_{m}({\\\\boldsymbol {\\\\theta }},\\\\sigma ^{2}I),\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9f91e318ab7b67f7cf60d538cc31280dde768387)where the vector ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) is the unknown [mean](https:\/\/en.wikipedia.org\/wiki\/Expected_value \"Expected value\") of ![{\\\\displaystyle {\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7020214a70ec832bbdd74738ec96ca9989a695e6), which is [![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc)\\-variate normally distributed](https:\/\/en.wikipedia.org\/wiki\/Multivariate_normal_distribution \"Multivariate normal distribution\") and with known [covariance matrix](https:\/\/en.wikipedia.org\/wiki\/Covariance_matrix \"Covariance matrix\") ![{\\\\displaystyle \\\\sigma ^{2}I}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6e2975cf33cb590e842a2c26a906f0949af795ff).\n\nWe are interested in obtaining an estimate, ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42013234f6f8ffb7d8cc4268ec825db065113510), of ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf), based on a single observation, ![{\\\\displaystyle {\\\\mathbf {y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c4ce8e1f631864c8e56c48f7861efef6666236d9), of ![{\\\\displaystyle {\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7020214a70ec832bbdd74738ec96ca9989a695e6).\n\nIn real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent [Gaussian noise](https:\/\/en.wikipedia.org\/wiki\/Gaussian_noise \"Gaussian noise\"). Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the [least squares](https:\/\/en.wikipedia.org\/wiki\/Least_squares \"Least squares\") estimator, which is ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}={\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/83ee2a6a995c8a02c8f565486b778e1e70bd2fac).\n\nStein demonstrated that in terms of [mean squared error](https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error \"Mean squared error\") ![{\\\\displaystyle \\\\operatorname {E} \\\\left\\[\\\\left\\\\\\|{\\\\boldsymbol {\\\\theta }}-{\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\\\right\\\\\\|^{2}\\\\right\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bdd1163606c619b36e0e0d2310a54e10903e7234), the least squares estimator, ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d), is sub-optimal to shrinkage based estimators, such as the **James–Stein estimator**, ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/805e2127008d46c29a7952101fada0840f6fe252).[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-stein-56-1) The paradoxical result, that there is a (possibly) better and never any worse estimate of ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) in mean squared error as compared to the sample mean, became known as [Stein's example](https:\/\/en.wikipedia.org\/wiki\/Stein%27s_example \"Stein's example\").\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/MSE_of_ML_vs_JS.png\/500px-MSE_of_ML_vs_JS.png)](https:\/\/en.wikipedia.org\/wiki\/File:MSE_of_ML_vs_JS.png)\n\nMSE (R) of least squares estimator (ML) vs. James–Stein estimator (JS). The James–Stein estimator gives its best estimate when the norm of the actual parameter vector θ is near zero.\n\nIf ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known, the James–Stein estimator is given by\n\n![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }\\\\\\|^{2}}}\\\\right){\\\\mathbf {Y} }.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/abf3aef7dbf3a1cd7401484e59500b5504961d65)\n\nJames and Stein showed that the above estimator [dominates](https:\/\/en.wikipedia.org\/wiki\/Dominating_decision_rule \"Dominating decision rule\") ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{LS}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d) for any ![{\\\\displaystyle m\\\\geq 3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4610f29d2708d1febf1c0090f58ddd3986593545), meaning that the James–Stein estimator has a lower [mean squared error](https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error \"Mean squared error\") (MSE) than the [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimator.[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) By definition, this makes the least squares estimator [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") when ![{\\\\displaystyle m\\\\geq 3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4610f29d2708d1febf1c0090f58ddd3986593545).\n\nNotice that if ![{\\\\displaystyle (m-2)\\\\sigma ^{2}\\<\\\\\\|{\\\\mathbf {Y} }\\\\\\|^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/791e4ba2fdd291a31cd1e0da5f16826c143f55c1) then this estimator simply takes the natural estimator ![{\\\\displaystyle \\\\mathbf {Y} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c92a7716a99fadda050469747fce1e475e0ec549) and shrinks it towards the origin **0**. In fact this is not the only direction of [shrinkage](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_\\(statistics\\) \"Shrinkage (statistics)\") that works. Let ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) be an arbitrary fixed vector of dimension ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc). Then there exists an estimator of the James–Stein type that shrinks toward ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160), namely\n\n![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|^{2}}}\\\\right)({\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }})+{\\\\boldsymbol {\\\\nu }},\\\\qquad m\\\\geq 3.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fda106b6c468638610f33348f5c7549428c9810f)\n\nThe James–Stein estimator dominates the usual estimator for any ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160). A natural question to ask is whether the improvement over the usual estimator is independent of the choice of ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160). The answer is no. The improvement is small if ![{\\\\displaystyle \\\\\\|{{\\\\boldsymbol {\\\\theta }}-{\\\\boldsymbol {\\\\nu }}}\\\\\\|}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c4f285d34fe737275b0d397ec39b887f4fdabd4c) is large. Thus to get a very great improvement some knowledge of the location of ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge [a priori](https:\/\/en.wikipedia.org\/wiki\/A_priori_and_a_posteriori \"A priori and a posteriori\"). But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that *any* finite guess ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160), surely a poor guess.\n\nSeeing the James–Stein estimator as an [empirical Bayes method](https:\/\/en.wikipedia.org\/wiki\/Empirical_Bayes_method \"Empirical Bayes method\") gives some intuition to this result: One assumes that ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) itself is a random variable with [prior distribution](https:\/\/en.wikipedia.org\/wiki\/Prior_probability \"Prior probability\") ![{\\\\displaystyle \\\\sim N(0,A)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/70b48c6623236065e6e4d8940bd524e756c7b641), where ![{\\\\displaystyle A}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7daff47fa58cdfd29dc333def748ff5fa4c923e3) is estimated from the data itself. Estimating ![{\\\\displaystyle A}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7daff47fa58cdfd29dc333def748ff5fa4c923e3) only gives an advantage compared to the [maximum-likelihood estimator](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") when the dimension ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) is large enough; hence it does not work for ![{\\\\displaystyle m\\\\leq 2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9ef46309a66e2cdf7737c7246afd8e78e3052f3d). The James–Stein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-5)\n\nA consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James–Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is [admissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\"). A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The James–Stein estimator always improves upon the *total* MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the James–Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James–Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.\n\nThe conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a [telecommunication](https:\/\/en.wikipedia.org\/wiki\/Telecommunication \"Telecommunication\") setting, it is reasonable to combine [channel](https:\/\/en.wikipedia.org\/wiki\/Communication_channel \"Communication channel\") tap measurements in a [channel estimation](https:\/\/en.wikipedia.org\/wiki\/Channel_estimation \"Channel estimation\") scenario, as the goal is to minimize the total channel estimation error.\n\nThe James–Stein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the [entropic uncertainty principle](https:\/\/en.wikipedia.org\/wiki\/Entropic_uncertainty_principle \"Entropic uncertainty principle\") for more than three measurements.[\\[6\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-stander-17-6)\n\nAn intuitive derivation and interpretation is given by the [Galtonian](https:\/\/en.wikipedia.org\/wiki\/Francis_Galton \"Francis Galton\") perspective.[\\[7\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-7) Under this interpretation, we aim to predict the population means using the [imperfectly measured sample means](https:\/\/en.wikipedia.org\/wiki\/Measurement_error_model \"Measurement error model\"). The equation of the [OLS](https:\/\/en.wikipedia.org\/wiki\/Ordinary_least_squares \"Ordinary least squares\") estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the James–Stein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary).\n\n## Positive-part James–Stein shrinkage operator\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=4 \"Edit section: Positive-part James–Stein shrinkage operator\")\\]\n\nDespite the intuition that the James–Stein estimator shrinks the unbiased least-squares estimator ![{\\\\displaystyle {\\\\mathbf {Y} }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7020214a70ec832bbdd74738ec96ca9989a695e6) *toward* ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160), the estimator actually moves *away* from ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160) for small values of ![{\\\\displaystyle \\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/65e1a7e674ea9579de967e045b596907ae5fde77) as the multiplier on ![{\\\\displaystyle {\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6fddab8608ca8b09eeef298a960defea57c78395) is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the *positive-part James-Stein shrinkage operator*:\n\n![{\\\\displaystyle S\\_{\\\\lambda }(x)=x\\\\left\\[1-\\\\left(\\\\lambda \/x\\\\right)^{2}\\\\right\\]\\_{+},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682ea45fcbaf678f56319df06a3029306a5e5e51)\n\nwhere ![{\\\\displaystyle x\\_{+}=\\\\max\\\\{0,x\\\\}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5a5cf1f9bd0134bd7a9e50fa8b4e96acbbd05e32), and apply this operator component-wise to the (unbiased) least-squares estimator of ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}-{\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad81208959563b1216bdeb1e64c895fe4c1fddf4) (with known ![{\\\\displaystyle {\\\\boldsymbol {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/57d030f5358ad62e959e7907ec1508746a563160)) for each ![{\\\\displaystyle i=1,\\\\ldots ,m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/74690f54a3c93a332ecb2935e900178b9a555483):\n\n![{\\\\displaystyle {\\\\widehat {\\\\theta }}\\_{i}^{+}-\\\\nu \\_{i}=S\\_{\\\\lambda \\_{i}}(Y\\_{i}-\\\\nu \\_{i}),\\\\quad \\\\lambda \\_{i}:=\\\\sigma {\\\\sqrt {m-2}}\\\\,{\\\\frac {\\|Y\\_{i}-\\\\nu \\_{i}\\|}{\\\\\\|\\\\mathbf {Y} -{\\\\boldsymbol {\\\\nu }}\\\\\\|}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/26dfce05be8073ab053938126bd30ae4555cad60)\n\nThe resulting estimator ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}^{+}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7f7f54f4ecec860db59a9850ffa278bda136898d) of ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) is called the *positive-part James–Stein estimator* and can be written in vector notation as:\n\n![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}^{+}-{\\\\boldsymbol {\\\\nu }}=\\\\left(1-{\\\\frac {(m-2)\\\\sigma ^{2}}{\\\\\\|{\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}\\\\\\|^{2}}}\\\\right)\\_{+}({\\\\mathbf {Y} }-{\\\\boldsymbol {\\\\nu }}).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f8dada56b5049df33fce01cf1d6f9ddbdab23eb3)\n\nThis estimator has a smaller risk than the basic James–Stein estimator for ![{\\\\displaystyle m\\\\geq 4}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e9ba367983142680a2456aa5ae1f61e21e1aa186). It follows that the basic James–Stein estimator is itself [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\").[\\[8\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-Anderson-84-8)\n\nIt turns out, however, that the positive-part estimator is also inadmissible.[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) This follows from a more general result which requires admissible estimators to be smooth.\n\n## Positive-part James–Stein shrinkage and model selection\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=5 \"Edit section: Positive-part James–Stein shrinkage and model selection\")\\]\n\nRecall the initial setup:\n\n![{\\\\displaystyle {\\\\mathbf {Y} }\\\\sim N({\\\\boldsymbol {\\\\theta }},\\\\sigma ^{2}I),\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b19ee31bd76ccbb7a75983bad82ac0421914d40a)\n\nwhere the variance coefficient ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known and we wish to estimate the unknown (mean response) coefficient ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}=\\\\mathbb {E} \\\\mathbf {Y} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a9b78572d3e5dd5cb8362412e27f57d1a5241fdd). In the more general setting of [linear regression](https:\/\/en.wikipedia.org\/wiki\/Linear_regression \"Linear regression\"), the mean response is instead given by\n\n![{\\\\displaystyle \\\\mathbb {E} \\\\mathbf {Y} =\\\\mathbf {X} {\\\\boldsymbol {\\\\theta }},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a9b27e84b5235faaa4c502229ce10644dbba2ac5)\n\nwhere ![{\\\\displaystyle \\\\mathbf {X} =\\[\\\\mathbf {v} \\_{1},\\\\ldots ,\\\\mathbf {v} \\_{m}\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0ad1c826b3a19756dc60523573d2d769dd2315d6) is a matrix with ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) columns. As in the previous section, we can use the *positive-part James-Stein shrinkage operator* to obtain a [shrinkage estimator](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_estimator \"Shrinkage estimator\") of ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf). In particular, any ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42013234f6f8ffb7d8cc4268ec825db065113510) that satisfies the *James-Stein [KKT conditions](https:\/\/en.wikipedia.org\/wiki\/KKT_conditions \"KKT conditions\")*:[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-:0-9)\n\n![{\\\\displaystyle {\\\\hat {\\\\theta }}\\_{i}=S\\_{\\\\frac {\\\\sigma }{\\\\\\|\\\\mathbf {v} \\_{i}\\\\\\|}}{\\\\bigg (}{\\\\hat {\\\\theta }}\\_{i}+{\\\\frac {\\\\mathbf {v} \\_{i}^{\\\\top }(\\\\mathbf {Y} -\\\\mathbf {X} {\\\\widehat {\\\\boldsymbol {\\\\theta }}})}{\\\\\\|\\\\mathbf {v} \\_{i}\\\\\\|^{2}}}{\\\\bigg )},\\\\quad i=1,\\\\ldots ,m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/936cbd3895bad590cedb7d6a87a98a32fb35ce2a)\n\nis a (positive-part) James-Stein estimator of ![{\\\\displaystyle {\\\\boldsymbol {\\\\theta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/33b025a6bf54ec02e65c871dc3e5897c921419cf) with the useful property that it performs both shrinkage and [model selection](https:\/\/en.wikipedia.org\/wiki\/Model_selection \"Model selection\") simultaneously. This is because, depending on the value of the known ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5), there is a (possibly empty) set ![{\\\\displaystyle {\\\\mathcal {S}}\\\\subseteq \\\\{1,\\\\ldots ,m\\\\}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/720aba8454507e0c7eded393c9d045ff1417c5ce) such that\n\n![{\\\\displaystyle {\\\\hat {\\\\theta }}\\_{i}=0,\\\\quad i\\\\in {\\\\mathcal {S}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2192ecfb09e8f50a2d6a6d23f9fd842e4fbe6321)\n\nIn other words, some (or all) of the ![{\\\\displaystyle \\\\theta \\_{i}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/302b19204ed378e99ff4575341a67eebdbe5a555) could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model.\n\nThe James–Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the \"ordinary\" or least squares estimator is often [inadmissible](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\") for simultaneous estimation of several parameters. This effect has been called [Stein's phenomenon](https:\/\/en.wikipedia.org\/wiki\/Stein%27s_phenomenon \"Stein's phenomenon\"), and has been demonstrated for several different problem settings, some of which are briefly outlined below.\n\n- James and Stein demonstrated that the estimator presented above can still be used when the variance ![{\\\\displaystyle \\\\sigma ^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53a5c55e536acf250c1d3e0f754be5692b843ef5) is unknown, by replacing it with the standard estimator of the variance, ![{\\\\displaystyle {\\\\widehat {\\\\sigma }}^{2}={\\\\frac {1}{m}}\\\\sum (Y\\_{i}-{\\\\overline {Y}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/950af315e27d5b7a8dcb8a34f12250702441a8ad). The dominance result still holds under the same condition, namely, ![{\\\\displaystyle m\\>2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/44e4ce1c04edd8f9602e60f0ec4457b7ac12fcd4).[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)\n- All the results above are for the case when only a single observation vector **y** is available. For the more general case when ![{\\\\displaystyle n}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b) vectors are available, we consider the estimator ![{\\\\displaystyle {\\\\widehat {\\\\boldsymbol {\\\\theta }}}\\_{JS}=\\\\left(1-{\\\\frac {(m-2){\\\\frac {\\\\sigma ^{2}}{n}}}{\\\\\\|{\\\\overline {\\\\mathbf {Y} }}\\\\\\|^{2}}}\\\\right){\\\\overline {\\\\mathbf {Y} }},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a97236e07a0088f3d9a1b8adc410e9e6de0fd599) where ![{\\\\displaystyle {\\\\overline {\\\\mathbf {Y} }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9415319ff8cfc6377fe06bf6ae88700802497c3c) is the ![{\\\\displaystyle m}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a07d98bb302f3856cbabc47b2b9016692e3f7bc)\\-length average of the ![{\\\\displaystyle n}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b) observations, so that, ![{\\\\displaystyle {\\\\overline {\\\\mathbf {Y} }}\\\\sim N\\_{m}{\\\\Big (}{\\\\boldsymbol {\\\\theta }},{\\\\frac {\\\\sigma ^{2}}{n}}I{\\\\Big )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e72554ac36054093a779d821974a6bbedec2d3df).\n- The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-bock75-10) A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a [linear regression](https:\/\/en.wikipedia.org\/wiki\/Linear_regression \"Linear regression\") technique which outperforms the standard application of the LS estimator.[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-bock75-10)\n- Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.[\\[11\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-brown66-11) It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4)\n\n- [Admissible decision rule](https:\/\/en.wikipedia.org\/wiki\/Admissible_decision_rule \"Admissible decision rule\")\n- [Hodges' estimator](https:\/\/en.wikipedia.org\/wiki\/Hodges%27_estimator \"Hodges' estimator\")\n- [Shrinkage estimator](https:\/\/en.wikipedia.org\/wiki\/Shrinkage_estimator \"Shrinkage estimator\")\n- [Regular estimator](https:\/\/en.wikipedia.org\/wiki\/Regular_estimator \"Regular estimator\")\n- [KL divergence](https:\/\/en.wikipedia.org\/wiki\/KL_divergence \"KL divergence\")\n\n1. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-1)\n   [Stein, C.](https:\/\/en.wikipedia.org\/wiki\/Charles_Stein_\\(statistician\\) \"Charles Stein (statistician)\") (1956), \"Inadmissibility of the usual estimator for the mean of a multivariate distribution\", [*Proc. Third Berkeley Symp. Math. Statist. Prob.*](http:\/\/projecteuclid.org\/euclid.bsmsp\/1200501656), vol. 1, pp. 197–206, [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0084922](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0084922), [Zbl](https:\/\/en.wikipedia.org\/wiki\/Zbl_\\(identifier\\) \"Zbl (identifier)\") [0073\\.35602](https:\/\/zbmath.org\/?format=complete&q=an:0073.35602)\n2. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-2)\n   James, W.; [Stein, C.](https:\/\/en.wikipedia.org\/wiki\/Charles_Stein_\\(statistician\\) \"Charles Stein (statistician)\") (1961), \"Estimation with quadratic loss\", [*Proc. Fourth Berkeley Symp. Math. Statist. Prob.*](http:\/\/projecteuclid.org\/euclid.bsmsp\/1200512173), vol. 1, pp. 361–379, [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0133191](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0133191)\n3. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-3)** Beran, R. (1995). THE ROLE OF HAJEK’S CONVOLUTION THEOREM IN STATISTICAL THEORY\n4. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-2)\n   Lehmann, E. L.; Casella, G. (1998), *Theory of Point Estimation* (2nd ed.), New York: Springer\n5. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-5)**\n   Efron, B.; Morris, C. (1973). \"Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach\". *Journal of the American Statistical Association*. **68** (341). American Statistical Association: 117–130\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.2307\/2284155](https:\/\/doi.org\/10.2307%2F2284155). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [2284155](https:\/\/www.jstor.org\/stable\/2284155).\n6. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-stander-17_6-0)**\n   Stander, M. (2017), *Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements*, [arXiv](https:\/\/en.wikipedia.org\/wiki\/ArXiv_\\(identifier\\) \"ArXiv (identifier)\"):[1702\\.02440](https:\/\/arxiv.org\/abs\/1702.02440), [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2017arXiv170202440S](https:\/\/ui.adsabs.harvard.edu\/abs\/2017arXiv170202440S)\n7. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-7)**\n   Stigler, Stephen M. (1990-02-01). [\"The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators\"](https:\/\/doi.org\/10.1214%2Fss%2F1177012274). *Statistical Science*. **5** (1). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/ss\/1177012274](https:\/\/doi.org\/10.1214%2Fss%2F1177012274). [ISSN](https:\/\/en.wikipedia.org\/wiki\/ISSN_\\(identifier\\) \"ISSN (identifier)\") [0883-4237](https:\/\/search.worldcat.org\/issn\/0883-4237).\n8. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-Anderson-84_8-0)**\n   Anderson, T. W. (1984), *An Introduction to Multivariate Statistical Analysis* (2nd ed.), New York: John Wiley & Sons\n9. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-:0_9-0)**\n   Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025). *Data Science and Machine Learning: Mathematical and Statistical Methods* (2nd ed.). Boca Raton ; London: CRC Press. pp. 277–279\\. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-1-032-48868-4](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-032-48868-4 \"Special:BookSources\/978-1-032-48868-4\")\n   \n   .\n10. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-bock75_10-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-bock75_10-1)\n    Bock, M. E. (1975), \"Minimax estimators of the mean of a multivariate normal distribution\", *[Annals of Statistics](https:\/\/en.wikipedia.org\/wiki\/Annals_of_Statistics \"Annals of Statistics\")*, **3** (1): 209–218, [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/aos\/1176343009](https:\/\/doi.org\/10.1214%2Faos%2F1176343009), [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0381064](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0381064), [Zbl](https:\/\/en.wikipedia.org\/wiki\/Zbl_\\(identifier\\) \"Zbl (identifier)\") [0314\\.62005](https:\/\/zbmath.org\/?format=complete&q=an:0314.62005)\n11. **[^](https:\/\/en.wikipedia.org\/wiki\/James%E2%80%93Stein_estimator#cite_ref-brown66_11-0)**\n    [Brown, L. D.](https:\/\/en.wikipedia.org\/wiki\/Lawrence_D._Brown \"Lawrence D. Brown\") (1966), \"On the admissibility of invariant estimators of one or more location parameters\", *Annals of Mathematical Statistics*, **37** (5): 1087–1136, [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/aoms\/1177699259](https:\/\/doi.org\/10.1214%2Faoms%2F1177699259), [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [0216647](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=0216647), [Zbl](https:\/\/en.wikipedia.org\/wiki\/Zbl_\\(identifier\\) \"Zbl (identifier)\") [0156\\.39401](https:\/\/zbmath.org\/?format=complete&q=an:0156.39401)\n\n- Judge, George G.; Bock, M. E. (1978). *The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics*. New York: North Holland. pp. 229–257\\. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n  \n  [0-7204-0729-X](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/0-7204-0729-X \"Special:BookSources\/0-7204-0729-X\")\n  \n  .","meta_canonical":null,"ml_categories_json":"{\"\/Science\":903,\"\/Science\/Mathematics\":808}","ml_types_json":"{\"\/Article\":997,\"\/Article\/Wiki\":840}","ml_intent_types_json":"{\"Informational\":999}","meta_language":"en","attrs_author":null,"attrs_publish_time":0,"attrs_original_publish_time":1433035597,"attrs_is_republished":0,"attrs_nr_words":"3289","attrs_boilerpipe_nr_words":"1989","body_ext_links_number":37,"body_int_links_number":116,"meta_nofollow":0,"meta_noarchive":0,"props_was_rendered":0,"src_redirect":"","download_time_msec":29,"download_ttfb_msec":20,"download_size":30603}

3. Robots.txt Check

Query:

Response:

4. Spam/Ban Check

Query:

Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

📄

INDEXABLE

✅

CRAWLED

1 month ago

🤖

ROBOTS ALLOWED

Page Info Filters

Filter	Status	Condition	Details
HTTP status	PASS	`download_http_code = 200`	HTTP 200
Age cutoff	PASS	`download_stamp > now() - 6 MONTH`	1.2 months ago (distributed domain, exempt)
History drop	PASS	`isNull(history_drop_reason)`	No drop reason
Spam/ban	PASS	`fh_dont_index != 1 AND ml_spam_score = 0`	ml_spam_score=0
Canonical	PASS	`meta_canonical IS NULL OR = '' OR = src_unparsed`	Not set

Page Details

Property

Value

URL

https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator

Last Crawled

2026-03-19 01:38:09 (1 month ago)

First Indexed

2015-05-31 01:26:37 (10 years ago)

HTTP Status Code

200

Content

Meta Title

James–Stein estimator - Wikipedia

Meta Description

null

Meta Canonical

null

Boilerpipe Text

From Wikipedia, the free encyclopedia The James–Stein estimator is an estimator of the mean for a multivariate random variable . It arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956, [ 1 ] when Charles Stein reached a relatively shocking conclusion that while the then-usual estimate of the mean, the sample mean , is admissible when , it is inadmissible when . Stein proposed a possible improvement to the estimator that shrinks the sample mean towards a more central mean vector (which can be chosen a priori or commonly as the "average of averages" of the sample means, given all samples share the same size). This observation is commonly referred to as Stein's example or paradox . In 1961, Willard James and Charles Stein simplified the original process. [ 2 ] It can be shown that the James–Stein estimator dominates the "ordinary" least squares approach in the sense that the James–Stein estimator has a lower mean squared error than the "ordinary" least squares estimator for all . This is possible because the James–Stein estimator is biased , so that the Gauss–Markov theorem does not apply. Similar to the Hodges' estimator , the James-Stein estimator is superefficient and non-regular at . [ 3 ] Let where the vector is the unknown mean of , which is -variate normally distributed and with known covariance matrix . We are interested in obtaining an estimate, , of , based on a single observation, , of . In real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent Gaussian noise . Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the least squares estimator, which is . Stein demonstrated that in terms of mean squared error , the least squares estimator, , is sub-optimal to shrinkage based estimators, such as the James–Stein estimator , . [ 1 ] The paradoxical result, that there is a (possibly) better and never any worse estimate of in mean squared error as compared to the sample mean, became known as Stein's example . MSE (R) of least squares estimator (ML) vs. James–Stein estimator (JS). The James–Stein estimator gives its best estimate when the norm of the actual parameter vector θ is near zero. If is known, the James–Stein estimator is given by James and Stein showed that the above estimator dominates for any , meaning that the James–Stein estimator has a lower mean squared error (MSE) than the maximum likelihood estimator. [ 2 ] [ 4 ] By definition, this makes the least squares estimator inadmissible when . Notice that if then this estimator simply takes the natural estimator and shrinks it towards the origin 0 . In fact this is not the only direction of shrinkage that works. Let be an arbitrary fixed vector of dimension . Then there exists an estimator of the James–Stein type that shrinks toward , namely The James–Stein estimator dominates the usual estimator for any . A natural question to ask is whether the improvement over the usual estimator is independent of the choice of . The answer is no. The improvement is small if is large. Thus to get a very great improvement some knowledge of the location of is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge a priori . But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that any finite guess improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite , surely a poor guess. Seeing the James–Stein estimator as an empirical Bayes method gives some intuition to this result: One assumes that itself is a random variable with prior distribution , where is estimated from the data itself. Estimating only gives an advantage compared to the maximum-likelihood estimator when the dimension is large enough; hence it does not work for . The James–Stein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator. [ 5 ] A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James–Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is admissible . A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The James–Stein estimator always improves upon the total MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the James–Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James–Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator. The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a telecommunication setting, it is reasonable to combine channel tap measurements in a channel estimation scenario, as the goal is to minimize the total channel estimation error. The James–Stein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the entropic uncertainty principle for more than three measurements. [ 6 ] An intuitive derivation and interpretation is given by the Galtonian perspective. [ 7 ] Under this interpretation, we aim to predict the population means using the imperfectly measured sample means . The equation of the OLS estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the James–Stein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary). Positive-part James–Stein shrinkage operator [ edit ] Despite the intuition that the James–Stein estimator shrinks the unbiased least-squares estimator toward , the estimator actually moves away from for small values of as the multiplier on is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the positive-part James-Stein shrinkage operator : where , and apply this operator component-wise to the (unbiased) least-squares estimator of (with known ) for each : The resulting estimator of is called the positive-part James–Stein estimator and can be written in vector notation as: This estimator has a smaller risk than the basic James–Stein estimator for . It follows that the basic James–Stein estimator is itself inadmissible . [ 8 ] It turns out, however, that the positive-part estimator is also inadmissible. [ 4 ] This follows from a more general result which requires admissible estimators to be smooth. Positive-part James–Stein shrinkage and model selection [ edit ] Recall the initial setup: where the variance coefficient is known and we wish to estimate the unknown (mean response) coefficient . In the more general setting of linear regression , the mean response is instead given by where is a matrix with columns. As in the previous section, we can use the positive-part James-Stein shrinkage operator to obtain a shrinkage estimator of . In particular, any that satisfies the James-Stein KKT conditions : [ 9 ] is a (positive-part) James-Stein estimator of with the useful property that it performs both shrinkage and model selection simultaneously. This is because, depending on the value of the known , there is a (possibly empty) set such that In other words, some (or all) of the could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model. The James–Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the "ordinary" or least squares estimator is often inadmissible for simultaneous estimation of several parameters. This effect has been called Stein's phenomenon , and has been demonstrated for several different problem settings, some of which are briefly outlined below. James and Stein demonstrated that the estimator presented above can still be used when the variance is unknown, by replacing it with the standard estimator of the variance, . The dominance result still holds under the same condition, namely, . [ 2 ] All the results above are for the case when only a single observation vector y is available. For the more general case when vectors are available, we consider the estimator where is the -length average of the observations, so that, . The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances. [ 10 ] A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a linear regression technique which outperforms the standard application of the LS estimator. [ 10 ] Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited. [ 11 ] It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions. [ 4 ] Admissible decision rule Hodges' estimator Shrinkage estimator Regular estimator KL divergence ^ a b Stein, C. (1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution", Proc. Third Berkeley Symp. Math. Statist. Prob. , vol. 1, pp. 197– 206, MR 0084922 , Zbl 0073.35602 ^ a b c James, W.; Stein, C. (1961), "Estimation with quadratic loss", Proc. Fourth Berkeley Symp. Math. Statist. Prob. , vol. 1, pp. 361– 379, MR 0133191 ^ Beran, R. (1995). THE ROLE OF HAJEK’S CONVOLUTION THEOREM IN STATISTICAL THEORY ^ a b c Lehmann, E. L.; Casella, G. (1998), Theory of Point Estimation (2nd ed.), New York: Springer ^ Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach". Journal of the American Statistical Association . 68 (341). American Statistical Association: 117– 130. doi : 10.2307/2284155 . JSTOR 2284155 . ^ Stander, M. (2017), Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements , arXiv : 1702.02440 , Bibcode : 2017arXiv170202440S ^ Stigler, Stephen M. (1990-02-01). "The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators" . Statistical Science . 5 (1). doi : 10.1214/ss/1177012274 . ISSN 0883-4237 . ^ Anderson, T. W. (1984), An Introduction to Multivariate Statistical Analysis (2nd ed.), New York: John Wiley & Sons ^ Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025). Data Science and Machine Learning: Mathematical and Statistical Methods (2nd ed.). Boca Raton ; London: CRC Press. pp. 277– 279. ISBN 978-1-032-48868-4 . ^ a b Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution", Annals of Statistics , 3 (1): 209– 218, doi : 10.1214/aos/1176343009 , MR 0381064 , Zbl 0314.62005 ^ Brown, L. D. (1966), "On the admissibility of invariant estimators of one or more location parameters", Annals of Mathematical Statistics , 37 (5): 1087– 1136, doi : 10.1214/aoms/1177699259 , MR 0216647 , Zbl 0156.39401 Judge, George G.; Bock, M. E. (1978). The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics . New York: North Holland. pp. 229– 257. ISBN 0-7204-0729-X .

Markdown

[Jump to content](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#bodyContent) Main menu Main menu move to sidebar hide Navigation - [Main page](https://en.wikipedia.org/wiki/Main_Page "Visit the main page [z]") - [Contents](https://en.wikipedia.org/wiki/Wikipedia:Contents "Guides to browsing Wikipedia") - [Current events](https://en.wikipedia.org/wiki/Portal:Current_events "Articles related to current events") - [Random article](https://en.wikipedia.org/wiki/Special:Random "Visit a randomly selected article [x]") - [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About "Learn about Wikipedia and how it works") - [Contact us](https://en.wikipedia.org/wiki/Wikipedia:Contact_us "How to contact Wikipedia") Contribute - [Help](https://en.wikipedia.org/wiki/Help:Contents "Guidance on how to use and edit Wikipedia") - [Learn to edit](https://en.wikipedia.org/wiki/Help:Introduction "Learn how to edit Wikipedia") - [Community portal](https://en.wikipedia.org/wiki/Wikipedia:Community_portal "The hub for editors") - [Recent changes](https://en.wikipedia.org/wiki/Special:RecentChanges "A list of recent changes to Wikipedia [r]") - [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_upload_wizard "Add images or other media for use on Wikipedia") - [Special pages](https://en.wikipedia.org/wiki/Special:SpecialPages "A list of all special pages [q]") [![](https://en.wikipedia.org/static/images/icons/enwiki-25.svg) ![Wikipedia](https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-en-25.svg) ![The Free Encyclopedia](https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-tagline-en-25.svg)](https://en.wikipedia.org/wiki/Main_Page) [Search](https://en.wikipedia.org/wiki/Special:Search "Search Wikipedia [f]") Appearance - [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en) - [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=James%E2%80%93Stein+estimator "You are encouraged to create an account and log in; however, it is not mandatory") - [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=James%E2%80%93Stein+estimator "You're encouraged to log in; however, it's not mandatory. [o]") Personal tools - [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en) - [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=James%E2%80%93Stein+estimator "You are encouraged to create an account and log in; however, it is not mandatory") - [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=James%E2%80%93Stein+estimator "You're encouraged to log in; however, it's not mandatory. [o]") ## Contents move to sidebar hide - [(Top)](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator) - [1 Setting](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Setting) - [2 Formulation](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Formulation) - [3 Interpretation](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Interpretation) - [4 Positive-part James–Stein shrinkage operator](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Positive-part_James%E2%80%93Stein_shrinkage_operator) - [5 Positive-part James–Stein shrinkage and model selection](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Positive-part_James%E2%80%93Stein_shrinkage_and_model_selection) - [6 Further extensions](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Further_extensions) - [7 See also](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#See_also) - [8 References](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#References) - [9 Further reading](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#Further_reading) Toggle the table of contents # James–Stein estimator 3 languages - [Deutsch](https://de.wikipedia.org/wiki/James-Stein-Sch%C3%A4tzer "James-Stein-Schätzer – German") - [Español](https://es.wikipedia.org/wiki/Estimador_de_James-Stein "Estimador de James-Stein – Spanish") - [Polski](https://pl.wikipedia.org/wiki/Estymator_Jamesa-Steina "Estymator Jamesa-Steina – Polish") [Edit links](https://www.wikidata.org/wiki/Special:EntityPage/Q6146297#sitelinks-wikipedia "Edit interlanguage links") - [Article](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator "View the content page [c]") - [Talk](https://en.wikipedia.org/wiki/Talk:James%E2%80%93Stein_estimator "Discuss improvements to the content page [t]") English - [Read](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator) - [Edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit "Edit this page [e]") - [View history](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=history "Past revisions of this page [h]") Tools Tools move to sidebar hide Actions - [Read](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator) - [Edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit "Edit this page [e]") - [View history](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=history) General - [What links here](https://en.wikipedia.org/wiki/Special:WhatLinksHere/James%E2%80%93Stein_estimator "List of all English Wikipedia pages containing links to this page [j]") - [Related changes](https://en.wikipedia.org/wiki/Special:RecentChangesLinked/James%E2%80%93Stein_estimator "Recent changes in pages linked from this page [k]") - [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_Upload_Wizard "Upload files [u]") - [Permanent link](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&oldid=1342138970 "Permanent link to this revision of this page") - [Page information](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=info "More information about this page") - [Cite this page](https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&page=James%E2%80%93Stein_estimator&id=1342138970&wpFormIdentifier=titleform "Information on how to cite this page") - [Get shortened URL](https://en.wikipedia.org/w/index.php?title=Special:UrlShortener&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJames%25E2%2580%2593Stein_estimator) Print/export - [Download as PDF](https://en.wikipedia.org/w/index.php?title=Special:DownloadAsPdf&page=James%E2%80%93Stein_estimator&action=show-download-screen "Download this page as a PDF file") - [Printable version](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&printable=yes "Printable version of this page [p]") In other projects - [Wikidata item](https://www.wikidata.org/wiki/Special:EntityPage/Q6146297 "Structured data on this page hosted by Wikidata [g]") Appearance move to sidebar hide From Wikipedia, the free encyclopedia Rule for estimating the mean of a dataset | | | |---|---| | ![](https://upload.wikimedia.org/wikipedia/en/thumb/f/f2/Edit-clear.svg/40px-Edit-clear.svg.png) | This article **may be too technical for most readers to understand**. Please [help improve it](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit) to [make it understandable to non-experts](https://en.wikipedia.org/wiki/Wikipedia:Make_technical_articles_understandable "Wikipedia:Make technical articles understandable"), without removing the technical details. *(November 2017)* *([Learn how and when to remove this message](https://en.wikipedia.org/wiki/Help:Maintenance_template_removal "Help:Maintenance template removal"))* | The **James–Stein estimator** is an [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of the [mean](https://en.wikipedia.org/wiki/Mean "Mean") θ := ( θ 1 , θ 2 , … θ m ) {\\displaystyle {\\boldsymbol {\\theta }}:=(\\theta \_{1},\\theta \_{2},\\dots \\theta \_{m})} ![{\\displaystyle {\\boldsymbol {\\theta }}:=(\\theta \_{1},\\theta \_{2},\\dots \\theta \_{m})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ea925cd9dce005c92b0c86c343e8b005ecf8a3f3) for a multivariate [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") Y := ( Y 1 , Y 2 , … Y m ) {\\displaystyle {\\boldsymbol {Y}}:=(Y\_{1},Y\_{2},\\dots Y\_{m})} ![{\\displaystyle {\\boldsymbol {Y}}:=(Y\_{1},Y\_{2},\\dots Y\_{m})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c7d304d83650977659f023b20fae116a55c065b0). It arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) when [Charles Stein](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") reached a relatively shocking conclusion that while the then-usual estimate of the mean, the [sample mean](https://en.wikipedia.org/wiki/Sample_mean "Sample mean"), is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when m ≤ 2 {\\displaystyle m\\leq 2} ![{\\displaystyle m\\leq 2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9ef46309a66e2cdf7737c7246afd8e78e3052f3d), it is [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when m ≥ 3 {\\displaystyle m\\geq 3} ![{\\displaystyle m\\geq 3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4610f29d2708d1febf1c0090f58ddd3986593545). Stein proposed a possible improvement to the estimator that [shrinks](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") the sample mean θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) towards a more central mean vector ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) (which can be chosen [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori") or commonly as the "average of averages" of the sample means, given all samples share the same size). This observation is commonly referred to as [Stein's example or paradox](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example"). In 1961, [Willard James](https://en.wikipedia.org/wiki/Willard_D._James "Willard D. James") and Charles Stein simplified the original process.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2) It can be shown that the James–Stein estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule") the "ordinary" [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") approach in the sense that the James–Stein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") than the "ordinary" least squares estimator for all θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf). This is possible because the James–Stein estimator is [biased](https://en.wikipedia.org/wiki/Bias_of_an_estimator "Bias of an estimator"), so that the [Gauss–Markov theorem](https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem "Gauss–Markov theorem") does not apply. Similar to the [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator"), the James-Stein estimator is [superefficient](https://en.wikipedia.org/w/index.php?title=Superefficient&action=edit&redlink=1 "Superefficient (page does not exist)") and [non-regular](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator") at θ \= 0 {\\displaystyle {\\boldsymbol {\\theta }}=\\mathbf {0} } ![{\\displaystyle {\\boldsymbol {\\theta }}=\\mathbf {0} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/3278daf129fb6d34f19a067b7fd6b7a90cee5aeb).[\[3\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-3) ## Setting \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=1 "Edit section: Setting")\] Let Y ∼ N m ( θ , σ 2 I ) , {\\displaystyle {\\mathbf {Y} }\\sim N\_{m}({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,} ![{\\displaystyle {\\mathbf {Y} }\\sim N\_{m}({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9f91e318ab7b67f7cf60d538cc31280dde768387)where the vector θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) is the unknown [mean](https://en.wikipedia.org/wiki/Expected_value "Expected value") of Y {\\displaystyle {\\mathbf {Y} }} ![{\\displaystyle {\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7020214a70ec832bbdd74738ec96ca9989a695e6), which is [m {\\displaystyle m} ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc)\-variate normally distributed](https://en.wikipedia.org/wiki/Multivariate_normal_distribution "Multivariate normal distribution") and with known [covariance matrix](https://en.wikipedia.org/wiki/Covariance_matrix "Covariance matrix") σ 2 I {\\displaystyle \\sigma ^{2}I} ![{\\displaystyle \\sigma ^{2}I}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6e2975cf33cb590e842a2c26a906f0949af795ff). We are interested in obtaining an estimate, θ ^ {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42013234f6f8ffb7d8cc4268ec825db065113510), of θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf), based on a single observation, y {\\displaystyle {\\mathbf {y} }} ![{\\displaystyle {\\mathbf {y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c4ce8e1f631864c8e56c48f7861efef6666236d9), of Y {\\displaystyle {\\mathbf {Y} }} ![{\\displaystyle {\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7020214a70ec832bbdd74738ec96ca9989a695e6). In real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent [Gaussian noise](https://en.wikipedia.org/wiki/Gaussian_noise "Gaussian noise"). Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") estimator, which is θ ^ L S \= Y {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}={\\mathbf {Y} }} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}={\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/83ee2a6a995c8a02c8f565486b778e1e70bd2fac). Stein demonstrated that in terms of [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") E ⁡ \[ ‖ θ − θ ^ ‖ 2 \] {\\displaystyle \\operatorname {E} \\left\[\\left\\\|{\\boldsymbol {\\theta }}-{\\widehat {\\boldsymbol {\\theta }}}\\right\\\|^{2}\\right\]} ![{\\displaystyle \\operatorname {E} \\left\[\\left\\\|{\\boldsymbol {\\theta }}-{\\widehat {\\boldsymbol {\\theta }}}\\right\\\|^{2}\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bdd1163606c619b36e0e0d2310a54e10903e7234), the least squares estimator, θ ^ L S {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d), is sub-optimal to shrinkage based estimators, such as the **James–Stein estimator**, θ ^ J S {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/805e2127008d46c29a7952101fada0840f6fe252).[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) The paradoxical result, that there is a (possibly) better and never any worse estimate of θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) in mean squared error as compared to the sample mean, became known as [Stein's example](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example"). ## Formulation \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=2 "Edit section: Formulation")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/MSE_of_ML_vs_JS.png/500px-MSE_of_ML_vs_JS.png)](https://en.wikipedia.org/wiki/File:MSE_of_ML_vs_JS.png) MSE (R) of least squares estimator (ML) vs. James–Stein estimator (JS). The James–Stein estimator gives its best estimate when the norm of the actual parameter vector θ is near zero. If σ 2 {\\displaystyle \\sigma ^{2}} ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known, the James–Stein estimator is given by θ ^ J S \= ( 1 − ( m − 2 ) σ 2 ‖ Y ‖ 2 ) Y . {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }\\\|^{2}}}\\right){\\mathbf {Y} }.} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }\\\|^{2}}}\\right){\\mathbf {Y} }.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/abf3aef7dbf3a1cd7401484e59500b5504961d65) James and Stein showed that the above estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule") θ ^ L S {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d) for any m ≥ 3 {\\displaystyle m\\geq 3} ![{\\displaystyle m\\geq 3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4610f29d2708d1febf1c0090f58ddd3986593545), meaning that the James–Stein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") (MSE) than the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) By definition, this makes the least squares estimator [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when m ≥ 3 {\\displaystyle m\\geq 3} ![{\\displaystyle m\\geq 3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4610f29d2708d1febf1c0090f58ddd3986593545). Notice that if ( m − 2 ) σ 2 \< ‖ Y ‖ 2 {\\displaystyle (m-2)\\sigma ^{2}\<\\\|{\\mathbf {Y} }\\\|^{2}} ![{\\displaystyle (m-2)\\sigma ^{2}\<\\\|{\\mathbf {Y} }\\\|^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/791e4ba2fdd291a31cd1e0da5f16826c143f55c1) then this estimator simply takes the natural estimator Y {\\displaystyle \\mathbf {Y} } ![{\\displaystyle \\mathbf {Y} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/c92a7716a99fadda050469747fce1e475e0ec549) and shrinks it towards the origin **0**. In fact this is not the only direction of [shrinkage](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") that works. Let ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) be an arbitrary fixed vector of dimension m {\\displaystyle m} ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc). Then there exists an estimator of the James–Stein type that shrinks toward ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160), namely θ ^ J S \= ( 1 − ( m − 2 ) σ 2 ‖ Y − ν ‖ 2 ) ( Y − ν ) \+ ν , m ≥ 3\. {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)({\\mathbf {Y} }-{\\boldsymbol {\\nu }})+{\\boldsymbol {\\nu }},\\qquad m\\geq 3.} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)({\\mathbf {Y} }-{\\boldsymbol {\\nu }})+{\\boldsymbol {\\nu }},\\qquad m\\geq 3.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fda106b6c468638610f33348f5c7549428c9810f) The James–Stein estimator dominates the usual estimator for any ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160). A natural question to ask is whether the improvement over the usual estimator is independent of the choice of ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160). The answer is no. The improvement is small if ‖ θ − ν ‖ {\\displaystyle \\\|{{\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}}\\\|} ![{\\displaystyle \\\|{{\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}}\\\|}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c4f285d34fe737275b0d397ec39b887f4fdabd4c) is large. Thus to get a very great improvement some knowledge of the location of θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori"). But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that *any* finite guess ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160), surely a poor guess. ## Interpretation \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=3 "Edit section: Interpretation")\] Seeing the James–Stein estimator as an [empirical Bayes method](https://en.wikipedia.org/wiki/Empirical_Bayes_method "Empirical Bayes method") gives some intuition to this result: One assumes that θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) itself is a random variable with [prior distribution](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") ∼ N ( 0 , A ) {\\displaystyle \\sim N(0,A)} ![{\\displaystyle \\sim N(0,A)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/70b48c6623236065e6e4d8940bd524e756c7b641), where A {\\displaystyle A} ![{\\displaystyle A}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7daff47fa58cdfd29dc333def748ff5fa4c923e3) is estimated from the data itself. Estimating A {\\displaystyle A} ![{\\displaystyle A}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7daff47fa58cdfd29dc333def748ff5fa4c923e3) only gives an advantage compared to the [maximum-likelihood estimator](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") when the dimension m {\\displaystyle m} ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) is large enough; hence it does not work for m ≤ 2 {\\displaystyle m\\leq 2} ![{\\displaystyle m\\leq 2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9ef46309a66e2cdf7737c7246afd8e78e3052f3d). The James–Stein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.[\[5\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-5) A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James–Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule"). A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The James–Stein estimator always improves upon the *total* MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the James–Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James–Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator. The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a [telecommunication](https://en.wikipedia.org/wiki/Telecommunication "Telecommunication") setting, it is reasonable to combine [channel](https://en.wikipedia.org/wiki/Communication_channel "Communication channel") tap measurements in a [channel estimation](https://en.wikipedia.org/wiki/Channel_estimation "Channel estimation") scenario, as the goal is to minimize the total channel estimation error. The James–Stein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the [entropic uncertainty principle](https://en.wikipedia.org/wiki/Entropic_uncertainty_principle "Entropic uncertainty principle") for more than three measurements.[\[6\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stander-17-6) An intuitive derivation and interpretation is given by the [Galtonian](https://en.wikipedia.org/wiki/Francis_Galton "Francis Galton") perspective.[\[7\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-7) Under this interpretation, we aim to predict the population means using the [imperfectly measured sample means](https://en.wikipedia.org/wiki/Measurement_error_model "Measurement error model"). The equation of the [OLS](https://en.wikipedia.org/wiki/Ordinary_least_squares "Ordinary least squares") estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the James–Stein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary). ## Positive-part James–Stein shrinkage operator \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=4 "Edit section: Positive-part James–Stein shrinkage operator")\] Despite the intuition that the James–Stein estimator shrinks the unbiased least-squares estimator Y {\\displaystyle {\\mathbf {Y} }} ![{\\displaystyle {\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7020214a70ec832bbdd74738ec96ca9989a695e6) *toward* ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160), the estimator actually moves *away* from ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) for small values of ‖ Y − ν ‖ , {\\displaystyle \\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|,} ![{\\displaystyle \\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/65e1a7e674ea9579de967e045b596907ae5fde77) as the multiplier on Y − ν {\\displaystyle {\\mathbf {Y} }-{\\boldsymbol {\\nu }}} ![{\\displaystyle {\\mathbf {Y} }-{\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6fddab8608ca8b09eeef298a960defea57c78395) is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the *positive-part James-Stein shrinkage operator*: S λ ( x ) \= x \[ 1 − ( λ / x ) 2 \] \+ , {\\displaystyle S\_{\\lambda }(x)=x\\left\[1-\\left(\\lambda /x\\right)^{2}\\right\]\_{+},} ![{\\displaystyle S\_{\\lambda }(x)=x\\left\[1-\\left(\\lambda /x\\right)^{2}\\right\]\_{+},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682ea45fcbaf678f56319df06a3029306a5e5e51) where x \+ \= max { 0 , x } {\\displaystyle x\_{+}=\\max\\{0,x\\}} ![{\\displaystyle x\_{+}=\\max\\{0,x\\}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5a5cf1f9bd0134bd7a9e50fa8b4e96acbbd05e32), and apply this operator component-wise to the (unbiased) least-squares estimator of θ − ν {\\displaystyle {\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad81208959563b1216bdeb1e64c895fe4c1fddf4) (with known ν {\\displaystyle {\\boldsymbol {\\nu }}} ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160)) for each i \= 1 , … , m {\\displaystyle i=1,\\ldots ,m} ![{\\displaystyle i=1,\\ldots ,m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/74690f54a3c93a332ecb2935e900178b9a555483): θ ^ i \+ − ν i \= S λ i ( Y i − ν i ) , λ i := σ m − 2 \| Y i − ν i \| ‖ Y − ν ‖ . {\\displaystyle {\\widehat {\\theta }}\_{i}^{+}-\\nu \_{i}=S\_{\\lambda \_{i}}(Y\_{i}-\\nu \_{i}),\\quad \\lambda \_{i}:=\\sigma {\\sqrt {m-2}}\\,{\\frac {\|Y\_{i}-\\nu \_{i}\|}{\\\|\\mathbf {Y} -{\\boldsymbol {\\nu }}\\\|}}.} ![{\\displaystyle {\\widehat {\\theta }}\_{i}^{+}-\\nu \_{i}=S\_{\\lambda \_{i}}(Y\_{i}-\\nu \_{i}),\\quad \\lambda \_{i}:=\\sigma {\\sqrt {m-2}}\\,{\\frac {\|Y\_{i}-\\nu \_{i}\|}{\\\|\\mathbf {Y} -{\\boldsymbol {\\nu }}\\\|}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/26dfce05be8073ab053938126bd30ae4555cad60) The resulting estimator θ ^ \+ {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7f7f54f4ecec860db59a9850ffa278bda136898d) of θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) is called the *positive-part James–Stein estimator* and can be written in vector notation as: θ ^ \+ − ν \= ( 1 − ( m − 2 ) σ 2 ‖ Y − ν ‖ 2 ) \+ ( Y − ν ) . {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}-{\\boldsymbol {\\nu }}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)\_{+}({\\mathbf {Y} }-{\\boldsymbol {\\nu }}).} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}-{\\boldsymbol {\\nu }}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)\_{+}({\\mathbf {Y} }-{\\boldsymbol {\\nu }}).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f8dada56b5049df33fce01cf1d6f9ddbdab23eb3) This estimator has a smaller risk than the basic James–Stein estimator for m ≥ 4 {\\displaystyle m\\geq 4} ![{\\displaystyle m\\geq 4}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e9ba367983142680a2456aa5ae1f61e21e1aa186). It follows that the basic James–Stein estimator is itself [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule").[\[8\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-Anderson-84-8) It turns out, however, that the positive-part estimator is also inadmissible.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) This follows from a more general result which requires admissible estimators to be smooth. ## Positive-part James–Stein shrinkage and model selection \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=5 "Edit section: Positive-part James–Stein shrinkage and model selection")\] Recall the initial setup: Y ∼ N ( θ , σ 2 I ) , {\\displaystyle {\\mathbf {Y} }\\sim N({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,} ![{\\displaystyle {\\mathbf {Y} }\\sim N({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b19ee31bd76ccbb7a75983bad82ac0421914d40a) where the variance coefficient σ 2 {\\displaystyle \\sigma ^{2}} ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known and we wish to estimate the unknown (mean response) coefficient θ \= E Y {\\displaystyle {\\boldsymbol {\\theta }}=\\mathbb {E} \\mathbf {Y} } ![{\\displaystyle {\\boldsymbol {\\theta }}=\\mathbb {E} \\mathbf {Y} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/a9b78572d3e5dd5cb8362412e27f57d1a5241fdd). In the more general setting of [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression"), the mean response is instead given by E Y \= X θ , {\\displaystyle \\mathbb {E} \\mathbf {Y} =\\mathbf {X} {\\boldsymbol {\\theta }},} ![{\\displaystyle \\mathbb {E} \\mathbf {Y} =\\mathbf {X} {\\boldsymbol {\\theta }},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a9b27e84b5235faaa4c502229ce10644dbba2ac5) where X \= \[ v 1 , … , v m \] {\\displaystyle \\mathbf {X} =\[\\mathbf {v} \_{1},\\ldots ,\\mathbf {v} \_{m}\]} ![{\\displaystyle \\mathbf {X} =\[\\mathbf {v} \_{1},\\ldots ,\\mathbf {v} \_{m}\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0ad1c826b3a19756dc60523573d2d769dd2315d6) is a matrix with m {\\displaystyle m} ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) columns. As in the previous section, we can use the *positive-part James-Stein shrinkage operator* to obtain a [shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator") of θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf). In particular, any θ ^ {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42013234f6f8ffb7d8cc4268ec825db065113510) that satisfies the *James-Stein [KKT conditions](https://en.wikipedia.org/wiki/KKT_conditions "KKT conditions")*:[\[9\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-:0-9) θ ^ i \= S σ ‖ v i ‖ ( θ ^ i \+ v i ⊤ ( Y − X θ ^ ) ‖ v i ‖ 2 ) , i \= 1 , … , m {\\displaystyle {\\hat {\\theta }}\_{i}=S\_{\\frac {\\sigma }{\\\|\\mathbf {v} \_{i}\\\|}}{\\bigg (}{\\hat {\\theta }}\_{i}+{\\frac {\\mathbf {v} \_{i}^{\\top }(\\mathbf {Y} -\\mathbf {X} {\\widehat {\\boldsymbol {\\theta }}})}{\\\|\\mathbf {v} \_{i}\\\|^{2}}}{\\bigg )},\\quad i=1,\\ldots ,m} ![{\\displaystyle {\\hat {\\theta }}\_{i}=S\_{\\frac {\\sigma }{\\\|\\mathbf {v} \_{i}\\\|}}{\\bigg (}{\\hat {\\theta }}\_{i}+{\\frac {\\mathbf {v} \_{i}^{\\top }(\\mathbf {Y} -\\mathbf {X} {\\widehat {\\boldsymbol {\\theta }}})}{\\\|\\mathbf {v} \_{i}\\\|^{2}}}{\\bigg )},\\quad i=1,\\ldots ,m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/936cbd3895bad590cedb7d6a87a98a32fb35ce2a) is a (positive-part) James-Stein estimator of θ {\\displaystyle {\\boldsymbol {\\theta }}} ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) with the useful property that it performs both shrinkage and [model selection](https://en.wikipedia.org/wiki/Model_selection "Model selection") simultaneously. This is because, depending on the value of the known σ 2 {\\displaystyle \\sigma ^{2}} ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5), there is a (possibly empty) set S ⊆ { 1 , … , m } {\\displaystyle {\\mathcal {S}}\\subseteq \\{1,\\ldots ,m\\}} ![{\\displaystyle {\\mathcal {S}}\\subseteq \\{1,\\ldots ,m\\}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/720aba8454507e0c7eded393c9d045ff1417c5ce) such that θ ^ i \= 0 , i ∈ S . {\\displaystyle {\\hat {\\theta }}\_{i}=0,\\quad i\\in {\\mathcal {S}}.} ![{\\displaystyle {\\hat {\\theta }}\_{i}=0,\\quad i\\in {\\mathcal {S}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2192ecfb09e8f50a2d6a6d23f9fd842e4fbe6321) In other words, some (or all) of the θ i {\\displaystyle \\theta \_{i}} ![{\\displaystyle \\theta \_{i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/302b19204ed378e99ff4575341a67eebdbe5a555) could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model. ## Further extensions \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=6 "Edit section: Further extensions")\] The James–Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the "ordinary" or least squares estimator is often [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") for simultaneous estimation of several parameters. This effect has been called [Stein's phenomenon](https://en.wikipedia.org/wiki/Stein%27s_phenomenon "Stein's phenomenon"), and has been demonstrated for several different problem settings, some of which are briefly outlined below. - James and Stein demonstrated that the estimator presented above can still be used when the variance σ 2 {\\displaystyle \\sigma ^{2}} ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5) is unknown, by replacing it with the standard estimator of the variance, σ ^ 2 \= 1 m ∑ ( Y i − Y ¯ ) 2 {\\displaystyle {\\widehat {\\sigma }}^{2}={\\frac {1}{m}}\\sum (Y\_{i}-{\\overline {Y}})^{2}} ![{\\displaystyle {\\widehat {\\sigma }}^{2}={\\frac {1}{m}}\\sum (Y\_{i}-{\\overline {Y}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/950af315e27d5b7a8dcb8a34f12250702441a8ad) . The dominance result still holds under the same condition, namely, m \> 2 {\\displaystyle m\>2} ![{\\displaystyle m\>2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/44e4ce1c04edd8f9602e60f0ec4457b7ac12fcd4) .[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2) - All the results above are for the case when only a single observation vector **y** is available. For the more general case when n {\\displaystyle n} ![{\\displaystyle n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a601995d55609f2d9f5e233e36fbe9ea26011b3b) vectors are available, we consider the estimator θ ^ J S \= ( 1 − ( m − 2 ) σ 2 n ‖ Y ¯ ‖ 2 ) Y ¯ , {\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2){\\frac {\\sigma ^{2}}{n}}}{\\\|{\\overline {\\mathbf {Y} }}\\\|^{2}}}\\right){\\overline {\\mathbf {Y} }},} ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2){\\frac {\\sigma ^{2}}{n}}}{\\\|{\\overline {\\mathbf {Y} }}\\\|^{2}}}\\right){\\overline {\\mathbf {Y} }},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a97236e07a0088f3d9a1b8adc410e9e6de0fd599) where Y ¯ {\\displaystyle {\\overline {\\mathbf {Y} }}} ![{\\displaystyle {\\overline {\\mathbf {Y} }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9415319ff8cfc6377fe06bf6ae88700802497c3c) is the m {\\displaystyle m} ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) \-length average of the n {\\displaystyle n} ![{\\displaystyle n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a601995d55609f2d9f5e233e36fbe9ea26011b3b) observations, so that, Y ¯ ∼ N m ( θ , σ 2 n I ) {\\displaystyle {\\overline {\\mathbf {Y} }}\\sim N\_{m}{\\Big (}{\\boldsymbol {\\theta }},{\\frac {\\sigma ^{2}}{n}}I{\\Big )}} ![{\\displaystyle {\\overline {\\mathbf {Y} }}\\sim N\_{m}{\\Big (}{\\boldsymbol {\\theta }},{\\frac {\\sigma ^{2}}{n}}I{\\Big )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e72554ac36054093a779d821974a6bbedec2d3df) . - The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10) A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression") technique which outperforms the standard application of the LS estimator.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10) - Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.[\[11\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-brown66-11) It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) ## See also \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=7 "Edit section: See also")\] - [Admissible decision rule](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") - [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator") - [Shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator") - [Regular estimator](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator") - [KL divergence](https://en.wikipedia.org/wiki/KL_divergence "KL divergence") ## References \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=8 "Edit section: References")\] 1. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-1) [Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution", [*Proc. Third Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200501656), vol. 1, pp. 197–206, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0084922](https://mathscinet.ams.org/mathscinet-getitem?mr=0084922), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0073\.35602](https://zbmath.org/?format=complete&q=an:0073.35602) 2. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-2) James, W.; [Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1961), "Estimation with quadratic loss", [*Proc. Fourth Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200512173), vol. 1, pp. 361–379, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0133191](https://mathscinet.ams.org/mathscinet-getitem?mr=0133191) 3. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-3)** Beran, R. (1995). THE ROLE OF HAJEK’S CONVOLUTION THEOREM IN STATISTICAL THEORY 4. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-2) Lehmann, E. L.; Casella, G. (1998), *Theory of Point Estimation* (2nd ed.), New York: Springer 5. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-5)** Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach". *Journal of the American Statistical Association*. **68** (341). American Statistical Association: 117–130\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/2284155](https://doi.org/10.2307%2F2284155). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2284155](https://www.jstor.org/stable/2284155). 6. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stander-17_6-0)** Stander, M. (2017), *Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.02440](https://arxiv.org/abs/1702.02440), [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2017arXiv170202440S](https://ui.adsabs.harvard.edu/abs/2017arXiv170202440S) 7. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-7)** Stigler, Stephen M. (1990-02-01). ["The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators"](https://doi.org/10.1214%2Fss%2F1177012274). *Statistical Science*. **5** (1). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/ss/1177012274](https://doi.org/10.1214%2Fss%2F1177012274). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0883-4237](https://search.worldcat.org/issn/0883-4237). 8. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-Anderson-84_8-0)** Anderson, T. W. (1984), *An Introduction to Multivariate Statistical Analysis* (2nd ed.), New York: John Wiley & Sons 9. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-:0_9-0)** Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025). *Data Science and Machine Learning: Mathematical and Statistical Methods* (2nd ed.). Boca Raton ; London: CRC Press. pp. 277–279\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-032-48868-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-032-48868-4 "Special:BookSources/978-1-032-48868-4") . 10. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-1) Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution", *[Annals of Statistics](https://en.wikipedia.org/wiki/Annals_of_Statistics "Annals of Statistics")*, **3** (1): 209–218, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aos/1176343009](https://doi.org/10.1214%2Faos%2F1176343009), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0381064](https://mathscinet.ams.org/mathscinet-getitem?mr=0381064), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0314\.62005](https://zbmath.org/?format=complete&q=an:0314.62005) 11. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-brown66_11-0)** [Brown, L. D.](https://en.wikipedia.org/wiki/Lawrence_D._Brown "Lawrence D. Brown") (1966), "On the admissibility of invariant estimators of one or more location parameters", *Annals of Mathematical Statistics*, **37** (5): 1087–1136, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aoms/1177699259](https://doi.org/10.1214%2Faoms%2F1177699259), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0216647](https://mathscinet.ams.org/mathscinet-getitem?mr=0216647), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0156\.39401](https://zbmath.org/?format=complete&q=an:0156.39401) ## Further reading \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=9 "Edit section: Further reading")\] - Judge, George G.; Bock, M. E. (1978). *The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics*. New York: North Holland. pp. 229–257\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [0-7204-0729-X](https://en.wikipedia.org/wiki/Special:BookSources/0-7204-0729-X "Special:BookSources/0-7204-0729-X") . ![](https://en.wikipedia.org/wiki/Special:CentralAutoLogin/start?useformat=desktop&type=1x1&usesul3=1) Retrieved from "<https://en.wikipedia.org/w/index.php?title=James–Stein_estimator&oldid=1342138970>" [Categories](https://en.wikipedia.org/wiki/Help:Category "Help:Category"): - [Estimator](https://en.wikipedia.org/wiki/Category:Estimator "Category:Estimator") - [Normal distribution](https://en.wikipedia.org/wiki/Category:Normal_distribution "Category:Normal distribution") Hidden categories: - [Articles with short description](https://en.wikipedia.org/wiki/Category:Articles_with_short_description "Category:Articles with short description") - [Short description is different from Wikidata](https://en.wikipedia.org/wiki/Category:Short_description_is_different_from_Wikidata "Category:Short description is different from Wikidata") - [Wikipedia articles that are too technical from November 2017](https://en.wikipedia.org/wiki/Category:Wikipedia_articles_that_are_too_technical_from_November_2017 "Category:Wikipedia articles that are too technical from November 2017") - [All articles that are too technical](https://en.wikipedia.org/wiki/Category:All_articles_that_are_too_technical "Category:All articles that are too technical") - This page was last edited on 7 March 2026, at 06:50 (UTC). - Text is available under the [Creative Commons Attribution-ShareAlike 4.0 License](https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License "Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License"); additional terms may apply. By using this site, you agree to the [Terms of Use](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Terms_of_Use "foundation:Special:MyLanguage/Policy:Terms of Use") and [Privacy Policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy "foundation:Special:MyLanguage/Policy:Privacy policy"). Wikipedia® is a registered trademark of the [Wikimedia Foundation, Inc.](https://wikimediafoundation.org/), a non-profit organization. - [Privacy policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy) - [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About) - [Disclaimers](https://en.wikipedia.org/wiki/Wikipedia:General_disclaimer) - [Contact Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Contact_us) - [Legal & safety contacts](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Legal:Wikimedia_Foundation_Legal_and_Safety_Contact_Information) - [Code of Conduct](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Universal_Code_of_Conduct) - [Developers](https://developer.wikimedia.org/) - [Statistics](https://stats.wikimedia.org/#/en.wikipedia.org) - [Cookie statement](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Cookie_statement) - [Mobile view](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&mobileaction=toggle_view_mobile) - [![Wikimedia Foundation](https://en.wikipedia.org/static/images/footer/wikimedia.svg)](https://www.wikimedia.org/) - [![Powered by MediaWiki](https://en.wikipedia.org/w/resources/assets/mediawiki_compact.svg)](https://www.mediawiki.org/) Search Toggle the table of contents James–Stein estimator 3 languages [Add topic](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator)

Readable Markdown

From Wikipedia, the free encyclopedia The **James–Stein estimator** is an [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of the [mean](https://en.wikipedia.org/wiki/Mean "Mean") ![{\\displaystyle {\\boldsymbol {\\theta }}:=(\\theta \_{1},\\theta \_{2},\\dots \\theta \_{m})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ea925cd9dce005c92b0c86c343e8b005ecf8a3f3) for a multivariate [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") ![{\\displaystyle {\\boldsymbol {Y}}:=(Y\_{1},Y\_{2},\\dots Y\_{m})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c7d304d83650977659f023b20fae116a55c065b0). It arose sequentially in two main published papers. The earlier version of the estimator was developed in 1956,[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) when [Charles Stein](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") reached a relatively shocking conclusion that while the then-usual estimate of the mean, the [sample mean](https://en.wikipedia.org/wiki/Sample_mean "Sample mean"), is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when ![{\\displaystyle m\\leq 2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9ef46309a66e2cdf7737c7246afd8e78e3052f3d), it is [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when ![{\\displaystyle m\\geq 3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4610f29d2708d1febf1c0090f58ddd3986593545). Stein proposed a possible improvement to the estimator that [shrinks](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") the sample mean ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) towards a more central mean vector ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) (which can be chosen [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori") or commonly as the "average of averages" of the sample means, given all samples share the same size). This observation is commonly referred to as [Stein's example or paradox](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example"). In 1961, [Willard James](https://en.wikipedia.org/wiki/Willard_D._James "Willard D. James") and Charles Stein simplified the original process.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2) It can be shown that the James–Stein estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule") the "ordinary" [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") approach in the sense that the James–Stein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") than the "ordinary" least squares estimator for all ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf). This is possible because the James–Stein estimator is [biased](https://en.wikipedia.org/wiki/Bias_of_an_estimator "Bias of an estimator"), so that the [Gauss–Markov theorem](https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem "Gauss–Markov theorem") does not apply. Similar to the [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator"), the James-Stein estimator is [superefficient](https://en.wikipedia.org/w/index.php?title=Superefficient&action=edit&redlink=1 "Superefficient (page does not exist)") and [non-regular](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator") at ![{\\displaystyle {\\boldsymbol {\\theta }}=\\mathbf {0} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/3278daf129fb6d34f19a067b7fd6b7a90cee5aeb).[\[3\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-3) Let ![{\\displaystyle {\\mathbf {Y} }\\sim N\_{m}({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9f91e318ab7b67f7cf60d538cc31280dde768387)where the vector ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) is the unknown [mean](https://en.wikipedia.org/wiki/Expected_value "Expected value") of ![{\\displaystyle {\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7020214a70ec832bbdd74738ec96ca9989a695e6), which is [![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc)\-variate normally distributed](https://en.wikipedia.org/wiki/Multivariate_normal_distribution "Multivariate normal distribution") and with known [covariance matrix](https://en.wikipedia.org/wiki/Covariance_matrix "Covariance matrix") ![{\\displaystyle \\sigma ^{2}I}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6e2975cf33cb590e842a2c26a906f0949af795ff). We are interested in obtaining an estimate, ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42013234f6f8ffb7d8cc4268ec825db065113510), of ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf), based on a single observation, ![{\\displaystyle {\\mathbf {y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c4ce8e1f631864c8e56c48f7861efef6666236d9), of ![{\\displaystyle {\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7020214a70ec832bbdd74738ec96ca9989a695e6). In real-world application, this is a common situation in which a set of parameters is sampled, and the samples are corrupted by independent [Gaussian noise](https://en.wikipedia.org/wiki/Gaussian_noise "Gaussian noise"). Since this noise has mean of zero, it may be reasonable to use the samples themselves as an estimate of the parameters. This approach is the [least squares](https://en.wikipedia.org/wiki/Least_squares "Least squares") estimator, which is ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}={\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/83ee2a6a995c8a02c8f565486b778e1e70bd2fac). Stein demonstrated that in terms of [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") ![{\\displaystyle \\operatorname {E} \\left\[\\left\\\|{\\boldsymbol {\\theta }}-{\\widehat {\\boldsymbol {\\theta }}}\\right\\\|^{2}\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bdd1163606c619b36e0e0d2310a54e10903e7234), the least squares estimator, ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d), is sub-optimal to shrinkage based estimators, such as the **James–Stein estimator**, ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/805e2127008d46c29a7952101fada0840f6fe252).[\[1\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stein-56-1) The paradoxical result, that there is a (possibly) better and never any worse estimate of ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) in mean squared error as compared to the sample mean, became known as [Stein's example](https://en.wikipedia.org/wiki/Stein%27s_example "Stein's example"). [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/MSE_of_ML_vs_JS.png/500px-MSE_of_ML_vs_JS.png)](https://en.wikipedia.org/wiki/File:MSE_of_ML_vs_JS.png) MSE (R) of least squares estimator (ML) vs. James–Stein estimator (JS). The James–Stein estimator gives its best estimate when the norm of the actual parameter vector θ is near zero. If ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known, the James–Stein estimator is given by ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }\\\|^{2}}}\\right){\\mathbf {Y} }.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/abf3aef7dbf3a1cd7401484e59500b5504961d65) James and Stein showed that the above estimator [dominates](https://en.wikipedia.org/wiki/Dominating_decision_rule "Dominating decision rule") ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{LS}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/58b39a6819b4e3e61fbcb24dd72a258cd0455d3d) for any ![{\\displaystyle m\\geq 3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4610f29d2708d1febf1c0090f58ddd3986593545), meaning that the James–Stein estimator has a lower [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error "Mean squared error") (MSE) than the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator.[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2)[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) By definition, this makes the least squares estimator [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") when ![{\\displaystyle m\\geq 3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4610f29d2708d1febf1c0090f58ddd3986593545). Notice that if ![{\\displaystyle (m-2)\\sigma ^{2}\<\\\|{\\mathbf {Y} }\\\|^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/791e4ba2fdd291a31cd1e0da5f16826c143f55c1) then this estimator simply takes the natural estimator ![{\\displaystyle \\mathbf {Y} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/c92a7716a99fadda050469747fce1e475e0ec549) and shrinks it towards the origin **0**. In fact this is not the only direction of [shrinkage](https://en.wikipedia.org/wiki/Shrinkage_\(statistics\) "Shrinkage (statistics)") that works. Let ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) be an arbitrary fixed vector of dimension ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc). Then there exists an estimator of the James–Stein type that shrinks toward ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160), namely ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)({\\mathbf {Y} }-{\\boldsymbol {\\nu }})+{\\boldsymbol {\\nu }},\\qquad m\\geq 3.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fda106b6c468638610f33348f5c7549428c9810f) The James–Stein estimator dominates the usual estimator for any ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160). A natural question to ask is whether the improvement over the usual estimator is independent of the choice of ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160). The answer is no. The improvement is small if ![{\\displaystyle \\\|{{\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}}\\\|}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c4f285d34fe737275b0d397ec39b887f4fdabd4c) is large. Thus to get a very great improvement some knowledge of the location of ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) is necessary. Of course this is the quantity we are trying to estimate so we don't have this knowledge [a priori](https://en.wikipedia.org/wiki/A_priori_and_a_posteriori "A priori and a posteriori"). But we may have some guess as to what the mean vector is. This can be considered a disadvantage of the estimator: the choice is not objective as it may depend on the beliefs of the researcher. Nonetheless, James and Stein's result is that *any* finite guess ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) improves the expected MSE over the maximum-likelihood estimator, which is tantamount to using an infinite ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160), surely a poor guess. Seeing the James–Stein estimator as an [empirical Bayes method](https://en.wikipedia.org/wiki/Empirical_Bayes_method "Empirical Bayes method") gives some intuition to this result: One assumes that ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) itself is a random variable with [prior distribution](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") ![{\\displaystyle \\sim N(0,A)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/70b48c6623236065e6e4d8940bd524e756c7b641), where ![{\\displaystyle A}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7daff47fa58cdfd29dc333def748ff5fa4c923e3) is estimated from the data itself. Estimating ![{\\displaystyle A}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7daff47fa58cdfd29dc333def748ff5fa4c923e3) only gives an advantage compared to the [maximum-likelihood estimator](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") when the dimension ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) is large enough; hence it does not work for ![{\\displaystyle m\\leq 2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9ef46309a66e2cdf7737c7246afd8e78e3052f3d). The James–Stein estimator is a member of a class of Bayesian estimators that dominate the maximum-likelihood estimator.[\[5\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-5) A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James–Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is [admissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule"). A quirky example would be estimating the speed of light, tea consumption in Taiwan, and hog weight in Montana, all together. The James–Stein estimator always improves upon the *total* MSE, i.e., the sum of the expected squared errors of each component. Therefore, the total MSE in measuring light speed, tea consumption, and hog weight would improve by using the James–Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James–Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator. The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a [telecommunication](https://en.wikipedia.org/wiki/Telecommunication "Telecommunication") setting, it is reasonable to combine [channel](https://en.wikipedia.org/wiki/Communication_channel "Communication channel") tap measurements in a [channel estimation](https://en.wikipedia.org/wiki/Channel_estimation "Channel estimation") scenario, as the goal is to minimize the total channel estimation error. The James–Stein estimator has also found use in fundamental quantum theory, where the estimator has been used to improve the theoretical bounds of the [entropic uncertainty principle](https://en.wikipedia.org/wiki/Entropic_uncertainty_principle "Entropic uncertainty principle") for more than three measurements.[\[6\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-stander-17-6) An intuitive derivation and interpretation is given by the [Galtonian](https://en.wikipedia.org/wiki/Francis_Galton "Francis Galton") perspective.[\[7\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-7) Under this interpretation, we aim to predict the population means using the [imperfectly measured sample means](https://en.wikipedia.org/wiki/Measurement_error_model "Measurement error model"). The equation of the [OLS](https://en.wikipedia.org/wiki/Ordinary_least_squares "Ordinary least squares") estimator in a hypothetical regression of the population means on the sample means gives an estimator of the form of either the James–Stein estimator (when we force the OLS intercept to equal 0) or of the Efron-Morris estimator (when we allow the intercept to vary). ## Positive-part James–Stein shrinkage operator \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=4 "Edit section: Positive-part James–Stein shrinkage operator")\] Despite the intuition that the James–Stein estimator shrinks the unbiased least-squares estimator ![{\\displaystyle {\\mathbf {Y} }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7020214a70ec832bbdd74738ec96ca9989a695e6) *toward* ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160), the estimator actually moves *away* from ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160) for small values of ![{\\displaystyle \\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/65e1a7e674ea9579de967e045b596907ae5fde77) as the multiplier on ![{\\displaystyle {\\mathbf {Y} }-{\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6fddab8608ca8b09eeef298a960defea57c78395) is then negative. This can be remedied by replacing this multiplier by zero when it is negative. To this end, define the *positive-part James-Stein shrinkage operator*: ![{\\displaystyle S\_{\\lambda }(x)=x\\left\[1-\\left(\\lambda /x\\right)^{2}\\right\]\_{+},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682ea45fcbaf678f56319df06a3029306a5e5e51) where ![{\\displaystyle x\_{+}=\\max\\{0,x\\}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5a5cf1f9bd0134bd7a9e50fa8b4e96acbbd05e32), and apply this operator component-wise to the (unbiased) least-squares estimator of ![{\\displaystyle {\\boldsymbol {\\theta }}-{\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad81208959563b1216bdeb1e64c895fe4c1fddf4) (with known ![{\\displaystyle {\\boldsymbol {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/57d030f5358ad62e959e7907ec1508746a563160)) for each ![{\\displaystyle i=1,\\ldots ,m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/74690f54a3c93a332ecb2935e900178b9a555483): ![{\\displaystyle {\\widehat {\\theta }}\_{i}^{+}-\\nu \_{i}=S\_{\\lambda \_{i}}(Y\_{i}-\\nu \_{i}),\\quad \\lambda \_{i}:=\\sigma {\\sqrt {m-2}}\\,{\\frac {\|Y\_{i}-\\nu \_{i}\|}{\\\|\\mathbf {Y} -{\\boldsymbol {\\nu }}\\\|}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/26dfce05be8073ab053938126bd30ae4555cad60) The resulting estimator ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7f7f54f4ecec860db59a9850ffa278bda136898d) of ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) is called the *positive-part James–Stein estimator* and can be written in vector notation as: ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}^{+}-{\\boldsymbol {\\nu }}=\\left(1-{\\frac {(m-2)\\sigma ^{2}}{\\\|{\\mathbf {Y} }-{\\boldsymbol {\\nu }}\\\|^{2}}}\\right)\_{+}({\\mathbf {Y} }-{\\boldsymbol {\\nu }}).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f8dada56b5049df33fce01cf1d6f9ddbdab23eb3) This estimator has a smaller risk than the basic James–Stein estimator for ![{\\displaystyle m\\geq 4}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e9ba367983142680a2456aa5ae1f61e21e1aa186). It follows that the basic James–Stein estimator is itself [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule").[\[8\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-Anderson-84-8) It turns out, however, that the positive-part estimator is also inadmissible.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) This follows from a more general result which requires admissible estimators to be smooth. ## Positive-part James–Stein shrinkage and model selection \[[edit](https://en.wikipedia.org/w/index.php?title=James%E2%80%93Stein_estimator&action=edit&section=5 "Edit section: Positive-part James–Stein shrinkage and model selection")\] Recall the initial setup: ![{\\displaystyle {\\mathbf {Y} }\\sim N({\\boldsymbol {\\theta }},\\sigma ^{2}I),\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b19ee31bd76ccbb7a75983bad82ac0421914d40a) where the variance coefficient ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5) is known and we wish to estimate the unknown (mean response) coefficient ![{\\displaystyle {\\boldsymbol {\\theta }}=\\mathbb {E} \\mathbf {Y} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/a9b78572d3e5dd5cb8362412e27f57d1a5241fdd). In the more general setting of [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression"), the mean response is instead given by ![{\\displaystyle \\mathbb {E} \\mathbf {Y} =\\mathbf {X} {\\boldsymbol {\\theta }},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a9b27e84b5235faaa4c502229ce10644dbba2ac5) where ![{\\displaystyle \\mathbf {X} =\[\\mathbf {v} \_{1},\\ldots ,\\mathbf {v} \_{m}\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0ad1c826b3a19756dc60523573d2d769dd2315d6) is a matrix with ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc) columns. As in the previous section, we can use the *positive-part James-Stein shrinkage operator* to obtain a [shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator") of ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf). In particular, any ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42013234f6f8ffb7d8cc4268ec825db065113510) that satisfies the *James-Stein [KKT conditions](https://en.wikipedia.org/wiki/KKT_conditions "KKT conditions")*:[\[9\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-:0-9) ![{\\displaystyle {\\hat {\\theta }}\_{i}=S\_{\\frac {\\sigma }{\\\|\\mathbf {v} \_{i}\\\|}}{\\bigg (}{\\hat {\\theta }}\_{i}+{\\frac {\\mathbf {v} \_{i}^{\\top }(\\mathbf {Y} -\\mathbf {X} {\\widehat {\\boldsymbol {\\theta }}})}{\\\|\\mathbf {v} \_{i}\\\|^{2}}}{\\bigg )},\\quad i=1,\\ldots ,m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/936cbd3895bad590cedb7d6a87a98a32fb35ce2a) is a (positive-part) James-Stein estimator of ![{\\displaystyle {\\boldsymbol {\\theta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/33b025a6bf54ec02e65c871dc3e5897c921419cf) with the useful property that it performs both shrinkage and [model selection](https://en.wikipedia.org/wiki/Model_selection "Model selection") simultaneously. This is because, depending on the value of the known ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5), there is a (possibly empty) set ![{\\displaystyle {\\mathcal {S}}\\subseteq \\{1,\\ldots ,m\\}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/720aba8454507e0c7eded393c9d045ff1417c5ce) such that ![{\\displaystyle {\\hat {\\theta }}\_{i}=0,\\quad i\\in {\\mathcal {S}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2192ecfb09e8f50a2d6a6d23f9fd842e4fbe6321) In other words, some (or all) of the ![{\\displaystyle \\theta \_{i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/302b19204ed378e99ff4575341a67eebdbe5a555) could be estimated as exactly zero, which is equivalent to the selection of a suitable linear regression model. The James–Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect; namely, the fact that the "ordinary" or least squares estimator is often [inadmissible](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") for simultaneous estimation of several parameters. This effect has been called [Stein's phenomenon](https://en.wikipedia.org/wiki/Stein%27s_phenomenon "Stein's phenomenon"), and has been demonstrated for several different problem settings, some of which are briefly outlined below. - James and Stein demonstrated that the estimator presented above can still be used when the variance ![{\\displaystyle \\sigma ^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5) is unknown, by replacing it with the standard estimator of the variance, ![{\\displaystyle {\\widehat {\\sigma }}^{2}={\\frac {1}{m}}\\sum (Y\_{i}-{\\overline {Y}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/950af315e27d5b7a8dcb8a34f12250702441a8ad). The dominance result still holds under the same condition, namely, ![{\\displaystyle m\>2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/44e4ce1c04edd8f9602e60f0ec4457b7ac12fcd4).[\[2\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-james%E2%80%93stein-61-2) - All the results above are for the case when only a single observation vector **y** is available. For the more general case when ![{\\displaystyle n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a601995d55609f2d9f5e233e36fbe9ea26011b3b) vectors are available, we consider the estimator ![{\\displaystyle {\\widehat {\\boldsymbol {\\theta }}}\_{JS}=\\left(1-{\\frac {(m-2){\\frac {\\sigma ^{2}}{n}}}{\\\|{\\overline {\\mathbf {Y} }}\\\|^{2}}}\\right){\\overline {\\mathbf {Y} }},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a97236e07a0088f3d9a1b8adc410e9e6de0fd599) where ![{\\displaystyle {\\overline {\\mathbf {Y} }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9415319ff8cfc6377fe06bf6ae88700802497c3c) is the ![{\\displaystyle m}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a07d98bb302f3856cbabc47b2b9016692e3f7bc)\-length average of the ![{\\displaystyle n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a601995d55609f2d9f5e233e36fbe9ea26011b3b) observations, so that, ![{\\displaystyle {\\overline {\\mathbf {Y} }}\\sim N\_{m}{\\Big (}{\\boldsymbol {\\theta }},{\\frac {\\sigma ^{2}}{n}}I{\\Big )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e72554ac36054093a779d821974a6bbedec2d3df). - The work of James and Stein has been extended to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10) A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a [linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression") technique which outperforms the standard application of the LS estimator.[\[10\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-bock75-10) - Stein's result has been extended to a wide class of distributions and loss functions. However, this theory provides only an existence result, in that explicit dominating estimators were not actually exhibited.[\[11\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-brown66-11) It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.[\[4\]](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_note-lehmann-casella-98-4) - [Admissible decision rule](https://en.wikipedia.org/wiki/Admissible_decision_rule "Admissible decision rule") - [Hodges' estimator](https://en.wikipedia.org/wiki/Hodges%27_estimator "Hodges' estimator") - [Shrinkage estimator](https://en.wikipedia.org/wiki/Shrinkage_estimator "Shrinkage estimator") - [Regular estimator](https://en.wikipedia.org/wiki/Regular_estimator "Regular estimator") - [KL divergence](https://en.wikipedia.org/wiki/KL_divergence "KL divergence") 1. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stein-56_1-1) [Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1956), "Inadmissibility of the usual estimator for the mean of a multivariate distribution", [*Proc. Third Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200501656), vol. 1, pp. 197–206, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0084922](https://mathscinet.ams.org/mathscinet-getitem?mr=0084922), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0073\.35602](https://zbmath.org/?format=complete&q=an:0073.35602) 2. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-james%E2%80%93stein-61_2-2) James, W.; [Stein, C.](https://en.wikipedia.org/wiki/Charles_Stein_\(statistician\) "Charles Stein (statistician)") (1961), "Estimation with quadratic loss", [*Proc. Fourth Berkeley Symp. Math. Statist. Prob.*](http://projecteuclid.org/euclid.bsmsp/1200512173), vol. 1, pp. 361–379, [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0133191](https://mathscinet.ams.org/mathscinet-getitem?mr=0133191) 3. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-3)** Beran, R. (1995). THE ROLE OF HAJEK’S CONVOLUTION THEOREM IN STATISTICAL THEORY 4. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-1) [***c***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-lehmann-casella-98_4-2) Lehmann, E. L.; Casella, G. (1998), *Theory of Point Estimation* (2nd ed.), New York: Springer 5. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-5)** Efron, B.; Morris, C. (1973). "Stein's Estimation Rule and Its Competitors—An Empirical Bayes Approach". *Journal of the American Statistical Association*. **68** (341). American Statistical Association: 117–130\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/2284155](https://doi.org/10.2307%2F2284155). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2284155](https://www.jstor.org/stable/2284155). 6. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-stander-17_6-0)** Stander, M. (2017), *Using Stein's estimator to correct the bound on the entropic uncertainty principle for more than two measurements*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.02440](https://arxiv.org/abs/1702.02440), [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2017arXiv170202440S](https://ui.adsabs.harvard.edu/abs/2017arXiv170202440S) 7. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-7)** Stigler, Stephen M. (1990-02-01). ["The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators"](https://doi.org/10.1214%2Fss%2F1177012274). *Statistical Science*. **5** (1). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/ss/1177012274](https://doi.org/10.1214%2Fss%2F1177012274). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0883-4237](https://search.worldcat.org/issn/0883-4237). 8. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-Anderson-84_8-0)** Anderson, T. W. (1984), *An Introduction to Multivariate Statistical Analysis* (2nd ed.), New York: John Wiley & Sons 9. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-:0_9-0)** Botev, Zdravko I.; Kroese, Dirk P.; Taimre, Thomas (2025). *Data Science and Machine Learning: Mathematical and Statistical Methods* (2nd ed.). Boca Raton ; London: CRC Press. pp. 277–279\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-032-48868-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-032-48868-4 "Special:BookSources/978-1-032-48868-4") . 10. ^ [***a***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-0) [***b***](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-bock75_10-1) Bock, M. E. (1975), "Minimax estimators of the mean of a multivariate normal distribution", *[Annals of Statistics](https://en.wikipedia.org/wiki/Annals_of_Statistics "Annals of Statistics")*, **3** (1): 209–218, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aos/1176343009](https://doi.org/10.1214%2Faos%2F1176343009), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0381064](https://mathscinet.ams.org/mathscinet-getitem?mr=0381064), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0314\.62005](https://zbmath.org/?format=complete&q=an:0314.62005) 11. **[^](https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator#cite_ref-brown66_11-0)** [Brown, L. D.](https://en.wikipedia.org/wiki/Lawrence_D._Brown "Lawrence D. Brown") (1966), "On the admissibility of invariant estimators of one or more location parameters", *Annals of Mathematical Statistics*, **37** (5): 1087–1136, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/aoms/1177699259](https://doi.org/10.1214%2Faoms%2F1177699259), [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [0216647](https://mathscinet.ams.org/mathscinet-getitem?mr=0216647), [Zbl](https://en.wikipedia.org/wiki/Zbl_\(identifier\) "Zbl (identifier)") [0156\.39401](https://zbmath.org/?format=complete&q=an:0156.39401) - Judge, George G.; Bock, M. E. (1978). *The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics*. New York: North Holland. pp. 229–257\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [0-7204-0729-X](https://en.wikipedia.org/wiki/Special:BookSources/0-7204-0729-X "Special:BookSources/0-7204-0729-X") .

ML Classification

ML Categories

/Science		90.3%
/Science/Mathematics		80.8%

Raw JSON

{
    "/Science": 903,
    "/Science/Mathematics": 808
}

ML Page Types

/Article		99.7%
/Article/Wiki		84.0%

Raw JSON

{
    "/Article": 997,
    "/Article/Wiki": 840
}

ML Intent Types

Informational

99.9%

Raw JSON

{
    "Informational": 999
}

Content Metadata

Language

Author

null

Publish Time

not set

Original Publish Time

2015-05-31 01:26:37 (10 years ago)

Republished

Word Count (Total)

3,289

Word Count (Content)

1,989

Links

External Links

Internal Links

116

Technical SEO

Meta Nofollow

Meta Noarchive

JS Rendered

Redirect Target

null

Performance

Download Time (ms)

TTFB (ms)

Download Size (bytes)

30,603

Shard

152 (laksa)

Root Hash

17790707453426894952

Unparsed URL

org,wikipedia!en,/wiki/James%E2%80%93Stein_estimator s443