🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:

Response:

Calculated Shard: 129 (from laksa056)

2. Crawled Status Check

Query:

curl -X POST \
  'http://laksa129.int.ahrefs:8124/' \
  -H 'Content-Type: text/plain' \
  -H 'X-ClickHouse-Database: crawler3' \
  -H 'Authorization: Basic YXBpOg==' \
  -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, attrs_boilerpipe_text AS attrs_boilerpipe_text, attrs_markdown AS attrs_markdown, attrs_readable_markdown AS attrs_readable_markdown, meta_canonical AS meta_canonical FROM crawler3.page_info_local FINAL PREWHERE (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://link.springer.com/article/10.1007/s42081-023-00209-y\')), getAhrefsUnparsedNoserviceFromURL(\'https://link.springer.com/article/10.1007/s42081-023-00209-y\'))) FORMAT JSONEachRow'

Response:

{"found_url":"https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y","crawl_time":1774325499,"first_indexed_time":1695224851,"http_code":200,"src_unparsed":"com,springer!link,\/article\/10.1007\/s42081-023-00209-y s443","src_root_hash":"17645177711233004329","history_drop_reason":null,"meta_title":"Machine learning and the James–Stein estimator | Japanese Journal of Statistics and Data Science | Springer Nature Link","meta_descriptions":["It is now 62 years since the publication of James and Stein’s seminal article on the estimation of a multivariate normal mean vector. The paper made"],"attrs_boilerpipe_text":"1\nIntroduction\nBy and large, the statistics world is one of heuristics, approximations, and asymptotics. The James–Stein estimator arrived in that world in 1961 on a note of startling specificity: unseen parameters\nμ\n1\n,\nμ\n2\n,\n…\n,\nμ\nn\nproduce independent observations\nx\ni\n∼\nind\nN\n(\nμ\ni\n,\n1\n)\n,\ni\n=\n1\n,\n…\n,\nn\n,\n(1)\nn\n≥\n3\n. The James–Stein rule in its simplest form proposed estimating the\nμ\ni\nby\nμ\n^\ni\nJ\nS\n=\n(\n1\n−\nn\n−\n2\nS\n)\nx\ni\n(\nS\n=\n∑\ni\n=\n1\nn\nx\ni\n2\n)\n.\n(2)\nFormula (\n2\n) looked implausible: the estimate of\nμ\ni\ndepended on the\nother\nobservations\nx\nj\n,\nj\n≠\ni\n(through\nS\n), as well as\nx\ni\n, despite the independence assumption. Nevertheless, James and Stein showed that Rule (\n2\n)\nalways\nbeat the obvious maximum likelihood estimates\nμ\n^\ni\nM\nL\n=\nx\n¯\ni\n(\ni\n=\n1\n,\n…\n,\nn\n)\n(3)\nin terms of total expected squared error\nE\n{\n∑\ni\n=\n1\nn\n(\nμ\n^\ni\n−\nμ\ni\n)\n2\n}\n.\n(4)\nThat “always” was the shocking part: two centuries of statistical theory, ANOVA, regression, multivariate analysis, etc., depended on maximum likelihood estimation. Did everything have to be rethought?\nOne path forward involved Bayesian thinking. If we assumed that the\nμ\ni\nthemselves came from a normal distribution,\nμ\ni\n∼\nind\nN\n(\n0\n,\nA\n)\nfor \ni\n=\n1\n,\n…\n,\nn\n,\n(5)\nwith variance\nA\n≥\n0\n, the Bayes estimates would be\nμ\n^\ni\nB\na\ny\ne\ns\n=\nB\nx\ni\n(\nB\n=\nA\n\/\n(\nA\n+\n1\n)\n)\n.\n(6)\nWe don’t know\nA\nor\nB\nbut\nB\n^\n=\n1\n−\n(\nn\n−\n2\n)\n\/\nS\n(7)\nis\nB\n’s unbiased estimate: we can rewrite (\n2\n) as\nμ\n^\ni\nJ\nS\n=\nB\n^\nx\ni\n,\n(8)\nwhich at least looks more plausible.\nIn the language introduced by Robbins (\n1956\n), formula (\n8\n) is an\nempirical Bayes\nestimator, another shocking post-war statistical innovation. Carl Morris and I wrote a series of papers in the 1970 s exploring Bayesian roots of the James–Stein estimator (Efron and Morris,\n1973\n). Something is lost in the empirical Bayes formulation, namely the frequentist “always” of expected square error minimization, but a lot is gained in flexibility and scope, as discussed in Sect. \n2\n.\nFig. 1\nProstate data: 6033\nx\nvalues; mean 0.003, sd\n=\n1.135\n; curve is proportional to a\nN\n(\n0\n,\n1\n)\ndensity\nFull size image\nFigure \n1\nillustrates an example of simultaneous estimation pursued in Sect. 2.1 of Efron (\n2010\n). A microarray study has compared expression levels between prostate cancer patients and control subjects for\nn\n=\n6033\ngenes. For each gene, a statistic\nx\ni\nhas been calculated (essentially a “\nz\n-value”),\nx\ni\n∼\nN\n(\nμ\ni\n,\n1\n)\n,\ni\n=\n1\n,\n…\n,\nn\n,\n(9)\nwhere\nμ\ni\nmeasures the difference between cancer and control group levels.\nThe solid curve in Fig. \n1\nis a\nN\n(\n0\n,\n1\n)\ndensity scaled to have the same area as the histogram of the 6033\nx\nvalues. A bad result from the researchers’ point of view would be a perfect fit of curve to histogram, which would imply all the genes have\nμ\ni\n=\n0\n, the “null” value of no difference between cancer patients and controls.\nThat’s not what happened: the histogram has mildly heavy tails in both directions. The researchers were hoping to find genes with large values of\n‖\nμ\ni\n‖\n—ones that might be a clue to prostate cancer etiology—as suggested by the heavy tails. How encouraged should they be?\nNot very, according to the James–Stein rule. The 6033\nx\ni\nvalues have mean 0.003, which I’ll take to be zero, and empirical variance\nσ\n^\n2\n=\n1.289.\n(10)\nThe James–Stein estimate (\n2\n) is\nμ\n^\ni\nJ\nS\n=\n(\n1\n−\nn\n−\n2\nn\n−\n1\n1\nσ\n^\n2\n)\nx\ni\n=\n0.224\n⋅\nx\ni\n,\n(11)\nso even\nx\ni\n=\n5\nyields an estimate barely exceeding 1. Section \n2\nsuggests a more optimistic analysis.\n2\nTweedie’s formula\nThe impressive precision of the James–Stein theorem came at a cost in generality. Efforts to extend the theorem, say to Poisson rather than normal observations, or to measures of loss other than total squared error, gave encouraging asymptotic results but not the James–Stein kind of finite sample frequentist dominance.\nBetter progress was possible on the empirical Bayes side of the street.\nTweedie’s formula\n(Efron,\n2011\n) has been particularly useful. We wish to calculate Bayesian estimates\nμ\ni\nB\na\ny\ne\ns\n=\nE\n{\nμ\ni\n∣\nx\ni\n}\n,\ni\n=\n1\n,\n…\n,\nn\n,\n(12)\nin the normal sampling model (\n1\n), starting from a given (possibly non-normal) prior\nπ\n(\nμ\n)\n, applying to all\nn\ncases. Let\nf\n(\nx\n) be the marginal density\nf\n(\nx\n)\n=\n∫\nR\nπ\n(\nμ\n)\nϕ\n(\nx\n−\nμ\n)\n \nd\nμ\n,\n(13)\nwith\nϕ\nthe standard\nN\n(\n0\n,\n1\n)\ndensity and\nR\nthe range of\nμ\n. (It isn’t necessary for\nπ\n(\n⋅\n)\nto be a continuous distribution but it simplifies notation.)\nTweedie’s formula provides an elegant statement for\nμ\ni\nB\na\ny\ne\ns\n, the posterior expectation of\nμ\ni\ngiven\nx\ni\n,\nμ\ni\nB\na\ny\ne\ns\n=\nE\n{\nμ\ni\n∣\nx\ni\n}\n=\nx\ni\n+\nl\n′\n(\nx\ni\n)\nwith\nl\n′\n(\nx\ni\n)\n=\nd\nd\nx\nlog\n⁡\n(\nf\n(\nx\ni\n)\n)\n.\n(14)\nIn the empirical Bayes situation (\n1\n), where the prior\nπ\n(\n⋅\n)\nis unknown, we can use the observed data\nx\n1\n,\n…\n,\nx\nn\nto estimate the marginal density\nf\n(\nx\n), say by\nf\n^\n(\nx\n)\n, giving empirical Bayes estimates\nμ\n^\ni\n=\nx\ni\n+\nl\n^\n′\n(\nx\ni\n)\n.\n(15)\nThe Bayes estimate (\n14\n) can be thought of as the MLE\nx\ni\nplus a Bayesian correction term\nl\n′\n(\nx\ni\n)\n. When the prior\nπ\n(\nμ\n)\nis the\nN\n(\n0\n,\nA\n)\ndistribution (\n5\n),\nμ\ni\nB\na\ny\ne\ns\nequals\nB\nx\ni\n(\n6\n). Simple formulas for\nμ\ni\nB\na\ny\ne\ns\ngive out for most other choices of\nπ\n(\nμ\n)\nbut now, in the machine learning era\nFootnote\n1\nof statistical research, numerical methods provide useful ways forward, as discussed next.\nThe\nlog polynomial class\nFootnote\n2\nof marginal densities defines\nf\n(\nx\n) by\nlog\n⁡\n(\nf\nβ\n(\nx\n)\n)\n=\nβ\n0\n+\nβ\n⊤\nc\n(\nx\n)\n.\n(16)\nHere\nc\n(\nx\n)\n=\n(\nx\n,\nx\n2\n,\n…\n,\nx\nJ\n)\n⊤\n and \nβ\n=\n(\nβ\n1\n,\n…\n,\nβ\nJ\n)\n⊤\n,\n(17)\nwith\nβ\n0\nchosen to make\nf\nβ\n(\nx\n)\nintegrate to 1. The choice\nJ\n=\n2\ngives normal marginals; larger values of\nJ\nallow for marginal non-normality.\nFig. 2\nProstate data: Tweedie’s estimate of\nE\n{\nμ\n∣\nx\n}\n, 5 degrees of freedom; dashed curve is James–Stein estimate\nFull size image\nThe choice\nJ\n=\n5\nwas applied to the prostate cancer data of Fig. \n1\n: Tweedie’s formula (\n14\n) gave\nμ\n^\n(\nx\n)\n=\nE\n{\nμ\n∣\nx\n}\n, graphed as the solid curve in Fig. \n2\n. It differs markedly from the James–Stein estimate\nJ\n=\n2\n, the dashed line. At\nx\n=\n4\nfor example, the\nJ\n=\n5\nestimate is\nFootnote\n3\nE\n{\nμ\n∣\nx\n=\n4\n}\n=\n2.555\n(18)\ncompared to 0.901 for the James–Stein estimate.\nThe estimated curve\nE\n{\nμ\n∣\nx\n}\nis\nempirical Bayes\nin the same sense as (\n8\n): the parameter vector\nβ\nwas selected by maximum likelihood, as discussed next. With\nJ\n=\n5\n, the prior was able to adapt to the “fishing expedition” nature of such microarray studies, where we expect most of the genes to be null or close to null, with\nμ\ni\nnearly zero (corresponding here to the flat part of the curve for\nx\nbetween\n−\n2\nand 2) and, hopefully, a small proportion of interestingly large\nμ\ni\ns.\nThe sample size\nn\n=\n6033\nhas much to do with Fig. \n2\n. James and Stein (\n1961\n) was usually considered in terms of small samples, perhaps\nn\n≤\n20\n, for which there would be little hope of seeing the detail in Fig. \n2\n. The term “machine learning era” seems less fanciful when considering the scale of problems statisticians are now asked to deal with, as well as the tools they use to solve them.\nIt looks like it might be hard work computing Fig. \n2\nbut it’s not. The histogram in Fig. \n1\nhas 97 bins, with centerpoints\nv\nv\n=\n(\n−\n4.4\n,\n−\n4.3\n,\n…\n,\n5.1\n,\n5.2\n)\n.\n(19)\nLet\ny\nj\nbe the count in bin\nj\n, that is, the number of the 6033\nx\ni\nvalues falling into it, with the vector of counts being\ny\ny\n=\n(\ny\n1\n,\n…\n,\ny\n97\n)\n.\n(20)\nThen the single R command\nl\nl\n^\n=\nlog\n⁡\n(\nglm\n(\ny\ny\n∼\npoly\n(\nv\nv\n,\n5\n)\n,\npoisson\n)\n$\nfit\n)\n(21)\nprovides a close approximation to the MLE of\nlog\n⁡\nf\n(\nx\n)\nin (\n14\n); numerical differentiation of\nl\nl\n^\ngives Tweedie’s estimate. Section 3.4 of Efron (\n2023\n) shows why Poisson regression (\n21\n) is appropriate here.\nThe James–Stein theorem depends on the independence assumption in (\n1\n), unlikely to be true in the microarray study, but the estimates (\n2\n) have a certain marginal validity even under dependence. This is clearer from the empirical Bayes point of view. The Tweedie estimate\nx\ni\n+\nl\n^\n′\n(\nx\ni\n)\nrequires only that\nl\n^\n′\n(\nx\n)\nbe close to\nl\n′\n(\nx\n)\n, not that it be estimated from independent\nx\ni\ns.\nFootnote\n4\n3\nShrinkage estimators\nJames and Stein’s paper aroused excited interest in the statistics community when it arrived in 1961. Most of the excitement focused on the strict inadmissibility of the traditional maximum likelihood estimate demonstrated by the James–Stein rule. Other rules dominating the MLE were discovered, for instance the Bayes estimator of Strawderman (\n1971\n), that was itself admissible while rendering the MLE inadmissible.\nBig new ideas can take a while to make their true impact felt. The James–Stein rule had an influential side effect on subsequent theory and practice in that it demonstrated, in an inarguable way, the virtues of\nshrinkage estimation\n: given an ensemble of problems, individual estimates are shrunk toward a central point; that is, a deliberate bias is introduced, pulling estimates away from their MLEs for the sake of better group performances.\nAdmissibility and inadmissibility aren’t much in the air these days, while shrinkage estimation has gone on to play a major role in modern practice. A spectacular success story is the lasso (Tibshirani,\n1996\n). Lasso shrinkage is extreme, pulling some (often most) of the coefficient estimates all the way back to zero.\nBayes and empirical Bayes rules tend to be strong shrinkers. Tweedie’s estimate in Fig. \n2\n(\nJ\n=\n5\n) shrinks the estimate of\nE\n{\nμ\n∣\nx\n=\n4\n}\nfrom its MLE value 4 down to 2.555. For\nμ\nbetween\n−\n1\nand 1, the shrinkage is almost all the way to zero.\nThe reader may have been surprised to see that neither Tweedie’s formula (\n14\n) for\nE\n{\nμ\ni\n∣\nx\ni\n}\nnor its empirical version (\n15\n) require estimation of the prior\nπ\n(\nμ\n)\n. This is a special property of the posterior expectation\nE\n{\nμ\ni\n∣\nx\ni\n}\nand isn’t available for say\nPr\n{\nμ\ni\n≥\n2\n∣\nx\ni\n}\n, or most other Bayesian targets.\n“Bayesian deconvolution” (Efron,\n2016\n) uses low-dimensional parametric modeling of\nπ\n(\nμ\n)\nfor general empirical Bayes computations. It was applied to finding a prior density\nπ\n(\nμ\n)\nthat would give the distribution of\nx\nseen in Fig. \n1\n, assuming the normal sampling model (\n1\n). The deconvolution model for\nπ\n(\nμ\n)\nused a delta function at\nμ\n=\n0\n(for the “null” genes) and a natural spline function with four degrees of freedom for the non-null cases.\nFig. 3\nEmpirical Bayes conditional density of\nμ\ngiven\nμ\nnot zero;\nPr\n{\nμ\n=\n0\n}\nequals 0.825\nFull size image\nThe estimated prior\nFootnote\n5\nπ\n^\n(\nμ\n)\nis shown in Fig. \n3\n; it put probability 0.825 on\nμ\n=\n0\n, while the conditional distribution given\nμ\n≠\n0\nwas a moderately heavy-tailed version of\nN\n(\n0\n,\n1.33\n2\n)\n. Based on\nπ\n^\n(\nμ\n)\nwe can form estimates of\nany\nBayesian target, for instance\nPr\n^\n{\nμ\ni\n≥\n2\n∣\nx\ni\n=\n4\n}\n=\n0.80\n. Figure \n3\nis a direct descendent of the James–Stein rule, now 60-plus years on.\nA less-direct descendent, but still on the family tree, arrived in 1995. The\nfalse discovery rate\npaper by Benjamini and Hochberg concerned simultaneous hypothesis testing. Looking at Fig. \n1\n, which of the\nn\n=\n6033\ngenes can confidently be labeled as non-null, that is as having\nμ\ni\n≠\n0\n?\nSuppose for convenience that the\nx\ni\ns are ordered from smallest to largest. The right-sided significance level for testing\nμ\ni\n=\n0\nis\nS\n0\n(\nx\ni\n)\n=\n1\n−\nΦ\n(\nx\ni\n)\n,\n(22)\nwhere\nΦ\nis the standard normal cumulative distribution function. Of the 6033 genes, 401 had\nS\ni\n≤\n0.05\n, the usual rejection level for individual testing, but even if actually\nall\nof the genes were null we would expect 302 such rejections, so individual testing can’t be right. Benjamini and Hochberg proposed a novel simultaneous testing rule that safely controls the number of “false discoveries” — genes falsely labeled ”non-null” — while not being discouragingly strict. (My summary here won’t give the BH rule its full due; see Chapter 4 of Efron (\n2010\n) for a more complete description.)\nLet\nS\n^\n(\nx\n)\nbe the observed proportion of\nx\ni\ns exceeding value\nx\n, and define\nF\nd\nr\n^\n(\nx\n)\n=\nπ\n0\nS\n0\n(\nx\n)\n\/\nS\n^\n(\nx\n)\n,\n(23)\nwhere\nπ\n0\nis the proportion of null genes among all\nn\n.\nFootnote\n6\nFor a fixed control level\nα\n, such as\nα\n=\n0.1\n, the BH rule says to reject the null hypothesis\nμ\ni\n=\n0\nfor those genes having\nF\nd\nr\n^\n(\nx\ni\n)\n≤\nα\n.\n(24)\nThe Benjamini–Hochberg theorem states that under independence assumptions like (\n1\n), the expected proportion of false discoveries by rule (\n24\n) is\nα\n.\nFig. 4\nProstate data: left Fdr and right Fdr; dashes show 60 genes with\nF\nd\nr\n<\n0.1\nFull size image\nFigure \n4\nshows\nF\nd\nr\n^\nfor the prostate cancer data and also for the left-sided Fdr estimate, where significance is defined by\nS\n0\n(\nx\ni\n)\n=\nΦ\n(\nx\ni\n)\nrather than (\n22\n). I applied the BH rule with\nα\n=\n0.1\nwhich labeled 60 genes as non-null, 32 on the left and 28 on the right. The BH theorem says that we can expect 6 of the 60 to actually be null.\nThe fdr story has evolved very much along the lines of its James–Stein predecessor. Intense initial interest focused on the exact frequentist control of false discovery rates. The Bayes and empirical Bayes implications came later: as at (\n5\n), we assume that each\nx\ni\nis a realization of a random variable\nx\ngiven by\nμ\n∼\nπ\n(\nμ\n)\nand\nx\n∣\nμ\n∼\np\n(\nx\n∣\nμ\n)\n,\n(25)\nwhere\np\n(\nx\n∣\nμ\n)\nis a known probability kernel which I’ll take here to be the normal sampling model (\n1\n). Then if\nS\n(\nx\n) is 1 minus the cdf of the marginal density (\n13\n), Bayes rule gives\nPr\n{\nμ\n=\n0\n∣\nx\n}\n=\nπ\n0\nS\n0\n(\nx\n)\n\/\nS\n(\nx\n)\n.\n(26)\nComparing (\n26\n) with (\n23\n) says that the BH rule amounts to labeling case\ni\nas non-null if its obvious empirical Bayes estimate of nullness is less than\nα\n. This is less precise than the frequentist control theorem but, as with the James–Stein estimator, is more robust in not demanding independence among the\nx\ni\ns. The family resemblance between JS and BH is through shrinkage: in the BH case the shrinkage of significance levels. For instance,\nx\ni\n=\n3\nhas individual significance level 0.001 against nullness, whereas\nF\nd\nr\n^\n=\n0.164\nfor the prostate data, i.e, still with about a 1\/6 chance of gene\ni\nbeing null.\nSo what does machine learning have to do with the James–Stein estimator? Nothing to its birth but, as the articles in this volume show, a great deal to its downstream effects on statistical theory and practice. Charles Stein, who was a good applied statistician when he put his mind to it, might have enjoyed these developments, but maybe not; his heart was always with the mathematics.\nReferences\nBenjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing.\nJournal of the Royal Statistical Society Series B,\n57\n(1), 289–300.\nArticle\n \nMathSciNet\n \nGoogle Scholar\n \nEfron, B. (2010).\nLarge-scale inference: Empirical Bayes methods for estimation, testing, and prediction\n(Vol. 1). Cambridge: Cambridge University Press.\nBook\n \nGoogle Scholar\n \nEfron, B. (2011). Tweedie’s formula and selection bias.\nJournal of the American Statistical Association,\n106\n(496), 1602–1614.\nhttps:\/\/doi.org\/10.1198\/jasa.2011.tm11181\nArticle\n \nMathSciNet\n \nGoogle Scholar\n \nEfron, B. (2016). Empirical Bayes deconvolution estimates.\nBiometrika,\n103\n(1), 1–20.\nhttps:\/\/doi.org\/10.1093\/biomet\/asv068\nArticle\n \nMathSciNet\n \nGoogle Scholar\n \nEfron, B. (2023).\nExponential Families in Theory and Practice\n. Cambridge: Cambridge University Press.\nGoogle Scholar\n \nEfron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach.\nJournal of the American Statistical Association,\n68\n, 117–130.\nMathSciNet\n \nGoogle Scholar\n \nJames, W., & Stein, C. (1961). Estimation with quadratic loss. In\nProc. 4th Berkeley Sympos. Math. Statist. and Prob.\n(Vol. I, pp. 361–379). Berkeley: University of California Press.\nNarasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation.\nJournal of Statistical Software,\n94\n(11), 1–20.\nhttps:\/\/doi.org\/10.18637\/jss.v094.i11\nArticle\n \nGoogle Scholar\n \nRobbins, H. (1956). An empirical Bayes approach to statistics. In\nProc. 3rd Berkeley Sympos. Math. Statist. and Prob.\n(Vol. I, pp. 157–163). Berkeley: University of California Press.\nStrawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean.\nAnnals of Mathematical Statistics,\n42\n(1), 385–388.\nhttps:\/\/doi.org\/10.1214\/aoms\/1177693528\nArticle\n \nMathSciNet\n \nGoogle Scholar\n \nTibshirani, R. (1996). Regression shrinkage and selection via the lasso.\nJournal of the Royal Statistical Society Series B,\n58\n(1), 267–288.\nArticle\n \nMathSciNet\n \nGoogle Scholar\n \nDownload references","attrs_markdown":"[Skip to main content](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#main)\n\nAdvertisement\n\n[![Advertisement](https:\/\/pubads.g.doubleclick.net\/gampad\/ad?iu=\/270604982\/springerlink\/42081\/article&sz=728x90&pos=top&articleid=s42081-023-00209-y)](https:\/\/pubads.g.doubleclick.net\/gampad\/jump?iu=\/270604982\/springerlink\/42081\/article&sz=728x90&pos=top&articleid=s42081-023-00209-y)\n\n[![Springer Nature Link](https:\/\/link.springer.com\/oscar-static\/images\/darwin\/header\/img\/logo-springer-nature-link-05805fde18.svg)](https:\/\/link.springer.com\/)\n\n[Account](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y)\n\n[Menu]()\n\n[Find a journal](https:\/\/link.springer.com\/journals\/)   [Publish with us](https:\/\/www.springernature.com\/gp\/authors)   [Track your research](https:\/\/link.springernature.com\/home\/)\n\n[Search]()\n\n[Saved research](https:\/\/link.springer.com\/saved-research)\n\n[Cart](https:\/\/order.springer.com\/public\/cart)\n## Search\n## Navigation\n- [Find a journal](https:\/\/link.springer.com\/journals\/)\n- [Publish with us](https:\/\/www.springernature.com\/gp\/authors)\n- [Track your research](https:\/\/link.springernature.com\/home\/)\n\n1. [Home](https:\/\/link.springer.com\/)\n2. [Japanese Journal of Statistics and Data Science](https:\/\/link.springer.com\/journal\/42081)\n3. Article\n# Machine learning and the James–Stein estimator\n- Original Paper\n- Stein Estimation and Statistical Shrinkage Methods\n- [Open access](https:\/\/www.springernature.com\/gp\/open-science\/about\/the-fundamentals-of-open-access-and-open-research)\n- Published:\n  30 June 2023\n\n- Volume 7, pages 257–266, (2024)\n- [Cite this article](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#citeas)\n\nYou have full access to this [open access](https:\/\/www.springernature.com\/gp\/open-science\/about\/the-fundamentals-of-open-access-and-open-research) article\n\n[Download PDF](https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42081-023-00209-y.pdf)\n\n[Save article](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/save-research?_csrf=MqX1VwflVBF2wGV6B4ITqKc5OgaOhi_J)\n\n[View saved research](https:\/\/link.springer.com\/saved-research)\n\n[![](https:\/\/media.springernature.com\/w72\/springer-static\/cover-hires\/journal\/42081?as=webp) Japanese Journal of Statistics and Data Science](https:\/\/link.springer.com\/journal\/42081)\n\n[Aims and scope](https:\/\/link.springer.com\/journal\/42081\/aims-and-scope)\n\n[Submit manuscript](https:\/\/www.editorialmanager.com\/jjsd)\n\nMachine learning and the James–Stein estimator\n\n[Download PDF](https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42081-023-00209-y.pdf)\n\n- [Bradley Efron](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#auth-Bradley-Efron-Aff1-Aff2)\n  [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Aff1),[2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Aff2)\n\n- 9362 Accesses\n- 7 Citations\n- 24  Altmetric\n- 1 Mention\n- [Explore all metrics](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/metrics)\n\n## Abstract\nIt is now 62 years since the publication of James and Stein’s seminal article on the estimation of a multivariate normal mean vector. The paper made a spectacular first impression on the statistical community through its demonstration of inadmissability of the maximum likelihood estimator. It continues to be influential, but not for the initial reasons. Empirical Bayes shrinkage estimation, now a major topic, found its early justification in the James–Stein formula. Less obvious downstream topics include Tweedie’s formula and Benjamini and Hochberg’s false discovery rate algorithm. This is a short and mainly non-technical review of the James–Stein rule and its effects on the machine learning era of statistical innovation.\n\n### Similar content being viewed by others\n![](https:\/\/media.springernature.com\/w215h120\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00227-w\/MediaObjects\/42081_2023_227_Fig1_HTML.png)\n\n### [Expansion estimators improving the bias and risk of James–Stein’s shrinkage estimator](https:\/\/link.springer.com\/10.1007\/s42081-023-00227-w)\nArticle 16 December 2023\n\n![](https:\/\/media.springernature.com\/w215h120\/springer-static\/image\/art%3A10.1007%2Fs10618-019-00622-6\/MediaObjects\/10618_2019_622_Fig1_HTML.png)\n\n### [A new class of metrics for learning on real-valued and structured data](https:\/\/link.springer.com\/10.1007\/s10618-019-00622-6)\nArticle 27 March 2019\n\n![](https:\/\/media.springernature.com\/w215h120\/springer-static\/image\/art%3A10.1007%2Fs40953-021-00270-y\/MediaObjects\/40953_2021_270_Fig1_HTML.png)\n\n### [Weak Versus Strong Dominance of Shrinkage Estimators](https:\/\/link.springer.com\/10.1007\/s40953-021-00270-y)\nArticle 18 November 2021\n\n### Explore related subjects\nDiscover the latest articles, books and news in related subjects, suggested using machine learning.\n\n- [Bayesian Inference](https:\/\/link.springer.com\/subjects\/bayesian-inference)\n- [Empiricism](https:\/\/link.springer.com\/subjects\/empiricism)\n- [Machine Learning](https:\/\/link.springer.com\/subjects\/machine-learning)\n- [Non-parametric Inference](https:\/\/link.springer.com\/subjects\/non-parametric-inference)\n- [Statistical Learning](https:\/\/link.springer.com\/subjects\/statistical-learning)\n- [Statistical Theory and Methods](https:\/\/link.springer.com\/subjects\/statistical-theory-and-methods)\n- [Empirical Bayesian Methods in Statistical Inference](https:\/\/link.springer.com\/subjects\/empirical-bayesian-methods-in-statistical-inference)\n\n## 1 Introduction\nBy and large, the statistics world is one of heuristics, approximations, and asymptotics. The James–Stein estimator arrived in that world in 1961 on a note of startling specificity: unseen parameters μ 1 , μ 2 , … , μ n produce independent observations\n\nx\n\ni\n\n∼\n\nind\n\nN\n\n(\n\nμ\n\ni\n\n,\n\n1\n\n)\n\n,\n\ni\n\n\\=\n\n1\n\n,\n\n…\n\n,\n\nn\n\n,\n\n(1)\n\nn ≥ 3. The James–Stein rule in its simplest form proposed estimating the μ i by\n\nμ\n\n^\n\ni\n\nJ\n\nS\n\n\\=\n\n(\n\n1\n\n−\n\nn\n\n−\n\n2\n\nS\n\n)\n\nx\n\ni\n\n(\n\nS\n\n\\=\n\n∑\n\ni\n\n\\=\n\n1\n\nn\n\nx\n\ni\n\n2\n\n)\n\n.\n\n(2)\n\nFormula ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) looked implausible: the estimate of μ i depended on the *other* observations x j, j ≠ i (through *S*), as well as x i, despite the independence assumption. Nevertheless, James and Stein showed that Rule ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) *always* beat the obvious maximum likelihood estimates\n\nμ\n\n^\n\ni\n\nM\n\nL\n\n\\=\n\nx\n\n¯\n\ni\n\n(\n\ni\n\n\\=\n\n1\n\n,\n\n…\n\n,\n\nn\n\n)\n\n(3)\n\nin terms of total expected squared error\n\nE\n\n{\n\n∑\n\ni\n\n\\=\n\n1\n\nn\n\n(\n\nμ\n\n^\n\ni\n\n−\n\nμ\n\ni\n\n)\n\n2\n\n}\n\n.\n\n(4)\n\nThat “always” was the shocking part: two centuries of statistical theory, ANOVA, regression, multivariate analysis, etc., depended on maximum likelihood estimation. Did everything have to be rethought?\n\nOne path forward involved Bayesian thinking. If we assumed that the μ i themselves came from a normal distribution,\n\nμ\n\ni\n\n∼\n\nind\n\nN\n\n(\n\n0\n\n,\n\nA\n\n)\n\nfor\n\ni\n\n\\=\n\n1\n\n,\n\n…\n\n,\n\nn\n\n,\n\n(5)\n\nwith variance A ≥ 0, the Bayes estimates would be\n\nμ\n\n^\n\ni\n\nB\n\na\n\ny\n\ne\n\ns\n\n\\=\n\nB\n\nx\n\ni\n\n(\n\nB\n\n\\=\n\nA\n\n\/\n\n(\n\nA\n\n\\+\n\n1\n\n)\n\n)\n\n.\n\n(6)\n\nWe don’t know *A* or *B* but\n\nB\n\n^\n\n\\=\n\n1\n\n−\n\n(\n\nn\n\n−\n\n2\n\n)\n\n\/\n\nS\n\n(7)\n\nis *B*’s unbiased estimate: we can rewrite ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) as\n\nμ\n\n^\n\ni\n\nJ\n\nS\n\n\\=\n\nB\n\n^\n\nx\n\ni\n\n,\n\n(8)\n\nwhich at least looks more plausible.\n\nIn the language introduced by Robbins ([1956](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR9 \"Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press.\")), formula ([8](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ8)) is an *empirical Bayes* estimator, another shocking post-war statistical innovation. Carl Morris and I wrote a series of papers in the 1970 s exploring Bayesian roots of the James–Stein estimator (Efron and Morris, [1973](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR6 \"Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130.\")). Something is lost in the empirical Bayes formulation, namely the frequentist “always” of expected square error minimization, but a lot is gained in flexibility and scope, as discussed in Sect. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Sec2).\n\n**Fig. 1**\n\n[![Fig. 1](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig1_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/1)\n\nProstate data: 6033 *x* values; mean 0.003, sd \\= 1\\.135; curve is proportional to a N ( 0 , 1 ) density\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/1)\n\nFigure [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1) illustrates an example of simultaneous estimation pursued in Sect. 2.1 of Efron ([2010](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR2 \"Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.\")). A microarray study has compared expression levels between prostate cancer patients and control subjects for n \\= 6033 genes. For each gene, a statistic x i has been calculated (essentially a “*z*\\-value”),\n\nx\n\ni\n\n∼\n\nN\n\n(\n\nμ\n\ni\n\n,\n\n1\n\n)\n\n,\n\ni\n\n\\=\n\n1\n\n,\n\n…\n\n,\n\nn\n\n,\n\n(9)\n\nwhere μ i measures the difference between cancer and control group levels.\n\nThe solid curve in Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1) is a N ( 0 , 1 ) density scaled to have the same area as the histogram of the 6033 *x* values. A bad result from the researchers’ point of view would be a perfect fit of curve to histogram, which would imply all the genes have μ i \\= 0, the “null” value of no difference between cancer patients and controls.\n\nThat’s not what happened: the histogram has mildly heavy tails in both directions. The researchers were hoping to find genes with large values of ‖ μ i ‖—ones that might be a clue to prostate cancer etiology—as suggested by the heavy tails. How encouraged should they be?\n\nNot very, according to the James–Stein rule. The 6033 x i values have mean 0.003, which I’ll take to be zero, and empirical variance\n\nσ\n\n^\n\n2\n\n\\=\n\n1\\.289.\n\n(10)\n\nThe James–Stein estimate ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) is\n\nμ\n\n^\n\ni\n\nJ\n\nS\n\n\\=\n\n(\n\n1\n\n−\n\nn\n\n−\n\n2\n\nn\n\n−\n\n1\n\n1\n\nσ\n\n^\n\n2\n\n)\n\nx\n\ni\n\n\\=\n\n0\\.224\n\n⋅\n\nx\n\ni\n\n,\n\n(11)\n\nso even x i \\= 5 yields an estimate barely exceeding 1. Section [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Sec2) suggests a more optimistic analysis.\n\n## 2 Tweedie’s formula\nThe impressive precision of the James–Stein theorem came at a cost in generality. Efforts to extend the theorem, say to Poisson rather than normal observations, or to measures of loss other than total squared error, gave encouraging asymptotic results but not the James–Stein kind of finite sample frequentist dominance.\n\nBetter progress was possible on the empirical Bayes side of the street. *Tweedie’s formula* (Efron, [2011](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR3 \"Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496), 1602–1614. \n                  https:\/\/doi.org\/10.1198\/jasa.2011.tm11181\n                  \n                \")) has been particularly useful. We wish to calculate Bayesian estimates\n\nμ\n\ni\n\nB\n\na\n\ny\n\ne\n\ns\n\n\\=\n\nE\n\n{\n\nμ\n\ni\n\n∣\n\nx\n\ni\n\n}\n\n,\n\ni\n\n\\=\n\n1\n\n,\n\n…\n\n,\n\nn\n\n,\n\n(12)\n\nin the normal sampling model ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), starting from a given (possibly non-normal) prior π ( μ ), applying to all *n* cases. Let *f*(*x*) be the marginal density\n\nf\n\n(\n\nx\n\n)\n\n\\=\n\n∫\n\nR\n\nπ\n\n(\n\nμ\n\n)\n\nϕ\n\n(\n\nx\n\n−\n\nμ\n\n)\n\nd\n\nμ\n\n,\n\n(13)\n\nwith ϕ the standard N ( 0 , 1 ) density and R the range of μ. (It isn’t necessary for π ( ⋅ ) to be a continuous distribution but it simplifies notation.)\n\nTweedie’s formula provides an elegant statement for μ i B a y e s, the posterior expectation of μ i given x i,\n\nμ\n\ni\n\nB\n\na\n\ny\n\ne\n\ns\n\n\\=\n\nE\n\n{\n\nμ\n\ni\n\n∣\n\nx\n\ni\n\n}\n\n\\=\n\nx\n\ni\n\n\\+\n\nl\n\n′\n\n(\n\nx\n\ni\n\n)\n\nwith\n\nl\n\n′\n\n(\n\nx\n\ni\n\n)\n\n\\=\n\nd\n\nd\n\nx\n\nlog\n\n⁡\n\n(\n\nf\n\n(\n\nx\n\ni\n\n)\n\n)\n\n.\n\n(14)\n\nIn the empirical Bayes situation ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), where the prior π ( ⋅ ) is unknown, we can use the observed data x 1 , … , x n to estimate the marginal density *f*(*x*), say by f ^ ( x ), giving empirical Bayes estimates\n\nμ\n\n^\n\ni\n\n\\=\n\nx\n\ni\n\n\\+\n\nl\n\n^\n\n′\n\n(\n\nx\n\ni\n\n)\n\n.\n\n(15)\n\nThe Bayes estimate ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)) can be thought of as the MLE x i plus a Bayesian correction term l ′ ( x i ). When the prior π ( μ ) is the N ( 0 , A ) distribution ([5](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ5)), μ i B a y e s equals B x i ([6](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ6)). Simple formulas for μ i B a y e s give out for most other choices of π ( μ ) but now, in the machine learning era[Footnote 1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn1) of statistical research, numerical methods provide useful ways forward, as discussed next.\n\nThe *log polynomial class*[Footnote 2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn2) of marginal densities defines *f*(*x*) by\n\nlog\n\n⁡\n\n(\n\nf\n\nβ\n\n(\n\nx\n\n)\n\n)\n\n\\=\n\nβ\n\n0\n\n\\+\n\nβ\n\n⊤\n\nc\n\n(\n\nx\n\n)\n\n.\n\n(16)\n\nHere\n\nc\n\n(\n\nx\n\n)\n\n\\=\n\n(\n\nx\n\n,\n\nx\n\n2\n\n,\n\n…\n\n,\n\nx\n\nJ\n\n)\n\n⊤\n\nand\n\nβ\n\n\\=\n\n(\n\nβ\n\n1\n\n,\n\n…\n\n,\n\nβ\n\nJ\n\n)\n\n⊤\n\n,\n\n(17)\n\nwith β 0 chosen to make f β ( x ) integrate to 1. The choice J \\= 2 gives normal marginals; larger values of *J* allow for marginal non-normality.\n\n**Fig. 2**\n\n[![Fig. 2](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig2_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/2)\n\nProstate data: Tweedie’s estimate of E { μ ∣ x }, 5 degrees of freedom; dashed curve is James–Stein estimate\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/2)\n\nThe choice J \\= 5 was applied to the prostate cancer data of Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1): Tweedie’s formula ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)) gave μ ^ ( x ) \\= E { μ ∣ x }, graphed as the solid curve in Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2). It differs markedly from the James–Stein estimate J \\= 2, the dashed line. At x \\= 4 for example, the J \\= 5 estimate is[Footnote 3](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn3)\n\nE\n\n{\n\nμ\n\n∣\n\nx\n\n\\=\n\n4\n\n}\n\n\\=\n\n2\\.555\n\n(18)\n\ncompared to 0.901 for the James–Stein estimate.\n\nThe estimated curve E { μ ∣ x } is *empirical Bayes* in the same sense as ([8](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ8)): the parameter vector β was selected by maximum likelihood, as discussed next. With J \\= 5, the prior was able to adapt to the “fishing expedition” nature of such microarray studies, where we expect most of the genes to be null or close to null, with μ i nearly zero (corresponding here to the flat part of the curve for *x* between − 2 and 2) and, hopefully, a small proportion of interestingly large μ is.\n\nThe sample size n \\= 6033 has much to do with Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2). James and Stein ([1961](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR7 \"James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press.\")) was usually considered in terms of small samples, perhaps n ≤ 20, for which there would be little hope of seeing the detail in Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2). The term “machine learning era” seems less fanciful when considering the scale of problems statisticians are now asked to deal with, as well as the tools they use to solve them.\n\nIt looks like it might be hard work computing Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2) but it’s not. The histogram in Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1) has 97 bins, with centerpoints\n\nv\n\nv\n\n\\=\n\n(\n\n−\n\n4\\.4\n\n,\n\n−\n\n4\\.3\n\n,\n\n…\n\n,\n\n5\\.1\n\n,\n\n5\\.2\n\n)\n\n.\n\n(19)\n\nLet y j be the count in bin *j*, that is, the number of the 6033 x i values falling into it, with the vector of counts being\n\ny\n\ny\n\n\\=\n\n(\n\ny\n\n1\n\n,\n\n…\n\n,\n\ny\n\n97\n\n)\n\n.\n\n(20)\n\nThen the single R command\n\nl\n\nl\n\n^\n\n\\=\n\nlog\n\n⁡\n\n(\n\nglm\n\n(\n\ny\n\ny\n\n∼\n\npoly\n\n(\n\nv\n\nv\n\n,\n\n5\n\n)\n\n,\n\npoisson\n\n)\n\n\\$\n\nfit\n\n)\n\n(21)\n\nprovides a close approximation to the MLE of log ⁡ f ( x ) in ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)); numerical differentiation of l l ^ gives Tweedie’s estimate. Section 3.4 of Efron ([2023](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR5 \"Efron, B. (2023). Exponential Families in Theory and Practice. Cambridge: Cambridge University Press.\")) shows why Poisson regression ([21](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ21)) is appropriate here.\n\nThe James–Stein theorem depends on the independence assumption in ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), unlikely to be true in the microarray study, but the estimates ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) have a certain marginal validity even under dependence. This is clearer from the empirical Bayes point of view. The Tweedie estimate x i \\+ l ^ ′ ( x i ) requires only that l ^ ′ ( x ) be close to l ′ ( x ), not that it be estimated from independent x is.[Footnote 4](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn4)\n\n## 3 Shrinkage estimators\nJames and Stein’s paper aroused excited interest in the statistics community when it arrived in 1961. Most of the excitement focused on the strict inadmissibility of the traditional maximum likelihood estimate demonstrated by the James–Stein rule. Other rules dominating the MLE were discovered, for instance the Bayes estimator of Strawderman ([1971](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR10 \"Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388. \n                  https:\/\/doi.org\/10.1214\/aoms\/1177693528\n                  \n                \")), that was itself admissible while rendering the MLE inadmissible.\n\nBig new ideas can take a while to make their true impact felt. The James–Stein rule had an influential side effect on subsequent theory and practice in that it demonstrated, in an inarguable way, the virtues of *shrinkage estimation*: given an ensemble of problems, individual estimates are shrunk toward a central point; that is, a deliberate bias is introduced, pulling estimates away from their MLEs for the sake of better group performances.\n\nAdmissibility and inadmissibility aren’t much in the air these days, while shrinkage estimation has gone on to play a major role in modern practice. A spectacular success story is the lasso (Tibshirani, [1996](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR11 \"Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.\")). Lasso shrinkage is extreme, pulling some (often most) of the coefficient estimates all the way back to zero.\n\nBayes and empirical Bayes rules tend to be strong shrinkers. Tweedie’s estimate in Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2) (J \\= 5) shrinks the estimate of E { μ ∣ x \\= 4 } from its MLE value 4 down to 2.555. For μ between − 1 and 1, the shrinkage is almost all the way to zero.\n\nThe reader may have been surprised to see that neither Tweedie’s formula ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)) for E { μ i ∣ x i } nor its empirical version ([15](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ15)) require estimation of the prior π ( μ ). This is a special property of the posterior expectation E { μ i ∣ x i } and isn’t available for say Pr { μ i ≥ 2 ∣ x i }, or most other Bayesian targets.\n\n“Bayesian deconvolution” (Efron, [2016](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR4 \"Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. \n                  https:\/\/doi.org\/10.1093\/biomet\/asv068\n                  \n                \")) uses low-dimensional parametric modeling of π ( μ ) for general empirical Bayes computations. It was applied to finding a prior density π ( μ ) that would give the distribution of *x* seen in Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1), assuming the normal sampling model ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)). The deconvolution model for π ( μ ) used a delta function at μ \\= 0 (for the “null” genes) and a natural spline function with four degrees of freedom for the non-null cases.\n\n**Fig. 3**\n\n[![Fig. 3](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig3_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/3)\n\nEmpirical Bayes conditional density of μ given μ not zero; Pr { μ \\= 0 } equals 0.825\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/3)\n\nThe estimated prior[Footnote 5](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn5)π ^ ( μ ) is shown in Fig. [3](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig3); it put probability 0.825 on μ \\= 0, while the conditional distribution given μ ≠ 0 was a moderately heavy-tailed version of N ( 0 , 1\\.33 2 ). Based on π ^ ( μ ) we can form estimates of *any* Bayesian target, for instance Pr ^ { μ i ≥ 2 ∣ x i \\= 4 } \\= 0\\.80. Figure [3](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig3) is a direct descendent of the James–Stein rule, now 60-plus years on.\n\nA less-direct descendent, but still on the family tree, arrived in 1995. The *false discovery rate* paper by Benjamini and Hochberg concerned simultaneous hypothesis testing. Looking at Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1), which of the n \\= 6033 genes can confidently be labeled as non-null, that is as having μ i ≠ 0?\n\nSuppose for convenience that the x is are ordered from smallest to largest. The right-sided significance level for testing μ i \\= 0 is\n\nS\n\n0\n\n(\n\nx\n\ni\n\n)\n\n\\=\n\n1\n\n−\n\nΦ\n\n(\n\nx\n\ni\n\n)\n\n,\n\n(22)\n\nwhere Φ is the standard normal cumulative distribution function. Of the 6033 genes, 401 had S i ≤ 0\\.05, the usual rejection level for individual testing, but even if actually *all* of the genes were null we would expect 302 such rejections, so individual testing can’t be right. Benjamini and Hochberg proposed a novel simultaneous testing rule that safely controls the number of “false discoveries” — genes falsely labeled ”non-null” — while not being discouragingly strict. (My summary here won’t give the BH rule its full due; see Chapter 4 of Efron ([2010](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR2 \"Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.\")) for a more complete description.)\n\nLet S ^ ( x ) be the observed proportion of x is exceeding value *x*, and define\n\nF\n\nd\n\nr\n\n^\n\n(\n\nx\n\n)\n\n\\=\n\nπ\n\n0\n\nS\n\n0\n\n(\n\nx\n\n)\n\n\/\n\nS\n\n^\n\n(\n\nx\n\n)\n\n,\n\n(23)\n\nwhere π 0 is the proportion of null genes among all *n*.[Footnote 6](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn6) For a fixed control level α, such as α \\= 0\\.1, the BH rule says to reject the null hypothesis μ i \\= 0 for those genes having\n\nF\n\nd\n\nr\n\n^\n\n(\n\nx\n\ni\n\n)\n\n≤\n\nα\n\n.\n\n(24)\n\nThe Benjamini–Hochberg theorem states that under independence assumptions like ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), the expected proportion of false discoveries by rule ([24](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ24)) is α.\n\n**Fig. 4**\n\n[![Fig. 4](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig4_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/4)\n\nProstate data: left Fdr and right Fdr; dashes show 60 genes with F d r \\< 0\\.1\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/4)\n\nFigure [4](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig4) shows F d r ^ for the prostate cancer data and also for the left-sided Fdr estimate, where significance is defined by S 0 ( x i ) \\= Φ ( x i ) rather than ([22](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ22)). I applied the BH rule with α \\= 0\\.1 which labeled 60 genes as non-null, 32 on the left and 28 on the right. The BH theorem says that we can expect 6 of the 60 to actually be null.\n\nThe fdr story has evolved very much along the lines of its James–Stein predecessor. Intense initial interest focused on the exact frequentist control of false discovery rates. The Bayes and empirical Bayes implications came later: as at ([5](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ5)), we assume that each x i is a realization of a random variable *x* given by\n\nμ\n\n∼\n\nπ\n\n(\n\nμ\n\n)\n\nand\n\nx\n\n∣\n\nμ\n\n∼\n\np\n\n(\n\nx\n\n∣\n\nμ\n\n)\n\n,\n\n(25)\n\nwhere p ( x ∣ μ ) is a known probability kernel which I’ll take here to be the normal sampling model ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)). Then if *S*(*x*) is 1 minus the cdf of the marginal density ([13](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ13)), Bayes rule gives\n\nPr\n\n{\n\nμ\n\n\\=\n\n0\n\n∣\n\nx\n\n}\n\n\\=\n\nπ\n\n0\n\nS\n\n0\n\n(\n\nx\n\n)\n\n\/\n\nS\n\n(\n\nx\n\n)\n\n.\n\n(26)\n\nComparing ([26](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ26)) with ([23](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ23)) says that the BH rule amounts to labeling case *i* as non-null if its obvious empirical Bayes estimate of nullness is less than α. This is less precise than the frequentist control theorem but, as with the James–Stein estimator, is more robust in not demanding independence among the x is. The family resemblance between JS and BH is through shrinkage: in the BH case the shrinkage of significance levels. For instance, x i \\= 3 has individual significance level 0.001 against nullness, whereas F d r ^ \\= 0\\.164 for the prostate data, i.e, still with about a 1\/6 chance of gene *i* being null.\n\nSo what does machine learning have to do with the James–Stein estimator? Nothing to its birth but, as the articles in this volume show, a great deal to its downstream effects on statistical theory and practice. Charles Stein, who was a good applied statistician when he put his mind to it, might have enjoyed these developments, but maybe not; his heart was always with the mathematics.\n\n## Notes\n1. Where algorithms can substitute for theorems.\n2. For general use, a natural spline basis is preferable to polynomials, to control the behavior of log ⁡ π ( μ ) at the extremes.\n3. With an estimated bootstrap standard error of 0.192.\n4. The accuracy of the Tweedie estimate *does* suffer under dependence, so the previously quoted bootstrap standard error is likely to be optimistic.\n5. Estimated using the CRAN package deconvolveR (Narasimhan and Efron, [2020](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR8 \"Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. Journal of Statistical Software, 94(11), 1–20. \n                     https:\/\/doi.org\/10.18637\/jss.v094.i11\n                     \n                   \")).\n6. π 0 can be estimated but in practice it is usually replaced by its upper bound 1 in applying rule ([24](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ24)). For cases like the prostate data where most of the genes are null, this doesn’t much affect the outcome.\n\n## References\n- Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. *Journal of the Royal Statistical Society Series B,* *57*(1), 289–300.\n  [Article](https:\/\/doi.org\/10.1111%2Fj.2517-6161.1995.tb02031.x) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=1325392)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Controlling%20the%20false%20discovery%20rate%3A%20A%20practical%20and%20powerful%20approach%20to%20multiple%20testing&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1995.tb02031.x&volume=57&issue=1&pages=289-300&publication_year=1995&author=Benjamini%2CY&author=Hochberg%2CY)\n- Efron, B. (2010). *Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction* (Vol. 1). Cambridge: Cambridge University Press.\n  [Book](https:\/\/doi.org\/10.1017%2FCBO9780511761362)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Large-scale%20inference%3A%20Empirical%20Bayes%20methods%20for%20estimation%2C%20testing%2C%20and%20prediction&doi=10.1017%2FCBO9780511761362&publication_year=2010&author=Efron%2CB)\n- Efron, B. (2011). Tweedie’s formula and selection bias. *Journal of the American Statistical Association,* *106*(496), 1602–1614. <https:\/\/doi.org\/10.1198\/jasa.2011.tm11181>\n  [Article](https:\/\/doi.org\/10.1198%2Fjasa.2011.tm11181) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=2896860)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Tweedie%E2%80%99s%20formula%20and%20selection%20bias&journal=Journal%20of%20the%20American%20Statistical%20Association&doi=10.1198%2Fjasa.2011.tm11181&volume=106&issue=496&pages=1602-1614&publication_year=2011&author=Efron%2CB)\n- Efron, B. (2016). Empirical Bayes deconvolution estimates. *Biometrika,* *103*(1), 1–20. <https:\/\/doi.org\/10.1093\/biomet\/asv068>\n  [Article](https:\/\/doi.org\/10.1093%2Fbiomet%2Fasv068) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=3465818)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Empirical%20Bayes%20deconvolution%20estimates&journal=Biometrika&doi=10.1093%2Fbiomet%2Fasv068&volume=103&issue=1&pages=1-20&publication_year=2016&author=Efron%2CB)\n- Efron, B. (2023). *Exponential Families in Theory and Practice*. Cambridge: Cambridge University Press.\n  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Exponential%20Families%20in%20Theory%20and%20Practice&publication_year=2023&author=Efron%2CB)\n- Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. *Journal of the American Statistical Association,* *68*, 117–130.\n  [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=388597)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Stein%E2%80%99s%20estimation%20rule%20and%20its%20competitors%E2%80%94An%20empirical%20Bayes%20approach&journal=Journal%20of%20the%20American%20Statistical%20Association&volume=68&pages=117-130&publication_year=1973&author=Efron%2CB&author=Morris%2CC)\n- James, W., & Stein, C. (1961). Estimation with quadratic loss. In *Proc. 4th Berkeley Sympos. Math. Statist. and Prob.* (Vol. I, pp. 361–379). Berkeley: University of California Press.\n- Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. *Journal of Statistical Software,* *94*(11), 1–20. <https:\/\/doi.org\/10.18637\/jss.v094.i11>\n  [Article](https:\/\/doi.org\/10.18637%2Fjss.v094.i11)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=deconvolveR%3A%20A%20G-Modeling%20Program%20for%20Deconvolution%20and%20Empirical%20Bayes%20Estimation&journal=Journal%20of%20Statistical%20Software&doi=10.18637%2Fjss.v094.i11&volume=94&issue=11&pages=1-20&publication_year=2020&author=Narasimhan%2CB&author=Efron%2CB)\n- Robbins, H. (1956). An empirical Bayes approach to statistics. In *Proc. 3rd Berkeley Sympos. Math. Statist. and Prob.* (Vol. I, pp. 157–163). Berkeley: University of California Press.\n- Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. *Annals of Mathematical Statistics,* *42*(1), 385–388. <https:\/\/doi.org\/10.1214\/aoms\/1177693528>\n  [Article](https:\/\/doi.org\/10.1214%2Faoms%2F1177693528) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=397939)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Proper%20Bayes%20minimax%20estimators%20of%20the%20multivariate%20normal%20mean&journal=Annals%20of%20Mathematical%20Statistics&doi=10.1214%2Faoms%2F1177693528&volume=42&issue=1&pages=385-388&publication_year=1971&author=Strawderman%2CWE)\n- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. *Journal of the Royal Statistical Society Series B,* *58*(1), 267–288.\n  [Article](https:\/\/doi.org\/10.1111%2Fj.2517-6161.1996.tb02080.x) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=1379242)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Regression%20shrinkage%20and%20selection%20via%20the%20lasso&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1996.tb02080.x&volume=58&issue=1&pages=267-288&publication_year=1996&author=Tibshirani%2CR)\n\n[Download references](https:\/\/citation-needed.springer.com\/v2\/references\/10.1007\/s42081-023-00209-y?format=refman&flavour=references)\n\n## Funding\nNo funds, grants, or other support was received.\n\n## Author information\n### Authors and Affiliations\n1. Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA, 94305, USA\n   Bradley Efron\n2. Department of Biomedical Data Science, Stanford School of Medicine, 1265 Welch Road, Stanford, CA, 94305, USA\n   Bradley Efron\n\nAuthors\n\n1. Bradley Efron\n   [View author publications](https:\/\/link.springer.com\/search?sortBy=newestFirst&contributor=Bradley%20Efron)\n   \n   Search author on:[PubMed](https:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=search&term=Bradley%20Efron) [Google Scholar](https:\/\/scholar.google.co.uk\/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Bradley%20Efron%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en)\n### Corresponding author\nCorrespondence to [Bradley Efron](mailto:efron@stanford.edu).\n\n## Ethics declarations\n### Conflict of interest\nThe author has no relevant financial or non-financial interests to disclose.\n\n## Additional information\n### Publisher's Note\nSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.\n\nDedicated to the memory of Carl Morris.\n\n## Rights and permissions\n**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit <http:\/\/creativecommons.org\/licenses\/by\/4.0\/>.\n\n[Reprints and permissions](https:\/\/s100.copyright.com\/AppDispatchServlet?title=Machine%20learning%20and%20the%20James%E2%80%93Stein%20estimator&author=Bradley%20Efron&contentID=10.1007%2Fs42081-023-00209-y&copyright=The%20Author%28s%29&publication=2520-8756&publicationDate=2023-06-30&publisherName=SpringerNature&orderBeanReset=true&oa=CC%20BY)\n\n## About this article\n[![Check for updates. Verify currency and authenticity via CrossMark](data:image\/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjgxIiB3aWR0aD0iNTciIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PGcgZmlsbD0ibm9uZSIgZmlsbC1ydWxlPSJldmVub2RkIj48cGF0aCBkPSJtMTcuMzUgMzUuNDUgMjEuMy0xNC4ydi0xNy4wM2gtMjEuMyIgZmlsbD0iIzk4OTg5OCIvPjxwYXRoIGQ9Im0zOC42NSAzNS40NS0yMS4zLTE0LjJ2LTE3LjAzaDIxLjMiIGZpbGw9IiM3NDc0NzQiLz48cGF0aCBkPSJtMjggLjVjLTEyLjk4IDAtMjMuNSAxMC41Mi0yMy41IDIzLjVzMTAuNTIgMjMuNSAyMy41IDIzLjUgMjMuNS0xMC41MiAyMy41LTIzLjVjMC02LjIzLTIuNDgtMTIuMjEtNi44OC0xNi42Mi00LjQxLTQuNC0xMC4zOS02Ljg4LTE2LjYyLTYuODh6bTAgNDEuMjVjLTkuOCAwLTE3Ljc1LTcuOTUtMTcuNzUtMTcuNzVzNy45NS0xNy43NSAxNy43NS0xNy43NSAxNy43NSA3Ljk1IDE3Ljc1IDE3Ljc1YzAgNC43MS0xLjg3IDkuMjItNS4yIDEyLjU1cy03Ljg0IDUuMi0xMi41NSA1LjJ6IiBmaWxsPSIjNTM1MzUzIi8+PHBhdGggZD0ibTQxIDM2Yy01LjgxIDYuMjMtMTUuMjMgNy40NS0yMi40MyAyLjktNy4yMS00LjU1LTEwLjE2LTEzLjU3LTcuMDMtMjEuNWwtNC45Mi0zLjExYy00Ljk1IDEwLjctMS4xOSAyMy40MiA4Ljc4IDI5LjcxIDkuOTcgNi4zIDIzLjA3IDQuMjIgMzAuNi00Ljg2eiIgZmlsbD0iIzljOWM5YyIvPjxwYXRoIGQ9Im0uMiA1OC40NWMwLS43NS4xMS0xLjQyLjMzLTIuMDFzLjUyLTEuMDkuOTEtMS41Yy4zOC0uNDEuODMtLjczIDEuMzQtLjk0LjUxLS4yMiAxLjA2LS4zMiAxLjY1LS4zMi41NiAwIDEuMDYuMTEgMS41MS4zNS40NC4yMy44MS41IDEuMS44MWwtLjkxIDEuMDFjLS4yNC0uMjQtLjQ5LS40Mi0uNzUtLjU2LS4yNy0uMTMtLjU4LS4yLS45My0uMi0uMzkgMC0uNzMuMDgtMS4wNS4yMy0uMzEuMTYtLjU4LjM3LS44MS42Ni0uMjMuMjgtLjQxLjYzLS41MyAxLjA0LS4xMy40MS0uMTkuODgtLjE5IDEuMzkgMCAxLjA0LjIzIDEuODYuNjggMi40Ni40NS41OSAxLjA2Ljg4IDEuODQuODguNDEgMCAuNzctLjA3IDEuMDctLjIzcy41OS0uMzkuODUtLjY4bC45MSAxYy0uMzguNDMtLjguNzYtMS4yOC45OS0uNDcuMjItMSAuMzQtMS41OC4zNC0uNTkgMC0xLjEzLS4xLTEuNjQtLjMxLS41LS4yLS45NC0uNTEtMS4zMS0uOTEtLjM4LS40LS42Ny0uOS0uODgtMS40OC0uMjItLjU5LS4zMy0xLjI2LS4zMy0yLjAyem04LjQtNS4zM2gxLjYxdjIuNTRsLS4wNSAxLjMzYy4yOS0uMjcuNjEtLjUxLjk2LS43MnMuNzYtLjMxIDEuMjQtLjMxYy43MyAwIDEuMjcuMjMgMS42MS43MS4zMy40Ny41IDEuMTQuNSAyLjAydjQuMzFoLTEuNjF2LTQuMWMwLS41Ny0uMDgtLjk3LS4yNS0xLjIxLS4xNy0uMjMtLjQ1LS4zNS0uODMtLjM1LS4zIDAtLjU2LjA4LS43OS4yMi0uMjMuMTUtLjQ5LjM2LS43OC42NHY0LjhoLTEuNjF6bTcuMzcgNi40NWMwLS41Ni4wOS0xLjA2LjI2LTEuNTEuMTgtLjQ1LjQyLS44My43MS0xLjE0LjI5LS4zLjYzLS41NCAxLjAxLS43MS4zOS0uMTcuNzgtLjI1IDEuMTgtLjI1LjQ3IDAgLjg4LjA4IDEuMjMuMjQuMzYuMTYuNjUuMzguODkuNjdzLjQyLjYzLjU0IDEuMDNjLjEyLjQxLjE4Ljg0LjE4IDEuMzIgMCAuMzItLjAyLjU3LS4wNy43NmgtNC4zNmMuMDcuNjIuMjkgMS4xLjY1IDEuNDQuMzYuMzMuODIuNSAxLjM4LjUuMjkgMCAuNTctLjA0LjgzLS4xM3MuNTEtLjIxLjc2LS4zN2wuNTUgMS4wMWMtLjMzLjIxLS42OS4zOS0xLjA5LjUzLS40MS4xNC0uODMuMjEtMS4yNi4yMS0uNDggMC0uOTItLjA4LTEuMzQtLjI1LS40MS0uMTYtLjc2LS40LTEuMDctLjctLjMxLS4zMS0uNTUtLjY5LS43Mi0xLjEzLS4xOC0uNDQtLjI2LS45NS0uMjYtMS41MnptNC42LS42MmMwLS41NS0uMTEtLjk4LS4zNC0xLjI4LS4yMy0uMzEtLjU4LS40Ny0xLjA2LS40Ny0uNDEgMC0uNzcuMTUtMS4wNy40NS0uMzEuMjktLjUuNzMtLjU4IDEuM3ptMi41LjYyYzAtLjU3LjA5LTEuMDguMjgtMS41My4xOC0uNDQuNDMtLjgyLjc1LTEuMTNzLjY5LS41NCAxLjEtLjcxYy40Mi0uMTYuODUtLjI0IDEuMzEtLjI0LjQ1IDAgLjg0LjA4IDEuMTcuMjNzLjYxLjM0Ljg1LjU3bC0uNzcgMS4wMmMtLjE5LS4xNi0uMzgtLjI4LS41Ni0uMzctLjE5LS4wOS0uMzktLjE0LS42MS0uMTQtLjU2IDAtMS4wMS4yMS0xLjM1LjYzLS4zNS40MS0uNTIuOTctLjUyIDEuNjcgMCAuNjkuMTcgMS4yNC41MSAxLjY2LjM0LjQxLjc4LjYyIDEuMzIuNjIuMjggMCAuNTQtLjA2Ljc4LS4xNy4yNC0uMTIuNDUtLjI2LjY0LS40MmwuNjcgMS4wM2MtLjMzLjI5LS42OS41MS0xLjA4LjY1LS4zOS4xNS0uNzguMjMtMS4xOC4yMy0uNDYgMC0uOS0uMDgtMS4zMS0uMjQtLjQtLjE2LS43NS0uMzktMS4wNS0uN3MtLjUzLS42OS0uNy0xLjEzYy0uMTctLjQ1LS4yNS0uOTYtLjI1LTEuNTN6bTYuOTEtNi40NWgxLjU4djYuMTdoLjA1bDIuNTQtMy4xNmgxLjc3bC0yLjM1IDIuOCAyLjU5IDQuMDdoLTEuNzVsLTEuNzctMi45OC0xLjA4IDEuMjN2MS43NWgtMS41OHptMTMuNjkgMS4yN2MtLjI1LS4xMS0uNS0uMTctLjc1LS4xNy0uNTggMC0uODcuMzktLjg3IDEuMTZ2Ljc1aDEuMzR2MS4yN2gtMS4zNHY1LjZoLTEuNjF2LTUuNmgtLjkydi0xLjJsLjkyLS4wN3YtLjcyYzAtLjM1LjA0LS42OC4xMy0uOTguMDgtLjMxLjIxLS41Ny40LS43OXMuNDItLjM5LjcxLS41MWMuMjgtLjEyLjYzLS4xOCAxLjA0LS4xOC4yNCAwIC40OC4wMi42OS4wNy4yMi4wNS40MS4xLjU3LjE3em0uNDggNS4xOGMwLS41Ny4wOS0xLjA4LjI3LTEuNTMuMTctLjQ0LjQxLS44Mi43Mi0xLjEzLjMtLjMxLjY1LS41NCAxLjA0LS43MS4zOS0uMTYuOC0uMjQgMS4yMy0uMjRzLjg0LjA4IDEuMjQuMjRjLjQuMTcuNzQuNCAxLjA0Ljcxcy41NC42OS43MiAxLjEzYy4xOS40NS4yOC45Ni4yOCAxLjUzcy0uMDkgMS4wOC0uMjggMS41M2MtLjE4LjQ0LS40Mi44Mi0uNzIgMS4xM3MtLjY0LjU0LTEuMDQuNy0uODEuMjQtMS4yNC4yNC0uODQtLjA4LTEuMjMtLjI0LS43NC0uMzktMS4wNC0uN2MtLjMxLS4zMS0uNTUtLjY5LS43Mi0xLjEzLS4xOC0uNDUtLjI3LS45Ni0uMjctMS41M3ptMS42NSAwYzAgLjY5LjE0IDEuMjQuNDMgMS42Ni4yOC40MS42OC42MiAxLjE4LjYyLjUxIDAgLjktLjIxIDEuMTktLjYyLjI5LS40Mi40NC0uOTcuNDQtMS42NiAwLS43LS4xNS0xLjI2LS40NC0xLjY3LS4yOS0uNDItLjY4LS42My0xLjE5LS42My0uNSAwLS45LjIxLTEuMTguNjMtLjI5LjQxLS40My45Ny0uNDMgMS42N3ptNi40OC0zLjQ0aDEuMzNsLjEyIDEuMjFoLjA1Yy4yNC0uNDQuNTQtLjc5Ljg4LTEuMDIuMzUtLjI0LjctLjM2IDEuMDctLjM2LjMyIDAgLjU5LjA1Ljc4LjE0bC0uMjggMS40LS4zMy0uMDljLS4xMS0uMDEtLjIzLS4wMi0uMzgtLjAyLS4yNyAwLS41Ni4xLS44Ni4zMXMtLjU1LjU4LS43NyAxLjF2NC4yaC0xLjYxem0tNDcuODcgMTVoMS42MXY0LjFjMCAuNTcuMDguOTcuMjUgMS4yLjE3LjI0LjQ0LjM1LjgxLjM1LjMgMCAuNTctLjA3LjgtLjIyLjIyLS4xNS40Ny0uMzkuNzMtLjczdi00LjdoMS42MXY2Ljg3aC0xLjMybC0uMTItMS4wMWgtLjA0Yy0uMy4zNi0uNjMuNjQtLjk4Ljg2LS4zNS4yMS0uNzYuMzItMS4yNC4zMi0uNzMgMC0xLjI3LS4yNC0xLjYxLS43MS0uMzMtLjQ3LS41LTEuMTQtLjUtMi4wMnptOS40NiA3LjQzdjIuMTZoLTEuNjF2LTkuNTloMS4zM2wuMTIuNzJoLjA1Yy4yOS0uMjQuNjEtLjQ1Ljk3LS42My4zNS0uMTcuNzItLjI2IDEuMS0uMjYuNDMgMCAuODEuMDggMS4xNS4yNC4zMy4xNy42MS40Ljg0LjcxLjI0LjMxLjQxLjY4LjUzIDEuMTEuMTMuNDIuMTkuOTEuMTkgMS40NCAwIC41OS0uMDkgMS4xMS0uMjUgMS41Ny0uMTYuNDctLjM4Ljg1LS42NSAxLjE2LS4yNy4zMi0uNTguNTYtLjk0LjczLS4zNS4xNi0uNzIuMjUtMS4xLjI1LS4zIDAtLjYtLjA3LS45LS4ycy0uNTktLjMxLS44Ny0uNTZ6bTAtMi4zYy4yNi4yMi41LjM3LjczLjQ1LjI0LjA5LjQ2LjEzLjY2LjEzLjQ2IDAgLjg0LS4yIDEuMTUtLjYuMzEtLjM5LjQ2LS45OC40Ni0xLjc3IDAtLjY5LS4xMi0xLjIyLS4zNS0xLjYxLS4yMy0uMzgtLjYxLS41Ny0xLjEzLS41Ny0uNDkgMC0uOTkuMjYtMS41Mi43N3ptNS44Ny0xLjY5YzAtLjU2LjA4LTEuMDYuMjUtMS41MS4xNi0uNDUuMzctLjgzLjY1LTEuMTQuMjctLjMuNTgtLjU0LjkzLS43MXMuNzEtLjI1IDEuMDgtLjI1Yy4zOSAwIC43My4wNyAxIC4yLjI3LjE0LjU0LjMyLjgxLjU1bC0uMDYtMS4xdi0yLjQ5aDEuNjF2OS44OGgtMS4zM2wtLjExLS43NGgtLjA2Yy0uMjUuMjUtLjU0LjQ2LS44OC42NC0uMzMuMTgtLjY5LjI3LTEuMDYuMjctLjg3IDAtMS41Ni0uMzItMi4wNy0uOTVzLS43Ni0xLjUxLS43Ni0yLjY1em0xLjY3LS4wMWMwIC43NC4xMyAxLjMxLjQgMS43LjI2LjM4LjY1LjU4IDEuMTUuNTguNTEgMCAuOTktLjI2IDEuNDQtLjc3di0zLjIxYy0uMjQtLjIxLS40OC0uMzYtLjctLjQ1LS4yMy0uMDgtLjQ2LS4xMi0uNy0uMTItLjQ1IDAtLjgyLjE5LTEuMTMuNTktLjMxLjM5LS40Ni45NS0uNDYgMS42OHptNi4zNSAxLjU5YzAtLjczLjMyLTEuMy45Ny0xLjcxLjY0LS40IDEuNjctLjY4IDMuMDgtLjg0IDAtLjE3LS4wMi0uMzQtLjA3LS41MS0uMDUtLjE2LS4xMi0uMy0uMjItLjQzcy0uMjItLjIyLS4zOC0uM2MtLjE1LS4wNi0uMzQtLjEtLjU4LS4xLS4zNCAwLS42OC4wNy0xIC4ycy0uNjMuMjktLjkzLjQ3bC0uNTktMS4wOGMuMzktLjI0LjgxLS40NSAxLjI4LS42My40Ny0uMTcuOTktLjI2IDEuNTQtLjI2Ljg2IDAgMS41MS4yNSAxLjkzLjc2cy42MyAxLjI1LjYzIDIuMjF2NC4wN2gtMS4zMmwtLjEyLS43NmgtLjA1Yy0uMy4yNy0uNjMuNDgtLjk4LjY2cy0uNzMuMjctMS4xNC4yN2MtLjYxIDAtMS4xLS4xOS0xLjQ4LS41Ni0uMzgtLjM2LS41Ny0uODUtLjU3LTEuNDZ6bTEuNTctLjEyYzAgLjMuMDkuNTMuMjcuNjcuMTkuMTQuNDIuMjEuNzEuMjEuMjggMCAuNTQtLjA3Ljc3LS4ycy40OC0uMzEuNzMtLjU2di0xLjU0Yy0uNDcuMDYtLjg2LjEzLTEuMTguMjMtLjMxLjA5LS41Ny4xOS0uNzYuMzFzLS4zMy4yNS0uNDEuNGMtLjA5LjE1LS4xMy4zMS0uMTMuNDh6bTYuMjktMy42M2gtLjk4di0xLjJsMS4wNi0uMDcuMi0xLjg4aDEuMzR2MS44OGgxLjc1djEuMjdoLTEuNzV2My4yOGMwIC44LjMyIDEuMi45NyAxLjIuMTIgMCAuMjQtLjAxLjM3LS4wNC4xMi0uMDMuMjQtLjA3LjM0LS4xMWwuMjggMS4xOWMtLjE5LjA2LS40LjEyLS42NC4xNy0uMjMuMDUtLjQ5LjA4LS43Ni4wOC0uNCAwLS43NC0uMDYtMS4wMi0uMTgtLjI3LS4xMy0uNDktLjMtLjY3LS41Mi0uMTctLjIxLS4zLS40OC0uMzctLjc4LS4wOC0uMy0uMTItLjY0LS4xMi0xLjAxem00LjM2IDIuMTdjMC0uNTYuMDktMS4wNi4yNy0xLjUxcy40MS0uODMuNzEtMS4xNGMuMjktLjMuNjMtLjU0IDEuMDEtLjcxLjM5LS4xNy43OC0uMjUgMS4xOC0uMjUuNDcgMCAuODguMDggMS4yMy4yNC4zNi4xNi42NS4zOC44OS42N3MuNDIuNjMuNTQgMS4wM2MuMTIuNDEuMTguODQuMTggMS4zMiAwIC4zMi0uMDIuNTctLjA3Ljc2aC00LjM3Yy4wOC42Mi4yOSAxLjEuNjUgMS40NC4zNi4zMy44Mi41IDEuMzguNS4zIDAgLjU4LS4wNC44NC0uMTMuMjUtLjA5LjUxLS4yMS43Ni0uMzdsLjU0IDEuMDFjLS4zMi4yMS0uNjkuMzktMS4wOS41M3MtLjgyLjIxLTEuMjYuMjFjLS40NyAwLS45Mi0uMDgtMS4zMy0uMjUtLjQxLS4xNi0uNzctLjQtMS4wOC0uNy0uMy0uMzEtLjU0LS42OS0uNzItMS4xMy0uMTctLjQ0LS4yNi0uOTUtLjI2LTEuNTJ6bTQuNjEtLjYyYzAtLjU1LS4xMS0uOTgtLjM0LTEuMjgtLjIzLS4zMS0uNTgtLjQ3LTEuMDYtLjQ3LS40MSAwLS43Ny4xNS0xLjA4LjQ1LS4zMS4yOS0uNS43My0uNTcgMS4zem0zLjAxIDIuMjNjLjMxLjI0LjYxLjQzLjkyLjU3LjMuMTMuNjMuMi45OC4yLjM4IDAgLjY1LS4wOC44My0uMjNzLjI3LS4zNS4yNy0uNmMwLS4xNC0uMDUtLjI2LS4xMy0uMzctLjA4LS4xLS4yLS4yLS4zNC0uMjgtLjE0LS4wOS0uMjktLjE2LS40Ny0uMjNsLS41My0uMjJjLS4yMy0uMDktLjQ2LS4xOC0uNjktLjMtLjIzLS4xMS0uNDQtLjI0LS42Mi0uNHMtLjMzLS4zNS0uNDUtLjU1Yy0uMTItLjIxLS4xOC0uNDYtLjE4LS43NSAwLS42MS4yMy0xLjEuNjgtMS40OS40NC0uMzggMS4wNi0uNTcgMS44My0uNTcuNDggMCAuOTEuMDggMS4yOS4yNXMuNzEuMzYuOTkuNTdsLS43NC45OGMtLjI0LS4xNy0uNDktLjMyLS43My0uNDItLjI1LS4xMS0uNTEtLjE2LS43OC0uMTYtLjM1IDAtLjYuMDctLjc2LjIxLS4xNy4xNS0uMjUuMzMtLjI1LjU0IDAgLjE0LjA0LjI2LjEyLjM2cy4xOC4xOC4zMS4yNmMuMTQuMDcuMjkuMTQuNDYuMjFsLjU0LjE5Yy4yMy4wOS40Ny4xOC43LjI5cy40NC4yNC42NC40Yy4xOS4xNi4zNC4zNS40Ni41OC4xMS4yMy4xNy41LjE3LjgyIDAgLjMtLjA2LjU4LS4xNy44My0uMTIuMjYtLjI5LjQ4LS41MS42OC0uMjMuMTktLjUxLjM0LS44NC40NS0uMzQuMTEtLjcyLjE3LTEuMTUuMTctLjQ4IDAtLjk1LS4wOS0xLjQxLS4yNy0uNDYtLjE5LS44Ni0uNDEtMS4yLS42OHoiIGZpbGw9IiM1MzUzNTMiLz48L2c+PC9zdmc+)](https:\/\/crossmark.crossref.org\/dialog\/?doi=10.1007\/s42081-023-00209-y)\n\n### Cite this article\nEfron, B. Machine learning and the James–Stein estimator. *Jpn J Stat Data Sci* **7**, 257–266 (2024). https:\/\/doi.org\/10.1007\/s42081-023-00209-y\n\n[Download citation](https:\/\/citation-needed.springer.com\/v2\/references\/10.1007\/s42081-023-00209-y?format=refman&flavour=citation)\n\n- Received: 24 March 2023\n- Accepted: 13 May 2023\n- Published: 30 June 2023\n- Version of record: 30 June 2023\n- Issue date: June 2024\n- DOI: https:\/\/doi.org\/10.1007\/s42081-023-00209-y\n\n### Share this article\nAnyone you share the following link with will be able to read this content:\n\nGet shareable link\n\nSorry, a shareable link is not currently available for this article.\n\nCopy shareable link to clipboard\n\nProvided by the Springer Nature SharedIt content-sharing initiative\n### Keywords\n- [Empirical bayes](https:\/\/link.springer.com\/search?query=Empirical%20bayes&facet-discipline=\"Statistics\")\n- [Shrinkage](https:\/\/link.springer.com\/search?query=Shrinkage&facet-discipline=\"Statistics\")\n- [Tweedie’s formula](https:\/\/link.springer.com\/search?query=Tweedie%E2%80%99s%20formula&facet-discipline=\"Statistics\")\n- [Benjamini–Hochberg algorithm](https:\/\/link.springer.com\/search?query=Benjamini%E2%80%93Hochberg%20algorithm&facet-discipline=\"Statistics\")\n\n- Sections\n- Figures\n- References\n\n- [Abstract](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Abs1)\n- [1 Introduction](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Sec1)\n- [2 Tweedie’s formula](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Sec2)\n- [3 Shrinkage estimators](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Sec3)\n- [Notes](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#notes)\n- [References](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Bib1)\n- [Funding](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fun)\n- [Author information](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#author-information)\n- [Ethics declarations](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ethics)\n- [Additional information](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#additional-information)\n- [Rights and permissions](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#rightslink)\n- [About this article](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#article-info)\n\nAdvertisement\n\n- **Fig. 1**\n  \n  ![Fig. 1]()\n  \n  [View in article](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1)[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/1)\n- **Fig. 2**\n  \n  ![Fig. 2]()\n  \n  [View in article](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2)[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/2)\n- **Fig. 3**\n  \n  ![Fig. 3]()\n  \n  [View in article](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig3)[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/3)\n- **Fig. 4**\n  \n  ![Fig. 4]()\n  \n  [View in article](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig4)[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/4)\n\n1. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. *Journal of the Royal Statistical Society Series B,* *57*(1), 289–300.\n   [Article](https:\/\/doi.org\/10.1111%2Fj.2517-6161.1995.tb02031.x) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=1325392)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Controlling%20the%20false%20discovery%20rate%3A%20A%20practical%20and%20powerful%20approach%20to%20multiple%20testing&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1995.tb02031.x&volume=57&issue=1&pages=289-300&publication_year=1995&author=Benjamini%2CY&author=Hochberg%2CY)\n2. Efron, B. (2010). *Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction* (Vol. 1). Cambridge: Cambridge University Press.\n   [Book](https:\/\/doi.org\/10.1017%2FCBO9780511761362)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Large-scale%20inference%3A%20Empirical%20Bayes%20methods%20for%20estimation%2C%20testing%2C%20and%20prediction&doi=10.1017%2FCBO9780511761362&publication_year=2010&author=Efron%2CB)\n3. Efron, B. (2011). Tweedie’s formula and selection bias. *Journal of the American Statistical Association,* *106*(496), 1602–1614. <https:\/\/doi.org\/10.1198\/jasa.2011.tm11181>\n   [Article](https:\/\/doi.org\/10.1198%2Fjasa.2011.tm11181) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=2896860)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Tweedie%E2%80%99s%20formula%20and%20selection%20bias&journal=Journal%20of%20the%20American%20Statistical%20Association&doi=10.1198%2Fjasa.2011.tm11181&volume=106&issue=496&pages=1602-1614&publication_year=2011&author=Efron%2CB)\n4. Efron, B. (2016). Empirical Bayes deconvolution estimates. *Biometrika,* *103*(1), 1–20. <https:\/\/doi.org\/10.1093\/biomet\/asv068>\n   [Article](https:\/\/doi.org\/10.1093%2Fbiomet%2Fasv068) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=3465818)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Empirical%20Bayes%20deconvolution%20estimates&journal=Biometrika&doi=10.1093%2Fbiomet%2Fasv068&volume=103&issue=1&pages=1-20&publication_year=2016&author=Efron%2CB)\n5. Efron, B. (2023). *Exponential Families in Theory and Practice*. Cambridge: Cambridge University Press.\n   [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Exponential%20Families%20in%20Theory%20and%20Practice&publication_year=2023&author=Efron%2CB)\n6. Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. *Journal of the American Statistical Association,* *68*, 117–130.\n   [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=388597)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Stein%E2%80%99s%20estimation%20rule%20and%20its%20competitors%E2%80%94An%20empirical%20Bayes%20approach&journal=Journal%20of%20the%20American%20Statistical%20Association&volume=68&pages=117-130&publication_year=1973&author=Efron%2CB&author=Morris%2CC)\n7. James, W., & Stein, C. (1961). Estimation with quadratic loss. In *Proc. 4th Berkeley Sympos. Math. Statist. and Prob.* (Vol. I, pp. 361–379). Berkeley: University of California Press.\n8. Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. *Journal of Statistical Software,* *94*(11), 1–20. <https:\/\/doi.org\/10.18637\/jss.v094.i11>\n   [Article](https:\/\/doi.org\/10.18637%2Fjss.v094.i11)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=deconvolveR%3A%20A%20G-Modeling%20Program%20for%20Deconvolution%20and%20Empirical%20Bayes%20Estimation&journal=Journal%20of%20Statistical%20Software&doi=10.18637%2Fjss.v094.i11&volume=94&issue=11&pages=1-20&publication_year=2020&author=Narasimhan%2CB&author=Efron%2CB)\n9. Robbins, H. (1956). An empirical Bayes approach to statistics. In *Proc. 3rd Berkeley Sympos. Math. Statist. and Prob.* (Vol. I, pp. 157–163). Berkeley: University of California Press.\n10. Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. *Annals of Mathematical Statistics,* *42*(1), 385–388. <https:\/\/doi.org\/10.1214\/aoms\/1177693528>\n    [Article](https:\/\/doi.org\/10.1214%2Faoms%2F1177693528) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=397939)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Proper%20Bayes%20minimax%20estimators%20of%20the%20multivariate%20normal%20mean&journal=Annals%20of%20Mathematical%20Statistics&doi=10.1214%2Faoms%2F1177693528&volume=42&issue=1&pages=385-388&publication_year=1971&author=Strawderman%2CWE)\n11. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. *Journal of the Royal Statistical Society Series B,* *58*(1), 267–288.\n    [Article](https:\/\/doi.org\/10.1111%2Fj.2517-6161.1996.tb02080.x) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=1379242)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Regression%20shrinkage%20and%20selection%20via%20the%20lasso&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1996.tb02080.x&volume=58&issue=1&pages=267-288&publication_year=1996&author=Tibshirani%2CR)\n\n### Discover content\n- [Journals A-Z](https:\/\/link.springer.com\/journals\/a\/1)\n- [Books A-Z](https:\/\/link.springer.com\/books\/a\/1)\n\n### Publish with us\n- [Journal finder](https:\/\/link.springer.com\/journals)\n- [Publish your research](https:\/\/www.springernature.com\/gp\/authors)\n- [Language editing](https:\/\/authorservices.springernature.com\/go\/sn\/?utm_source=SNLinkfooter&utm_medium=Web&utm_campaign=SNReferral)\n- [Open access publishing](https:\/\/www.springernature.com\/gp\/open-science\/about\/the-fundamentals-of-open-access-and-open-research)\n\n### Products and services\n- [Our products](https:\/\/www.springernature.com\/gp\/products)\n- [Librarians](https:\/\/www.springernature.com\/gp\/librarians)\n- [Societies](https:\/\/www.springernature.com\/gp\/societies)\n- [Partners and advertisers](https:\/\/www.springernature.com\/gp\/partners)\n\n### Our brands\n- [Springer](https:\/\/link.springer.com\/brands\/springer)\n- [Nature Portfolio](https:\/\/www.nature.com\/)\n- [BMC](https:\/\/link.springer.com\/brands\/bmc)\n- [Palgrave Macmillan](https:\/\/link.springer.com\/brands\/palgrave)\n- [Apress](https:\/\/link.springer.com\/brands\/apress)\n- [Discover](https:\/\/link.springer.com\/brands\/discover)\n\n- Your privacy choices\/Manage cookies\n- [Your US state privacy rights](https:\/\/www.springernature.com\/gp\/legal\/ccpa)\n- [Accessibility statement](https:\/\/link.springer.com\/accessibility)\n- [Terms and conditions](https:\/\/link.springer.com\/termsandconditions)\n- [Privacy policy](https:\/\/link.springer.com\/privacystatement)\n- [Help and support](https:\/\/support.springernature.com\/en\/support\/home)\n- [Legal notice](https:\/\/link.springer.com\/legal-notice)\n- [Cancel contracts here](https:\/\/support.springernature.com\/en\/support\/solutions\/articles\/6000255911-subscription-cancellations)\n\nNot affiliated\n\n[![Springer Nature](https:\/\/link.springer.com\/oscar-static\/images\/logo-springernature-white-dbadd2cbd6.svg)](https:\/\/www.springernature.com\/)\n\n© 2026 Springer Nature","attrs_readable_markdown":"## 1 Introduction\nBy and large, the statistics world is one of heuristics, approximations, and asymptotics. The James–Stein estimator arrived in that world in 1961 on a note of startling specificity: unseen parameters μ 1 , μ 2 , … , μ n produce independent observations\n\nx i ∼ ind N ( μ i , 1 ) , i \\= 1 , … , n ,\n\n(1)\n\nn ≥ 3. The James–Stein rule in its simplest form proposed estimating the μ i by\n\nμ ^ i J S \\= ( 1 − n − 2 S ) x i ( S \\= ∑ i \\= 1 n x i 2 ) .\n\n(2)\n\nFormula ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) looked implausible: the estimate of μ i depended on the *other* observations x j, j ≠ i (through *S*), as well as x i, despite the independence assumption. Nevertheless, James and Stein showed that Rule ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) *always* beat the obvious maximum likelihood estimates\n\nμ ^ i M L \\= x ¯ i ( i \\= 1 , … , n )\n\n(3)\n\nin terms of total expected squared error\n\nE { ∑ i \\= 1 n ( μ ^ i − μ i ) 2 } .\n\n(4)\n\nThat “always” was the shocking part: two centuries of statistical theory, ANOVA, regression, multivariate analysis, etc., depended on maximum likelihood estimation. Did everything have to be rethought?\n\nOne path forward involved Bayesian thinking. If we assumed that the μ i themselves came from a normal distribution,\n\nμ i ∼ ind N ( 0 , A ) for i \\= 1 , … , n ,\n\n(5)\n\nwith variance A ≥ 0, the Bayes estimates would be\n\nμ ^ i B a y e s \\= B x i ( B \\= A \/ ( A \\+ 1 ) ) .\n\n(6)\n\nWe don’t know *A* or *B* but\n\nB ^ \\= 1 − ( n − 2 ) \/ S\n\n(7)\n\nis *B*’s unbiased estimate: we can rewrite ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) as\n\nμ ^ i J S \\= B ^ x i ,\n\n(8)\n\nwhich at least looks more plausible.\n\nIn the language introduced by Robbins ([1956](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR9 \"Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press.\")), formula ([8](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ8)) is an *empirical Bayes* estimator, another shocking post-war statistical innovation. Carl Morris and I wrote a series of papers in the 1970 s exploring Bayesian roots of the James–Stein estimator (Efron and Morris, [1973](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR6 \"Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130.\")). Something is lost in the empirical Bayes formulation, namely the frequentist “always” of expected square error minimization, but a lot is gained in flexibility and scope, as discussed in Sect. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Sec2).\n\n**Fig. 1**\n\n[![Fig. 1](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig1_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/1)\n\nProstate data: 6033 *x* values; mean 0.003, sd \\= 1\\.135; curve is proportional to a N ( 0 , 1 ) density\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/1)\n\nFigure [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1) illustrates an example of simultaneous estimation pursued in Sect. 2.1 of Efron ([2010](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR2 \"Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.\")). A microarray study has compared expression levels between prostate cancer patients and control subjects for n \\= 6033 genes. For each gene, a statistic x i has been calculated (essentially a “*z*\\-value”),\n\nx i ∼ N ( μ i , 1 ) , i \\= 1 , … , n ,\n\n(9)\n\nwhere μ i measures the difference between cancer and control group levels.\n\nThe solid curve in Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1) is a N ( 0 , 1 ) density scaled to have the same area as the histogram of the 6033 *x* values. A bad result from the researchers’ point of view would be a perfect fit of curve to histogram, which would imply all the genes have μ i \\= 0, the “null” value of no difference between cancer patients and controls.\n\nThat’s not what happened: the histogram has mildly heavy tails in both directions. The researchers were hoping to find genes with large values of ‖ μ i ‖—ones that might be a clue to prostate cancer etiology—as suggested by the heavy tails. How encouraged should they be?\n\nNot very, according to the James–Stein rule. The 6033 x i values have mean 0.003, which I’ll take to be zero, and empirical variance\n\nσ ^ 2 \\= 1\\.289.\n\n(10)\n\nThe James–Stein estimate ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) is\n\nμ ^ i J S \\= ( 1 − n − 2 n − 1 1 σ ^ 2 ) x i \\= 0\\.224 ⋅ x i ,\n\n(11)\n\nso even x i \\= 5 yields an estimate barely exceeding 1. Section [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Sec2) suggests a more optimistic analysis.\n\n## 2 Tweedie’s formula\nThe impressive precision of the James–Stein theorem came at a cost in generality. Efforts to extend the theorem, say to Poisson rather than normal observations, or to measures of loss other than total squared error, gave encouraging asymptotic results but not the James–Stein kind of finite sample frequentist dominance.\n\nBetter progress was possible on the empirical Bayes side of the street. *Tweedie’s formula* (Efron, [2011](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR3 \"Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496), 1602–1614. \n                  https:\/\/doi.org\/10.1198\/jasa.2011.tm11181\n                  \n                \")) has been particularly useful. We wish to calculate Bayesian estimates\n\nμ i B a y e s \\= E { μ i ∣ x i } , i \\= 1 , … , n ,\n\n(12)\n\nin the normal sampling model ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), starting from a given (possibly non-normal) prior π ( μ ), applying to all *n* cases. Let *f*(*x*) be the marginal density\n\nf ( x ) \\= ∫ R π ( μ ) ϕ ( x − μ ) d μ ,\n\n(13)\n\nwith ϕ the standard N ( 0 , 1 ) density and R the range of μ. (It isn’t necessary for π ( ⋅ ) to be a continuous distribution but it simplifies notation.)\n\nTweedie’s formula provides an elegant statement for μ i B a y e s, the posterior expectation of μ i given x i,\n\nμ i B a y e s \\= E { μ i ∣ x i } \\= x i \\+ l ′ ( x i ) with l ′ ( x i ) \\= d d x log ⁡ ( f ( x i ) ) .\n\n(14)\n\nIn the empirical Bayes situation ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), where the prior π ( ⋅ ) is unknown, we can use the observed data x 1 , … , x n to estimate the marginal density *f*(*x*), say by f ^ ( x ), giving empirical Bayes estimates\n\nμ ^ i \\= x i \\+ l ^ ′ ( x i ) .\n\n(15)\n\nThe Bayes estimate ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)) can be thought of as the MLE x i plus a Bayesian correction term l ′ ( x i ). When the prior π ( μ ) is the N ( 0 , A ) distribution ([5](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ5)), μ i B a y e s equals B x i ([6](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ6)). Simple formulas for μ i B a y e s give out for most other choices of π ( μ ) but now, in the machine learning era[Footnote 1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn1) of statistical research, numerical methods provide useful ways forward, as discussed next.\n\nThe *log polynomial class*[Footnote 2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn2) of marginal densities defines *f*(*x*) by\n\nlog ⁡ ( f β ( x ) ) \\= β 0 \\+ β ⊤ c ( x ) .\n\n(16)\n\nHere\n\nc ( x ) \\= ( x , x 2 , … , x J ) ⊤ and β \\= ( β 1 , … , β J ) ⊤ ,\n\n(17)\n\nwith β 0 chosen to make f β ( x ) integrate to 1. The choice J \\= 2 gives normal marginals; larger values of *J* allow for marginal non-normality.\n\n**Fig. 2**\n\n[![Fig. 2](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig2_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/2)\n\nProstate data: Tweedie’s estimate of E { μ ∣ x }, 5 degrees of freedom; dashed curve is James–Stein estimate\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/2)\n\nThe choice J \\= 5 was applied to the prostate cancer data of Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1): Tweedie’s formula ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)) gave μ ^ ( x ) \\= E { μ ∣ x }, graphed as the solid curve in Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2). It differs markedly from the James–Stein estimate J \\= 2, the dashed line. At x \\= 4 for example, the J \\= 5 estimate is[Footnote 3](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn3)\n\nE { μ ∣ x \\= 4 } \\= 2\\.555\n\n(18)\n\ncompared to 0.901 for the James–Stein estimate.\n\nThe estimated curve E { μ ∣ x } is *empirical Bayes* in the same sense as ([8](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ8)): the parameter vector β was selected by maximum likelihood, as discussed next. With J \\= 5, the prior was able to adapt to the “fishing expedition” nature of such microarray studies, where we expect most of the genes to be null or close to null, with μ i nearly zero (corresponding here to the flat part of the curve for *x* between − 2 and 2) and, hopefully, a small proportion of interestingly large μ is.\n\nThe sample size n \\= 6033 has much to do with Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2). James and Stein ([1961](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR7 \"James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press.\")) was usually considered in terms of small samples, perhaps n ≤ 20, for which there would be little hope of seeing the detail in Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2). The term “machine learning era” seems less fanciful when considering the scale of problems statisticians are now asked to deal with, as well as the tools they use to solve them.\n\nIt looks like it might be hard work computing Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2) but it’s not. The histogram in Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1) has 97 bins, with centerpoints\n\nv v \\= ( − 4\\.4 , − 4\\.3 , … , 5\\.1 , 5\\.2 ) .\n\n(19)\n\nLet y j be the count in bin *j*, that is, the number of the 6033 x i values falling into it, with the vector of counts being\n\ny y \\= ( y 1 , … , y 97 ) .\n\n(20)\n\nThen the single R command\n\nl l ^ \\= log ⁡ ( glm ( y y ∼ poly ( v v , 5 ) , poisson ) \\$ fit )\n\n(21)\n\nprovides a close approximation to the MLE of log ⁡ f ( x ) in ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)); numerical differentiation of l l ^ gives Tweedie’s estimate. Section 3.4 of Efron ([2023](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR5 \"Efron, B. (2023). Exponential Families in Theory and Practice. Cambridge: Cambridge University Press.\")) shows why Poisson regression ([21](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ21)) is appropriate here.\n\nThe James–Stein theorem depends on the independence assumption in ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), unlikely to be true in the microarray study, but the estimates ([2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ2)) have a certain marginal validity even under dependence. This is clearer from the empirical Bayes point of view. The Tweedie estimate x i \\+ l ^ ′ ( x i ) requires only that l ^ ′ ( x ) be close to l ′ ( x ), not that it be estimated from independent x is.[Footnote 4](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn4)\n\n## 3 Shrinkage estimators\nJames and Stein’s paper aroused excited interest in the statistics community when it arrived in 1961. Most of the excitement focused on the strict inadmissibility of the traditional maximum likelihood estimate demonstrated by the James–Stein rule. Other rules dominating the MLE were discovered, for instance the Bayes estimator of Strawderman ([1971](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR10 \"Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388. \n                  https:\/\/doi.org\/10.1214\/aoms\/1177693528\n                  \n                \")), that was itself admissible while rendering the MLE inadmissible.\n\nBig new ideas can take a while to make their true impact felt. The James–Stein rule had an influential side effect on subsequent theory and practice in that it demonstrated, in an inarguable way, the virtues of *shrinkage estimation*: given an ensemble of problems, individual estimates are shrunk toward a central point; that is, a deliberate bias is introduced, pulling estimates away from their MLEs for the sake of better group performances.\n\nAdmissibility and inadmissibility aren’t much in the air these days, while shrinkage estimation has gone on to play a major role in modern practice. A spectacular success story is the lasso (Tibshirani, [1996](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR11 \"Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.\")). Lasso shrinkage is extreme, pulling some (often most) of the coefficient estimates all the way back to zero.\n\nBayes and empirical Bayes rules tend to be strong shrinkers. Tweedie’s estimate in Fig. [2](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig2) (J \\= 5) shrinks the estimate of E { μ ∣ x \\= 4 } from its MLE value 4 down to 2.555. For μ between − 1 and 1, the shrinkage is almost all the way to zero.\n\nThe reader may have been surprised to see that neither Tweedie’s formula ([14](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ14)) for E { μ i ∣ x i } nor its empirical version ([15](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ15)) require estimation of the prior π ( μ ). This is a special property of the posterior expectation E { μ i ∣ x i } and isn’t available for say Pr { μ i ≥ 2 ∣ x i }, or most other Bayesian targets.\n\n“Bayesian deconvolution” (Efron, [2016](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR4 \"Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. \n                  https:\/\/doi.org\/10.1093\/biomet\/asv068\n                  \n                \")) uses low-dimensional parametric modeling of π ( μ ) for general empirical Bayes computations. It was applied to finding a prior density π ( μ ) that would give the distribution of *x* seen in Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1), assuming the normal sampling model ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)). The deconvolution model for π ( μ ) used a delta function at μ \\= 0 (for the “null” genes) and a natural spline function with four degrees of freedom for the non-null cases.\n\n**Fig. 3**\n\n[![Fig. 3](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig3_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/3)\n\nEmpirical Bayes conditional density of μ given μ not zero; Pr { μ \\= 0 } equals 0.825\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/3)\n\nThe estimated prior[Footnote 5](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn5)π ^ ( μ ) is shown in Fig. [3](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig3); it put probability 0.825 on μ \\= 0, while the conditional distribution given μ ≠ 0 was a moderately heavy-tailed version of N ( 0 , 1\\.33 2 ). Based on π ^ ( μ ) we can form estimates of *any* Bayesian target, for instance Pr ^ { μ i ≥ 2 ∣ x i \\= 4 } \\= 0\\.80. Figure [3](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig3) is a direct descendent of the James–Stein rule, now 60-plus years on.\n\nA less-direct descendent, but still on the family tree, arrived in 1995. The *false discovery rate* paper by Benjamini and Hochberg concerned simultaneous hypothesis testing. Looking at Fig. [1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig1), which of the n \\= 6033 genes can confidently be labeled as non-null, that is as having μ i ≠ 0?\n\nSuppose for convenience that the x is are ordered from smallest to largest. The right-sided significance level for testing μ i \\= 0 is\n\nS 0 ( x i ) \\= 1 − Φ ( x i ) ,\n\n(22)\n\nwhere Φ is the standard normal cumulative distribution function. Of the 6033 genes, 401 had S i ≤ 0\\.05, the usual rejection level for individual testing, but even if actually *all* of the genes were null we would expect 302 such rejections, so individual testing can’t be right. Benjamini and Hochberg proposed a novel simultaneous testing rule that safely controls the number of “false discoveries” — genes falsely labeled ”non-null” — while not being discouragingly strict. (My summary here won’t give the BH rule its full due; see Chapter 4 of Efron ([2010](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#ref-CR2 \"Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.\")) for a more complete description.)\n\nLet S ^ ( x ) be the observed proportion of x is exceeding value *x*, and define\n\nF d r ^ ( x ) \\= π 0 S 0 ( x ) \/ S ^ ( x ) ,\n\n(23)\n\nwhere π 0 is the proportion of null genes among all *n*.[Footnote 6](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fn6) For a fixed control level α, such as α \\= 0\\.1, the BH rule says to reject the null hypothesis μ i \\= 0 for those genes having\n\nF d r ^ ( x i ) ≤ α .\n\n(24)\n\nThe Benjamini–Hochberg theorem states that under independence assumptions like ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)), the expected proportion of false discoveries by rule ([24](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ24)) is α.\n\n**Fig. 4**\n\n[![Fig. 4](https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1007%2Fs42081-023-00209-y\/MediaObjects\/42081_2023_209_Fig4_HTML.png)](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/4)\n\nProstate data: left Fdr and right Fdr; dashes show 60 genes with F d r \\< 0\\.1\n\n[Full size image](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y\/figures\/4)\n\nFigure [4](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Fig4) shows F d r ^ for the prostate cancer data and also for the left-sided Fdr estimate, where significance is defined by S 0 ( x i ) \\= Φ ( x i ) rather than ([22](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ22)). I applied the BH rule with α \\= 0\\.1 which labeled 60 genes as non-null, 32 on the left and 28 on the right. The BH theorem says that we can expect 6 of the 60 to actually be null.\n\nThe fdr story has evolved very much along the lines of its James–Stein predecessor. Intense initial interest focused on the exact frequentist control of false discovery rates. The Bayes and empirical Bayes implications came later: as at ([5](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ5)), we assume that each x i is a realization of a random variable *x* given by\n\nμ ∼ π ( μ ) and x ∣ μ ∼ p ( x ∣ μ ) ,\n\n(25)\n\nwhere p ( x ∣ μ ) is a known probability kernel which I’ll take here to be the normal sampling model ([1](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ1)). Then if *S*(*x*) is 1 minus the cdf of the marginal density ([13](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ13)), Bayes rule gives\n\nPr { μ \\= 0 ∣ x } \\= π 0 S 0 ( x ) \/ S ( x ) .\n\n(26)\n\nComparing ([26](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ26)) with ([23](https:\/\/link.springer.com\/article\/10.1007\/s42081-023-00209-y#Equ23)) says that the BH rule amounts to labeling case *i* as non-null if its obvious empirical Bayes estimate of nullness is less than α. This is less precise than the frequentist control theorem but, as with the James–Stein estimator, is more robust in not demanding independence among the x is. The family resemblance between JS and BH is through shrinkage: in the BH case the shrinkage of significance levels. For instance, x i \\= 3 has individual significance level 0.001 against nullness, whereas F d r ^ \\= 0\\.164 for the prostate data, i.e, still with about a 1\/6 chance of gene *i* being null.\n\nSo what does machine learning have to do with the James–Stein estimator? Nothing to its birth but, as the articles in this volume show, a great deal to its downstream effects on statistical theory and practice. Charles Stein, who was a good applied statistician when he put his mind to it, might have enjoyed these developments, but maybe not; his heart was always with the mathematics.\n\n## References\n- Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. *Journal of the Royal Statistical Society Series B,* *57*(1), 289–300.\n  [Article](https:\/\/doi.org\/10.1111%2Fj.2517-6161.1995.tb02031.x) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=1325392)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Controlling%20the%20false%20discovery%20rate%3A%20A%20practical%20and%20powerful%20approach%20to%20multiple%20testing&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1995.tb02031.x&volume=57&issue=1&pages=289-300&publication_year=1995&author=Benjamini%2CY&author=Hochberg%2CY)\n- Efron, B. (2010). *Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction* (Vol. 1). Cambridge: Cambridge University Press.\n  [Book](https:\/\/doi.org\/10.1017%2FCBO9780511761362)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Large-scale%20inference%3A%20Empirical%20Bayes%20methods%20for%20estimation%2C%20testing%2C%20and%20prediction&doi=10.1017%2FCBO9780511761362&publication_year=2010&author=Efron%2CB)\n- Efron, B. (2011). Tweedie’s formula and selection bias. *Journal of the American Statistical Association,* *106*(496), 1602–1614. <https:\/\/doi.org\/10.1198\/jasa.2011.tm11181>\n  [Article](https:\/\/doi.org\/10.1198%2Fjasa.2011.tm11181) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=2896860)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Tweedie%E2%80%99s%20formula%20and%20selection%20bias&journal=Journal%20of%20the%20American%20Statistical%20Association&doi=10.1198%2Fjasa.2011.tm11181&volume=106&issue=496&pages=1602-1614&publication_year=2011&author=Efron%2CB)\n- Efron, B. (2016). Empirical Bayes deconvolution estimates. *Biometrika,* *103*(1), 1–20. <https:\/\/doi.org\/10.1093\/biomet\/asv068>\n  [Article](https:\/\/doi.org\/10.1093%2Fbiomet%2Fasv068) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=3465818)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Empirical%20Bayes%20deconvolution%20estimates&journal=Biometrika&doi=10.1093%2Fbiomet%2Fasv068&volume=103&issue=1&pages=1-20&publication_year=2016&author=Efron%2CB)\n- Efron, B. (2023). *Exponential Families in Theory and Practice*. Cambridge: Cambridge University Press.\n  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Exponential%20Families%20in%20Theory%20and%20Practice&publication_year=2023&author=Efron%2CB)\n- Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. *Journal of the American Statistical Association,* *68*, 117–130.\n  [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=388597)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Stein%E2%80%99s%20estimation%20rule%20and%20its%20competitors%E2%80%94An%20empirical%20Bayes%20approach&journal=Journal%20of%20the%20American%20Statistical%20Association&volume=68&pages=117-130&publication_year=1973&author=Efron%2CB&author=Morris%2CC)\n- James, W., & Stein, C. (1961). Estimation with quadratic loss. In *Proc. 4th Berkeley Sympos. Math. Statist. and Prob.* (Vol. I, pp. 361–379). Berkeley: University of California Press.\n- Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. *Journal of Statistical Software,* *94*(11), 1–20. <https:\/\/doi.org\/10.18637\/jss.v094.i11>\n  [Article](https:\/\/doi.org\/10.18637%2Fjss.v094.i11)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=deconvolveR%3A%20A%20G-Modeling%20Program%20for%20Deconvolution%20and%20Empirical%20Bayes%20Estimation&journal=Journal%20of%20Statistical%20Software&doi=10.18637%2Fjss.v094.i11&volume=94&issue=11&pages=1-20&publication_year=2020&author=Narasimhan%2CB&author=Efron%2CB)\n- Robbins, H. (1956). An empirical Bayes approach to statistics. In *Proc. 3rd Berkeley Sympos. Math. Statist. and Prob.* (Vol. I, pp. 157–163). Berkeley: University of California Press.\n- Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. *Annals of Mathematical Statistics,* *42*(1), 385–388. <https:\/\/doi.org\/10.1214\/aoms\/1177693528>\n  [Article](https:\/\/doi.org\/10.1214%2Faoms%2F1177693528) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=397939)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Proper%20Bayes%20minimax%20estimators%20of%20the%20multivariate%20normal%20mean&journal=Annals%20of%20Mathematical%20Statistics&doi=10.1214%2Faoms%2F1177693528&volume=42&issue=1&pages=385-388&publication_year=1971&author=Strawderman%2CWE)\n- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. *Journal of the Royal Statistical Society Series B,* *58*(1), 267–288.\n  [Article](https:\/\/doi.org\/10.1111%2Fj.2517-6161.1996.tb02080.x) [MathSciNet](http:\/\/www.ams.org\/mathscinet-getitem?mr=1379242)  [Google Scholar](http:\/\/scholar.google.com\/scholar_lookup?&title=Regression%20shrinkage%20and%20selection%20via%20the%20lasso&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1996.tb02080.x&volume=58&issue=1&pages=267-288&publication_year=1996&author=Tibshirani%2CR)\n\n[Download references](https:\/\/citation-needed.springer.com\/v2\/references\/10.1007\/s42081-023-00209-y?format=refman&flavour=references)","meta_canonical":null}

3. Robots.txt Check

Query:

Response:

4. Spam/Ban Check

Query:

Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

📄

INDEXABLE

✅

CRAWLED

28 days ago

🤖

ROBOTS ALLOWED

Page Info Filters

Filter	Status	Condition	Details
HTTP status	PASS	`download_http_code = 200`	HTTP 200
Age cutoff	PASS	`download_stamp > now() - 6 MONTH`	1 months ago
History drop	PASS	`isNull(history_drop_reason)`	No drop reason
Spam/ban	PASS	`fh_dont_index != 1 AND ml_spam_score = 0`	ml_spam_score=0
Canonical	PASS	`meta_canonical IS NULL OR = '' OR = src_unparsed`	Not set

Page Details

Property	Value
URL	https://link.springer.com/article/10.1007/s42081-023-00209-y
Last Crawled	2026-03-24 04:11:39 (28 days ago)
First Indexed	2023-09-20 15:47:31 (2 years ago)
HTTP Status Code	200
Meta Title	Machine learning and the James–Stein estimator \| Japanese Journal of Statistics and Data Science \| Springer Nature Link
Meta Description	It is now 62 years since the publication of James and Stein’s seminal article on the estimation of a multivariate normal mean vector. The paper made
Meta Canonical	null
Boilerpipe Text	1 Introduction By and large, the statistics world is one of heuristics, approximations, and asymptotics. The James–Stein estimator arrived in that world in 1961 on a note of startling specificity: unseen parameters μ 1 , μ 2 , … , μ n produce independent observations x i ∼ ind N ( μ i , 1 ) , i = 1 , … , n , (1) n ≥ 3 . The James–Stein rule in its simplest form proposed estimating the μ i by μ ^ i J S = ( 1 − n − 2 S ) x i ( S = ∑ i = 1 n x i 2 ) . (2) Formula ( 2 ) looked implausible: the estimate of μ i depended on the other observations x j , j ≠ i (through S ), as well as x i , despite the independence assumption. Nevertheless, James and Stein showed that Rule ( 2 ) always beat the obvious maximum likelihood estimates μ ^ i M L = x ¯ i ( i = 1 , … , n ) (3) in terms of total expected squared error E { ∑ i = 1 n ( μ ^ i − μ i ) 2 } . (4) That “always” was the shocking part: two centuries of statistical theory, ANOVA, regression, multivariate analysis, etc., depended on maximum likelihood estimation. Did everything have to be rethought? One path forward involved Bayesian thinking. If we assumed that the μ i themselves came from a normal distribution, μ i ∼ ind N ( 0 , A ) for i = 1 , … , n , (5) with variance A ≥ 0 , the Bayes estimates would be μ ^ i B a y e s = B x i ( B = A / ( A + 1 ) ) . (6) We don’t know A or B but B ^ = 1 − ( n − 2 ) / S (7) is B ’s unbiased estimate: we can rewrite ( 2 ) as μ ^ i J S = B ^ x i , (8) which at least looks more plausible. In the language introduced by Robbins ( 1956 ), formula ( 8 ) is an empirical Bayes estimator, another shocking post-war statistical innovation. Carl Morris and I wrote a series of papers in the 1970 s exploring Bayesian roots of the James–Stein estimator (Efron and Morris, 1973 ). Something is lost in the empirical Bayes formulation, namely the frequentist “always” of expected square error minimization, but a lot is gained in flexibility and scope, as discussed in Sect. 2 . Fig. 1 Prostate data: 6033 x values; mean 0.003, sd = 1.135 ; curve is proportional to a N ( 0 , 1 ) density Full size image Figure 1 illustrates an example of simultaneous estimation pursued in Sect. 2.1 of Efron ( 2010 ). A microarray study has compared expression levels between prostate cancer patients and control subjects for n = 6033 genes. For each gene, a statistic x i has been calculated (essentially a “ z -value”), x i ∼ N ( μ i , 1 ) , i = 1 , … , n , (9) where μ i measures the difference between cancer and control group levels. The solid curve in Fig. 1 is a N ( 0 , 1 ) density scaled to have the same area as the histogram of the 6033 x values. A bad result from the researchers’ point of view would be a perfect fit of curve to histogram, which would imply all the genes have μ i = 0 , the “null” value of no difference between cancer patients and controls. That’s not what happened: the histogram has mildly heavy tails in both directions. The researchers were hoping to find genes with large values of ‖ μ i ‖ —ones that might be a clue to prostate cancer etiology—as suggested by the heavy tails. How encouraged should they be? Not very, according to the James–Stein rule. The 6033 x i values have mean 0.003, which I’ll take to be zero, and empirical variance σ ^ 2 = 1.289. (10) The James–Stein estimate ( 2 ) is μ ^ i J S = ( 1 − n − 2 n − 1 1 σ ^ 2 ) x i = 0.224 ⋅ x i , (11) so even x i = 5 yields an estimate barely exceeding 1. Section 2 suggests a more optimistic analysis. 2 Tweedie’s formula The impressive precision of the James–Stein theorem came at a cost in generality. Efforts to extend the theorem, say to Poisson rather than normal observations, or to measures of loss other than total squared error, gave encouraging asymptotic results but not the James–Stein kind of finite sample frequentist dominance. Better progress was possible on the empirical Bayes side of the street. Tweedie’s formula (Efron, 2011 ) has been particularly useful. We wish to calculate Bayesian estimates μ i B a y e s = E { μ i ∣ x i } , i = 1 , … , n , (12) in the normal sampling model ( 1 ), starting from a given (possibly non-normal) prior π ( μ ) , applying to all n cases. Let f ( x ) be the marginal density f ( x ) = ∫ R π ( μ ) ϕ ( x − μ ) d μ , (13) with ϕ the standard N ( 0 , 1 ) density and R the range of μ . (It isn’t necessary for π ( ⋅ ) to be a continuous distribution but it simplifies notation.) Tweedie’s formula provides an elegant statement for μ i B a y e s , the posterior expectation of μ i given x i , μ i B a y e s = E { μ i ∣ x i } = x i + l ′ ( x i ) with l ′ ( x i ) = d d x log ⁡ ( f ( x i ) ) . (14) In the empirical Bayes situation ( 1 ), where the prior π ( ⋅ ) is unknown, we can use the observed data x 1 , … , x n to estimate the marginal density f ( x ), say by f ^ ( x ) , giving empirical Bayes estimates μ ^ i = x i + l ^ ′ ( x i ) . (15) The Bayes estimate ( 14 ) can be thought of as the MLE x i plus a Bayesian correction term l ′ ( x i ) . When the prior π ( μ ) is the N ( 0 , A ) distribution ( 5 ), μ i B a y e s equals B x i ( 6 ). Simple formulas for μ i B a y e s give out for most other choices of π ( μ ) but now, in the machine learning era Footnote 1 of statistical research, numerical methods provide useful ways forward, as discussed next. The log polynomial class Footnote 2 of marginal densities defines f ( x ) by log ⁡ ( f β ( x ) ) = β 0 + β ⊤ c ( x ) . (16) Here c ( x ) = ( x , x 2 , … , x J ) ⊤ and β = ( β 1 , … , β J ) ⊤ , (17) with β 0 chosen to make f β ( x ) integrate to 1. The choice J = 2 gives normal marginals; larger values of J allow for marginal non-normality. Fig. 2 Prostate data: Tweedie’s estimate of E { μ ∣ x } , 5 degrees of freedom; dashed curve is James–Stein estimate Full size image The choice J = 5 was applied to the prostate cancer data of Fig. 1 : Tweedie’s formula ( 14 ) gave μ ^ ( x ) = E { μ ∣ x } , graphed as the solid curve in Fig. 2 . It differs markedly from the James–Stein estimate J = 2 , the dashed line. At x = 4 for example, the J = 5 estimate is Footnote 3 E { μ ∣ x = 4 } = 2.555 (18) compared to 0.901 for the James–Stein estimate. The estimated curve E { μ ∣ x } is empirical Bayes in the same sense as ( 8 ): the parameter vector β was selected by maximum likelihood, as discussed next. With J = 5 , the prior was able to adapt to the “fishing expedition” nature of such microarray studies, where we expect most of the genes to be null or close to null, with μ i nearly zero (corresponding here to the flat part of the curve for x between − 2 and 2) and, hopefully, a small proportion of interestingly large μ i s. The sample size n = 6033 has much to do with Fig. 2 . James and Stein ( 1961 ) was usually considered in terms of small samples, perhaps n ≤ 20 , for which there would be little hope of seeing the detail in Fig. 2 . The term “machine learning era” seems less fanciful when considering the scale of problems statisticians are now asked to deal with, as well as the tools they use to solve them. It looks like it might be hard work computing Fig. 2 but it’s not. The histogram in Fig. 1 has 97 bins, with centerpoints v v = ( − 4.4 , − 4.3 , … , 5.1 , 5.2 ) . (19) Let y j be the count in bin j , that is, the number of the 6033 x i values falling into it, with the vector of counts being y y = ( y 1 , … , y 97 ) . (20) Then the single R command l l ^ = log ⁡ ( glm ( y y ∼ poly ( v v , 5 ) , poisson ) $ fit ) (21) provides a close approximation to the MLE of log ⁡ f ( x ) in ( 14 ); numerical differentiation of l l ^ gives Tweedie’s estimate. Section 3.4 of Efron ( 2023 ) shows why Poisson regression ( 21 ) is appropriate here. The James–Stein theorem depends on the independence assumption in ( 1 ), unlikely to be true in the microarray study, but the estimates ( 2 ) have a certain marginal validity even under dependence. This is clearer from the empirical Bayes point of view. The Tweedie estimate x i + l ^ ′ ( x i ) requires only that l ^ ′ ( x ) be close to l ′ ( x ) , not that it be estimated from independent x i s. Footnote 4 3 Shrinkage estimators James and Stein’s paper aroused excited interest in the statistics community when it arrived in 1961. Most of the excitement focused on the strict inadmissibility of the traditional maximum likelihood estimate demonstrated by the James–Stein rule. Other rules dominating the MLE were discovered, for instance the Bayes estimator of Strawderman ( 1971 ), that was itself admissible while rendering the MLE inadmissible. Big new ideas can take a while to make their true impact felt. The James–Stein rule had an influential side effect on subsequent theory and practice in that it demonstrated, in an inarguable way, the virtues of shrinkage estimation : given an ensemble of problems, individual estimates are shrunk toward a central point; that is, a deliberate bias is introduced, pulling estimates away from their MLEs for the sake of better group performances. Admissibility and inadmissibility aren’t much in the air these days, while shrinkage estimation has gone on to play a major role in modern practice. A spectacular success story is the lasso (Tibshirani, 1996 ). Lasso shrinkage is extreme, pulling some (often most) of the coefficient estimates all the way back to zero. Bayes and empirical Bayes rules tend to be strong shrinkers. Tweedie’s estimate in Fig. 2 ( J = 5 ) shrinks the estimate of E { μ ∣ x = 4 } from its MLE value 4 down to 2.555. For μ between − 1 and 1, the shrinkage is almost all the way to zero. The reader may have been surprised to see that neither Tweedie’s formula ( 14 ) for E { μ i ∣ x i } nor its empirical version ( 15 ) require estimation of the prior π ( μ ) . This is a special property of the posterior expectation E { μ i ∣ x i } and isn’t available for say Pr { μ i ≥ 2 ∣ x i } , or most other Bayesian targets. “Bayesian deconvolution” (Efron, 2016 ) uses low-dimensional parametric modeling of π ( μ ) for general empirical Bayes computations. It was applied to finding a prior density π ( μ ) that would give the distribution of x seen in Fig. 1 , assuming the normal sampling model ( 1 ). The deconvolution model for π ( μ ) used a delta function at μ = 0 (for the “null” genes) and a natural spline function with four degrees of freedom for the non-null cases. Fig. 3 Empirical Bayes conditional density of μ given μ not zero; Pr { μ = 0 } equals 0.825 Full size image The estimated prior Footnote 5 π ^ ( μ ) is shown in Fig. 3 ; it put probability 0.825 on μ = 0 , while the conditional distribution given μ ≠ 0 was a moderately heavy-tailed version of N ( 0 , 1.33 2 ) . Based on π ^ ( μ ) we can form estimates of any Bayesian target, for instance Pr ^ { μ i ≥ 2 ∣ x i = 4 } = 0.80 . Figure 3 is a direct descendent of the James–Stein rule, now 60-plus years on. A less-direct descendent, but still on the family tree, arrived in 1995. The false discovery rate paper by Benjamini and Hochberg concerned simultaneous hypothesis testing. Looking at Fig. 1 , which of the n = 6033 genes can confidently be labeled as non-null, that is as having μ i ≠ 0 ? Suppose for convenience that the x i s are ordered from smallest to largest. The right-sided significance level for testing μ i = 0 is S 0 ( x i ) = 1 − Φ ( x i ) , (22) where Φ is the standard normal cumulative distribution function. Of the 6033 genes, 401 had S i ≤ 0.05 , the usual rejection level for individual testing, but even if actually all of the genes were null we would expect 302 such rejections, so individual testing can’t be right. Benjamini and Hochberg proposed a novel simultaneous testing rule that safely controls the number of “false discoveries” — genes falsely labeled ”non-null” — while not being discouragingly strict. (My summary here won’t give the BH rule its full due; see Chapter 4 of Efron ( 2010 ) for a more complete description.) Let S ^ ( x ) be the observed proportion of x i s exceeding value x , and define F d r ^ ( x ) = π 0 S 0 ( x ) / S ^ ( x ) , (23) where π 0 is the proportion of null genes among all n . Footnote 6 For a fixed control level α , such as α = 0.1 , the BH rule says to reject the null hypothesis μ i = 0 for those genes having F d r ^ ( x i ) ≤ α . (24) The Benjamini–Hochberg theorem states that under independence assumptions like ( 1 ), the expected proportion of false discoveries by rule ( 24 ) is α . Fig. 4 Prostate data: left Fdr and right Fdr; dashes show 60 genes with F d r < 0.1 Full size image Figure 4 shows F d r ^ for the prostate cancer data and also for the left-sided Fdr estimate, where significance is defined by S 0 ( x i ) = Φ ( x i ) rather than ( 22 ). I applied the BH rule with α = 0.1 which labeled 60 genes as non-null, 32 on the left and 28 on the right. The BH theorem says that we can expect 6 of the 60 to actually be null. The fdr story has evolved very much along the lines of its James–Stein predecessor. Intense initial interest focused on the exact frequentist control of false discovery rates. The Bayes and empirical Bayes implications came later: as at ( 5 ), we assume that each x i is a realization of a random variable x given by μ ∼ π ( μ ) and x ∣ μ ∼ p ( x ∣ μ ) , (25) where p ( x ∣ μ ) is a known probability kernel which I’ll take here to be the normal sampling model ( 1 ). Then if S ( x ) is 1 minus the cdf of the marginal density ( 13 ), Bayes rule gives Pr { μ = 0 ∣ x } = π 0 S 0 ( x ) / S ( x ) . (26) Comparing ( 26 ) with ( 23 ) says that the BH rule amounts to labeling case i as non-null if its obvious empirical Bayes estimate of nullness is less than α . This is less precise than the frequentist control theorem but, as with the James–Stein estimator, is more robust in not demanding independence among the x i s. The family resemblance between JS and BH is through shrinkage: in the BH case the shrinkage of significance levels. For instance, x i = 3 has individual significance level 0.001 against nullness, whereas F d r ^ = 0.164 for the prostate data, i.e, still with about a 1/6 chance of gene i being null. So what does machine learning have to do with the James–Stein estimator? Nothing to its birth but, as the articles in this volume show, a great deal to its downstream effects on statistical theory and practice. Charles Stein, who was a good applied statistician when he put his mind to it, might have enjoyed these developments, but maybe not; his heart was always with the mathematics. References Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57 (1), 289–300. Article MathSciNet Google Scholar Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press. Book Google Scholar Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106 (496), 1602–1614. https://doi.org/10.1198/jasa.2011.tm11181 Article MathSciNet Google Scholar Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103 (1), 1–20. https://doi.org/10.1093/biomet/asv068 Article MathSciNet Google Scholar Efron, B. (2023). Exponential Families in Theory and Practice . Cambridge: Cambridge University Press. Google Scholar Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68 , 117–130. MathSciNet Google Scholar James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press. Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. Journal of Statistical Software, 94 (11), 1–20. https://doi.org/10.18637/jss.v094.i11 Article Google Scholar Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press. Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42 (1), 385–388. https://doi.org/10.1214/aoms/1177693528 Article MathSciNet Google Scholar Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58 (1), 267–288. Article MathSciNet Google Scholar Download references
Markdown	[Skip to main content](https://link.springer.com/article/10.1007/s42081-023-00209-y#main) Advertisement [![Advertisement](https://pubads.g.doubleclick.net/gampad/ad?iu=/270604982/springerlink/42081/article&sz=728x90&pos=top&articleid=s42081-023-00209-y)](https://pubads.g.doubleclick.net/gampad/jump?iu=/270604982/springerlink/42081/article&sz=728x90&pos=top&articleid=s42081-023-00209-y) [![Springer Nature Link](https://link.springer.com/oscar-static/images/darwin/header/img/logo-springer-nature-link-05805fde18.svg)](https://link.springer.com/) [Account](https://link.springer.com/article/10.1007/s42081-023-00209-y) [Menu]() [Find a journal](https://link.springer.com/journals/) [Publish with us](https://www.springernature.com/gp/authors) [Track your research](https://link.springernature.com/home/) [Search]() [Saved research](https://link.springer.com/saved-research) [Cart](https://order.springer.com/public/cart) ## Search ## Navigation - [Find a journal](https://link.springer.com/journals/) - [Publish with us](https://www.springernature.com/gp/authors) - [Track your research](https://link.springernature.com/home/) 1. [Home](https://link.springer.com/) 2. [Japanese Journal of Statistics and Data Science](https://link.springer.com/journal/42081) 3. Article # Machine learning and the James–Stein estimator - Original Paper - Stein Estimation and Statistical Shrinkage Methods - [Open access](https://www.springernature.com/gp/open-science/about/the-fundamentals-of-open-access-and-open-research) - Published: 30 June 2023 - Volume 7, pages 257–266, (2024) - [Cite this article](https://link.springer.com/article/10.1007/s42081-023-00209-y#citeas) You have full access to this [open access](https://www.springernature.com/gp/open-science/about/the-fundamentals-of-open-access-and-open-research) article [Download PDF](https://link.springer.com/content/pdf/10.1007/s42081-023-00209-y.pdf) [Save article](https://link.springer.com/article/10.1007/s42081-023-00209-y/save-research?_csrf=MqX1VwflVBF2wGV6B4ITqKc5OgaOhi_J) [View saved research](https://link.springer.com/saved-research) [![](https://media.springernature.com/w72/springer-static/cover-hires/journal/42081?as=webp) Japanese Journal of Statistics and Data Science](https://link.springer.com/journal/42081) [Aims and scope](https://link.springer.com/journal/42081/aims-and-scope) [Submit manuscript](https://www.editorialmanager.com/jjsd) Machine learning and the James–Stein estimator [Download PDF](https://link.springer.com/content/pdf/10.1007/s42081-023-00209-y.pdf) - [Bradley Efron](https://link.springer.com/article/10.1007/s42081-023-00209-y#auth-Bradley-Efron-Aff1-Aff2) [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Aff1),[2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Aff2) - 9362 Accesses - 7 Citations - 24 Altmetric - 1 Mention - [Explore all metrics](https://link.springer.com/article/10.1007/s42081-023-00209-y/metrics) ## Abstract It is now 62 years since the publication of James and Stein’s seminal article on the estimation of a multivariate normal mean vector. The paper made a spectacular first impression on the statistical community through its demonstration of inadmissability of the maximum likelihood estimator. It continues to be influential, but not for the initial reasons. Empirical Bayes shrinkage estimation, now a major topic, found its early justification in the James–Stein formula. Less obvious downstream topics include Tweedie’s formula and Benjamini and Hochberg’s false discovery rate algorithm. This is a short and mainly non-technical review of the James–Stein rule and its effects on the machine learning era of statistical innovation. ### Similar content being viewed by others ![](https://media.springernature.com/w215h120/springer-static/image/art%3A10.1007%2Fs42081-023-00227-w/MediaObjects/42081_2023_227_Fig1_HTML.png) ### [Expansion estimators improving the bias and risk of James–Stein’s shrinkage estimator](https://link.springer.com/10.1007/s42081-023-00227-w) Article 16 December 2023 ![](https://media.springernature.com/w215h120/springer-static/image/art%3A10.1007%2Fs10618-019-00622-6/MediaObjects/10618_2019_622_Fig1_HTML.png) ### [A new class of metrics for learning on real-valued and structured data](https://link.springer.com/10.1007/s10618-019-00622-6) Article 27 March 2019 ![](https://media.springernature.com/w215h120/springer-static/image/art%3A10.1007%2Fs40953-021-00270-y/MediaObjects/40953_2021_270_Fig1_HTML.png) ### [Weak Versus Strong Dominance of Shrinkage Estimators](https://link.springer.com/10.1007/s40953-021-00270-y) Article 18 November 2021 ### Explore related subjects Discover the latest articles, books and news in related subjects, suggested using machine learning. - [Bayesian Inference](https://link.springer.com/subjects/bayesian-inference) - [Empiricism](https://link.springer.com/subjects/empiricism) - [Machine Learning](https://link.springer.com/subjects/machine-learning) - [Non-parametric Inference](https://link.springer.com/subjects/non-parametric-inference) - [Statistical Learning](https://link.springer.com/subjects/statistical-learning) - [Statistical Theory and Methods](https://link.springer.com/subjects/statistical-theory-and-methods) - [Empirical Bayesian Methods in Statistical Inference](https://link.springer.com/subjects/empirical-bayesian-methods-in-statistical-inference) ## 1 Introduction By and large, the statistics world is one of heuristics, approximations, and asymptotics. The James–Stein estimator arrived in that world in 1961 on a note of startling specificity: unseen parameters μ 1 , μ 2 , … , μ n produce independent observations x i ∼ ind N ( μ i , 1 ) , i \= 1 , … , n , (1) n ≥ 3. The James–Stein rule in its simplest form proposed estimating the μ i by μ ^ i J S \= ( 1 − n − 2 S ) x i ( S \= ∑ i \= 1 n x i 2 ) . (2) Formula ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) looked implausible: the estimate of μ i depended on the other observations x j, j ≠ i (through S), as well as x i, despite the independence assumption. Nevertheless, James and Stein showed that Rule ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) always beat the obvious maximum likelihood estimates μ ^ i M L \= x ¯ i ( i \= 1 , … , n ) (3) in terms of total expected squared error E { ∑ i \= 1 n ( μ ^ i − μ i ) 2 } . (4) That “always” was the shocking part: two centuries of statistical theory, ANOVA, regression, multivariate analysis, etc., depended on maximum likelihood estimation. Did everything have to be rethought? One path forward involved Bayesian thinking. If we assumed that the μ i themselves came from a normal distribution, μ i ∼ ind N ( 0 , A ) for i \= 1 , … , n , (5) with variance A ≥ 0, the Bayes estimates would be μ ^ i B a y e s \= B x i ( B \= A / ( A \+ 1 ) ) . (6) We don’t know A or B but B ^ \= 1 − ( n − 2 ) / S (7) is B’s unbiased estimate: we can rewrite ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) as μ ^ i J S \= B ^ x i , (8) which at least looks more plausible. In the language introduced by Robbins ([1956](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR9 "Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press.")), formula ([8](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ8)) is an empirical Bayes estimator, another shocking post-war statistical innovation. Carl Morris and I wrote a series of papers in the 1970 s exploring Bayesian roots of the James–Stein estimator (Efron and Morris, [1973](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR6 "Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130.")). Something is lost in the empirical Bayes formulation, namely the frequentist “always” of expected square error minimization, but a lot is gained in flexibility and scope, as discussed in Sect. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Sec2). Fig. 1 [![Fig. 1](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig1_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/1) Prostate data: 6033 x values; mean 0.003, sd \= 1\.135; curve is proportional to a N ( 0 , 1 ) density [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/1) Figure [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1) illustrates an example of simultaneous estimation pursued in Sect. 2.1 of Efron ([2010](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR2 "Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.")). A microarray study has compared expression levels between prostate cancer patients and control subjects for n \= 6033 genes. For each gene, a statistic x i has been calculated (essentially a “z\-value”), x i ∼ N ( μ i , 1 ) , i \= 1 , … , n , (9) where μ i measures the difference between cancer and control group levels. The solid curve in Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1) is a N ( 0 , 1 ) density scaled to have the same area as the histogram of the 6033 x values. A bad result from the researchers’ point of view would be a perfect fit of curve to histogram, which would imply all the genes have μ i \= 0, the “null” value of no difference between cancer patients and controls. That’s not what happened: the histogram has mildly heavy tails in both directions. The researchers were hoping to find genes with large values of ‖ μ i ‖—ones that might be a clue to prostate cancer etiology—as suggested by the heavy tails. How encouraged should they be? Not very, according to the James–Stein rule. The 6033 x i values have mean 0.003, which I’ll take to be zero, and empirical variance σ ^ 2 \= 1\.289. (10) The James–Stein estimate ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) is μ ^ i J S \= ( 1 − n − 2 n − 1 1 σ ^ 2 ) x i \= 0\.224 ⋅ x i , (11) so even x i \= 5 yields an estimate barely exceeding 1. Section [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Sec2) suggests a more optimistic analysis. ## 2 Tweedie’s formula The impressive precision of the James–Stein theorem came at a cost in generality. Efforts to extend the theorem, say to Poisson rather than normal observations, or to measures of loss other than total squared error, gave encouraging asymptotic results but not the James–Stein kind of finite sample frequentist dominance. Better progress was possible on the empirical Bayes side of the street. Tweedie’s formula (Efron, [2011](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR3 "Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496), 1602–1614. https://doi.org/10.1198/jasa.2011.tm11181 ")) has been particularly useful. We wish to calculate Bayesian estimates μ i B a y e s \= E { μ i ∣ x i } , i \= 1 , … , n , (12) in the normal sampling model ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), starting from a given (possibly non-normal) prior π ( μ ), applying to all n cases. Let f(x) be the marginal density f ( x ) \= ∫ R π ( μ ) ϕ ( x − μ ) d μ , (13) with ϕ the standard N ( 0 , 1 ) density and R the range of μ. (It isn’t necessary for π ( ⋅ ) to be a continuous distribution but it simplifies notation.) Tweedie’s formula provides an elegant statement for μ i B a y e s, the posterior expectation of μ i given x i, μ i B a y e s \= E { μ i ∣ x i } \= x i \+ l ′ ( x i ) with l ′ ( x i ) \= d d x log ⁡ ( f ( x i ) ) . (14) In the empirical Bayes situation ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), where the prior π ( ⋅ ) is unknown, we can use the observed data x 1 , … , x n to estimate the marginal density f(x), say by f ^ ( x ), giving empirical Bayes estimates μ ^ i \= x i \+ l ^ ′ ( x i ) . (15) The Bayes estimate ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)) can be thought of as the MLE x i plus a Bayesian correction term l ′ ( x i ). When the prior π ( μ ) is the N ( 0 , A ) distribution ([5](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ5)), μ i B a y e s equals B x i ([6](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ6)). Simple formulas for μ i B a y e s give out for most other choices of π ( μ ) but now, in the machine learning era[Footnote 1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn1) of statistical research, numerical methods provide useful ways forward, as discussed next. The log polynomial class[Footnote 2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn2) of marginal densities defines f(x) by log ⁡ ( f β ( x ) ) \= β 0 \+ β ⊤ c ( x ) . (16) Here c ( x ) \= ( x , x 2 , … , x J ) ⊤ and β \= ( β 1 , … , β J ) ⊤ , (17) with β 0 chosen to make f β ( x ) integrate to 1. The choice J \= 2 gives normal marginals; larger values of J allow for marginal non-normality. Fig. 2 [![Fig. 2](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig2_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/2) Prostate data: Tweedie’s estimate of E { μ ∣ x }, 5 degrees of freedom; dashed curve is James–Stein estimate [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/2) The choice J \= 5 was applied to the prostate cancer data of Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1): Tweedie’s formula ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)) gave μ ^ ( x ) \= E { μ ∣ x }, graphed as the solid curve in Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2). It differs markedly from the James–Stein estimate J \= 2, the dashed line. At x \= 4 for example, the J \= 5 estimate is[Footnote 3](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn3) E { μ ∣ x \= 4 } \= 2\.555 (18) compared to 0.901 for the James–Stein estimate. The estimated curve E { μ ∣ x } is empirical Bayes in the same sense as ([8](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ8)): the parameter vector β was selected by maximum likelihood, as discussed next. With J \= 5, the prior was able to adapt to the “fishing expedition” nature of such microarray studies, where we expect most of the genes to be null or close to null, with μ i nearly zero (corresponding here to the flat part of the curve for x between − 2 and 2) and, hopefully, a small proportion of interestingly large μ is. The sample size n \= 6033 has much to do with Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2). James and Stein ([1961](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR7 "James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press.")) was usually considered in terms of small samples, perhaps n ≤ 20, for which there would be little hope of seeing the detail in Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2). The term “machine learning era” seems less fanciful when considering the scale of problems statisticians are now asked to deal with, as well as the tools they use to solve them. It looks like it might be hard work computing Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2) but it’s not. The histogram in Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1) has 97 bins, with centerpoints v v \= ( − 4\.4 , − 4\.3 , … , 5\.1 , 5\.2 ) . (19) Let y j be the count in bin j, that is, the number of the 6033 x i values falling into it, with the vector of counts being y y \= ( y 1 , … , y 97 ) . (20) Then the single R command l l ^ \= log ⁡ ( glm ( y y ∼ poly ( v v , 5 ) , poisson ) \$ fit ) (21) provides a close approximation to the MLE of log ⁡ f ( x ) in ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)); numerical differentiation of l l ^ gives Tweedie’s estimate. Section 3.4 of Efron ([2023](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR5 "Efron, B. (2023). Exponential Families in Theory and Practice. Cambridge: Cambridge University Press.")) shows why Poisson regression ([21](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ21)) is appropriate here. The James–Stein theorem depends on the independence assumption in ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), unlikely to be true in the microarray study, but the estimates ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) have a certain marginal validity even under dependence. This is clearer from the empirical Bayes point of view. The Tweedie estimate x i \+ l ^ ′ ( x i ) requires only that l ^ ′ ( x ) be close to l ′ ( x ), not that it be estimated from independent x is.[Footnote 4](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn4) ## 3 Shrinkage estimators James and Stein’s paper aroused excited interest in the statistics community when it arrived in 1961. Most of the excitement focused on the strict inadmissibility of the traditional maximum likelihood estimate demonstrated by the James–Stein rule. Other rules dominating the MLE were discovered, for instance the Bayes estimator of Strawderman ([1971](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR10 "Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388. https://doi.org/10.1214/aoms/1177693528 ")), that was itself admissible while rendering the MLE inadmissible. Big new ideas can take a while to make their true impact felt. The James–Stein rule had an influential side effect on subsequent theory and practice in that it demonstrated, in an inarguable way, the virtues of shrinkage estimation: given an ensemble of problems, individual estimates are shrunk toward a central point; that is, a deliberate bias is introduced, pulling estimates away from their MLEs for the sake of better group performances. Admissibility and inadmissibility aren’t much in the air these days, while shrinkage estimation has gone on to play a major role in modern practice. A spectacular success story is the lasso (Tibshirani, [1996](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR11 "Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.")). Lasso shrinkage is extreme, pulling some (often most) of the coefficient estimates all the way back to zero. Bayes and empirical Bayes rules tend to be strong shrinkers. Tweedie’s estimate in Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2) (J \= 5) shrinks the estimate of E { μ ∣ x \= 4 } from its MLE value 4 down to 2.555. For μ between − 1 and 1, the shrinkage is almost all the way to zero. The reader may have been surprised to see that neither Tweedie’s formula ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)) for E { μ i ∣ x i } nor its empirical version ([15](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ15)) require estimation of the prior π ( μ ). This is a special property of the posterior expectation E { μ i ∣ x i } and isn’t available for say Pr { μ i ≥ 2 ∣ x i }, or most other Bayesian targets. “Bayesian deconvolution” (Efron, [2016](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR4 "Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. https://doi.org/10.1093/biomet/asv068 ")) uses low-dimensional parametric modeling of π ( μ ) for general empirical Bayes computations. It was applied to finding a prior density π ( μ ) that would give the distribution of x seen in Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1), assuming the normal sampling model ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)). The deconvolution model for π ( μ ) used a delta function at μ \= 0 (for the “null” genes) and a natural spline function with four degrees of freedom for the non-null cases. Fig. 3 [![Fig. 3](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig3_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/3) Empirical Bayes conditional density of μ given μ not zero; Pr { μ \= 0 } equals 0.825 [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/3) The estimated prior[Footnote 5](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn5)π ^ ( μ ) is shown in Fig. [3](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig3); it put probability 0.825 on μ \= 0, while the conditional distribution given μ ≠ 0 was a moderately heavy-tailed version of N ( 0 , 1\.33 2 ). Based on π ^ ( μ ) we can form estimates of any Bayesian target, for instance Pr ^ { μ i ≥ 2 ∣ x i \= 4 } \= 0\.80. Figure [3](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig3) is a direct descendent of the James–Stein rule, now 60-plus years on. A less-direct descendent, but still on the family tree, arrived in 1995. The false discovery rate paper by Benjamini and Hochberg concerned simultaneous hypothesis testing. Looking at Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1), which of the n \= 6033 genes can confidently be labeled as non-null, that is as having μ i ≠ 0? Suppose for convenience that the x is are ordered from smallest to largest. The right-sided significance level for testing μ i \= 0 is S 0 ( x i ) \= 1 − Φ ( x i ) , (22) where Φ is the standard normal cumulative distribution function. Of the 6033 genes, 401 had S i ≤ 0\.05, the usual rejection level for individual testing, but even if actually all of the genes were null we would expect 302 such rejections, so individual testing can’t be right. Benjamini and Hochberg proposed a novel simultaneous testing rule that safely controls the number of “false discoveries” — genes falsely labeled ”non-null” — while not being discouragingly strict. (My summary here won’t give the BH rule its full due; see Chapter 4 of Efron ([2010](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR2 "Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.")) for a more complete description.) Let S ^ ( x ) be the observed proportion of x is exceeding value x, and define F d r ^ ( x ) \= π 0 S 0 ( x ) / S ^ ( x ) , (23) where π 0 is the proportion of null genes among all n.[Footnote 6](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn6) For a fixed control level α, such as α \= 0\.1, the BH rule says to reject the null hypothesis μ i \= 0 for those genes having F d r ^ ( x i ) ≤ α . (24) The Benjamini–Hochberg theorem states that under independence assumptions like ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), the expected proportion of false discoveries by rule ([24](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ24)) is α. Fig. 4 [![Fig. 4](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig4_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/4) Prostate data: left Fdr and right Fdr; dashes show 60 genes with F d r \< 0\.1 [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/4) Figure [4](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig4) shows F d r ^ for the prostate cancer data and also for the left-sided Fdr estimate, where significance is defined by S 0 ( x i ) \= Φ ( x i ) rather than ([22](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ22)). I applied the BH rule with α \= 0\.1 which labeled 60 genes as non-null, 32 on the left and 28 on the right. The BH theorem says that we can expect 6 of the 60 to actually be null. The fdr story has evolved very much along the lines of its James–Stein predecessor. Intense initial interest focused on the exact frequentist control of false discovery rates. The Bayes and empirical Bayes implications came later: as at ([5](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ5)), we assume that each x i is a realization of a random variable x given by μ ∼ π ( μ ) and x ∣ μ ∼ p ( x ∣ μ ) , (25) where p ( x ∣ μ ) is a known probability kernel which I’ll take here to be the normal sampling model ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)). Then if S(x) is 1 minus the cdf of the marginal density ([13](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ13)), Bayes rule gives Pr { μ \= 0 ∣ x } \= π 0 S 0 ( x ) / S ( x ) . (26) Comparing ([26](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ26)) with ([23](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ23)) says that the BH rule amounts to labeling case i as non-null if its obvious empirical Bayes estimate of nullness is less than α. This is less precise than the frequentist control theorem but, as with the James–Stein estimator, is more robust in not demanding independence among the x is. The family resemblance between JS and BH is through shrinkage: in the BH case the shrinkage of significance levels. For instance, x i \= 3 has individual significance level 0.001 against nullness, whereas F d r ^ \= 0\.164 for the prostate data, i.e, still with about a 1/6 chance of gene i being null. So what does machine learning have to do with the James–Stein estimator? Nothing to its birth but, as the articles in this volume show, a great deal to its downstream effects on statistical theory and practice. Charles Stein, who was a good applied statistician when he put his mind to it, might have enjoyed these developments, but maybe not; his heart was always with the mathematics. ## Notes 1. Where algorithms can substitute for theorems. 2. For general use, a natural spline basis is preferable to polynomials, to control the behavior of log ⁡ π ( μ ) at the extremes. 3. With an estimated bootstrap standard error of 0.192. 4. The accuracy of the Tweedie estimate does suffer under dependence, so the previously quoted bootstrap standard error is likely to be optimistic. 5. Estimated using the CRAN package deconvolveR (Narasimhan and Efron, [2020](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR8 "Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. Journal of Statistical Software, 94(11), 1–20. https://doi.org/10.18637/jss.v094.i11 ")). 6. π 0 can be estimated but in practice it is usually replaced by its upper bound 1 in applying rule ([24](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ24)). For cases like the prostate data where most of the genes are null, this doesn’t much affect the outcome. ## References - Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57(1), 289–300. [Article](https://doi.org/10.1111%2Fj.2517-6161.1995.tb02031.x) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=1325392) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Controlling%20the%20false%20discovery%20rate%3A%20A%20practical%20and%20powerful%20approach%20to%20multiple%20testing&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1995.tb02031.x&volume=57&issue=1&pages=289-300&publication_year=1995&author=Benjamini%2CY&author=Hochberg%2CY) - Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press. [Book](https://doi.org/10.1017%2FCBO9780511761362) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Large-scale%20inference%3A%20Empirical%20Bayes%20methods%20for%20estimation%2C%20testing%2C%20and%20prediction&doi=10.1017%2FCBO9780511761362&publication_year=2010&author=Efron%2CB) - Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496), 1602–1614. <https://doi.org/10.1198/jasa.2011.tm11181> [Article](https://doi.org/10.1198%2Fjasa.2011.tm11181) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=2896860) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Tweedie%E2%80%99s%20formula%20and%20selection%20bias&journal=Journal%20of%20the%20American%20Statistical%20Association&doi=10.1198%2Fjasa.2011.tm11181&volume=106&issue=496&pages=1602-1614&publication_year=2011&author=Efron%2CB) - Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. <https://doi.org/10.1093/biomet/asv068> [Article](https://doi.org/10.1093%2Fbiomet%2Fasv068) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=3465818) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Empirical%20Bayes%20deconvolution%20estimates&journal=Biometrika&doi=10.1093%2Fbiomet%2Fasv068&volume=103&issue=1&pages=1-20&publication_year=2016&author=Efron%2CB) - Efron, B. (2023). Exponential Families in Theory and Practice. Cambridge: Cambridge University Press. [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Exponential%20Families%20in%20Theory%20and%20Practice&publication_year=2023&author=Efron%2CB) - Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130. [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=388597) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Stein%E2%80%99s%20estimation%20rule%20and%20its%20competitors%E2%80%94An%20empirical%20Bayes%20approach&journal=Journal%20of%20the%20American%20Statistical%20Association&volume=68&pages=117-130&publication_year=1973&author=Efron%2CB&author=Morris%2CC) - James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press. - Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. Journal of Statistical Software, 94(11), 1–20. <https://doi.org/10.18637/jss.v094.i11> [Article](https://doi.org/10.18637%2Fjss.v094.i11) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=deconvolveR%3A%20A%20G-Modeling%20Program%20for%20Deconvolution%20and%20Empirical%20Bayes%20Estimation&journal=Journal%20of%20Statistical%20Software&doi=10.18637%2Fjss.v094.i11&volume=94&issue=11&pages=1-20&publication_year=2020&author=Narasimhan%2CB&author=Efron%2CB) - Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press. - Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388. <https://doi.org/10.1214/aoms/1177693528> [Article](https://doi.org/10.1214%2Faoms%2F1177693528) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=397939) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Proper%20Bayes%20minimax%20estimators%20of%20the%20multivariate%20normal%20mean&journal=Annals%20of%20Mathematical%20Statistics&doi=10.1214%2Faoms%2F1177693528&volume=42&issue=1&pages=385-388&publication_year=1971&author=Strawderman%2CWE) - Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288. [Article](https://doi.org/10.1111%2Fj.2517-6161.1996.tb02080.x) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=1379242) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Regression%20shrinkage%20and%20selection%20via%20the%20lasso&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1996.tb02080.x&volume=58&issue=1&pages=267-288&publication_year=1996&author=Tibshirani%2CR) [Download references](https://citation-needed.springer.com/v2/references/10.1007/s42081-023-00209-y?format=refman&flavour=references) ## Funding No funds, grants, or other support was received. ## Author information ### Authors and Affiliations 1. Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA, 94305, USA Bradley Efron 2. Department of Biomedical Data Science, Stanford School of Medicine, 1265 Welch Road, Stanford, CA, 94305, USA Bradley Efron Authors 1. Bradley Efron [View author publications](https://link.springer.com/search?sortBy=newestFirst&contributor=Bradley%20Efron) Search author on:[PubMed](https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&term=Bradley%20Efron) [Google Scholar](https://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Bradley%20Efron%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en) ### Corresponding author Correspondence to [Bradley Efron](mailto:efron@stanford.edu). ## Ethics declarations ### Conflict of interest The author has no relevant financial or non-financial interests to disclose. ## Additional information ### Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Dedicated to the memory of Carl Morris. ## Rights and permissions Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit <http://creativecommons.org/licenses/by/4.0/>. [Reprints and permissions](https://s100.copyright.com/AppDispatchServlet?title=Machine%20learning%20and%20the%20James%E2%80%93Stein%20estimator&author=Bradley%20Efron&contentID=10.1007%2Fs42081-023-00209-y&copyright=The%20Author%28s%29&publication=2520-8756&publicationDate=2023-06-30&publisherName=SpringerNature&orderBeanReset=true&oa=CC%20BY) ## About this article [![Check for updates. Verify currency and authenticity via CrossMark](data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjgxIiB3aWR0aD0iNTciIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PGcgZmlsbD0ibm9uZSIgZmlsbC1ydWxlPSJldmVub2RkIj48cGF0aCBkPSJtMTcuMzUgMzUuNDUgMjEuMy0xNC4ydi0xNy4wM2gtMjEuMyIgZmlsbD0iIzk4OTg5OCIvPjxwYXRoIGQ9Im0zOC42NSAzNS40NS0yMS4zLTE0LjJ2LTE3LjAzaDIxLjMiIGZpbGw9IiM3NDc0NzQiLz48cGF0aCBkPSJtMjggLjVjLTEyLjk4IDAtMjMuNSAxMC41Mi0yMy41IDIzLjVzMTAuNTIgMjMuNSAyMy41IDIzLjUgMjMuNS0xMC41MiAyMy41LTIzLjVjMC02LjIzLTIuNDgtMTIuMjEtNi44OC0xNi42Mi00LjQxLTQuNC0xMC4zOS02Ljg4LTE2LjYyLTYuODh6bTAgNDEuMjVjLTkuOCAwLTE3Ljc1LTcuOTUtMTcuNzUtMTcuNzVzNy45NS0xNy43NSAxNy43NS0xNy43NSAxNy43NSA3Ljk1IDE3Ljc1IDE3Ljc1YzAgNC43MS0xLjg3IDkuMjItNS4yIDEyLjU1cy03Ljg0IDUuMi0xMi41NSA1LjJ6IiBmaWxsPSIjNTM1MzUzIi8+PHBhdGggZD0ibTQxIDM2Yy01LjgxIDYuMjMtMTUuMjMgNy40NS0yMi40MyAyLjktNy4yMS00LjU1LTEwLjE2LTEzLjU3LTcuMDMtMjEuNWwtNC45Mi0zLjExYy00Ljk1IDEwLjctMS4xOSAyMy40MiA4Ljc4IDI5LjcxIDkuOTcgNi4zIDIzLjA3IDQuMjIgMzAuNi00Ljg2eiIgZmlsbD0iIzljOWM5YyIvPjxwYXRoIGQ9Im0uMiA1OC40NWMwLS43NS4xMS0xLjQyLjMzLTIuMDFzLjUyLTEuMDkuOTEtMS41Yy4zOC0uNDEuODMtLjczIDEuMzQtLjk0LjUxLS4yMiAxLjA2LS4zMiAxLjY1LS4zMi41NiAwIDEuMDYuMTEgMS41MS4zNS40NC4yMy44MS41IDEuMS44MWwtLjkxIDEuMDFjLS4yNC0uMjQtLjQ5LS40Mi0uNzUtLjU2LS4yNy0uMTMtLjU4LS4yLS45My0uMi0uMzkgMC0uNzMuMDgtMS4wNS4yMy0uMzEuMTYtLjU4LjM3LS44MS42Ni0uMjMuMjgtLjQxLjYzLS41MyAxLjA0LS4xMy40MS0uMTkuODgtLjE5IDEuMzkgMCAxLjA0LjIzIDEuODYuNjggMi40Ni40NS41OSAxLjA2Ljg4IDEuODQuODguNDEgMCAuNzctLjA3IDEuMDctLjIzcy41OS0uMzkuODUtLjY4bC45MSAxYy0uMzguNDMtLjguNzYtMS4yOC45OS0uNDcuMjItMSAuMzQtMS41OC4zNC0uNTkgMC0xLjEzLS4xLTEuNjQtLjMxLS41LS4yLS45NC0uNTEtMS4zMS0uOTEtLjM4LS40LS42Ny0uOS0uODgtMS40OC0uMjItLjU5LS4zMy0xLjI2LS4zMy0yLjAyem04LjQtNS4zM2gxLjYxdjIuNTRsLS4wNSAxLjMzYy4yOS0uMjcuNjEtLjUxLjk2LS43MnMuNzYtLjMxIDEuMjQtLjMxYy43MyAwIDEuMjcuMjMgMS42MS43MS4zMy40Ny41IDEuMTQuNSAyLjAydjQuMzFoLTEuNjF2LTQuMWMwLS41Ny0uMDgtLjk3LS4yNS0xLjIxLS4xNy0uMjMtLjQ1LS4zNS0uODMtLjM1LS4zIDAtLjU2LjA4LS43OS4yMi0uMjMuMTUtLjQ5LjM2LS43OC42NHY0LjhoLTEuNjF6bTcuMzcgNi40NWMwLS41Ni4wOS0xLjA2LjI2LTEuNTEuMTgtLjQ1LjQyLS44My43MS0xLjE0LjI5LS4zLjYzLS41NCAxLjAxLS43MS4zOS0uMTcuNzgtLjI1IDEuMTgtLjI1LjQ3IDAgLjg4LjA4IDEuMjMuMjQuMzYuMTYuNjUuMzguODkuNjdzLjQyLjYzLjU0IDEuMDNjLjEyLjQxLjE4Ljg0LjE4IDEuMzIgMCAuMzItLjAyLjU3LS4wNy43NmgtNC4zNmMuMDcuNjIuMjkgMS4xLjY1IDEuNDQuMzYuMzMuODIuNSAxLjM4LjUuMjkgMCAuNTctLjA0LjgzLS4xM3MuNTEtLjIxLjc2LS4zN2wuNTUgMS4wMWMtLjMzLjIxLS42OS4zOS0xLjA5LjUzLS40MS4xNC0uODMuMjEtMS4yNi4yMS0uNDggMC0uOTItLjA4LTEuMzQtLjI1LS40MS0uMTYtLjc2LS40LTEuMDctLjctLjMxLS4zMS0uNTUtLjY5LS43Mi0xLjEzLS4xOC0uNDQtLjI2LS45NS0uMjYtMS41MnptNC42LS42MmMwLS41NS0uMTEtLjk4LS4zNC0xLjI4LS4yMy0uMzEtLjU4LS40Ny0xLjA2LS40Ny0uNDEgMC0uNzcuMTUtMS4wNy40NS0uMzEuMjktLjUuNzMtLjU4IDEuM3ptMi41LjYyYzAtLjU3LjA5LTEuMDguMjgtMS41My4xOC0uNDQuNDMtLjgyLjc1LTEuMTNzLjY5LS41NCAxLjEtLjcxYy40Mi0uMTYuODUtLjI0IDEuMzEtLjI0LjQ1IDAgLjg0LjA4IDEuMTcuMjNzLjYxLjM0Ljg1LjU3bC0uNzcgMS4wMmMtLjE5LS4xNi0uMzgtLjI4LS41Ni0uMzctLjE5LS4wOS0uMzktLjE0LS42MS0uMTQtLjU2IDAtMS4wMS4yMS0xLjM1LjYzLS4zNS40MS0uNTIuOTctLjUyIDEuNjcgMCAuNjkuMTcgMS4yNC41MSAxLjY2LjM0LjQxLjc4LjYyIDEuMzIuNjIuMjggMCAuNTQtLjA2Ljc4LS4xNy4yNC0uMTIuNDUtLjI2LjY0LS40MmwuNjcgMS4wM2MtLjMzLjI5LS42OS41MS0xLjA4LjY1LS4zOS4xNS0uNzguMjMtMS4xOC4yMy0uNDYgMC0uOS0uMDgtMS4zMS0uMjQtLjQtLjE2LS43NS0uMzktMS4wNS0uN3MtLjUzLS42OS0uNy0xLjEzYy0uMTctLjQ1LS4yNS0uOTYtLjI1LTEuNTN6bTYuOTEtNi40NWgxLjU4djYuMTdoLjA1bDIuNTQtMy4xNmgxLjc3bC0yLjM1IDIuOCAyLjU5IDQuMDdoLTEuNzVsLTEuNzctMi45OC0xLjA4IDEuMjN2MS43NWgtMS41OHptMTMuNjkgMS4yN2MtLjI1LS4xMS0uNS0uMTctLjc1LS4xNy0uNTggMC0uODcuMzktLjg3IDEuMTZ2Ljc1aDEuMzR2MS4yN2gtMS4zNHY1LjZoLTEuNjF2LTUuNmgtLjkydi0xLjJsLjkyLS4wN3YtLjcyYzAtLjM1LjA0LS42OC4xMy0uOTguMDgtLjMxLjIxLS41Ny40LS43OXMuNDItLjM5LjcxLS41MWMuMjgtLjEyLjYzLS4xOCAxLjA0LS4xOC4yNCAwIC40OC4wMi42OS4wNy4yMi4wNS40MS4xLjU3LjE3em0uNDggNS4xOGMwLS41Ny4wOS0xLjA4LjI3LTEuNTMuMTctLjQ0LjQxLS44Mi43Mi0xLjEzLjMtLjMxLjY1LS41NCAxLjA0LS43MS4zOS0uMTYuOC0uMjQgMS4yMy0uMjRzLjg0LjA4IDEuMjQuMjRjLjQuMTcuNzQuNCAxLjA0Ljcxcy41NC42OS43MiAxLjEzYy4xOS40NS4yOC45Ni4yOCAxLjUzcy0uMDkgMS4wOC0uMjggMS41M2MtLjE4LjQ0LS40Mi44Mi0uNzIgMS4xM3MtLjY0LjU0LTEuMDQuNy0uODEuMjQtMS4yNC4yNC0uODQtLjA4LTEuMjMtLjI0LS43NC0uMzktMS4wNC0uN2MtLjMxLS4zMS0uNTUtLjY5LS43Mi0xLjEzLS4xOC0uNDUtLjI3LS45Ni0uMjctMS41M3ptMS42NSAwYzAgLjY5LjE0IDEuMjQuNDMgMS42Ni4yOC40MS42OC42MiAxLjE4LjYyLjUxIDAgLjktLjIxIDEuMTktLjYyLjI5LS40Mi40NC0uOTcuNDQtMS42NiAwLS43LS4xNS0xLjI2LS40NC0xLjY3LS4yOS0uNDItLjY4LS42My0xLjE5LS42My0uNSAwLS45LjIxLTEuMTguNjMtLjI5LjQxLS40My45Ny0uNDMgMS42N3ptNi40OC0zLjQ0aDEuMzNsLjEyIDEuMjFoLjA1Yy4yNC0uNDQuNTQtLjc5Ljg4LTEuMDIuMzUtLjI0LjctLjM2IDEuMDctLjM2LjMyIDAgLjU5LjA1Ljc4LjE0bC0uMjggMS40LS4zMy0uMDljLS4xMS0uMDEtLjIzLS4wMi0uMzgtLjAyLS4yNyAwLS41Ni4xLS44Ni4zMXMtLjU1LjU4LS43NyAxLjF2NC4yaC0xLjYxem0tNDcuODcgMTVoMS42MXY0LjFjMCAuNTcuMDguOTcuMjUgMS4yLjE3LjI0LjQ0LjM1LjgxLjM1LjMgMCAuNTctLjA3LjgtLjIyLjIyLS4xNS40Ny0uMzkuNzMtLjczdi00LjdoMS42MXY2Ljg3aC0xLjMybC0uMTItMS4wMWgtLjA0Yy0uMy4zNi0uNjMuNjQtLjk4Ljg2LS4zNS4yMS0uNzYuMzItMS4yNC4zMi0uNzMgMC0xLjI3LS4yNC0xLjYxLS43MS0uMzMtLjQ3LS41LTEuMTQtLjUtMi4wMnptOS40NiA3LjQzdjIuMTZoLTEuNjF2LTkuNTloMS4zM2wuMTIuNzJoLjA1Yy4yOS0uMjQuNjEtLjQ1Ljk3LS42My4zNS0uMTcuNzItLjI2IDEuMS0uMjYuNDMgMCAuODEuMDggMS4xNS4yNC4zMy4xNy42MS40Ljg0LjcxLjI0LjMxLjQxLjY4LjUzIDEuMTEuMTMuNDIuMTkuOTEuMTkgMS40NCAwIC41OS0uMDkgMS4xMS0uMjUgMS41Ny0uMTYuNDctLjM4Ljg1LS42NSAxLjE2LS4yNy4zMi0uNTguNTYtLjk0LjczLS4zNS4xNi0uNzIuMjUtMS4xLjI1LS4zIDAtLjYtLjA3LS45LS4ycy0uNTktLjMxLS44Ny0uNTZ6bTAtMi4zYy4yNi4yMi41LjM3LjczLjQ1LjI0LjA5LjQ2LjEzLjY2LjEzLjQ2IDAgLjg0LS4yIDEuMTUtLjYuMzEtLjM5LjQ2LS45OC40Ni0xLjc3IDAtLjY5LS4xMi0xLjIyLS4zNS0xLjYxLS4yMy0uMzgtLjYxLS41Ny0xLjEzLS41Ny0uNDkgMC0uOTkuMjYtMS41Mi43N3ptNS44Ny0xLjY5YzAtLjU2LjA4LTEuMDYuMjUtMS41MS4xNi0uNDUuMzctLjgzLjY1LTEuMTQuMjctLjMuNTgtLjU0LjkzLS43MXMuNzEtLjI1IDEuMDgtLjI1Yy4zOSAwIC43My4wNyAxIC4yLjI3LjE0LjU0LjMyLjgxLjU1bC0uMDYtMS4xdi0yLjQ5aDEuNjF2OS44OGgtMS4zM2wtLjExLS43NGgtLjA2Yy0uMjUuMjUtLjU0LjQ2LS44OC42NC0uMzMuMTgtLjY5LjI3LTEuMDYuMjctLjg3IDAtMS41Ni0uMzItMi4wNy0uOTVzLS43Ni0xLjUxLS43Ni0yLjY1em0xLjY3LS4wMWMwIC43NC4xMyAxLjMxLjQgMS43LjI2LjM4LjY1LjU4IDEuMTUuNTguNTEgMCAuOTktLjI2IDEuNDQtLjc3di0zLjIxYy0uMjQtLjIxLS40OC0uMzYtLjctLjQ1LS4yMy0uMDgtLjQ2LS4xMi0uNy0uMTItLjQ1IDAtLjgyLjE5LTEuMTMuNTktLjMxLjM5LS40Ni45NS0uNDYgMS42OHptNi4zNSAxLjU5YzAtLjczLjMyLTEuMy45Ny0xLjcxLjY0LS40IDEuNjctLjY4IDMuMDgtLjg0IDAtLjE3LS4wMi0uMzQtLjA3LS41MS0uMDUtLjE2LS4xMi0uMy0uMjItLjQzcy0uMjItLjIyLS4zOC0uM2MtLjE1LS4wNi0uMzQtLjEtLjU4LS4xLS4zNCAwLS42OC4wNy0xIC4ycy0uNjMuMjktLjkzLjQ3bC0uNTktMS4wOGMuMzktLjI0LjgxLS40NSAxLjI4LS42My40Ny0uMTcuOTktLjI2IDEuNTQtLjI2Ljg2IDAgMS41MS4yNSAxLjkzLjc2cy42MyAxLjI1LjYzIDIuMjF2NC4wN2gtMS4zMmwtLjEyLS43NmgtLjA1Yy0uMy4yNy0uNjMuNDgtLjk4LjY2cy0uNzMuMjctMS4xNC4yN2MtLjYxIDAtMS4xLS4xOS0xLjQ4LS41Ni0uMzgtLjM2LS41Ny0uODUtLjU3LTEuNDZ6bTEuNTctLjEyYzAgLjMuMDkuNTMuMjcuNjcuMTkuMTQuNDIuMjEuNzEuMjEuMjggMCAuNTQtLjA3Ljc3LS4ycy40OC0uMzEuNzMtLjU2di0xLjU0Yy0uNDcuMDYtLjg2LjEzLTEuMTguMjMtLjMxLjA5LS41Ny4xOS0uNzYuMzFzLS4zMy4yNS0uNDEuNGMtLjA5LjE1LS4xMy4zMS0uMTMuNDh6bTYuMjktMy42M2gtLjk4di0xLjJsMS4wNi0uMDcuMi0xLjg4aDEuMzR2MS44OGgxLjc1djEuMjdoLTEuNzV2My4yOGMwIC44LjMyIDEuMi45NyAxLjIuMTIgMCAuMjQtLjAxLjM3LS4wNC4xMi0uMDMuMjQtLjA3LjM0LS4xMWwuMjggMS4xOWMtLjE5LjA2LS40LjEyLS42NC4xNy0uMjMuMDUtLjQ5LjA4LS43Ni4wOC0uNCAwLS43NC0uMDYtMS4wMi0uMTgtLjI3LS4xMy0uNDktLjMtLjY3LS41Mi0uMTctLjIxLS4zLS40OC0uMzctLjc4LS4wOC0uMy0uMTItLjY0LS4xMi0xLjAxem00LjM2IDIuMTdjMC0uNTYuMDktMS4wNi4yNy0xLjUxcy40MS0uODMuNzEtMS4xNGMuMjktLjMuNjMtLjU0IDEuMDEtLjcxLjM5LS4xNy43OC0uMjUgMS4xOC0uMjUuNDcgMCAuODguMDggMS4yMy4yNC4zNi4xNi42NS4zOC44OS42N3MuNDIuNjMuNTQgMS4wM2MuMTIuNDEuMTguODQuMTggMS4zMiAwIC4zMi0uMDIuNTctLjA3Ljc2aC00LjM3Yy4wOC42Mi4yOSAxLjEuNjUgMS40NC4zNi4zMy44Mi41IDEuMzguNS4zIDAgLjU4LS4wNC44NC0uMTMuMjUtLjA5LjUxLS4yMS43Ni0uMzdsLjU0IDEuMDFjLS4zMi4yMS0uNjkuMzktMS4wOS41M3MtLjgyLjIxLTEuMjYuMjFjLS40NyAwLS45Mi0uMDgtMS4zMy0uMjUtLjQxLS4xNi0uNzctLjQtMS4wOC0uNy0uMy0uMzEtLjU0LS42OS0uNzItMS4xMy0uMTctLjQ0LS4yNi0uOTUtLjI2LTEuNTJ6bTQuNjEtLjYyYzAtLjU1LS4xMS0uOTgtLjM0LTEuMjgtLjIzLS4zMS0uNTgtLjQ3LTEuMDYtLjQ3LS40MSAwLS43Ny4xNS0xLjA4LjQ1LS4zMS4yOS0uNS43My0uNTcgMS4zem0zLjAxIDIuMjNjLjMxLjI0LjYxLjQzLjkyLjU3LjMuMTMuNjMuMi45OC4yLjM4IDAgLjY1LS4wOC44My0uMjNzLjI3LS4zNS4yNy0uNmMwLS4xNC0uMDUtLjI2LS4xMy0uMzctLjA4LS4xLS4yLS4yLS4zNC0uMjgtLjE0LS4wOS0uMjktLjE2LS40Ny0uMjNsLS41My0uMjJjLS4yMy0uMDktLjQ2LS4xOC0uNjktLjMtLjIzLS4xMS0uNDQtLjI0LS42Mi0uNHMtLjMzLS4zNS0uNDUtLjU1Yy0uMTItLjIxLS4xOC0uNDYtLjE4LS43NSAwLS42MS4yMy0xLjEuNjgtMS40OS40NC0uMzggMS4wNi0uNTcgMS44My0uNTcuNDggMCAuOTEuMDggMS4yOS4yNXMuNzEuMzYuOTkuNTdsLS43NC45OGMtLjI0LS4xNy0uNDktLjMyLS43My0uNDItLjI1LS4xMS0uNTEtLjE2LS43OC0uMTYtLjM1IDAtLjYuMDctLjc2LjIxLS4xNy4xNS0uMjUuMzMtLjI1LjU0IDAgLjE0LjA0LjI2LjEyLjM2cy4xOC4xOC4zMS4yNmMuMTQuMDcuMjkuMTQuNDYuMjFsLjU0LjE5Yy4yMy4wOS40Ny4xOC43LjI5cy40NC4yNC42NC40Yy4xOS4xNi4zNC4zNS40Ni41OC4xMS4yMy4xNy41LjE3LjgyIDAgLjMtLjA2LjU4LS4xNy44My0uMTIuMjYtLjI5LjQ4LS41MS42OC0uMjMuMTktLjUxLjM0LS44NC40NS0uMzQuMTEtLjcyLjE3LTEuMTUuMTctLjQ4IDAtLjk1LS4wOS0xLjQxLS4yNy0uNDYtLjE5LS44Ni0uNDEtMS4yLS42OHoiIGZpbGw9IiM1MzUzNTMiLz48L2c+PC9zdmc+)](https://crossmark.crossref.org/dialog/?doi=10.1007/s42081-023-00209-y) ### Cite this article Efron, B. Machine learning and the James–Stein estimator. Jpn J Stat Data Sci 7, 257–266 (2024). https://doi.org/10.1007/s42081-023-00209-y [Download citation](https://citation-needed.springer.com/v2/references/10.1007/s42081-023-00209-y?format=refman&flavour=citation) - Received: 24 March 2023 - Accepted: 13 May 2023 - Published: 30 June 2023 - Version of record: 30 June 2023 - Issue date: June 2024 - DOI: https://doi.org/10.1007/s42081-023-00209-y ### Share this article Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy shareable link to clipboard Provided by the Springer Nature SharedIt content-sharing initiative ### Keywords - [Empirical bayes](https://link.springer.com/search?query=Empirical%20bayes&facet-discipline="Statistics") - [Shrinkage](https://link.springer.com/search?query=Shrinkage&facet-discipline="Statistics") - [Tweedie’s formula](https://link.springer.com/search?query=Tweedie%E2%80%99s%20formula&facet-discipline="Statistics") - [Benjamini–Hochberg algorithm](https://link.springer.com/search?query=Benjamini%E2%80%93Hochberg%20algorithm&facet-discipline="Statistics") - Sections - Figures - References - [Abstract](https://link.springer.com/article/10.1007/s42081-023-00209-y#Abs1) - [1 Introduction](https://link.springer.com/article/10.1007/s42081-023-00209-y#Sec1) - [2 Tweedie’s formula](https://link.springer.com/article/10.1007/s42081-023-00209-y#Sec2) - [3 Shrinkage estimators](https://link.springer.com/article/10.1007/s42081-023-00209-y#Sec3) - [Notes](https://link.springer.com/article/10.1007/s42081-023-00209-y#notes) - [References](https://link.springer.com/article/10.1007/s42081-023-00209-y#Bib1) - [Funding](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fun) - [Author information](https://link.springer.com/article/10.1007/s42081-023-00209-y#author-information) - [Ethics declarations](https://link.springer.com/article/10.1007/s42081-023-00209-y#ethics) - [Additional information](https://link.springer.com/article/10.1007/s42081-023-00209-y#additional-information) - [Rights and permissions](https://link.springer.com/article/10.1007/s42081-023-00209-y#rightslink) - [About this article](https://link.springer.com/article/10.1007/s42081-023-00209-y#article-info) Advertisement - Fig. 1 ![Fig. 1]() [View in article](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1)[Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/1) - Fig. 2 ![Fig. 2]() [View in article](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2)[Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/2) - Fig. 3 ![Fig. 3]() [View in article](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig3)[Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/3) - Fig. 4 ![Fig. 4]() [View in article](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig4)[Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/4) 1. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57(1), 289–300. [Article](https://doi.org/10.1111%2Fj.2517-6161.1995.tb02031.x) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=1325392) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Controlling%20the%20false%20discovery%20rate%3A%20A%20practical%20and%20powerful%20approach%20to%20multiple%20testing&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1995.tb02031.x&volume=57&issue=1&pages=289-300&publication_year=1995&author=Benjamini%2CY&author=Hochberg%2CY) 2. Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press. [Book](https://doi.org/10.1017%2FCBO9780511761362) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Large-scale%20inference%3A%20Empirical%20Bayes%20methods%20for%20estimation%2C%20testing%2C%20and%20prediction&doi=10.1017%2FCBO9780511761362&publication_year=2010&author=Efron%2CB) 3. Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496), 1602–1614. <https://doi.org/10.1198/jasa.2011.tm11181> [Article](https://doi.org/10.1198%2Fjasa.2011.tm11181) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=2896860) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Tweedie%E2%80%99s%20formula%20and%20selection%20bias&journal=Journal%20of%20the%20American%20Statistical%20Association&doi=10.1198%2Fjasa.2011.tm11181&volume=106&issue=496&pages=1602-1614&publication_year=2011&author=Efron%2CB) 4. Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. <https://doi.org/10.1093/biomet/asv068> [Article](https://doi.org/10.1093%2Fbiomet%2Fasv068) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=3465818) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Empirical%20Bayes%20deconvolution%20estimates&journal=Biometrika&doi=10.1093%2Fbiomet%2Fasv068&volume=103&issue=1&pages=1-20&publication_year=2016&author=Efron%2CB) 5. Efron, B. (2023). Exponential Families in Theory and Practice. Cambridge: Cambridge University Press. [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Exponential%20Families%20in%20Theory%20and%20Practice&publication_year=2023&author=Efron%2CB) 6. Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130. [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=388597) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Stein%E2%80%99s%20estimation%20rule%20and%20its%20competitors%E2%80%94An%20empirical%20Bayes%20approach&journal=Journal%20of%20the%20American%20Statistical%20Association&volume=68&pages=117-130&publication_year=1973&author=Efron%2CB&author=Morris%2CC) 7. James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press. 8. Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. Journal of Statistical Software, 94(11), 1–20. <https://doi.org/10.18637/jss.v094.i11> [Article](https://doi.org/10.18637%2Fjss.v094.i11) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=deconvolveR%3A%20A%20G-Modeling%20Program%20for%20Deconvolution%20and%20Empirical%20Bayes%20Estimation&journal=Journal%20of%20Statistical%20Software&doi=10.18637%2Fjss.v094.i11&volume=94&issue=11&pages=1-20&publication_year=2020&author=Narasimhan%2CB&author=Efron%2CB) 9. Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press. 10. Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388. <https://doi.org/10.1214/aoms/1177693528> [Article](https://doi.org/10.1214%2Faoms%2F1177693528) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=397939) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Proper%20Bayes%20minimax%20estimators%20of%20the%20multivariate%20normal%20mean&journal=Annals%20of%20Mathematical%20Statistics&doi=10.1214%2Faoms%2F1177693528&volume=42&issue=1&pages=385-388&publication_year=1971&author=Strawderman%2CWE) 11. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288. [Article](https://doi.org/10.1111%2Fj.2517-6161.1996.tb02080.x) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=1379242) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Regression%20shrinkage%20and%20selection%20via%20the%20lasso&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1996.tb02080.x&volume=58&issue=1&pages=267-288&publication_year=1996&author=Tibshirani%2CR) ### Discover content - [Journals A-Z](https://link.springer.com/journals/a/1) - [Books A-Z](https://link.springer.com/books/a/1) ### Publish with us - [Journal finder](https://link.springer.com/journals) - [Publish your research](https://www.springernature.com/gp/authors) - [Language editing](https://authorservices.springernature.com/go/sn/?utm_source=SNLinkfooter&utm_medium=Web&utm_campaign=SNReferral) - [Open access publishing](https://www.springernature.com/gp/open-science/about/the-fundamentals-of-open-access-and-open-research) ### Products and services - [Our products](https://www.springernature.com/gp/products) - [Librarians](https://www.springernature.com/gp/librarians) - [Societies](https://www.springernature.com/gp/societies) - [Partners and advertisers](https://www.springernature.com/gp/partners) ### Our brands - [Springer](https://link.springer.com/brands/springer) - [Nature Portfolio](https://www.nature.com/) - [BMC](https://link.springer.com/brands/bmc) - [Palgrave Macmillan](https://link.springer.com/brands/palgrave) - [Apress](https://link.springer.com/brands/apress) - [Discover](https://link.springer.com/brands/discover) - Your privacy choices/Manage cookies - [Your US state privacy rights](https://www.springernature.com/gp/legal/ccpa) - [Accessibility statement](https://link.springer.com/accessibility) - [Terms and conditions](https://link.springer.com/termsandconditions) - [Privacy policy](https://link.springer.com/privacystatement) - [Help and support](https://support.springernature.com/en/support/home) - [Legal notice](https://link.springer.com/legal-notice) - [Cancel contracts here](https://support.springernature.com/en/support/solutions/articles/6000255911-subscription-cancellations) Not affiliated [![Springer Nature](https://link.springer.com/oscar-static/images/logo-springernature-white-dbadd2cbd6.svg)](https://www.springernature.com/) © 2026 Springer Nature
Readable Markdown	## 1 Introduction By and large, the statistics world is one of heuristics, approximations, and asymptotics. The James–Stein estimator arrived in that world in 1961 on a note of startling specificity: unseen parameters μ 1 , μ 2 , … , μ n produce independent observations x i ∼ ind N ( μ i , 1 ) , i \= 1 , … , n , (1) n ≥ 3. The James–Stein rule in its simplest form proposed estimating the μ i by μ ^ i J S \= ( 1 − n − 2 S ) x i ( S \= ∑ i \= 1 n x i 2 ) . (2) Formula ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) looked implausible: the estimate of μ i depended on the other observations x j, j ≠ i (through S), as well as x i, despite the independence assumption. Nevertheless, James and Stein showed that Rule ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) always beat the obvious maximum likelihood estimates μ ^ i M L \= x ¯ i ( i \= 1 , … , n ) (3) in terms of total expected squared error E { ∑ i \= 1 n ( μ ^ i − μ i ) 2 } . (4) That “always” was the shocking part: two centuries of statistical theory, ANOVA, regression, multivariate analysis, etc., depended on maximum likelihood estimation. Did everything have to be rethought? One path forward involved Bayesian thinking. If we assumed that the μ i themselves came from a normal distribution, μ i ∼ ind N ( 0 , A ) for i \= 1 , … , n , (5) with variance A ≥ 0, the Bayes estimates would be μ ^ i B a y e s \= B x i ( B \= A / ( A \+ 1 ) ) . (6) We don’t know A or B but B ^ \= 1 − ( n − 2 ) / S (7) is B’s unbiased estimate: we can rewrite ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) as μ ^ i J S \= B ^ x i , (8) which at least looks more plausible. In the language introduced by Robbins ([1956](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR9 "Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press.")), formula ([8](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ8)) is an empirical Bayes estimator, another shocking post-war statistical innovation. Carl Morris and I wrote a series of papers in the 1970 s exploring Bayesian roots of the James–Stein estimator (Efron and Morris, [1973](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR6 "Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130.")). Something is lost in the empirical Bayes formulation, namely the frequentist “always” of expected square error minimization, but a lot is gained in flexibility and scope, as discussed in Sect. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Sec2). Fig. 1 [![Fig. 1](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig1_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/1) Prostate data: 6033 x values; mean 0.003, sd \= 1\.135; curve is proportional to a N ( 0 , 1 ) density [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/1) Figure [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1) illustrates an example of simultaneous estimation pursued in Sect. 2.1 of Efron ([2010](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR2 "Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.")). A microarray study has compared expression levels between prostate cancer patients and control subjects for n \= 6033 genes. For each gene, a statistic x i has been calculated (essentially a “z\-value”), x i ∼ N ( μ i , 1 ) , i \= 1 , … , n , (9) where μ i measures the difference between cancer and control group levels. The solid curve in Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1) is a N ( 0 , 1 ) density scaled to have the same area as the histogram of the 6033 x values. A bad result from the researchers’ point of view would be a perfect fit of curve to histogram, which would imply all the genes have μ i \= 0, the “null” value of no difference between cancer patients and controls. That’s not what happened: the histogram has mildly heavy tails in both directions. The researchers were hoping to find genes with large values of ‖ μ i ‖—ones that might be a clue to prostate cancer etiology—as suggested by the heavy tails. How encouraged should they be? Not very, according to the James–Stein rule. The 6033 x i values have mean 0.003, which I’ll take to be zero, and empirical variance σ ^ 2 \= 1\.289. (10) The James–Stein estimate ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) is μ ^ i J S \= ( 1 − n − 2 n − 1 1 σ ^ 2 ) x i \= 0\.224 ⋅ x i , (11) so even x i \= 5 yields an estimate barely exceeding 1. Section [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Sec2) suggests a more optimistic analysis. ## 2 Tweedie’s formula The impressive precision of the James–Stein theorem came at a cost in generality. Efforts to extend the theorem, say to Poisson rather than normal observations, or to measures of loss other than total squared error, gave encouraging asymptotic results but not the James–Stein kind of finite sample frequentist dominance. Better progress was possible on the empirical Bayes side of the street. Tweedie’s formula (Efron, [2011](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR3 "Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496), 1602–1614. https://doi.org/10.1198/jasa.2011.tm11181 ")) has been particularly useful. We wish to calculate Bayesian estimates μ i B a y e s \= E { μ i ∣ x i } , i \= 1 , … , n , (12) in the normal sampling model ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), starting from a given (possibly non-normal) prior π ( μ ), applying to all n cases. Let f(x) be the marginal density f ( x ) \= ∫ R π ( μ ) ϕ ( x − μ ) d μ , (13) with ϕ the standard N ( 0 , 1 ) density and R the range of μ. (It isn’t necessary for π ( ⋅ ) to be a continuous distribution but it simplifies notation.) Tweedie’s formula provides an elegant statement for μ i B a y e s, the posterior expectation of μ i given x i, μ i B a y e s \= E { μ i ∣ x i } \= x i \+ l ′ ( x i ) with l ′ ( x i ) \= d d x log ⁡ ( f ( x i ) ) . (14) In the empirical Bayes situation ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), where the prior π ( ⋅ ) is unknown, we can use the observed data x 1 , … , x n to estimate the marginal density f(x), say by f ^ ( x ), giving empirical Bayes estimates μ ^ i \= x i \+ l ^ ′ ( x i ) . (15) The Bayes estimate ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)) can be thought of as the MLE x i plus a Bayesian correction term l ′ ( x i ). When the prior π ( μ ) is the N ( 0 , A ) distribution ([5](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ5)), μ i B a y e s equals B x i ([6](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ6)). Simple formulas for μ i B a y e s give out for most other choices of π ( μ ) but now, in the machine learning era[Footnote 1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn1) of statistical research, numerical methods provide useful ways forward, as discussed next. The log polynomial class[Footnote 2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn2) of marginal densities defines f(x) by log ⁡ ( f β ( x ) ) \= β 0 \+ β ⊤ c ( x ) . (16) Here c ( x ) \= ( x , x 2 , … , x J ) ⊤ and β \= ( β 1 , … , β J ) ⊤ , (17) with β 0 chosen to make f β ( x ) integrate to 1. The choice J \= 2 gives normal marginals; larger values of J allow for marginal non-normality. Fig. 2 [![Fig. 2](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig2_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/2) Prostate data: Tweedie’s estimate of E { μ ∣ x }, 5 degrees of freedom; dashed curve is James–Stein estimate [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/2) The choice J \= 5 was applied to the prostate cancer data of Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1): Tweedie’s formula ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)) gave μ ^ ( x ) \= E { μ ∣ x }, graphed as the solid curve in Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2). It differs markedly from the James–Stein estimate J \= 2, the dashed line. At x \= 4 for example, the J \= 5 estimate is[Footnote 3](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn3) E { μ ∣ x \= 4 } \= 2\.555 (18) compared to 0.901 for the James–Stein estimate. The estimated curve E { μ ∣ x } is empirical Bayes in the same sense as ([8](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ8)): the parameter vector β was selected by maximum likelihood, as discussed next. With J \= 5, the prior was able to adapt to the “fishing expedition” nature of such microarray studies, where we expect most of the genes to be null or close to null, with μ i nearly zero (corresponding here to the flat part of the curve for x between − 2 and 2) and, hopefully, a small proportion of interestingly large μ is. The sample size n \= 6033 has much to do with Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2). James and Stein ([1961](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR7 "James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press.")) was usually considered in terms of small samples, perhaps n ≤ 20, for which there would be little hope of seeing the detail in Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2). The term “machine learning era” seems less fanciful when considering the scale of problems statisticians are now asked to deal with, as well as the tools they use to solve them. It looks like it might be hard work computing Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2) but it’s not. The histogram in Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1) has 97 bins, with centerpoints v v \= ( − 4\.4 , − 4\.3 , … , 5\.1 , 5\.2 ) . (19) Let y j be the count in bin j, that is, the number of the 6033 x i values falling into it, with the vector of counts being y y \= ( y 1 , … , y 97 ) . (20) Then the single R command l l ^ \= log ⁡ ( glm ( y y ∼ poly ( v v , 5 ) , poisson ) \$ fit ) (21) provides a close approximation to the MLE of log ⁡ f ( x ) in ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)); numerical differentiation of l l ^ gives Tweedie’s estimate. Section 3.4 of Efron ([2023](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR5 "Efron, B. (2023). Exponential Families in Theory and Practice. Cambridge: Cambridge University Press.")) shows why Poisson regression ([21](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ21)) is appropriate here. The James–Stein theorem depends on the independence assumption in ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), unlikely to be true in the microarray study, but the estimates ([2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ2)) have a certain marginal validity even under dependence. This is clearer from the empirical Bayes point of view. The Tweedie estimate x i \+ l ^ ′ ( x i ) requires only that l ^ ′ ( x ) be close to l ′ ( x ), not that it be estimated from independent x is.[Footnote 4](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn4) ## 3 Shrinkage estimators James and Stein’s paper aroused excited interest in the statistics community when it arrived in 1961. Most of the excitement focused on the strict inadmissibility of the traditional maximum likelihood estimate demonstrated by the James–Stein rule. Other rules dominating the MLE were discovered, for instance the Bayes estimator of Strawderman ([1971](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR10 "Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388. https://doi.org/10.1214/aoms/1177693528 ")), that was itself admissible while rendering the MLE inadmissible. Big new ideas can take a while to make their true impact felt. The James–Stein rule had an influential side effect on subsequent theory and practice in that it demonstrated, in an inarguable way, the virtues of shrinkage estimation: given an ensemble of problems, individual estimates are shrunk toward a central point; that is, a deliberate bias is introduced, pulling estimates away from their MLEs for the sake of better group performances. Admissibility and inadmissibility aren’t much in the air these days, while shrinkage estimation has gone on to play a major role in modern practice. A spectacular success story is the lasso (Tibshirani, [1996](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR11 "Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.")). Lasso shrinkage is extreme, pulling some (often most) of the coefficient estimates all the way back to zero. Bayes and empirical Bayes rules tend to be strong shrinkers. Tweedie’s estimate in Fig. [2](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig2) (J \= 5) shrinks the estimate of E { μ ∣ x \= 4 } from its MLE value 4 down to 2.555. For μ between − 1 and 1, the shrinkage is almost all the way to zero. The reader may have been surprised to see that neither Tweedie’s formula ([14](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ14)) for E { μ i ∣ x i } nor its empirical version ([15](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ15)) require estimation of the prior π ( μ ). This is a special property of the posterior expectation E { μ i ∣ x i } and isn’t available for say Pr { μ i ≥ 2 ∣ x i }, or most other Bayesian targets. “Bayesian deconvolution” (Efron, [2016](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR4 "Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. https://doi.org/10.1093/biomet/asv068 ")) uses low-dimensional parametric modeling of π ( μ ) for general empirical Bayes computations. It was applied to finding a prior density π ( μ ) that would give the distribution of x seen in Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1), assuming the normal sampling model ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)). The deconvolution model for π ( μ ) used a delta function at μ \= 0 (for the “null” genes) and a natural spline function with four degrees of freedom for the non-null cases. Fig. 3 [![Fig. 3](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig3_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/3) Empirical Bayes conditional density of μ given μ not zero; Pr { μ \= 0 } equals 0.825 [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/3) The estimated prior[Footnote 5](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn5)π ^ ( μ ) is shown in Fig. [3](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig3); it put probability 0.825 on μ \= 0, while the conditional distribution given μ ≠ 0 was a moderately heavy-tailed version of N ( 0 , 1\.33 2 ). Based on π ^ ( μ ) we can form estimates of any Bayesian target, for instance Pr ^ { μ i ≥ 2 ∣ x i \= 4 } \= 0\.80. Figure [3](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig3) is a direct descendent of the James–Stein rule, now 60-plus years on. A less-direct descendent, but still on the family tree, arrived in 1995. The false discovery rate paper by Benjamini and Hochberg concerned simultaneous hypothesis testing. Looking at Fig. [1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig1), which of the n \= 6033 genes can confidently be labeled as non-null, that is as having μ i ≠ 0? Suppose for convenience that the x is are ordered from smallest to largest. The right-sided significance level for testing μ i \= 0 is S 0 ( x i ) \= 1 − Φ ( x i ) , (22) where Φ is the standard normal cumulative distribution function. Of the 6033 genes, 401 had S i ≤ 0\.05, the usual rejection level for individual testing, but even if actually all of the genes were null we would expect 302 such rejections, so individual testing can’t be right. Benjamini and Hochberg proposed a novel simultaneous testing rule that safely controls the number of “false discoveries” — genes falsely labeled ”non-null” — while not being discouragingly strict. (My summary here won’t give the BH rule its full due; see Chapter 4 of Efron ([2010](https://link.springer.com/article/10.1007/s42081-023-00209-y#ref-CR2 "Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press.")) for a more complete description.) Let S ^ ( x ) be the observed proportion of x is exceeding value x, and define F d r ^ ( x ) \= π 0 S 0 ( x ) / S ^ ( x ) , (23) where π 0 is the proportion of null genes among all n.[Footnote 6](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fn6) For a fixed control level α, such as α \= 0\.1, the BH rule says to reject the null hypothesis μ i \= 0 for those genes having F d r ^ ( x i ) ≤ α . (24) The Benjamini–Hochberg theorem states that under independence assumptions like ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)), the expected proportion of false discoveries by rule ([24](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ24)) is α. Fig. 4 [![Fig. 4](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs42081-023-00209-y/MediaObjects/42081_2023_209_Fig4_HTML.png)](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/4) Prostate data: left Fdr and right Fdr; dashes show 60 genes with F d r \< 0\.1 [Full size image](https://link.springer.com/article/10.1007/s42081-023-00209-y/figures/4) Figure [4](https://link.springer.com/article/10.1007/s42081-023-00209-y#Fig4) shows F d r ^ for the prostate cancer data and also for the left-sided Fdr estimate, where significance is defined by S 0 ( x i ) \= Φ ( x i ) rather than ([22](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ22)). I applied the BH rule with α \= 0\.1 which labeled 60 genes as non-null, 32 on the left and 28 on the right. The BH theorem says that we can expect 6 of the 60 to actually be null. The fdr story has evolved very much along the lines of its James–Stein predecessor. Intense initial interest focused on the exact frequentist control of false discovery rates. The Bayes and empirical Bayes implications came later: as at ([5](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ5)), we assume that each x i is a realization of a random variable x given by μ ∼ π ( μ ) and x ∣ μ ∼ p ( x ∣ μ ) , (25) where p ( x ∣ μ ) is a known probability kernel which I’ll take here to be the normal sampling model ([1](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ1)). Then if S(x) is 1 minus the cdf of the marginal density ([13](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ13)), Bayes rule gives Pr { μ \= 0 ∣ x } \= π 0 S 0 ( x ) / S ( x ) . (26) Comparing ([26](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ26)) with ([23](https://link.springer.com/article/10.1007/s42081-023-00209-y#Equ23)) says that the BH rule amounts to labeling case i as non-null if its obvious empirical Bayes estimate of nullness is less than α. This is less precise than the frequentist control theorem but, as with the James–Stein estimator, is more robust in not demanding independence among the x is. The family resemblance between JS and BH is through shrinkage: in the BH case the shrinkage of significance levels. For instance, x i \= 3 has individual significance level 0.001 against nullness, whereas F d r ^ \= 0\.164 for the prostate data, i.e, still with about a 1/6 chance of gene i being null. So what does machine learning have to do with the James–Stein estimator? Nothing to its birth but, as the articles in this volume show, a great deal to its downstream effects on statistical theory and practice. Charles Stein, who was a good applied statistician when he put his mind to it, might have enjoyed these developments, but maybe not; his heart was always with the mathematics. ## References - Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57(1), 289–300. [Article](https://doi.org/10.1111%2Fj.2517-6161.1995.tb02031.x) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=1325392) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Controlling%20the%20false%20discovery%20rate%3A%20A%20practical%20and%20powerful%20approach%20to%20multiple%20testing&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1995.tb02031.x&volume=57&issue=1&pages=289-300&publication_year=1995&author=Benjamini%2CY&author=Hochberg%2CY) - Efron, B. (2010). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge: Cambridge University Press. [Book](https://doi.org/10.1017%2FCBO9780511761362) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Large-scale%20inference%3A%20Empirical%20Bayes%20methods%20for%20estimation%2C%20testing%2C%20and%20prediction&doi=10.1017%2FCBO9780511761362&publication_year=2010&author=Efron%2CB) - Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496), 1602–1614. <https://doi.org/10.1198/jasa.2011.tm11181> [Article](https://doi.org/10.1198%2Fjasa.2011.tm11181) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=2896860) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Tweedie%E2%80%99s%20formula%20and%20selection%20bias&journal=Journal%20of%20the%20American%20Statistical%20Association&doi=10.1198%2Fjasa.2011.tm11181&volume=106&issue=496&pages=1602-1614&publication_year=2011&author=Efron%2CB) - Efron, B. (2016). Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. <https://doi.org/10.1093/biomet/asv068> [Article](https://doi.org/10.1093%2Fbiomet%2Fasv068) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=3465818) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Empirical%20Bayes%20deconvolution%20estimates&journal=Biometrika&doi=10.1093%2Fbiomet%2Fasv068&volume=103&issue=1&pages=1-20&publication_year=2016&author=Efron%2CB) - Efron, B. (2023). Exponential Families in Theory and Practice. Cambridge: Cambridge University Press. [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Exponential%20Families%20in%20Theory%20and%20Practice&publication_year=2023&author=Efron%2CB) - Efron, B., & Morris, C. (1973). Stein’s estimation rule and its competitors—An empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130. [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=388597) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Stein%E2%80%99s%20estimation%20rule%20and%20its%20competitors%E2%80%94An%20empirical%20Bayes%20approach&journal=Journal%20of%20the%20American%20Statistical%20Association&volume=68&pages=117-130&publication_year=1973&author=Efron%2CB&author=Morris%2CC) - James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 361–379). Berkeley: University of California Press. - Narasimhan, B., & Efron, B. (2020). deconvolveR: A G-Modeling Program for Deconvolution and Empirical Bayes Estimation. Journal of Statistical Software, 94(11), 1–20. <https://doi.org/10.18637/jss.v094.i11> [Article](https://doi.org/10.18637%2Fjss.v094.i11) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=deconvolveR%3A%20A%20G-Modeling%20Program%20for%20Deconvolution%20and%20Empirical%20Bayes%20Estimation&journal=Journal%20of%20Statistical%20Software&doi=10.18637%2Fjss.v094.i11&volume=94&issue=11&pages=1-20&publication_year=2020&author=Narasimhan%2CB&author=Efron%2CB) - Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Sympos. Math. Statist. and Prob. (Vol. I, pp. 157–163). Berkeley: University of California Press. - Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics, 42(1), 385–388. <https://doi.org/10.1214/aoms/1177693528> [Article](https://doi.org/10.1214%2Faoms%2F1177693528) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=397939) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Proper%20Bayes%20minimax%20estimators%20of%20the%20multivariate%20normal%20mean&journal=Annals%20of%20Mathematical%20Statistics&doi=10.1214%2Faoms%2F1177693528&volume=42&issue=1&pages=385-388&publication_year=1971&author=Strawderman%2CWE) - Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288. [Article](https://doi.org/10.1111%2Fj.2517-6161.1996.tb02080.x) [MathSciNet](http://www.ams.org/mathscinet-getitem?mr=1379242) [Google Scholar](http://scholar.google.com/scholar_lookup?&title=Regression%20shrinkage%20and%20selection%20via%20the%20lasso&journal=Journal%20of%20the%20Royal%20Statistical%20Society%20Series%20B&doi=10.1111%2Fj.2517-6161.1996.tb02080.x&volume=58&issue=1&pages=267-288&publication_year=1996&author=Tibshirani%2CR) [Download references](https://citation-needed.springer.com/v2/references/10.1007/s42081-023-00209-y?format=refman&flavour=references)
Shard	129 (laksa)
Root Hash	17645177711233004329
Unparsed URL	com,springer!link,/article/10.1007/s42081-023-00209-y s443