🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:

Response:

Calculated Shard: 152 (from laksa116)

2. Crawled Status Check

Query:

curl -X POST \
  'http://laksa152.int.ahrefs:8124/' \
  -H 'Content-Type: text/plain' \
  -H 'X-ClickHouse-Database: crawler3' \
  -H 'Authorization: Basic YXBpOg==' \
  -d 'SELECT getAhrefsURLFromUnparsed(src_unparsed) AS found_url, ifNull(toUnixTimestamp(download_stamp), 0) AS crawl_time, ifNull(toUnixTimestamp(props_url_first_seen), 0) AS first_indexed_time, download_http_code AS http_code, src_unparsed AS src_unparsed, src_root_hash AS src_root_hash, history_drop_reason AS history_drop_reason, meta_title AS meta_title, meta_descriptions AS meta_descriptions, attrs_boilerpipe_text AS attrs_boilerpipe_text, attrs_markdown AS attrs_markdown, attrs_readable_markdown AS attrs_readable_markdown, meta_canonical AS meta_canonical, ml_categories_json AS ml_categories_json, ml_types_json AS ml_types_json, ml_intent_types_json AS ml_intent_types_json, meta_language AS meta_language, attrs_author AS attrs_author, ifNull(toUnixTimestamp(attrs_publish_time), 0) AS attrs_publish_time, ifNull(toUnixTimestamp(attrs_original_publish_time), 0) AS attrs_original_publish_time, ifNull(attrs_is_republished, 0) AS attrs_is_republished, ifNull(attrs_nr_words, 0) AS attrs_nr_words, ifNull(attrs_boilerpipe_nr_words, 0) AS attrs_boilerpipe_nr_words, ifNull(body_ext_links_number, 0) AS body_ext_links_number, ifNull(body_int_links_number, 0) AS body_int_links_number, ifNull(meta_nofollow, 0) AS meta_nofollow, ifNull(meta_noarchive, 0) AS meta_noarchive, ifNull(props_was_rendered, 0) AS props_was_rendered, ifNull(src_redirect, \'\') AS src_redirect, ifNull(download_time_msec, 0) AS download_time_msec, ifNull(download_ttfb_msec, 0) AS download_ttfb_msec, ifNull(download_size, 0) AS download_size FROM crawler3.page_info_local FINAL PREWHERE (src_root_hash, src_unparsed) IN ((getAhrefsRootHashFromUnparsed(getAhrefsUnparsedNoserviceFromURL(\'https://en.wikipedia.org/wiki/Beta_distribution\')), getAhrefsUnparsedNoserviceFromURL(\'https://en.wikipedia.org/wiki/Beta_distribution\'))) FORMAT JSONEachRow'

Response:

{"found_url":"https:\/\/en.wikipedia.org\/wiki\/Beta_distribution","crawl_time":1775480755,"first_indexed_time":1376008991,"http_code":200,"src_unparsed":"org,wikipedia!en,\/wiki\/Beta_distribution s443","src_root_hash":"17790707453426894952","history_drop_reason":null,"meta_title":"Beta distribution - Wikipedia","meta_descriptions":[],"attrs_boilerpipe_text":"Beta\nProbability density function\nCumulative distribution function\nNotation\nBeta(\nα\n,\nβ\n)\nParameters\nα\n> 0\nshape\n(\nreal\n)\nβ\n> 0\nshape\n(\nreal\n)\nSupport\nor\nPDF\nwhere\nand\nis the\nGamma function\n.\nCDF\n(the\nregularized incomplete beta function\n)\nMean\n(see section:\nGeometric mean\n)\nwhere\nis the\ndigamma function\nMedian\nMode\nfor\nα\n,\nβ\n> 1\nAny value in the domain for\nα\n=\nβ\n= 1\nNo mode if\nα\n<1 or\nβ\n<1. Density diverges\nat 0 for\nα\n≤ 1, and at 1 if\nβ\n≤ 1\nVariance\n(see\ntrigamma function\nand see section:\nGeometric variance\n)\nSkewness\nExcess kurtosis\nEntropy\nMGF\nCF\n(see\nConfluent hypergeometric function\n)\nFisher information\nsee section:\nFisher information matrix\nMethod of moments\nIn\nprobability theory\nand\nstatistics\n, the\nbeta distribution\nis a family of continuous\nprobability distributions\ndefined on the interval [0, 1] or (0, 1) in terms of two positive\nparameters\n, denoted by\nalpha\n(\nα\n) and\nbeta\n(\nβ\n), that appear as exponents of the variable and its complement to 1, respectively, and control the\nshape\nof the distribution.\nThe beta distribution has been applied to model the behavior of\nrandom variables\nlimited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions.\nIn\nBayesian inference\n, the beta distribution is the\nconjugate prior probability distribution\nfor the\nBernoulli\n,\nbinomial\n,\nnegative binomial\n, and\ngeometric\ndistributions.\nThe formulation of the beta distribution discussed here is also known as the\nbeta distribution of the first kind\n, whereas\nbeta distribution of the second kind\nis an alternative name for the\nbeta prime distribution\n. The generalization to multiple variables is called a\nDirichlet distribution\n.\nProbability density function\n[\nedit\n]\nAn animation of the beta distribution for different values of its parameters.\nThe\nprobability density function\n(PDF) of the beta distribution, for\nor\n, and shape parameters\n,\n, is a\npower function\nof the variable\nand of its\nreflection\nas follows:\nwhere\nis the\ngamma function\n.  The\nbeta function\n,\n, is a\nnormalization constant\nto ensure that the total probability is 1. In the above equations\nis a\nrealization\n—an observed value that actually occurred—of a\nrandom variable\n.\nSeveral authors, including\nN. L. Johnson\nand\nS. Kotz\n,\n[\n1\n]\nuse the symbols\nand\n(instead of\nand\n) for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the\nBernoulli distribution\n, because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters\nand\napproach zero.\nIn the following, a random variable\nbeta-distributed with parameters\nand\nwill be denoted by:\n[\n2\n]\n[\n3\n]\nOther notations for beta-distributed random variables used in the statistical literature are\n[\n4\n]\nand\n.\n[\n5\n]\nCumulative distribution function\n[\nedit\n]\nCDF for symmetric beta distribution vs.\nx\nand \nα\n = \nβ\nCDF for skewed beta distribution vs.\nx\nand \nβ\n = 5\nα\nThe\ncumulative distribution function\nis\nwhere\nis the\nincomplete beta function\nand\nis the\nregularized incomplete beta function\n.\nFor positive integers\nα\nand\nβ\n, the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a\nbinomial distribution\nwith\n[\n6\n]\nAlternative parameterizations\n[\nedit\n]\nMean and sample size\n[\nedit\n]\nThe beta distribution may also be reparameterized in terms of its mean\nμ\n(0 <\nμ\n< 1)\nand the sum of the two shape parameters\nν\n=\nα\n+\nβ\n> 0\n(\n[\n3\n]\np. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability,  the interpretation of the addition of both shape parameters to be sample size =\nν\n=\nα\n·Posterior +\nβ\n·Posterior is only correct for the Haldane prior probability Beta(0,0).  Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size =\nα\n·Posterior +\nβ\n Posterior − 2, or\nν\n= (sample size) + 2.  For sample size much larger than 2, the difference between these two priors becomes negligible.  (See section\nBayesian inference\nfor further details.)\nν\n=\nα\n+\nβ\nis referred to as the \"sample size\" of a beta distribution, but one should remember that it is, strictly speaking, the \"sample size\" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem.\nThis parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤\nθ\n≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters\nα\nand\nβ\nvia\n[\n3\n]\nα\n=\nμν\n,\nβ\n= (1 −\nμ\n)\nν\nUnder this\nparametrization\n, one may place an\nuninformative prior\nprobability over the mean, and a vague prior probability (such as an\nexponential\nor\ngamma distribution\n) over the positive reals for the sample size, if they are independent, and prior data and\/or beliefs justify it.\nMode and concentration\n[\nedit\n]\nConcave\nbeta distributions, which have\n, can be parametrized in terms of mode and \"concentration\". The mode,\n, and concentration,\n, can be used to define the usual shape parameters as follows:\n[\n7\n]\nFor the mode,\n, to be well-defined, we need\n, or equivalently\n. If instead we define the concentration as\n, the condition simplifies to\nand the beta density at\nand\ncan be written as:\nwhere\ndirectly scales the\nsufficient statistics\n,\nand\n. Note also that in the limit,\n, the distribution becomes flat.\nSolving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters\nα\nand\nβ\n, one can express the\nα\nand\nβ\nparameters in terms of the mean (\nμ\n) and the variance (var):\nThis\nparametrization\nof the beta distribution may lead to a more intuitive understanding than the one based on the original parameters\nα\nand\nβ\n. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance:\nA beta distribution with the two shape parameters\nα\nand\nβ\nis supported on the range [0,1] or (0,1).  It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum,\na\n, and maximum\nc\n(\nc\n>\na\n), values of the distribution,\n[\n1\n]\nby a linear transformation substituting the non-dimensional variable\nx\nin terms of the new variable\ny\n(with support [\na\n,\nc\n] or (\na\n,\nc\n)) and the parameters\na\nand\nc\n:\nThe\nprobability density function\nof the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (\nc\n − \na\n), (so that the total area under the density curve equals a probability of one), and with the \"y\" variable shifted and scaled as follows:\nThat a random variable\nY\nis beta-distributed with four parameters\nα\n,\nβ\n,\na\n, and\nc\nwill be denoted by:\nSome measures of central location are scaled (by (\nc\n − \na\n)) and shifted (by\na\n), as follows:\nNote: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can.\nThe shape parameters of\nY\ncan be written in term of its mean and variance as\nThe statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (\nc\n − \na\n), linearly for the mean deviation and nonlinearly for the variance:\nSince the\nskewness\nand\nexcess kurtosis\nare non-dimensional quantities (as\nmoments\ncentered on the mean and normalized by the\nstandard deviation\n), they are independent of the parameters\na\nand\nc\n, and therefore equal to the expressions given above in terms of\nX\n(with support [0,1] or (0,1)):\nMeasures of central tendency\n[\nedit\n]\nThe\nmode\nof a beta distributed\nrandom variable\nX\nwith\nα\n,\nβ\n> 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:\n[\n1\n]\nWhen both parameters are less than one (\nα\n,\nβ\n< 1), this is the anti-mode: the lowest point of the probability density curve.\n[\n8\n]\nLetting\nα\n=\nβ\n, the expression for the mode simplifies to 1\/2, showing that for\nα\n=\nβ\n> 1 the mode (resp. anti-mode when\nα\n,\nβ\n< 1\n), is at the center of the distribution: it is symmetric in those cases.  See\nShapes\nsection in this article for a full list of mode cases, for arbitrary values of\nα\nand\nβ\n. For several of these cases, the maximum value of the density function occurs at one or both ends.  In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of\nα\n= 2,\nβ\n= 1 (or\nα\n= 1,\nβ\n= 2), the density function becomes a\nright-triangle distribution\nwhich is finite at both ends. In several other cases there is a\nsingularity\nat one end, where the value of the density function approaches infinity. For example, in the case\nα\n=\nβ\n= 1\/2, the beta distribution simplifies to become the\narcsine distribution\n.  There is debate among mathematicians about some of these cases and whether the ends (\nx\n= 0, and\nx\n= 1) can be called\nmodes\nor not.\n[\n9\n]\n[\n2\n]\nMode for beta distribution for 1 ≤\nα\n≤ 5 and 1 ≤ β ≤ 5\nWhether the ends are part of the\ndomain\nof the density function\nWhether a\nsingularity\ncan ever be called a\nmode\nWhether cases with two maxima should be called\nbimodal\nMedian for beta distribution for 0 ≤\nα\n≤ 5 and 0 ≤\nβ\n≤ 5\n(Mean–median) for beta distribution versus alpha and beta from 0 to 2\nThe median of the beta distribution is the unique real number\nfor which the\nregularized incomplete beta function\n. There is no general\nclosed-form expression\nfor the\nmedian\nof the beta distribution for arbitrary values of\nα\nand\nβ\n.\nClosed-form expressions\nfor particular values of the parameters\nα\nand\nβ\nfollow:\n[\ncitation needed\n]\nThe following are the limits with one parameter finite (non-zero) and the other approaching these limits:\n[\ncitation needed\n]\nA reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula\n[\n10\n]\nWhen\nα\n,\nβ\n≥ 1, the\nrelative error\n(the\nabsolute error\ndivided by the median) in this approximation is less than 4% and for both\nα\n≥ 2 and\nβ\n≥ 2 it is less than 1%. The\nabsolute error\ndivided by the difference between the mean and the mode is similarly small:\nMean for beta distribution for\n0 ≤\nα\n≤ 5\nand\n0 ≤\nβ\n≤ 5\nThe\nexpected value\n(mean) (\nμ\n) of a beta distribution\nrandom variable\nX\nwith two parameters\nα\nand\nβ\nis a function of only the ratio\nβ\n\/\nα\nof these parameters:\n[\n1\n]\nLetting\nα\n=\nβ\nin the above expression one obtains\nμ\n= 1\/2\n, showing that for\nα\n=\nβ\nthe mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression:\nTherefore, for\nβ\n\/\nα\n→ 0, or for\nα\n\/\nβ\n→ ∞, the mean is located at the right end,\nx\n= 1\n. For these limit ratios, the beta distribution becomes a one-point\ndegenerate distribution\nwith a\nDirac delta function\nspike at the right end,\nx\n= 1\n, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end,\nx\n= 1\n.\nSimilarly, for\nβ\n\/\nα\n→ ∞, or for\nα\n\/\nβ\n→ 0, the mean is located at the left end,\nx\n= 0\n.  The beta distribution becomes a 1-point\nDegenerate distribution\nwith a\nDirac delta function\nspike at the left end,\nx\n= 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end,\nx\n= 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits:\nWhile for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(\nα\n, \nβ\n) such that\nα\n,\nβ\n> 2\n) it is known that the sample mean (as an estimate of location) is not as\nrobust\nas the sample median, the opposite is the case for uniform or \"U-shaped\" bimodal distributions (with Beta(\nα\n, \nβ\n) such that\nα\n,\nβ\n≤ 1\n), with the modes located at the ends of the distribution.  As Mosteller and Tukey remark (\n[\n11\n]\np. 207) \"the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight.\"  By contrast, it follows that the median of \"U-shaped\" bimodal distributions with modes at the edge of the distribution (with Beta(\nα\n, \nβ\n) such that\nα\n,\nβ\n≤ 1\n) is not robust, as the sample median drops the extreme sample observations from consideration.  A practical application of this occurs for example for\nrandom walks\n, since the probability for the time of the last visit to the origin in a random walk is distributed as the\narcsine distribution\nBeta(1\/2, 1\/2):\n[\n5\n]\n[\n12\n]\nthe mean of a number of\nrealizations\nof a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case).\n(Mean − GeometricMean) for beta distribution versus\nα\nand\nβ\nfrom 0 to 2, showing the asymmetry between\nα\nand\nβ\nfor the geometric mean\nGeometric means for beta distribution Purple =\nG\n(\nx\n), Yellow =\nG\n(1 − \nx\n), smaller values\nα\nand\nβ\nin front\nGeometric means for beta distribution. purple =\nG\n(\nx\n), yellow =\nG\n(1 − \nx\n), larger values\nα\nand\nβ\nin front\nThe logarithm of the\ngeometric mean\nG\nX\nof a distribution with\nrandom variable\nX\nis the arithmetic mean of ln(\nX\n), or, equivalently, its expected value:\nFor a beta distribution, the expected value integral gives:\nwhere\nψ\nis the\ndigamma function\n.\nTherefore, the geometric mean of a beta distribution with shape parameters\nα\nand\nβ\nis the exponential of the digamma functions of\nα\nand\nβ\nas follows:\nWhile for a beta distribution with equal shape parameters\nα\n=\nβ\n, it follows that skewness = 0 and mode = mean = median = 1\/2, the geometric mean is less than 1\/2:\n0 <\nG\nX\n< 1\/2\n.  The reason for this is that the logarithmic transformation strongly weights the values of\nX\nclose to zero, as ln(\nX\n) strongly tends towards negative infinity as\nX\napproaches zero, while ln(\nX\n) flattens towards zero as\nX\n→ 1\n.\nAlong a line\nα\n=\nβ\n, the following limits apply:\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\nThe accompanying plot shows the difference between the mean and the geometric mean for shape parameters\nα\nand\nβ\nfrom zero to 2.  Besides the fact that the difference between them approaches zero as\nα\nand\nβ\napproach infinity and that the difference becomes large for values of\nα\nand\nβ\napproaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters\nα\nand\nβ\n. The difference between the geometric mean and the mean is larger for small values of\nα\nin relation to\nβ\nthan when exchanging the magnitudes of\nβ\nand\nα\n.\nN. L.Johnson\nand\nS. Kotz\n[\n1\n]\nsuggest the logarithmic approximation to the digamma function\nψ\n(\nα\n) ≈ ln(\nα\n − 1\/2) which results in the following approximation to the geometric mean:\nNumerical values for the\nrelative error\nin this approximation follow: [\n(\nα\n=\nβ\n= 1): 9.39%\n]; [\n(\nα\n=\nβ\n= 2): 1.29%\n];  [\n(\nα\n= 2,\nβ\n= 3): 1.51%\n];  [\n(\nα\n= 3,\nβ\n= 2): 0.44%\n]; [\n(\nα\n=\nβ\n= 3): 0.51%\n]; [\n(\nα\n=\nβ\n= 4): 0.26%\n]; [\n(\nα\n= 3,\nβ\n= 4): 0.55%\n];  [\n(\nα\n= 4,\nβ\n= 3): 0.24%\n].\nSimilarly, one can calculate the value of shape parameters required for the geometric mean to equal 1\/2. Given the value of the parameter\nβ\n, what would be the value of the other parameter, \nα\n, required for the geometric mean to equal 1\/2?.  The answer is that (for\nβ\n> 1\n), the value of\nα\nrequired tends towards\nβ\n+ 1\/2\nas\nβ\n→ ∞\n. For example, all these couples have the same geometric mean of 1\/2: [\nβ\n= 1,\nα\n= 1.4427\n], [\nβ\n= 2,\nα\n= 2.46958\n], [\nβ\n= 3,\nα\n= 3.47943\n], [\nβ\n= 4,\nα\n= 4.48449\n], [\nβ\n= 5,\nα\n= 5.48756\n], [\nβ\n= 10,\nα\n= 10.4938\n], [\nβ\n= 100,\nα\n= 100.499\n].\nThe fundamental property of the geometric mean, which can be proven to be false for any other mean, is\nThis makes the geometric mean the only correct mean when averaging\nnormalized\nresults, that is results that are presented as ratios to reference values.\n[\n13\n]\nThis is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions.  The geometric mean plays a central role in maximum likelihood estimation, see section \"Parameter estimation, maximum likelihood.\" Actually, when performing maximum likelihood estimation, besides the\ngeometric mean\nG\nX\nbased on the random variable X, also another geometric mean appears naturally: the\ngeometric mean\nbased on the linear transformation ––\n(1 −\nX\n)\n, the mirror-image of\nX\n, denoted by\nG\n(1−\nX\n)\n:\nAlong a line\nα\n=\nβ\n, the following limits apply:\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\nIt has the following approximate value:\nAlthough both\nG\nX\nand\nG\n1−\nX\nare asymmetric, in the case that both shape parameters are equal\nα\n=\nβ\n, the geometric means are equal:\nG\nX\n=\nG\n(1−\nX\n)\n.  This equality follows from the following symmetry displayed between both geometric means:\nHarmonic mean for beta distribution for 0 < \nα\n < 5 and 0 < \nβ\n < 5\nHarmonic mean for beta distribution versus\nα\nand\nβ\nfrom 0 to 2\nHarmonic means for beta distribution Purple =\nH\n(\nX\n), Yellow =\nH\n(1 − \nX\n), smaller values\nα\nand\nβ\nin front\nHarmonic means for beta distribution: purple =\nH\n(\nX\n), yellow =\nH\n(1 − \nX\n), larger values\nα\nand\nβ\nin front\nThe inverse of the\nharmonic mean\n(\nH\nX\n) of a distribution with\nrandom variable\nX\nis the arithmetic mean of 1\/\nX\n, or, equivalently, its expected value. Therefore, the\nharmonic mean\n(\nH\nX\n) of a beta distribution with shape parameters\nα\nand\nβ\nis:\nThe\nharmonic mean\n(\nH\nX\n) of a beta distribution with\nα\n< 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter\nα\nless than unity.\nLetting\nα\n=\nβ\nin the above expression one obtains\nshowing that for\nα\n=\nβ\nthe harmonic mean ranges from 0, for\nα\n=\nβ\n= 1, to 1\/2, for\nα\n=\nβ\n→ ∞.\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\nThe harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean\nH\nX\nbased on the random variable\nX\n, also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − \nX\n), the mirror-image of\nX\n, denoted by\nH\n1 − \nX\n:\nThe\nharmonic mean\n(\nH\n(1 − \nX\n)\n) of a beta distribution with\nβ\n< 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter\nβ\nless than unity.\nLetting\nα\n=\nβ\nin the above expression one obtains\nshowing that for\nα\n=\nβ\nthe harmonic mean ranges from 0, for\nα\n=\nβ\n= 1, to 1\/2, for\nα\n=\nβ\n→ ∞.\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\nAlthough both\nH\nX\nand\nH\n1−\nX\nare asymmetric, in the case that both shape parameters are equal\nα\n=\nβ\n, the harmonic means are equal:\nH\nX\n=\nH\n1−\nX\n.  This equality follows from the following symmetry displayed between both harmonic means:\nMeasures of statistical dispersion\n[\nedit\n]\nThe\nvariance\n(the second moment centered on the mean) of a beta distribution\nrandom variable\nX\nwith parameters\nα\nand\nβ\nis:\n[\n1\n]\n[\n14\n]\nLetting\nα\n=\nβ\nin the above expression one obtains\nshowing that for\nα\n=\nβ\nthe variance decreases monotonically as\nα\n=\nβ\nincreases. Setting\nα\n=\nβ\n= 0\nin this expression, one finds the maximum variance var(\nX\n) = 1\/4\n[\n1\n]\nwhich only occurs approaching the limit, at\nα\n=\nβ\n= 0\n.\nThe beta distribution may also be\nparametrized\nin terms of its mean\nμ\n(0 <\nμ\n< 1)\nand sample size\nν\n=\nα\n+\nβ\n(\nν\n> 0\n) (see subsection\nMean and sample size\n):\nUsing this\nparametrization\n, one can express the variance in terms of the mean\nμ\nand the sample size\nν\nas follows:\nSince\nν\n=\nα\n+\nβ\n> 0\n, it follows that\nvar(\nX\n) <\nμ\n(1 −\nμ\n)\n.\nFor a symmetric distribution, the mean is at the middle of the distribution,\nμ\n= 1\/2\n, and therefore:\nAlso, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\nGeometric variance and covariance\n[\nedit\n]\nlog geometric variances vs.\nα\nand\nβ\nlog geometric variances vs.\nα\nand\nβ\nThe logarithm of the geometric variance, ln(var\nGX\n), of a distribution with\nrandom variable\nX\nis the second moment of the logarithm of\nX\ncentered on the geometric mean of\nX\n, ln(\nG\nX\n):\nand therefore, the geometric variance is:\nIn the\nFisher information\nmatrix, and the curvature of the log\nlikelihood function\n, the logarithm of the geometric variance of the\nreflected\nvariable 1 − \nX\nand the logarithm of the geometric covariance between\nX\nand 1 − \nX\nappear:\nFor a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section\n§ Moments of logarithmically transformed random variables\n.  The\nvariance\nof the logarithmic variables and\ncovariance\nof ln \nX\nand ln(1−\nX\n) are:\nwhere the\ntrigamma function\n, denoted\nψ\n1\n(\nα\n), is the second of the\npolygamma functions\n, and is defined as the derivative of the\ndigamma function\n:\nTherefore,\nThe accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters\nα\nand\nβ\n.  The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters\nα\nand\nβ\ngreater than 2, and that the log geometric variances rapidly rise in value for shape parameter values\nα\nand\nβ\nless than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for\nα\nand\nβ\nless than unity.\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\nLimits with two parameters varying:\nAlthough both ln(var\nGX\n) and ln(var\nG\n(1 − \nX\n)\n) are asymmetric, when the shape parameters are equal,\nα\n=\nβ\n, one has: ln(var\nGX\n) = ln(var\nG\n(1−\nX\n)\n). This equality follows from the following symmetry displayed between both log geometric variances:\nThe log geometric covariance is symmetric:\nMean absolute deviation around the mean\n[\nedit\n]\nRatio of, ean abs.dev. to std.dev. for beta distribution with α and β ranging from 0 to 5\nRatio of mean abs.dev. to std.dev. for beta distribution with mean 0 ≤\nμ\n≤ 1 and sample size 0 <\nν\n≤ 10\nThe\nmean absolute deviation\naround the mean for the beta distribution with shape parameters\nα\nand\nβ\nis:\n[\n9\n]\nThe mean absolute deviation around the mean is a more\nrobust\nestimator\nof\nstatistical dispersion\nthan the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta(\nα\n, \nβ\n) distributions with\nα\n,\nβ\n> 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean.  Therefore, the effect of very large deviations from the mean are not as overly weighted.\nUsing\nStirling's approximation\nto the\nGamma function\n,\nN.L.Johnson\nand\nS.Kotz\n[\n1\n]\nderived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for\nα\n=\nβ\n= 1, and it decreases to zero as\nα\n→ ∞,\nβ\n→ ∞):\nAt the limit\nα\n→ ∞,\nβ\n→ ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution:\n.  For\nα\n=\nβ\n= 1 this ratio equals\n, so that from\nα\n=\nβ\n= 1 to\nα\n,\nβ\n→ ∞ the ratio decreases by 8.5%.  For\nα\n=\nβ\n= 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from\nα\n=\nβ\n= 0 to\nα\n=\nβ\n= 1, and by 25% from\nα\n=\nβ\n= 0 to\nα\n,\nβ\n→ ∞ . However, for skewed beta distributions such that\nα\n→ 0 or\nβ\n→ 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation.\nUsing the\nparametrization\nin terms of mean\nμ\nand sample size\nν\n=\nα\n+\nβ\n> 0:\nα\n=\nμν\n,\nβ\n= (1 −\nμ\n)\nν\none can express the mean\nabsolute deviation\naround the mean in terms of the mean\nμ\nand the sample size\nν\nas follows:\nFor a symmetric distribution, the mean is at the middle of the distribution,\nμ\n= 1\/2, and therefore:\nAlso, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\nMean absolute difference\n[\nedit\n]\nThe\nmean absolute difference\nfor the beta distribution is:\nThe\nGini coefficient\nfor the beta distribution is half of the relative mean absolute difference:\nSkewness for beta distribution as a function of variance and mean\nThe\nskewness\n(the third moment centered on the mean, normalized by the 3\/2 power of the variance) of the beta distribution is\n[\n1\n]\nLetting\nα\n=\nβ\nin the above expression one obtains\nγ\n1\n= 0, showing once again that for\nα\n=\nβ\nthe distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for\nα\n<\nβ\n, negative skew (left-tailed) for\nα\n>\nβ\n.\nUsing the\nparametrization\nin terms of mean\nμ\nand sample size\nν\n=\nα\n+\nβ\n:\none can express the skewness in terms of the mean\nμ\nand the sample size ν as follows:\nThe skewness can also be expressed just in terms of the variance\nvar\nand the mean\nμ\nas follows:\nThe accompanying plot of skewness as a function of variance and mean shows that maximum variance (1\/4) is coupled with zero skewness and the symmetry condition (\nμ\n= 1\/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the \"mass\" of the probability distribution is concentrated at the ends (minimum variance).\nThe following expression for the square of the skewness, in terms of the sample size\nν\n=\nα\n+\nβ\nand the variance var, is useful for the method of moments estimation of four parameters:\nThis expression correctly gives a skewness of zero for\nα\n=\nβ\n, since in that case (see\n§ Variance\n):\n.\nFor the symmetric case (\nα\n=\nβ\n), skewness = 0 over the whole range, and the following limits apply:\nFor the asymmetric cases (\nα\n≠\nβ\n) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\nExcess Kurtosis for Beta Distribution as a function of variance and mean\nThe beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear.\n[\n15\n]\nKurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc.\n[\n16\n]\nUnfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping\n[\n17\n]\nuse the symbol γ\n2\nfor the\nexcess kurtosis\n, but\nAbramowitz and Stegun\n[\n18\n]\nuse different terminology.  To prevent confusion\n[\n19\n]\nbetween kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows:\n[\n9\n]\n[\n20\n]\nLetting\nα\n=\nβ\nin the above expression one obtains\nTherefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as {\nα\n=\nβ\n} → 0, and approaching a maximum value of zero as {\nα\n=\nβ\n} → ∞.  The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve.  This minimum value is reached when all the probability density is entirely concentrated at each end\nx\n= 0 and\nx\n= 1, with nothing in between: a 2-point\nBernoulli distribution\nwith equal probability 1\/2 at each end (a coin toss: see section below \"Kurtosis bounded by the square of the skewness\" for further discussion).  The description of\nkurtosis\nas a measure of the \"potential outliers\" (or \"potential rare, extreme values\") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For\nα\n≠\nβ\n, skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for\nα\n→ 0 for finite\nβ\n, or for\nβ\n→ 0 for finite\nα\n) because the side away from the mode will produce occasional extreme values.  Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends.\nUsing the\nparametrization\nin terms of mean\nμ\nand sample size\nν\n=\nα\n+\nβ\n:\none can express the excess kurtosis in terms of the mean\nμ\nand the sample size\nν\nas follows:\nThe excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size\nν\nas follows:\nand, in terms of the variance\nvar\nand the mean\nμ\nas follows:\nThe plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1\/4) and the symmetry condition: the mean occurring at the midpoint (\nμ\n= 1\/2). This occurs for the symmetric case of\nα\n=\nβ\n= 0, with zero skewness.  At the limit, this is the 2 point\nBernoulli distribution\nwith equal probability 1\/2 at each\nDirac delta function\nend\nx\n= 0 and\nx\n= 1 and zero probability everywhere else. (A coin toss: one face of the coin being\nx\n= 0 and the other face being\nx\n= 1.)  Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end.  Excess kurtosis is minimum: the probability density \"mass\" is zero at the mean and it is concentrated at the two peaks at each end.  Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-\"peaky\" with nothing in between them.\nOn the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end (\nμ\n= 0 or\nμ\n= 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end.\nAlternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows:\nFrom this last expression, one can obtain the same limits published over a century ago by\nKarl Pearson\n[\n21\n]\nfor the beta distribution (see section below titled \"Kurtosis bounded by the square of the skewness\"). Setting\nα\n + \nβ\n = \nν\n = 0 in the above expression, one obtains Pearson's lower boundary (values for the skewness and excess kurtosis below the boundary (excess kurtosis + 2 − skewness\n2\n = 0) cannot occur for any distribution, and hence\nKarl Pearson\nappropriately called the region below this boundary the \"impossible region\"). The limit of\nα\n + \nβ\n = \nν\n → ∞ determines Pearson's upper boundary.\ntherefore:\nValues of\nν\n = \nα\n + \nβ\nsuch that\nν\nranges from zero to infinity, 0 < \nν\n < ∞, span the whole region of the beta distribution in the plane of excess kurtosis versus squared skewness.\nFor the symmetric case (\nα\n = \nβ\n), the following limits apply:\nFor the unsymmetric cases (\nα\n ≠ \nβ\n) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\nCharacteristic function\n[\nedit\n]\nRe(characteristic function)\nsymmetric case\nα\n = \nβ\nranging from 25 to 0\nRe(characteristic function)\nsymmetric case\nα\n = \nβ\nranging from 0 to 25\nRe(characteristic function)\nβ\n=\nα\n + 1\/2;\nα\n ranging from 25 to 0\nRe(characteristic function)\nα\n = \nβ\n + 1\/2;\nβ\nranging from 25 to 0\nRe(characteristic function)\nα\n = \nβ\n + 1\/2;\nβ\nranging from 0 to 25\nThe\ncharacteristic function\nis the\nFourier transform\nof the probability density function. The characteristic function of the beta distribution is\nKummer's confluent hypergeometric function\n(of the first kind):\n[\n1\n]\n[\n18\n]\n[\n22\n]\nwhere\nis the\nrising factorial\n. The value of the characteristic function for\nt\n = 0, is one:\nAlso, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable\nt\n:\nThe symmetric case\nα\n=\nβ\nsimplifies the characteristic function of the beta distribution to a\nBessel function\n, since in the special case\nα\n+\nβ\n= 2\nα\nthe\nconfluent hypergeometric function\n(of the first kind) reduces to a\nBessel function\n(the modified Bessel function of the first kind\n) using\nKummer's\nsecond transformation as follows:\nIn the accompanying plots, the\nreal part\n(Re) of the\ncharacteristic function\nof the beta distribution is displayed for symmetric (\nα\n=\nβ\n) and skewed (\nα\n≠\nβ\n) cases.\nMoment generating function\n[\nedit\n]\nIt also follows\n[\n1\n]\n[\n9\n]\nthat the\nmoment generating function\nis\nIn particular\nM\nX\n(\nα\n;\nβ\n; 0) = 1.\nUsing the\nmoment generating function\n, the\nk\n-th\nraw moment\nis given by\n[\n1\n]\nthe factor\nmultiplying the (exponential series) term\nin the series of the\nmoment generating function\nwhere (\nx\n)\n(\nk\n)\nis a\nPochhammer symbol\nrepresenting rising factorial. It can also be written in a recursive form as\nSince the moment generating function\nhas a positive radius of convergence,\n[\ncitation needed\n]\nthe beta distribution is\ndetermined by its moments\n.\n[\n23\n]\nMoments of transformed random variables\n[\nedit\n]\nMoments of linearly transformed, product and inverted random variables\n[\nedit\n]\nOne can also show the following expectations for a transformed random variable,\n[\n1\n]\nwhere the random variable\nX\nis Beta-distributed with parameters\nα\nand\nβ\n:\nX\n~ Beta(\nα\n, \nβ\n).  The expected value of the variable 1 − \nX\nis the mirror-symmetry of the expected value based on \nX\n:\nDue to the mirror-symmetry of the probability density function of the beta distribution, the variances based on variables\nX\nand 1 − \nX\nare identical, and the covariance on\nX\n(1 − \nX\n) is the negative of the variance:\nThese are the expected values for inverted variables, (these are related to the harmonic means, see\n§ Harmonic mean\n):\nThe following transformation by dividing the variable\nX\nby its mirror-image\nX\n\/(1 − \nX\n)) results in the expected value of the \"inverted beta distribution\" or\nbeta prime distribution\n(also known as beta distribution of the second kind or\nPearson's Type VI\n):\n[\n1\n]\nVariances of these transformed variables can be obtained by integration, as the expected values of the second moments centered on the corresponding variables:\nThe following variance of the variable\nX\ndivided by its mirror-image (\nX\n\/(1−\nX\n) results in the variance of the \"inverted beta distribution\" or\nbeta prime distribution\n(also known as beta distribution of the second kind or\nPearson's Type VI\n):\n[\n1\n]\nThe covariances are:\n \nThese expectations and variances appear in the four-parameter Fisher information matrix (\n§ Fisher information\n.)\nMoments of logarithmically transformed random variables\n[\nedit\n]\nPlot of logit(\nX\n) = ln(\nX\n\/(1 −\nX\n)) (vertical axis) vs.\nX\nin the domain of 0 to 1 (horizontal axis). Logit transformations are interesting, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable\nExpected values for\nlogarithmic transformations\n(useful for\nmaximum likelihood\nestimates, see\n§ Parameter estimation, Maximum likelihood\n) are discussed in this section.  The following logarithmic linear transformations are related to the geometric means\nG\nX\nand\nG\n1−\nX\n(see\n§ Geometric Mean\n):\nWhere the\ndigamma function\nψ\n(\nα\n) is defined as the\nlogarithmic derivative\nof the\ngamma function\n:\n[\n18\n]\nLogit\ntransformations are interesting,\n[\n24\n]\nas they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable:\nJohnson\n[\n25\n]\nconsidered the distribution of the\nlogit\n– transformed variable ln(\nX\n\/1 − \nX\n), including its moment generating function and approximations for large values of the shape parameters.  This transformation extends the finite support [0, 1] based on the original variable\nX\nto infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the\nlogistic-beta distribution\n.\nHigher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows:\ntherefore the\nvariance\nof the logarithmic variables and\ncovariance\nof ln(\nX\n) and ln(1−\nX\n) are:\nwhere the\ntrigamma function\n, denoted\nψ\n1\n(\nα\n), is the second of the\npolygamma functions\n, and is defined as the derivative of the\ndigamma\nfunction:\nThe variances and covariance of the logarithmically transformed variables\nX\nand (1 − \nX\n) are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables\nX\nand (1 − \nX\n), as the logarithm approaches negative infinity for the variable approaching zero.\nThese logarithmic variances and covariance are the elements of the\nFisher information\nmatrix for the beta distribution.  They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation).\nThe variances of the log inverse variables are identical to the variances of the log variables:\nIt also follows that the variances of the\nlogit\n-transformed variables are\nQuantities of information (entropy)\n[\nedit\n]\nGiven a beta distributed random variable,\nX\n~ Beta(\nα\n, \nβ\n), the\ndifferential entropy\nof\nX\nis (measured in\nnats\n),\n[\n26\n]\nthe expected value of the negative of the logarithm of the\nprobability density function\n:\nwhere\nf\n(\nx\n;\nα\n,\nβ\n) is the\nprobability density function\nof the beta distribution:\nThe\ndigamma function\nψ\nappears in the formula for the differential entropy as a consequence of Euler's integral formula for the\nharmonic numbers\nwhich follows from the integral:\nThe\ndifferential entropy\nof the beta distribution is negative for all values of\nα\nand\nβ\ngreater than zero, except at\nα\n = \nβ\n = 1 (for which values the beta distribution is the same as the\nuniform distribution\n), where the\ndifferential entropy\nreaches its\nmaximum\nvalue of zero.  It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable.\nFor\nα\nor\nβ\napproaching zero, the\ndifferential entropy\napproaches its\nminimum\nvalue of negative infinity. For (either or both)\nα\nor\nβ\napproaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both)\nα\nor\nβ\napproaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order.  If either\nα\nor\nβ\napproaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else.  If both shape parameters are equal (the symmetric case),\nα\n=\nβ\n, and they approach infinity simultaneously, the probability density becomes a spike (\nDirac delta function\n) concentrated at the middle\nx\n = 1\/2, and hence there is 100% probability at the middle\nx\n = 1\/2 and zero probability everywhere else.\nThe (continuous case)\ndifferential entropy\nwas introduced by Shannon in his original paper (where he named it the \"entropy of a continuous distribution\"), as the concluding part of the same paper where he defined the\ndiscrete entropy\n.\n[\n27\n]\nIt is known since then that the differential entropy may differ from the\ninfinitesimal\nlimit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy.\nGiven two beta distributed random variables,\nX\n1\n ~ Beta(\nα\n, \nβ\n) and\nX\n2\n~ Beta(\nα\n′\n,\nβ\n′\n), the\ncross-entropy\nis (measured in nats)\n[\n28\n]\nThe\ncross entropy\nhas been used as an error metric to measure the distance between two hypotheses.\n[\n29\n]\n[\n30\n]\nIts absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood\n[\n28\n]\n(see section on \"Parameter estimation. Maximum likelihood estimation\")).\nThe relative entropy, or\nKullback–Leibler divergence\nD\nKL\n(\nX\n1\n||\nX\n2\n), is a measure of the inefficiency of assuming that the distribution is\nX\n2\n~ Beta(\nα\n′\n,\nβ\n′\n)  when the distribution is really\nX\n1\n~ Beta(\nα\n,\nβ\n). It is defined as follows (measured in nats).\nThe relative entropy, or\nKullback–Leibler divergence\n, is always non-negative.  A few numerical examples follow:\nX\n1\n~ Beta(1, 1) and\nX\n2\n~ Beta(3, 3);\nD\nKL\n(\nX\n1\n||\nX\n2\n) = 0.598803;\nD\nKL\n(\nX\n2\n||\nX\n1\n) = 0.267864;\nh\n(\nX\n1\n) = 0;\nh\n(\nX\n2\n) = −0.267864\nX\n1\n~ Beta(3, 0.5) and\nX\n2\n~ Beta(0.5, 3);\nD\nKL\n(\nX\n1\n||\nX\n2\n) = 7.21574;\nD\nKL\n(\nX\n2\n||\nX\n1\n) = 7.21574;\nh\n(\nX\n1\n) = −1.10805;\nh\n(\nX\n2\n) = −1.10805.\nThe\nKullback–Leibler divergence\nis not symmetric\nD\nKL\n(\nX\n1\n||\nX\n2\n) ≠\nD\nKL\n(\nX\n2\n||\nX\n1\n)  for the case in which the individual beta distributions Beta(1, 1) and Beta(3, 3) are symmetric, but have different entropies\nh\n(\nX\n1\n) ≠\nh\n(\nX\n2\n). The value of the Kullback divergence depends on the direction traveled: whether going from a higher (differential) entropy to a lower (differential) entropy or the other way around. In the numerical example above, the Kullback divergence measures the inefficiency of assuming that the distribution is (bell-shaped) Beta(3, 3), rather than (uniform) Beta(1, 1). The \"h\" entropy of Beta(1, 1) is higher than the \"h\" entropy of Beta(3, 3) because the uniform distribution Beta(1, 1) has a maximum amount of disorder. The Kullback divergence is more than two times higher (0.598803 instead of 0.267864) when measured in the direction of decreasing entropy: the direction that assumes that the (uniform) Beta(1, 1) distribution is (bell-shaped) Beta(3, 3) rather than the other way around. In this restricted sense, the Kullback divergence is consistent with the\nsecond law of thermodynamics\n.\nThe\nKullback–Leibler divergence\nis symmetric\nD\nKL\n(\nX\n1\n||\nX\n2\n) =\nD\nKL\n(\nX\n2\n||\nX\n1\n) for the skewed cases Beta(3, 0.5) and Beta(0.5, 3) that have equal differential entropy\nh\n(\nX\n1\n) =\nh\n(\nX\n2\n).\nThe symmetry condition:\nfollows from the above definitions and the mirror-symmetry\nf\n(\nx\n;\nα\n,\nβ\n) =\nf\n(1 −\nx\n;\nα\n,\nβ\n) enjoyed by the beta distribution.\nRelationships between statistical measures\n[\nedit\n]\nMean, mode and median relationship\n[\nedit\n]\nIf 1 <\nα\n<\nβ\nthen mode ≤ median ≤ mean.\n[\n10\n]\nExpressing the mode (only for\nα\n,\nβ\n> 1), and the mean in terms of\nα\nand\nβ\n:\nIf 1 <\nβ\n<\nα\nthen the order of the inequalities are reversed. For\nα\n,\nβ\n> 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of\nx\n. On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of\nx\n, for the (\npathological\n) case of\nα\n= 1 and\nβ\n= 1, for which values the beta distribution approaches the uniform distribution and the\ndifferential entropy\napproaches its\nmaximum\nvalue, and hence maximum \"disorder\".\nFor example, for\nα\n= 1.0001 and\nβ\n= 1.00000001:\nmode   = 0.9999;   PDF(mode) = 1.00010\nmean   = 0.500025; PDF(mean) = 1.00003\nmedian = 0.500035; PDF(median) = 1.00003\nmean − mode   = −0.499875\nmean − median = −9.65538 × 10\n−6\nwhere PDF stands for the value of the\nprobability density function\n.\nMean, geometric mean and harmonic mean relationship\n[\nedit\n]\n:Mean, median, geometric mean and harmonic mean for beta distribution with 0 <\nα\n=\nβ\n< 5\nIt is known from the\ninequality of arithmetic and geometric means\nthat the geometric mean is lower than the mean.  Similarly, the harmonic mean is lower than the geometric mean.  The accompanying plot shows that for\nα\n=\nβ\n, both the mean and the median are exactly equal to 1\/2, regardless of the value of\nα\n=\nβ\n, and the mode is also equal to 1\/2 for\nα\n=\nβ\n> 1, however the geometric and harmonic means are lower than 1\/2 and they only approach this value asymptotically as\nα\n=\nβ\n→ ∞.\nKurtosis bounded by the square of the skewness\n[\nedit\n]\nBeta distribution\nα\nand\nβ\nparameters vs. excess kurtosis and squared skewness\nAs remarked by\nFeller\n,\n[\n5\n]\nin the\nPearson system\nthe beta probability density appears as\ntype I\n(any difference between the beta distribution and Pearson's type I distribution is only superficial and it makes no difference for the following discussion regarding the relationship between kurtosis and skewness).\nKarl Pearson\nshowed, in Plate 1 of his paper\n[\n21\n]\npublished in 1916,  a graph with the\nkurtosis\nas the vertical axis (\nordinate\n) and the square of the\nskewness\nas the horizontal axis (\nabscissa\n), in which a number of distributions were displayed.\n[\n31\n]\nThe region occupied by the beta distribution is bounded by the following two\nlines\nin the (skewness\n2\n,kurtosis)\nplane\n, or the (skewness\n2\n,excess kurtosis)\nplane\n:\nor, equivalently,\nAt a time when there were no powerful digital computers,\nKarl Pearson\naccurately computed further boundaries,\n[\n32\n]\n[\n21\n]\nfor example, separating the \"U-shaped\" from the \"J-shaped\" distributions. The lower boundary line (excess kurtosis + 2 − skewness\n2\n= 0) is produced by skewed \"U-shaped\" beta distributions with both values of shape parameters\nα\nand\nβ\nclose to zero.  The upper boundary line (excess kurtosis − (3\/2) skewness\n2\n= 0) is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter.\nKarl Pearson\nshowed\n[\n21\n]\nthat this upper boundary line (excess kurtosis − (3\/2) skewness\n2\n= 0) is also the intersection with Pearson's distribution III, which has unlimited support in one direction (towards positive infinity), and can be bell-shaped or J-shaped. His son,\nEgon Pearson\n, showed\n[\n31\n]\nthat the region (in the kurtosis\/squared-skewness plane) occupied by the beta distribution (equivalently, Pearson's distribution I) as it approaches this boundary (excess kurtosis − (3\/2) skewness\n2\n= 0) is shared with the\nnoncentral chi-squared distribution\n.  Karl Pearson\n[\n33\n]\n(Pearson 1895, pp. 357, 360, 373–376) also showed that the\ngamma distribution\nis a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6\/\nk\nand the square of the skewness is 4\/\nk\n, hence (excess kurtosis − (3\/2) skewness\n2\n= 0) is identically satisfied by the gamma distribution regardless of the value of the parameter \"k\"). Pearson later noted that the\nchi-squared distribution\nis a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the\nchi-squared distribution\nthe excess kurtosis is 12\/\nk\nand the square of the skewness is 8\/\nk\n, hence (excess kurtosis − (3\/2) skewness\n2\n= 0) is identically satisfied regardless of the value of the parameter \"k\"). This is to be expected, since the chi-squared distribution\nX\n~ χ\n2\n(\nk\n) is a special case of the gamma distribution, with parametrization X ~ Γ(k\/2, 1\/2) where k is a positive integer that specifies the \"number of degrees of freedom\" of the chi-squared distribution.\nAn example of a beta distribution near the upper boundary (excess kurtosis − (3\/2) skewness\n2\n= 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)\/(skewness\n2\n) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness\n2\n= 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)\/(skewness\n2\n) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both\nα\nand\nβ\napproaching zero symmetrically, the excess kurtosis reaches its minimum value at −2.  This minimum value occurs at the point at which the lower boundary line intersects the vertical axis (\nordinate\n). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards).\nValues for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness\n2\n= 0) cannot occur for any distribution, and hence\nKarl Pearson\nappropriately called the region below this boundary the \"impossible region\". The boundary for this \"impossible region\" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters\nα\nand\nβ\napproach zero and hence all the probability density is concentrated at the ends:\nx\n= 0, 1 with practically nothing in between them. Since for\nα\n≈\nβ\n≈ 0 the probability density is concentrated at the two ends\nx\n= 0 and\nx\n= 1, this \"impossible boundary\" is determined by a\nBernoulli distribution\n, where the two only possible outcomes occur with respective probabilities\np\nand\nq\n= 1 − \np\n. For cases approaching this limit boundary with symmetry\nα\n=\nβ\n, skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are\np\n≈\nq\n≈ 1\/2.  For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness\n2\n, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities\nat the left end\nx\n= 0 and\nat the right end\nx\n= 1.\nAll statements are conditional on\nα\n,\nβ\n> 0:\nGeometry of the probability density function\n[\nedit\n]\nInflection point location versus α and β showing regions with one inflection point\nInflection point location versus α and β showing region with two inflection points\nFor certain values of the shape parameters α and β, the\nprobability density function\nhas\ninflection points\n, at which the\ncurvature\nchanges sign.  The position of these inflection points can be useful as a measure of the\ndispersion\nor spread of the distribution.\nDefining the following quantity:\nPoints of inflection occur,\n[\n1\n]\n[\n8\n]\n[\n9\n]\n[\n20\n]\ndepending on the value of the shape parameters\nα\nand\nβ\n, as follows:\n(\nα\n> 2,\nβ\n> 2) The distribution is bell-shaped (symmetric for\nα\n=\nβ\nand skewed otherwise), with\ntwo inflection points\n, equidistant from the mode:\n(\nα\n= 2,\nβ\n> 2) The distribution is unimodal, positively skewed, right-tailed, with\none inflection point\n, located to the right of the mode:\n(\nα\n> 2, β = 2) The distribution is unimodal, negatively skewed, left-tailed, with\none inflection point\n, located to the left of the mode:\n(1 <\nα\n< 2, β > 2,\nα\n+\nβ\n> 2) The distribution is unimodal, positively skewed, right-tailed, with\none inflection point\n, located to the right of the mode:\n(0 <\nα\n< 1, 1 <\nβ\n< 2) The distribution has a mode at the left end\nx\n= 0 and it is positively skewed, right-tailed. There is\none inflection point\n, located to the right of the mode:\n(\nα\n> 2, 1 <\nβ\n< 2) The distribution is unimodal negatively skewed, left-tailed, with\none inflection point\n, located to the left of the mode:\n(1 <\nα\n< 2,  0 <\nβ\n< 1) The distribution has a mode at the right end\nx\n= 1 and it is negatively skewed, left-tailed. There is\none inflection point\n, located to the left of the mode:\nThere are no inflection points in the remaining (symmetric and skewed) regions: U-shaped: (\nα\n,\nβ\n< 1) upside-down-U-shaped: (1 <\nα\n< 2, 1 <\nβ\n< 2), reverse-J-shaped (\nα\n< 1,\nβ\n> 2) or J-shaped: (\nα\n> 2,\nβ\n< 1)\nThe accompanying plots show the inflection point locations (shown vertically, ranging from 0 to 1) versus\nα\nand\nβ\n(the horizontal axes ranging from 0 to 5). There are large cuts at surfaces intersecting the lines\nα\n= 1,\nβ\n= 1,\nα\n= 2, and\nβ\n= 2 because at these values the beta distribution change from 2 modes, to 1 mode to no mode.\nPDF for symmetric beta distribution vs.\nx\nand\nα\n = \nβ\nfrom 0 to 30\nPDF for symmetric beta distribution vs. x and\nα\n = \nβ\nfrom 0 to 2\nPDF for skewed beta distribution vs.\nx\nand\nβ\n = 2.5\nα\nfrom 0 to 9\nPDF for skewed beta distribution vs. x and\nβ\n = 5.5\nα\nfrom 0 to 9\nPDF for skewed beta distribution vs. x and\nβ\n = 8\nα\nfrom 0 to 10\nThe beta density function can take a wide variety of different shapes depending on the values of the two parameters\nα\nand\nβ\n.  The ability of the beta distribution to take this great diversity of shapes (using only two parameters) is partly responsible for finding wide application for modeling actual measurements:\nthe density function is\nsymmetric\nabout 1\/2 (blue & teal plots).\nmedian = mean = 1\/2.\nskewness  = 0.\nvariance = 1\/(4(2\nα\n+ 1))\nα\n=\nβ\n< 1\nU-shaped (blue plot).\nbimodal: left mode = 0,  right mode =1, anti-mode = 1\/2\n1\/12 < var(\nX\n) < 1\/4\n[\n1\n]\n−2 < excess kurtosis(\nX\n) < −6\/5\nα\n=\nβ\n= 1\/2 is the\narcsine distribution\nvar(\nX\n) = 1\/8\nexcess kurtosis(\nX\n) = −3\/2\nCF = Rinc (t)\n[\n34\n]\nα\n=\nβ\n→ 0 is a 2-point\nBernoulli distribution\nwith equal probability 1\/2 at each\nDirac delta function\nend\nx\n= 0 and\nx\n= 1 and zero probability everywhere else. A coin toss: one face of the coin being\nx\n= 0 and the other face being\nx\n= 1.\nα = β = 1\nthe\nuniform [0, 1] distribution\nno mode\nvar(\nX\n) = 1\/12\nexcess kurtosis(\nX\n) = −6\/5\nThe (negative anywhere else)\ndifferential entropy\nreaches its\nmaximum\nvalue of zero\nCF = Sinc (t)\nα\n=\nβ\n> 1\nsymmetric\nunimodal\nmode = 1\/2.\n0 < var(\nX\n) < 1\/12\n[\n1\n]\n−6\/5 < excess kurtosis(\nX\n) < 0\nα\n=\nβ\n= 3\/2 is a semi-elliptic [0, 1] distribution, see:\nWigner semicircle distribution\n[\n35\n]\nvar(\nX\n) = 1\/16.\nexcess kurtosis(\nX\n) = −1\nCF = 2 Jinc (t)\nα\n=\nβ\n= 2 is the parabolic [0, 1] distribution\nvar(\nX\n) = 1\/20\nexcess kurtosis(\nX\n) = −6\/7\nCF = 3 Tinc (t)\n[\n36\n]\nα\n=\nβ\n> 2 is bell-shaped, with\ninflection points\nlocated to either side of the mode\n0 < var(\nX\n) < 1\/20\n−6\/7 < excess kurtosis(\nX\n) < 0\nα\n=\nβ\n→ ∞ is a 1-point\nDegenerate distribution\nwith a\nDirac delta function\nspike at the midpoint\nx\n= 1\/2 with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the single point\nx\n= 1\/2.\nThe density function is\nskewed\n.  An interchange of parameter values yields the\nmirror image\n(the reverse) of the initial curve, some more specific cases:\nα\n< 1,\nβ\n< 1\nU-shaped\nPositive skew for\nα\n<\nβ\n, negative skew for\nα\n>\nβ\n.\nbimodal: left mode = 0, right mode = 1,  anti-mode =\n0 < median < 1.\n0 < var(\nX\n) < 1\/4\nα\n> 1,\nβ\n> 1\nunimodal\n(magenta & cyan plots),\nPositive skew for\nα\n<\nβ\n, negative skew for\nα\n>\nβ\n.\n0 < median < 1\n0 < var(\nX\n) < 1\/12\nα\n< 1,\nβ\n≥ 1\nα\n≥ 1,\nβ\n< 1\nα\n= 1,\nβ\n> 1\nα > 1, β = 1\nIf\nX\n~ Beta(\nα\n,\nβ\n) then 1 −\nX\n~ Beta(\nβ\n,\nα\n)\nmirror-image\nsymmetry\nIf\nX\n~ Beta(\nα\n,\nβ\n) then\n. The\nbeta prime distribution\n, also called \"beta distribution of the second kind\".\nIf\n, then\nhas a\ngeneralized logistic distribution\n, with density\n, where\nis the\nlogistic sigmoid\n.\nIf\nX\n~ Beta(\nα\n,\nβ\n) then\n.\nIf\nand\nthen\nhas density\nfor\nand\nfor\n, where\nis the\nHypergeometric function\n.\n[\n37\n]\nIf\nX\n~ Beta(\nn\n\/2,\nm\n\/2) then\n(assuming\nn\n> 0 and\nm\n> 0), the\nFisher–Snedecor F distribution\n.\nIf\nthen min +\nX\n(max − min) ~ PERT(min, max,\nm\n,\nλ\n) where\nPERT\ndenotes a\nPERT distribution\nused in\nPERT\nanalysis, and\nm\n=most likely value.\n[\n38\n]\nTraditionally\n[\n39\n]\nλ\n= 4 in PERT analysis.\nIf\nX\n~ Beta(1,\nβ\n) then\nX\n~\nKumaraswamy distribution\nwith parameters (1,\nβ\n)\nIf\nX\n~ Beta(\nα\n, 1) then\nX\n~\nKumaraswamy distribution\nwith parameters (\nα\n, 1)\nIf\nX\n~ Beta(\nα\n, 1) then −ln(\nX\n) ~ Exponential(\nα\n)\nSpecial and limiting cases\n[\nedit\n]\nExample of eight realizations of a random walk in one dimension starting at 0: the probability for the time of the last visit to the origin is distributed as Beta(1\/2, 1\/2)\nBeta(1\/2, 1\/2): The\narcsine distribution\nprobability density was proposed by\nHarold Jeffreys\nto represent uncertainty for a\nBernoulli\nor a\nbinomial distribution\nin\nBayesian inference\n, and is now commonly referred to as\nJeffreys prior\n:\np\n−1\/2\n(1 − \np\n)\n−1\/2\n. This distribution also appears in several\nrandom walk\nfundamental theorems\nBeta(1, 1) ~\nU(0, 1)\nwith density 1 on that interval.\nBeta(n, 1) ~ Maximum of\nn\nindependent rvs. with\nU(0, 1)\n, sometimes called a\na standard power function distribution\nwith density\nn\n \nx\nn\n–1\non that interval.\nBeta(1, n) ~ Minimum of\nn\nindependent rvs. with\nU(0, 1)\nwith density\nn\n(1 − \nx\n)\nn\n−1\non that interval.\nIf\nX\n~ Beta(3\/2, 3\/2) and\nr\n> 0 then 2\nrX\n − \nr\n~\nWigner semicircle distribution\n.\nBeta(1\/2, 1\/2) is equivalent to the\narcsine distribution\n. This distribution is also\nJeffreys prior\nprobability for the\nBernoulli\nand\nbinomial distributions\n.\nthe\nexponential distribution\n.\nthe\ngamma distribution\n.\nFor large\n,\nthe\nnormal distribution\n. More precisely, if\nthen\nconverges in distribution to a normal distribution with mean 0 and variance\nas\nn\nincreases.\nDerived from other distributions\n[\nedit\n]\nCombination with other distributions\n[\nedit\n]\nX\n~ Beta(\nα\n,\nβ\n) and\nY\n~ F(2\nβ\n,2\nα\n) then\nfor all\nx\n> 0.\nCompounding with other distributions\n[\nedit\n]\nIf\np\n~ Beta(α, β) and\nX\n~ Bin(\nk\n,\np\n) then\nX\n~\nbeta-binomial distribution\nIf\np\n~ Beta(α, β) and\nX\n~ NB(\nr\n,\np\n) then\nX\n~\nbeta negative binomial distribution\nStatistical inference\n[\nedit\n]\nParameter estimation\n[\nedit\n]\nTwo unknown parameters\n[\nedit\n]\nTwo unknown parameters (\nof a beta distribution supported in the [0,1] interval) can be estimated, using the method of moments, with the first two moments (sample mean and sample variance) as follows.  Let:\nbe the\nsample mean\nestimate and\nbe the\nsample variance\nestimate.  The\nmethod-of-moments\nestimates of the parameters are\nWhen the distribution is required over a known interval other than [0, 1] with random variable\nX\n, say [\na\n,\nc\n] with random variable\nY\n, then replace\nwith\nand\nwith\nin the above couple of equations for the shape parameters (see the \"Four unknown parameters\" section below),\n[\n41\n]\nwhere:\nFour unknown parameters\n[\nedit\n]\nSolutions for parameter estimates vs. (sample) excess Kurtosis and (sample) squared Skewness Beta distribution\nAll four parameters (\nof a beta distribution supported in the [\na\n,\nc\n] interval, see section\n\"Alternative parametrizations, Four parameters\"\n) can be estimated, using the method of moments developed by\nKarl Pearson\n, by equating sample and population values of the first four central moments (mean, variance, skewness and excess kurtosis).\n[\n1\n]\n[\n42\n]\n[\n43\n]\nThe excess kurtosis was expressed in terms of the square of the skewness, and the sample size ν = α + β, (see previous section\n\"Kurtosis\"\n) as follows:\nOne can use this equation to solve for the sample size ν= α + β in terms of the square of the skewness and the excess kurtosis as follows:\n[\n42\n]\nThis is the ratio (multiplied by a factor of 3) between the previously derived limit boundaries for the beta distribution in a space (as originally done by Karl Pearson\n[\n21\n]\n) defined with coordinates of the square of the skewness in one axis and the excess kurtosis in the other axis (see\n§ Kurtosis bounded by the square of the skewness\n):\nThe case of zero skewness, can be immediately solved because for zero skewness,\nα\n=\nβ\nand hence\nν\n= 2\nα\n= 2\nβ\n, therefore\nα\n=\nβ\n=\nν\n\/2\n(Excess kurtosis is negative for the beta distribution with zero skewness, ranging from -2 to 0, so that\n-and therefore the sample shape parameters- is positive, ranging from zero when the shape parameters approach zero and the excess kurtosis approaches -2, to infinity when the shape parameters approach infinity and the excess kurtosis approaches zero).\nFor non-zero sample skewness one needs to solve a system of two coupled equations. Since the skewness and the excess kurtosis are independent of the parameters\n, the parameters\ncan be uniquely determined from the sample skewness and the sample excess kurtosis, by solving the coupled equations with two known variables (sample skewness and sample excess kurtosis) and two unknowns (the shape parameters):\nresulting in the following solution:\n[\n42\n]\nWhere one should take the solutions as follows:\nfor (negative) sample skewness < 0, and\nfor (positive) sample skewness > 0.\nThe accompanying plot shows these two solutions as surfaces in a space with horizontal axes of (sample excess kurtosis) and (sample squared skewness) and the shape parameters as the vertical axis. The surfaces are constrained by the condition that the sample excess kurtosis must be bounded by the sample squared skewness as stipulated in the above equation.  The two surfaces meet at the right edge defined by zero skewness. Along this right edge, both parameters are equal and the distribution is symmetric U-shaped for α = β < 1, uniform for α = β = 1, upside-down-U-shaped for 1 < α = β < 2 and bell-shaped for α = β > 2.  The surfaces also meet at the front (lower) edge defined by \"the impossible boundary\" line (excess kurtosis + 2 - skewness\n2\n= 0). Along this front (lower) boundary both shape parameters approach zero, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities\nat the left end\nx\n= 0 and\nat the right end\nx\n= 1.  The two surfaces become further apart towards the rear edge.  At this rear edge the surface parameters are quite different from each other.  As remarked, for example, by Bowman and Shenton,\n[\n44\n]\nsampling in the neighborhood of the line (sample excess kurtosis - (3\/2)(sample skewness)\n2\n= 0) (the just-J-shaped portion of the rear edge where blue meets beige), \"is dangerously near to chaos\", because at that line the denominator of the expression above for the estimate ν = α + β becomes zero and hence ν approaches infinity as that line is approached.  Bowman and Shenton\n[\n44\n]\nwrite that \"the higher moment parameters (kurtosis and skewness) are extremely fragile (near that line). However, the mean and standard deviation are fairly reliable.\" Therefore, the problem is for the case of four parameter estimation for very skewed distributions such that the excess kurtosis approaches (3\/2) times the square of the skewness.  This boundary line is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter.  See\n§ Kurtosis bounded by the square of the skewness\nfor a numerical example and further comments about this rear edge boundary line (sample excess kurtosis - (3\/2)(sample skewness)\n2\n= 0).  As remarked by Karl Pearson himself\n[\n45\n]\nthis issue may not be of much practical importance as this trouble arises only for very skewed J-shaped (or mirror-image J-shaped) distributions with very different values of shape parameters that are unlikely to occur much in practice).  The usual skewed-bell-shape distributions that occur in practice do not have this parameter estimation problem.\nThe remaining two parameters\ncan be determined using the sample mean and the sample variance using a variety of equations.\n[\n1\n]\n[\n42\n]\nOne alternative is to calculate the support interval range\nbased on the sample variance and the sample kurtosis.  For this purpose one can solve, in terms of the range\n, the equation expressing the excess kurtosis in terms of the sample variance, and the sample size ν (see\n§ Kurtosis\nand\n§ Alternative parametrizations, four parameters\n):\nto obtain:\nAnother alternative is to calculate the support interval range\nbased on the sample variance and the sample skewness.\n[\n42\n]\nFor this purpose one can solve, in terms of the range\n, the equation expressing the squared skewness in terms of the sample variance, and the sample size ν (see section titled \"Skewness\" and \"Alternative parametrizations, four parameters\"):\nto obtain:\n[\n42\n]\nThe remaining parameter can be determined from the sample mean and the previously obtained parameters:\n:\nand finally,\n.\nIn the above formulas one may take, for example, as estimates of the sample moments:\nThe estimators\nG\n1\nfor\nsample skewness\nand\nG\n2\nfor\nsample kurtosis\nare used by\nDAP\n\/\nSAS\n,\nPSPP\n\/\nSPSS\n, and\nExcel\n.  However, they are not used by\nBMDP\nand (according to\n[\n46\n]\n) they were not used by\nMINITAB\nin 1998. Actually, Joanes and Gill in their 1998 study\n[\n46\n]\nconcluded that the skewness and kurtosis estimators used in\nBMDP\nand in\nMINITAB\n(at that time) had smaller variance and mean-squared error in normal samples, but the skewness and kurtosis estimators used in\nDAP\n\/\nSAS\n,\nPSPP\n\/\nSPSS\n, namely\nG\n1\nand\nG\n2\n, had smaller mean-squared error in samples from a very skewed distribution.  It is for this reason that we have spelled out \"sample skewness\", etc., in the above formulas, to make it explicit that the user should choose the best estimator according to the problem at hand, as the best estimator for skewness and kurtosis depends on the amount of skewness (as shown by Joanes and Gill\n[\n46\n]\n).\nTwo unknown parameters\n[\nedit\n]\nMax (joint log likelihood\/\nN\n) for beta distribution maxima at\nα\n = \nβ\n = 2\nMax (joint log likelihood\/\nN\n) for Beta distribution maxima at\nα\n = \nβ\n ∈ {0.25,0.5,1,2,4,6,8}\nAs is also the case for\nmaximum likelihood\nestimates for the\ngamma distribution\n, the maximum likelihood estimates for the beta distribution do not have a general closed form solution for arbitrary values of the shape parameters. If\nX\n1\n, ...,\nX\nN\nare independent random variables each having a beta distribution, the joint log likelihood function for\nN\niid\nobservations is:\nFinding the maximum with respect to a shape parameter involves taking the\npartial derivative\nwith respect to the shape parameter and setting the expression equal to zero yielding the\nmaximum likelihood\nestimator of the shape parameters:\nwhere:\nsince the\ndigamma function\ndenoted ψ(α) is defined as the\nlogarithmic derivative\nof the\ngamma function\n:\n[\n18\n]\nTo ensure that the values with zero tangent slope are indeed a maximum (instead of a saddle-point or a minimum) one has to also satisfy the condition that the curvature is negative.  This amounts to satisfying that the second partial derivative with respect to the shape parameters is negative\nusing the previous equations, this is equivalent to:\nwhere the\ntrigamma function\n, denoted\nψ\n1\n(\nα\n), is the second of the\npolygamma functions\n, and is defined as the derivative of the\ndigamma\nfunction:\nThese conditions are equivalent to stating that the variances of the logarithmically transformed variables are positive, since:\nTherefore, the condition of negative curvature at a maximum is equivalent to the statements:\nAlternatively, the condition of negative curvature at a maximum is also equivalent to stating that the following\nlogarithmic derivatives\nof the\ngeometric means\nG\nX\nand\nG\n(1−X)\nare positive, since:\nWhile these slopes are indeed positive, the other slopes are negative:\nThe slopes of the mean and the median with respect to\nα\nand\nβ\ndisplay similar sign behavior.\nFrom the condition that at a maximum, the partial derivative with respect to the shape parameter equals zero, we obtain the following system of coupled\nmaximum likelihood estimate\nequations (for the average log-likelihoods) that needs to be inverted to obtain the  (unknown) shape parameter estimates\nin terms of the (known) average of logarithms of the samples\nX\n1\n, ...,\nX\nN\n:\n[\n1\n]\nwhere we recognize\nas the logarithm of the sample\ngeometric mean\nand\nas the logarithm of the sample\ngeometric mean\nbased on (1 − \nX\n), the mirror-image of \nX\n. For\n, it follows that\n.\nThese coupled equations containing\ndigamma functions\nof the shape parameter estimates\nmust be solved by numerical methods as done, for example, by Beckman et al.\n[\n47\n]\nGnanadesikan et al. give numerical solutions for a few cases.\n[\n48\n]\nN.L.Johnson\nand\nS.Kotz\n[\n1\n]\nsuggest that for \"not too small\" shape parameter estimates\n, the logarithmic approximation to the digamma function\nmay be used to obtain initial values for an iterative solution, since the equations resulting from this approximation can be solved exactly:\nwhich leads to the following solution for the initial values (of the estimate shape parameters in terms of the sample geometric means) for an iterative solution:\nAlternatively, the estimates provided by the method of moments can instead be used as initial values for an iterative solution of the maximum likelihood coupled equations in terms of the digamma functions.\nWhen the distribution is required over a known interval other than [0, 1]  with random variable\nX\n, say [\na\n,\nc\n] with random variable\nY\n, then replace ln(\nX\ni\n) in the first equation with\nand replace ln(1−\nX\ni\n) in the second equation with\n(see \"Alternative parametrizations, four parameters\" section below).\nIf one of the shape parameters is known, the problem is considerably simplified.  The following\nlogit\ntransformation can be used to solve for the unknown shape parameter (for skewed cases such that\n, otherwise, if symmetric, both -equal- parameters are known when one is known):\nThis\nlogit\ntransformation is the logarithm of the transformation that divides the variable\nX\nby its mirror-image (\nX\n\/(1 -\nX\n) resulting in the \"inverted beta distribution\"  or\nbeta prime distribution\n(also known as beta distribution of the second kind or\nPearson's Type VI\n) with support [0, +∞). As previously discussed in the section \"Moments of logarithmically transformed random variables,\" the\nlogit\ntransformation\n, studied by Johnson,\n[\n25\n]\nextends the finite support [0, 1] based on the original variable\nX\nto infinite support in both directions of the real line (−∞, +∞).\nIf, for example,\nis known, the unknown parameter\ncan be obtained in terms of the inverse\n[\n49\n]\ndigamma function of the right hand side of this equation:\nIn particular, if one of the shape parameters has a value of unity, for example for\n(the power function distribution with bounded support [0,1]), using the identity ψ(\nx\n+ 1) = ψ(\nx\n) + 1\/\nx\nin the equation\n, the maximum likelihood estimator for the unknown parameter\nis,\n[\n1\n]\nexactly:\nThe beta has support [0, 1], therefore\n, and hence\n, and therefore\nIn conclusion, the maximum likelihood estimates of the shape parameters of a beta distribution are (in general) a complicated function of the sample\ngeometric mean\n, and of the sample\ngeometric mean\nbased on (1−\nX\n)), the mirror-image of\nX\n.  One may ask, if the variance (in addition to the mean) is necessary to estimate two shape parameters with the method of moments, why is the (logarithmic or geometric) variance not necessary to estimate two shape parameters with the maximum likelihood method, for which only the geometric means suffice?  The answer is because the mean does not provide as much information as the geometric mean.  For a beta distribution with equal shape parameters\nα\n = \nβ\n, the mean is exactly 1\/2, regardless of the value of the shape parameters, and therefore regardless of the value of the statistical dispersion (the variance).  On the other hand, the geometric mean of a beta distribution with equal shape parameters\nα\n = \nβ\n, depends on the value of the shape parameters, and therefore it contains more information.  Also, the geometric mean of a beta distribution does not satisfy the symmetry conditions satisfied by the mean, therefore, by employing both the geometric mean based on\nX\nand geometric mean based on (1 − \nX\n), the maximum likelihood method is able to provide best estimates for both parameters\nα\n = \nβ\n, without need of employing the variance.\nOne can express the joint log likelihood per\nN\niid\nobservations in terms of the\nsufficient statistics\n(the sample geometric means) as follows:\nWe can plot the joint log likelihood per\nN\nobservations for fixed values of the sample geometric means to see the behavior of the likelihood function as a function of the shape parameters α and β. In such a plot, the shape parameter estimators\ncorrespond to the maxima of the likelihood function. See the accompanying graph that shows that all the likelihood functions intersect at α = β = 1, which corresponds to the values of the shape parameters that give the maximum entropy (the maximum entropy occurs for shape parameters equal to unity: the uniform distribution).  It is evident from the plot that the likelihood function gives sharp peaks for values of the shape parameter estimators close to zero, but that for values of the shape parameters estimators greater than one, the likelihood function becomes quite flat, with less defined peaks.  Obviously, the maximum likelihood parameter estimation method for the beta distribution becomes less acceptable for larger values of the shape parameter estimators, as the uncertainty in the peak definition increases with the value of the shape parameter estimators.  One can arrive at the same conclusion by noticing that the expression for the curvature of the likelihood function is in terms of the geometric variances\nThese variances (and therefore the curvatures) are much larger for small values of the shape parameter α and β. However, for shape parameter values α, β > 1, the variances (and therefore the curvatures) flatten out.  Equivalently, this result follows from the\nCramér–Rao bound\n, since the\nFisher information\nmatrix components for the beta distribution are these logarithmic variances. The\nCramér–Rao bound\nstates that the\nvariance\nof any\nunbiased\nestimator\nof α is bounded by the\nreciprocal\nof the\nFisher information\n:\nso the variance of the estimators increases with increasing α and β, as the logarithmic variances decrease.\nAlso one can express the joint log likelihood per\nN\niid\nobservations in terms of the\ndigamma function\nexpressions for the logarithms of the sample geometric means as follows:\nthis expression is identical to the negative of the cross-entropy (see section on \"Quantities of information (entropy)\").  Therefore, finding the maximum of the joint log likelihood of the shape parameters, per\nN\niid\nobservations, is identical to finding the minimum of the cross-entropy for the beta distribution, as a function of the shape parameters.\nwith the cross-entropy defined as follows:\nFour unknown parameters\n[\nedit\n]\nThe procedure is similar to the one followed in the two unknown parameter case. If\nY\n1\n, ...,\nY\nN\nare independent random variables each having a beta distribution with four parameters, the joint log likelihood function for\nN\niid\nobservations is:\nFinding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the\nmaximum likelihood\nestimator of the shape parameters:\nthese equations can be re-arranged as the following system of four coupled equations (the first two equations are geometric means and the second two equations are the harmonic means) in terms of the maximum likelihood estimates for the four parameters\n:\nwith sample geometric means:\nThe parameters\nare embedded inside the geometric mean expressions in a nonlinear way (to the power 1\/\nN\n).  This precludes, in general, a closed form solution, even for an initial value approximation for iteration purposes.  One alternative is to use as initial values for iteration the values obtained from the method of moments solution for the four parameter case.  Furthermore, the expressions for the harmonic means are well-defined only for\n, which precludes a maximum likelihood solution for shape parameters less than unity in the four-parameter case. Fisher's information matrix for the four parameter case is\npositive-definite\nonly for α, β > 2 (for further discussion, see section on Fisher information matrix, four parameter case), for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. The following Fisher information components (that represent the expectations of the curvature of the log likelihood function) have\nsingularities\nat the following values:\n(for further discussion see section on Fisher information matrix). Thus, it is not possible to strictly carry on the maximum likelihood estimation for some well known distributions belonging to the four-parameter beta distribution family, like the\nuniform distribution\n(Beta(1, 1,\na\n,\nc\n)), and the\narcsine distribution\n(Beta(1\/2, 1\/2,\na\n,\nc\n)).\nN.L.Johnson\nand\nS.Kotz\n[\n1\n]\nignore the equations for the harmonic means and instead suggest \"If a and c are unknown, and maximum likelihood estimators of\na\n,\nc\n, α and β are required, the above procedure (for the two unknown parameter case, with\nX\ntransformed as\nX\n= (\nY\n − \na\n)\/(\nc\n − \na\n)) can be repeated using a succession of trial values of\na\nand\nc\n, until the pair (\na\n,\nc\n) for which maximum likelihood (given\na\nand\nc\n) is as great as possible, is attained\" (where, for the purpose of clarity, their notation for the parameters has been translated into the present notation).\nFisher information matrix\n[\nedit\n]\nLet a random variable X have a probability density\nf\n(\nx\n;\nα\n). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log\nlikelihood function\nis called the\nscore\n.  The second moment of the score is called the\nFisher information\n:\nThe\nexpectation\nof the\nscore\nis zero, therefore the Fisher information is also the second moment centered on the mean of the score: the\nvariance\nof the score.\nIf the log\nlikelihood function\nis twice differentiable with respect to the parameter α, and under certain regularity conditions,\n[\n50\n]\nthen the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes):\nThus, the Fisher information is the negative of the expectation of the second\nderivative\nwith respect to the parameter α of the log\nlikelihood function\n. Therefore, Fisher information is a measure of the\ncurvature\nof the log likelihood function of α. A low\ncurvature\n(and therefore high\nradius of curvature\n), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large\ncurvature\n(and therefore low\nradius of curvature\n) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters (\"the observed Fisher information matrix\") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms.\n[\n51\n]\nThe word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators.  The\nCramér–Rao bound\nstates that the inverse of the Fisher information is a lower bound on the variance of any\nestimator\nof a parameter α:\nThe precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two\nalternative hypothesis\nof a parameter.\n[\n52\n]\nWhen there are\nN\nparameters\nthen the Fisher information takes the form of an\nN\n×\nN\npositive semidefinite\nsymmetric matrix\n, the Fisher information matrix, with typical element:\nUnder certain regularity conditions,\n[\n50\n]\nthe Fisher Information Matrix may also be written in the following form, which is often more convenient for computation:\nWith\nX\n1\n, ...,\nX\nN\niid\nrandom variables, an\nN\n-dimensional \"box\" can be constructed with sides\nX\n1\n, ...,\nX\nN\n. Costa and Cover\n[\n53\n]\nshow that the (Shannon) differential entropy\nh\n(\nX\n) is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set.\nFor\nX\n1\n, ...,\nX\nN\nindependent random variables each having a beta distribution parametrized with shape parameters\nα\nand\nβ\n, the joint log likelihood function for\nN\niid\nobservations is:\ntherefore the joint log likelihood function per\nN\niid\nobservations is\nFor the two parameter case, the Fisher information has 4 components: 2 diagonal and 2 off-diagonal. Since the Fisher information matrix is symmetric, one of these off diagonal components is independent. Therefore, the Fisher information matrix has 3 independent components (2 diagonal and 1 off diagonal).\nAryal and Nadarajah\n[\n54\n]\ncalculated Fisher's information matrix for the four-parameter case, from which the two parameter case can be obtained as follows:\nSince the Fisher information matrix is symmetric\nThe Fisher information components are equal to the log geometric variances and log geometric covariance. Therefore, they can be expressed as\ntrigamma functions\n, denoted ψ\n1\n(α),  the second of the\npolygamma functions\n, defined as the derivative of the\ndigamma\nfunction:\nThese derivatives are also derived in the\n§ Two unknown parameters\nand plots of the log likelihood function are also shown in that section.\n§ Geometric variance and covariance\ncontains plots and further discussion of the Fisher information matrix components: the log geometric variances and log geometric covariance as a function of the shape parameters α and β.\n§ Moments of logarithmically transformed random variables\ncontains formulas for moments of logarithmically transformed random variables. Images for the Fisher information components\nand\nare shown in\n§ Geometric variance\n.\nThe determinant of Fisher's information matrix is of interest (for example for the calculation of\nJeffreys prior\nprobability).  From the expressions for the individual components of the Fisher information matrix, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution is:\nFrom\nSylvester's criterion\n(checking whether the diagonal elements are all positive), it follows that the Fisher information matrix for the two parameter case is\npositive-definite\n(under the standard condition that the shape parameters are positive\nα\n > 0 and \nβ\n > 0).\nFisher Information\nI\n(\na\n,\na\n) for\nα\n = \nβ\nvs range (\nc\n − \na\n) and exponent \nα\n = \nβ\nFisher Information\nI\n(\nα\n,\na\n) for\nα\n = \nβ\n, vs. range (\nc\n − \na\n) and exponent\nα\n = \nβ\nIf\nY\n1\n, ...,\nY\nN\nare independent random variables each having a beta distribution with four parameters: the exponents\nα\nand\nβ\n, and also\na\n(the minimum of the distribution range), and\nc\n(the maximum of the distribution range) (section titled \"Alternative parametrizations\", \"Four parameters\"), with\nprobability density function\n:\nthe joint log likelihood function per\nN\niid\nobservations is:\nFor the four parameter case, the Fisher information has 4*4=16 components.  It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12\/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components.  Aryal and Nadarajah\n[\n54\n]\ncalculated Fisher's information matrix for the four parameter case as follows:\nIn the above expressions, the use of\nX\ninstead of\nY\nin the expressions var[ln(\nX\n)] = ln(var\nGX\n) is\nnot an error\n. The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter\nX\n~ Beta(\nα\n,\nβ\n) parametrization because when taking the partial derivatives with respect to the exponents (\nα\n,\nβ\n) in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum\na\nand maximum\nc\nof the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents\nα\nand\nβ\nis the second derivative of the log of the beta function: ln(B(\nα\n,\nβ\n)). This term is independent of the minimum\na\nand maximum\nc\nof the distribution's range. Double differentiation of this term results in trigamma functions.  The sections titled \"Maximum likelihood\", \"Two unknown parameters\" and \"Four unknown parameters\" also show this fact.\nThe Fisher information for\nN\ni.i.d.\nsamples is\nN\ntimes the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas\n[\n28\n]\n).  (Aryal and Nadarajah\n[\n54\n]\ntake a single observation,\nN\n= 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per\nN\nobservations. Moreover, below the erroneous expression for\nin Aryal and Nadarajah has been corrected.)\nThe lower two diagonal entries of the Fisher information matrix, with respect to the parameter\na\n(the minimum of the distribution's range):\n, and with respect to the parameter\nc\n(the maximum of the distribution's range):\nare only defined for exponents\nα\n> 2 and\nβ\n> 2 respectively. The Fisher information matrix component\nfor the minimum\na\napproaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component\nfor the maximum\nc\napproaches infinity for exponent\nβ\napproaching 2 from above.\nThe Fisher information matrix for the four parameter case does not depend on the individual values of the minimum\na\nand the maximum\nc\n, but only on the total range (\nc\n − \na\n).  Moreover, the components of the Fisher information matrix that depend on the range (\nc\n − \na\n), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range (\nc\n − \na\n).\nThe accompanying images show the Fisher information components\nand\n. Images for the Fisher information components\nand\nare shown in\n§ Geometric variance\n.  All these Fisher information components look like a basin, with the \"walls\" of the basin being located at low values of the parameters.\nThe following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter:\nX\n~ Beta(α, β) expectations of the transformed ratio ((1 − \nX\n)\/\nX\n) and of its mirror image (\nX\n\/(1 − \nX\n)), scaled by the range (\nc\n − \na\n), which may be helpful for interpretation:\nThese are also the expected values of the \"inverted beta distribution\" or\nbeta prime distribution\n(also known as beta distribution of the second kind or\nPearson's Type VI\n)\n[\n1\n]\nand its mirror image, scaled by the range (\nc\n − \na\n).\nAlso, the following Fisher information components can be expressed in terms of the harmonic (1\/X) variances or of variances based on the ratio transformed variables ((1-X)\/X) as follows:\nSee section \"Moments of linearly transformed, product and inverted random variables\" for these expectations.\nThe determinant of Fisher's information matrix is of interest (for example for the calculation of\nJeffreys prior\nprobability).  From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is:\nUsing\nSylvester's criterion\n(checking whether the diagonal elements are all positive), and since diagonal components\nand\nhave\nsingularities\nat α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is\npositive-definite\nfor α>2 and β>2.  Since for α > 2 and β > 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the\nuniform distribution\n(Beta(1,1,a,c)) have Fisher information components (\n) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case).  The four-parameter\nWigner semicircle distribution\n(Beta(3\/2,3\/2,\na\n,\nc\n)) and\narcsine distribution\n(Beta(1\/2,1\/2,\na\n,\nc\n)) have negative Fisher information determinants for the four-parameter case.\n: The\nuniform distribution\nprobability density was proposed by\nThomas Bayes\nto represent ignorance of prior probabilities in\nBayesian inference\n.\nThe use of Beta distributions in\nBayesian inference\nis due to the fact that they provide a family of\nconjugate prior probability distributions\nfor\nbinomial\n(including\nBernoulli\n) and\ngeometric distributions\n.  The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value\np\n:\n[\n24\n]\nExamples of beta distributions used as prior probabilities to represent ignorance of prior parameter values in Bayesian inference are Beta(1,1), Beta(0,0) and Beta(1\/2,1\/2).\nA classic application of the beta distribution is the\nrule of succession\n, introduced in the 18th century by\nPierre-Simon Laplace\n[\n55\n]\nin the course of treating the\nsunrise problem\n.  It states that, given\ns\nsuccesses in\nn\nconditionally independent\nBernoulli trials\nwith probability\np,\nthat the estimate of the expected value in the next trial is\n.  This estimate is the expected value of the posterior distribution over\np,\nnamely Beta(\ns\n+1,\nn\n−\ns\n+1), which is given by\nBayes' rule\nif one assumes a uniform prior probability over\np\n(i.e., Beta(1, 1)) and then observes that\np\ngenerated\ns\nsuccesses in\nn\ntrials.  Laplace's rule of succession has been criticized by prominent scientists.  R. T. Cox described Laplace's application of the rule of succession to the\nsunrise problem\n(\n[\n56\n]\np. 89) as \"a travesty of the proper use of the principle\".  Keynes remarks  (\n[\n57\n]\nCh.XXX, p. 382)  \"indeed this is so foolish a theorem that to entertain it is discreditable\".  Karl Pearson\n[\n58\n]\nshowed that the probability that the next (\nn\n + 1) trials will be successes, after n successes in n trials, is only 50%, which has been considered too low by scientists like Jeffreys and unacceptable as a representation of the scientific process of experimentation to test a proposed scientific law.  As pointed out by Jeffreys (\n[\n59\n]\np. 128) (crediting\nC. D. Broad\n[\n60\n]\n) Laplace's rule of succession establishes a high probability of success ((n+1)\/(n+2)) in the next trial, but only a moderate probability (50%) that a further sample (\nn\n+1) comparable in size will be equally successful.  As pointed out by Perks,\n[\n61\n]\n\"The rule of succession itself is hard to accept. It assigns a probability to the next trial which implies the assumption that the actual run observed is an average run and that we are always at the end of an average run. It would, one would think, be more reasonable to assume that we were in the middle of an average run. Clearly a higher value for both probabilities is necessary if they are to accord with reasonable belief.\" These problems with Laplace's rule of succession motivated Haldane, Perks, Jeffreys and others to search for other forms of prior probability (see the next\n§ Bayesian inference\n).  According to Jaynes,\n[\n52\n]\nthe main problem with the rule of succession is that it is not valid when s=0 or s=n (see\nrule of succession\n, for an analysis of its validity).\nBayes–Laplace prior probability (Beta(1,1))\n[\nedit\n]\nThe beta distribution achieves maximum differential entropy for Beta(1,1): the\nuniform\nprobability density, for which all values in the domain of the distribution have equal density.  This uniform distribution Beta(1,1) was suggested (\"with a great deal of doubt\") by\nThomas Bayes\n[\n62\n]\nas the prior probability distribution to express ignorance about the correct prior distribution. This prior distribution was adopted (apparently, from his writings, with little sign of doubt\n[\n55\n]\n) by\nPierre-Simon Laplace\n, and hence it was also known as the \"Bayes–Laplace rule\" or the \"Laplace rule\" of \"\ninverse probability\n\" in publications of the first half of the 20th century. In the later part of the 19th century and early part of the 20th century, scientists realized that the assumption of uniform \"equal\" probability density depended on the actual functions (for example whether a linear or a logarithmic scale was most appropriate) and parametrizations used.  In particular, the behavior near the ends of distributions with finite support (for example near\nx\n= 0, for a distribution with initial support at\nx\n= 0) required particular attention. Keynes (\n[\n57\n]\nCh.XXX, p. 381) criticized the use of Bayes's uniform prior probability (Beta(1,1)) that all values between zero and one are equiprobable, as follows: \"Thus experience, if it shows anything, shows that there is a very marked clustering of statistical ratios in the neighborhoods of zero and unity, of those for positive theories and for correlations between positive qualities in the neighborhood of zero, and of those for negative theories and for correlations between negative qualities in the neighborhood of unity. \"\nHaldane's prior probability (Beta(0,0))\n[\nedit\n]\n: The Haldane prior probability expressing total ignorance about prior information, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure. As α, β → 0, the beta distribution approaches a two-point\nBernoulli distribution\nwith all probability density concentrated at each end, at 0 and 1, and nothing in between. A coin-toss: one face of the coin being at 0 and the other face being at 1.\nThe Beta(0,0) distribution was proposed by\nJ.B.S. Haldane\n,\n[\n63\n]\nwho suggested that the prior probability representing complete uncertainty should be proportional to\np\n−1\n(1−\np\n)\n−1\n. The function\np\n−1\n(1−\np\n)\n−1\ncan be viewed as the limit of the numerator of the beta distribution as both shape parameters approach zero: α, β → 0. The Beta function (in the denominator of the beta distribution) approaches infinity, for both parameters approaching zero, α, β → 0. Therefore,\np\n−1\n(1−\np\n)\n−1\ndivided by the Beta function approaches a 2-point\nBernoulli distribution\nwith equal probability 1\/2 at each end, at 0 and 1, and nothing in between, as α, β → 0. A coin-toss: one face of the coin being at 0 and the other face being at 1.  The Haldane prior probability distribution Beta(0,0) is an \"\nimproper prior\n\" because its integration (from 0 to 1) fails to strictly converge to 1 due to the singularities at each end. However, this is not an issue for computing posterior probabilities unless the sample size is very small.  Furthermore, Zellner\n[\n64\n]\npoints out that on the\nlog-odds\nscale, (the\nlogit\ntransformation\n), the Haldane prior is the uniformly flat prior. The fact that a uniform prior probability on the\nlogit\ntransformed variable ln(\np\n\/1 − \np\n) (with domain (−∞, ∞)) is equivalent to the Haldane prior on the domain [0, 1] was pointed out by\nHarold Jeffreys\nin the first edition (1939) of his book Theory of Probability (\n[\n59\n]\np. 123).  Jeffreys writes \"Certainly if we take the Bayes–Laplace rule right up to the extremes we are led to results that do not correspond to anybody's way of thinking. The (Haldane) rule d\nx\n\/(\nx\n(1 − \nx\n)) goes too far the other way.  It would lead to the conclusion that if a sample is of one type with respect to some property there is a probability 1 that the whole population is of that type.\"  The fact that \"uniform\" depends on the parametrization, led Jeffreys to seek a form of prior that would be invariant under different parametrizations.\nJeffreys' prior probability (Beta(1\/2,1\/2) for a Bernoulli or for a binomial distribution)\n[\nedit\n]\nJeffreys prior\nprobability for the beta distribution: the square root of the determinant of\nFisher's information\nmatrix:\nis a function of the\ntrigamma function\nψ\n1\nof shape parameters α, β\nPosterior Beta densities with samples having success = \"s\", failure = \"f\" of\ns\n\/(\ns\n+\nf\n) = 1\/2, and\ns\n+\nf\n= {3,10,50}, based on 3 different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near\np\n = 1\/2). Significant differences appear for very small sample sizes (the flatter distribution for sample size of 3)\nPosterior Beta densities with samples having success = \"s\", failure = \"f\" of\ns\n\/(\ns\n+\nf\n) = 1\/4, and\ns\n+\nf\n∈ {3,10,50}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near\np\n= 1\/4).  Significant differences appear for very small sample sizes (the very skewed distribution for the degenerate case of sample size = 3, in this degenerate and unlikely case the Haldane prior results in a reverse \"J\" shape with mode at\np\n = 0 instead of\np\n = 1\/4.  If there is sufficient\nsampling data\n, the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1\/2,1\/2)) and Haldane (Beta(0,0)) should yield similar\nposterior\nprobability\ndensities.\nPosterior Beta densities with samples having success =\ns\n, failure =\nf\nof\ns\n\/(\ns\n+\nf\n) = 1\/4, and\ns\n+\nf\n∈ {4,12,40}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 40 (with more pronounced peak near\np\n = 1\/4). Significant differences appear for very small sample sizes\nHarold Jeffreys\n[\n59\n]\n[\n65\n]\nproposed to use an\nuninformative prior\nprobability measure that should be\ninvariant under reparameterization\n: proportional to the square root of the\ndeterminant\nof\nFisher's information\nmatrix.  For the\nBernoulli distribution\n, this can be shown as follows: for a coin that is \"heads\" with probability\np\n∈ [0, 1] and is \"tails\" with probability 1 −\np\n, for a given (H,T) ∈ {(0,1), (1,0)} the probability is\np\nH\n(1 −\np\n)\nT\n.  Since\nT\n= 1 −\nH\n, the\nBernoulli distribution\nis\np\nH\n(1 −\np\n)\n1 −\nH\n. Considering\np\nas the only parameter, it follows that the log likelihood for the Bernoulli distribution is\nThe Fisher information matrix has only one component (it is a scalar, because there is only one parameter:\np\n), therefore:\nSimilarly, for the\nBinomial distribution\nwith\nn\nBernoulli trials\n, it can be shown that\nThus, for the\nBernoulli\n, and\nBinomial distributions\n,\nJeffreys prior\nis proportional to\n, which happens to be proportional to a beta distribution with domain variable\nx\n=\np\n, and shape parameters α = β = 1\/2, the\narcsine distribution\n:\nIt will be shown in the next section that the normalizing constant for Jeffreys prior is immaterial to the final result because the normalizing constant cancels out in Bayes' theorem for the posterior probability.  Hence Beta(1\/2,1\/2) is used as the Jeffreys prior for both Bernoulli and binomial distributions. As shown in the next section, when using this expression as a prior probability times the likelihood in\nBayes' theorem\n, the posterior probability turns out to be a beta distribution. It is important to realize, however, that Jeffreys prior is proportional to\nfor the Bernoulli and binomial distribution, but not for the beta distribution.  Jeffreys prior for the beta distribution is given by the determinant of Fisher's information for the beta distribution, which, as shown in the\n§ Fisher information matrix\nis a function of the\ntrigamma function\nψ\n1\nof shape parameters α and β as follows:\nAs previously discussed, Jeffreys prior for the Bernoulli and binomial distributions is proportional to the\narcsine distribution\nBeta(1\/2,1\/2), a one-dimensional\ncurve\nthat looks like a basin as a function of the parameter\np\nof the Bernoulli and binomial distributions. The walls of the basin are formed by\np\napproaching the singularities at the ends\np\n→ 0 and\np\n→ 1, where Beta(1\/2,1\/2) approaches infinity. Jeffreys prior for the beta distribution is a\n2-dimensional surface\n(embedded in a three-dimensional space) that looks like a basin with only two of its walls meeting at the corner α = β = 0 (and missing the other two walls) as a function of the shape parameters α and β of the beta distribution. The two adjoining walls of this 2-dimensional surface are formed by the shape parameters α and β approaching the singularities (of the trigamma function) at α, β → 0. It has no walls for α, β → ∞ because in this case the determinant of Fisher's information matrix for the beta distribution approaches zero.\nIt will be shown in the next section that Jeffreys prior probability results in posterior probabilities (when multiplied by the binomial likelihood function) that are intermediate between the posterior probability results of the Haldane and Bayes prior probabilities.\nJeffreys prior may be difficult to obtain analytically, and for some cases it just doesn't exist (even for simple distribution functions like the asymmetric\ntriangular distribution\n). Berger, Bernardo and Sun, in a 2009 paper\n[\n66\n]\ndefined a reference prior probability distribution that (unlike Jeffreys prior) exists for the asymmetric\ntriangular distribution\n. They cannot obtain a closed-form expression for their reference prior, but numerical calculations show it to be nearly perfectly fitted by the (proper) prior\nwhere θ is the vertex variable for the asymmetric triangular distribution with support [0, 1] (corresponding to the following parameter values in Wikipedia's article on the\ntriangular distribution\n: vertex\nc\n=\nθ\n, left end\na\n= 0, and right end\nb\n= 1). Berger et al. also give a heuristic argument that Beta(1\/2,1\/2) could indeed be the exact Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution. Therefore, Beta(1\/2,1\/2) not only is Jeffreys prior for the Bernoulli and binomial distributions, but also seems to be the Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution (for which the Jeffreys prior does not exist), a distribution used in project management and\nPERT\nanalysis to describe the cost and duration of project tasks.\nClarke and Barron\n[\n67\n]\nprove that, among continuous positive priors, Jeffreys prior (when it exists) asymptotically maximizes Shannon's\nmutual information\nbetween a sample of size n and the parameter, and therefore\nJeffreys prior is the most uninformative prior\n(measuring information as Shannon information). The proof rests on an examination of the\nKullback–Leibler divergence\nbetween probability density functions for\niid\nrandom variables.\nEffect of different prior probability choices on the posterior beta distribution\n[\nedit\n]\nIf samples are drawn from the population of a random variable\nX\nthat result in\ns\nsuccesses and\nf\nfailures in\nn\nBernoulli trials\nn\n = \ns\n + \nf\n, then the\nlikelihood function\nfor parameters\ns\nand\nf\ngiven\nx\n = \np\n(the notation\nx\n = \np\nin the expressions below will emphasize that the domain\nx\nstands for the value of the parameter\np\nin the binomial distribution), is the following\nbinomial distribution\n:\nIf beliefs about\nprior probability\ninformation are reasonably well approximated by a beta distribution with parameters\nα\n Prior and\nβ\n Prior, then:\nAccording to\nBayes' theorem\nfor a continuous event space, the\nposterior probability\ndensity is given by the product of the\nprior probability\nand the likelihood function (given the evidence\ns\nand\nf\n = \nn\n − \ns\n), normalized so that the area under the curve equals one, as follows:\nThe\nbinomial coefficient\nappears both in the numerator and the denominator of the posterior probability, and it does not depend on the integration variable\nx\n, hence it cancels out, and it is irrelevant to the final result.  Similarly the normalizing factor for the prior probability, the beta function B(αPrior,βPrior) cancels out and it is immaterial to the final result. The same posterior probability result can be obtained if one uses an un-normalized prior\nbecause the normalizing factors all cancel out. Several authors (including Jeffreys himself) thus use an un-normalized prior formula since the normalization constant cancels out.  The numerator of the posterior probability ends up being just the (un-normalized) product of the prior probability and the likelihood function, and the denominator is its integral from zero to one. The beta function in the denominator, B(\ns\n + \nα\n Prior, \nn\n − \ns\n + \nβ\n Prior), appears as a normalization constant to ensure that the total posterior probability integrates to unity.\nThe ratio\ns\n\/\nn\nof the number of successes to the total number of trials is a\nsufficient statistic\nin the binomial case, which is relevant for the following results.\nFor the\nBayes'\nprior probability (Beta(1,1)), the posterior probability is:\nFor the\nJeffreys'\nprior probability (Beta(1\/2,1\/2)), the posterior probability is:\nand for the\nHaldane\nprior probability (Beta(0,0)), the posterior probability is:\nFrom the above expressions it follows that for\ns\n\/\nn\n = 1\/2) all the above three prior probabilities result in the identical location for the posterior probability mean = mode = 1\/2.  For\ns\n\/\nn\n < 1\/2, the mean of the posterior probabilities, using the following priors, are such that: mean for Bayes prior > mean for Jeffreys prior > mean for Haldane prior. For\ns\n\/\nn\n > 1\/2 the order of these inequalities is reversed such that the Haldane prior probability results in the largest posterior mean. The\nHaldane\nprior probability Beta(0,0) results in a posterior probability density with\nmean\n(the expected value for the probability of success in the \"next\" trial) identical to the ratio\ns\n\/\nn\nof the number of successes to the total number of trials. Therefore, the Haldane prior results in a posterior probability with expected value in the next trial equal to the maximum likelihood. The\nBayes\nprior probability Beta(1,1) results in a posterior probability density with\nmode\nidentical to the ratio\ns\n\/\nn\n(the maximum likelihood).\nIn the case that 100% of the trials have been successful\ns\n = \nn\n, the\nBayes\nprior probability Beta(1,1) results in a posterior expected value equal to the rule of succession (\nn\n + 1)\/(\nn\n + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of 1 (absolute certainty of success in the next trial).  Jeffreys prior probability results in a posterior expected value equal to (\nn\n + 1\/2)\/(\nn\n + 1). Perks\n[\n61\n]\n(p. 303) points out: \"This provides a new rule of succession and expresses a 'reasonable' position to take up, namely, that after an unbroken run of n successes we assume a probability for the next trial equivalent to the assumption that we are about half-way through an average run, i.e. that we expect a failure once in (2\nn\n + 2) trials. The Bayes–Laplace rule implies that we are about at the end of an average run or that we expect a failure once in (\nn\n + 2) trials. The comparison clearly favours the new result (what is now called Jeffreys prior) from the point of view of 'reasonableness'.\"\nConversely, in the case that 100% of the trials have resulted in failure (\ns\n = 0), the\nBayes\nprior probability Beta(1,1) results in a posterior expected value for success in the next trial equal to 1\/(\nn\n + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of success in the next trial of 0 (absolute certainty of failure in the next trial). Jeffreys prior probability results in a posterior expected value for success in the next trial equal to (1\/2)\/(\nn\n + 1), which Perks\n[\n61\n]\n(p. 303) points out: \"is a much more reasonably remote result than the Bayes–Laplace result 1\/(\nn\n + 2)\".\nJaynes\n[\n52\n]\nquestions (for the Haldane prior Beta(0,0)) the use of these formulas for the cases\ns\n = 0 or\ns\n = \nn\nbecause the integrals do not converge (Beta(0,0) is an improper prior for\ns\n = 0 or\ns\n = \nn\n). In practice, the conditions 0<s<n necessary for a mode to exist between both ends for the Bayes prior are usually met, and therefore the Bayes prior (as long as 0 < \ns\n < \nn\n) results in a posterior mode located between both ends of the domain.\nAs remarked in the section on the rule of succession, K. Pearson showed that after\nn\nsuccesses in\nn\ntrials the posterior probability (based on the Bayes Beta(1,1) distribution as the prior probability) that the next (\nn\n + 1) trials will all be successes is exactly 1\/2, whatever the value of \nn\n. Based on the Haldane Beta(0,0) distribution as the prior probability, this posterior probability is 1 (absolute certainty that after n successes in\nn\ntrials the next (\nn\n + 1) trials will all be successes). Perks\n[\n61\n]\n(p. 303) shows that, for what is now known as the Jeffreys prior, this probability is ((\nn\n + 1\/2)\/(\nn\n + 1))((\nn\n + 3\/2)\/(\nn\n + 2))...(2\nn\n + 1\/2)\/(2\nn\n + 1), which for\nn\n = 1, 2, 3 gives 15\/24, 315\/480, 9009\/13440; rapidly approaching a limiting value of\nas n tends to infinity.  Perks remarks that what is now known as the Jeffreys prior: \"is clearly more 'reasonable' than either the Bayes–Laplace result or the result on the (Haldane) alternative rule rejected by Jeffreys which gives certainty as the probability. It clearly provides a very much better correspondence with the process of induction. Whether it is 'absolutely' reasonable for the purpose, i.e. whether it is yet large enough, without the absurdity of reaching unity, is a matter for others to decide. But it must be realized that the result depends on the assumption of complete indifference and absence of knowledge prior to the sampling experiment.\"\nFollowing are the variances of the posterior distribution obtained with these three prior probability distributions:\nfor the\nBayes'\nprior probability (Beta(1,1)), the posterior variance is:\nfor the\nJeffreys'\nprior probability (Beta(1\/2,1\/2)), the posterior variance is:\nand for the\nHaldane\nprior probability (Beta(0,0)), the posterior variance is:\nSo, as remarked by Silvey,\n[\n50\n]\nfor large\nn\n, the variance is small and hence the posterior distribution is highly concentrated, whereas the assumed prior distribution was very diffuse.  This is in accord with what one would hope for, as vague prior knowledge is transformed (through Bayes' theorem) into a more precise posterior knowledge by an informative experiment.  For small\nn\nthe Haldane Beta(0,0) prior results in the largest posterior variance while the Bayes Beta(1,1) prior results in the more concentrated posterior.  Jeffreys prior Beta(1\/2,1\/2) results in a posterior variance in between the other two.  As\nn\nincreases, the variance rapidly decreases so that the posterior variance for all three priors converges to approximately the same value (approaching zero variance as\nn\n→ ∞). Recalling the previous result that the\nHaldane\nprior probability Beta(0,0) results in a posterior probability density with\nmean\n(the expected value for the probability of success in the \"next\" trial) identical to the ratio s\/n of the number of successes to the total number of trials, it follows from the above expression that also the\nHaldane\nprior Beta(0,0) results in a posterior with\nvariance\nidentical to the variance expressed in terms of the max. likelihood estimate s\/n and sample size (in\n§ Variance\n):\nwith the mean\nμ\n = \ns\n\/\nn\nand the sample size \nν\n = \nn\n.\nIn Bayesian inference, using a\nprior distribution\nBeta(\nα\nPrior,\nβ\nPrior) prior to a binomial distribution is equivalent to adding (\nα\nPrior − 1) pseudo-observations of \"success\" and (\nβ\nPrior − 1) pseudo-observations of \"failure\" to the actual number of successes and failures observed, then estimating the parameter\np\nof the binomial distribution by the proportion of successes over both real- and pseudo-observations.  A uniform prior Beta(1,1) does not add (or subtract) any pseudo-observations since for Beta(1,1) it follows that (\nα\nPrior − 1) = 0 and (\nβ\nPrior − 1) = 0. The Haldane prior Beta(0,0) subtracts one pseudo observation from each and Jeffreys prior Beta(1\/2,1\/2) subtracts 1\/2 pseudo-observation of success and an equal number of failure. This subtraction has the effect of\nsmoothing\nout the posterior distribution.  If the proportion of successes is not 50% (\ns\n\/\nn\n ≠ 1\/2) values of\nα\nPrior and\nβ\nPrior less than 1 (and therefore negative (\nα\nPrior − 1) and (\nβ\nPrior − 1)) favor sparsity, i.e. distributions where the parameter\np\nis closer to either 0 or 1.  In effect, values of\nα\nPrior and\nβ\nPrior between 0 and 1, when operating together, function as a\nconcentration parameter\n.\nThe accompanying plots show the posterior probability density functions for sample sizes\nn\n ∈ {3,10,50}, successes\ns\n ∈ {\nn\n\/2,\nn\n\/4} and Beta(\nα\nPrior,\nβ\nPrior) ∈ {Beta(0,0),Beta(1\/2,1\/2),Beta(1,1)}. Also shown are the cases for\nn\n = {4,12,40}, success\ns\n = {\nn\n\/4} and Beta(\nα\nPrior,\nβ\nPrior) ∈ {Beta(0,0),Beta(1\/2,1\/2),Beta(1,1)}. The first plot shows the symmetric cases, for successes\ns\n ∈ {n\/2}, with mean = mode = 1\/2 and the second plot shows the skewed cases\ns\n ∈ {\nn\n\/4}.  The images show that there is little difference between the priors for the posterior with sample size of 50 (characterized by a more pronounced peak near\np\n = 1\/2). Significant differences appear for very small sample sizes (in particular for the flatter distribution for the degenerate case of sample size = 3). Therefore, the skewed cases, with successes\ns\n = {\nn\n\/4}, show a larger effect from the choice of prior, at small sample size, than the symmetric cases.  For symmetric distributions, the Bayes prior Beta(1,1) results in the most \"peaky\" and highest posterior distributions and the Haldane prior Beta(0,0) results in the flattest and lowest peak distribution.  The Jeffreys prior Beta(1\/2,1\/2) lies in between them.  For nearly symmetric, not too skewed distributions the effect of the priors is similar.  For very small sample size (in this case for a sample size of 3) and skewed distribution (in this example for\ns\n ∈ {\nn\n\/4}) the Haldane prior can result in a reverse-J-shaped distribution with a singularity at the left end.  However, this happens only in degenerate cases (in this example\nn\n = 3 and hence\ns\n = 3\/4 < 1, a degenerate value because s should be greater than unity in order for the posterior of the Haldane prior to have a mode located between the ends, and because\ns\n = 3\/4 is not an integer number, hence it violates the initial assumption of a binomial distribution for the likelihood) and it is not an issue in generic cases of reasonable sample size (such that the condition 1 < \ns\n < \nn\n − 1, necessary for a mode to exist between both ends, is fulfilled).\nIn Chapter 12 (p. 385) of his book, Jaynes\n[\n52\n]\nasserts that the\nHaldane prior\nBeta(0,0) describes a\nprior state of knowledge of complete ignorance\n, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure, while the\nBayes (uniform) prior Beta(1,1) applies if\none knows that\nboth binary outcomes are possible\n. Jaynes states: \"\ninterpret the Bayes–Laplace (Beta(1,1)) prior as describing not a state of complete ignorance\n, but the state of knowledge in which we have observed one success and one failure...once we have seen at least one success and one failure, then we know that the experiment is a true binary one, in the sense of physical possibility.\" Jaynes\n[\n52\n]\ndoes not specifically discuss Jeffreys prior Beta(1\/2,1\/2) (Jaynes discussion of \"Jeffreys prior\" on pp. 181, 423 and on chapter 12 of Jaynes book\n[\n52\n]\nrefers instead to the improper, un-normalized, prior \"1\/\np\n \ndp\n\" introduced by Jeffreys in the 1939 edition of his book,\n[\n59\n]\nseven years before he introduced what is now known as Jeffreys' invariant prior: the square root of the determinant of Fisher's information matrix.\n\"1\/p\" is Jeffreys' (1946) invariant prior for the\nexponential distribution\n, not for the Bernoulli or binomial distributions\n). However, it follows from the above discussion that Jeffreys Beta(1\/2,1\/2) prior represents a state of knowledge in between the Haldane Beta(0,0) and Bayes Beta (1,1) prior.\nSimilarly,\nKarl Pearson\nin his 1892 book\nThe Grammar of Science\n[\n68\n]\n[\n69\n]\n(p. 144 of 1900 edition)  maintained that the Bayes (Beta(1,1) uniform prior was not a complete ignorance prior, and that it should be used when prior information justified to \"distribute our ignorance equally\"\".  K. Pearson wrote: \"Yet the only supposition that we appear to have made is this: that, knowing nothing of nature, routine and anomy (from the Greek ανομία, namely: a- \"without\", and nomos \"law\") are to be considered as equally likely to occur.  Now we were not really justified in making even this assumption, for it involves a knowledge that we do not possess regarding nature.  We use our\nexperience\nof the constitution and action of coins in general to assert that heads and tails are equally probable, but we have no right to assert before experience that, as we know nothing of nature, routine and breach are equally probable. In our ignorance we ought to consider before experience that nature may consist of all routines, all anomies (normlessness), or a mixture of the two in any proportion whatever, and that all such are equally probable. Which of these constitutions after experience is the most probable must clearly depend on what that experience has been like.\"\nIf there is sufficient\nsampling data\n,\nand the posterior probability mode is not located at one of the extremes of the domain\n(\nx\n = 0 or\nx\n = 1), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1\/2,1\/2)) and Haldane (Beta(0,0)) should yield similar\nposterior\nprobability\ndensities.  Otherwise, as Gelman et al.\n[\n70\n]\n(p. 65) point out, \"if so few data are available that the choice of noninformative prior distribution makes a difference, one should put relevant information into the prior distribution\", or as Berger\n[\n4\n]\n(p. 125) points out \"when different reasonable priors yield substantially different answers, can it be right to state that there\nis\na single answer? Would it not be better to admit that there is scientific uncertainty, with the conclusion depending on prior beliefs?.\"\nOccurrence and applications\n[\nedit\n]\nThe beta distribution has an important application in the theory of\norder statistics\n. A basic result is that the distribution of the\nk\nth smallest of a sample of size\nn\nfrom a continuous\nuniform distribution\nhas a beta distribution.\n[\n40\n]\nThis result is summarized as\nFrom this, and application of the theory related to the\nprobability integral transform\n, the distribution of any individual order statistic from any\ncontinuous distribution\ncan be derived.\n[\n40\n]\nIn standard logic, propositions are considered to be either true or false. In contradistinction,\nsubjective logic\nassumes that humans cannot determine with absolute certainty whether a proposition about the real world is absolutely true or false. In\nsubjective logic\nthe\nposteriori\nprobability estimates of binary events can be represented by beta distributions.\n[\n71\n]\nA\nwavelet\nis a wave-like\noscillation\nwith an\namplitude\nthat starts out at zero, increases, and then decreases back to zero. It can typically be visualized as a \"brief oscillation\" that promptly decays. Wavelets can be used to extract information from many different kinds of data, including – but certainly not limited to – audio signals and images. Thus, wavelets are purposefully crafted to have specific properties that make them useful for\nsignal processing\n. Wavelets are localized in both time and\nfrequency\nwhereas the standard\nFourier transform\nis only localized in frequency. Therefore, standard Fourier Transforms are only applicable to\nstationary processes\n, while\nwavelets\nare applicable to non-\nstationary processes\n.  Continuous wavelets can be constructed based on the beta distribution.\nBeta wavelets\n[\n72\n]\ncan be viewed as a soft variety of\nHaar wavelets\nwhose shape is fine-tuned by two shape parameters α and β.\nPopulation genetics\n[\nedit\n]\nThe\nBalding–Nichols model\nis a two-parameter\nparametrization\nof the beta distribution used in\npopulation genetics\n.\n[\n73\n]\nIt is a statistical description of the\nallele frequencies\nin the components of a sub-divided population:\nwhere\nand\n; here\nF\nis (Wright's) genetic distance between two populations.\nProject management: task cost and schedule modeling\n[\nedit\n]\nThe beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the\ntriangular distribution\n — is used extensively in\nPERT\n,\ncritical path method\n(CPM), Joint Cost Schedule Modeling (JCSM) and other\nproject management\n\/control systems to describe the time to completion and the cost of a task. In project management, shorthand computations are widely used to estimate the\nmean\nand\nstandard deviation\nof the beta distribution:\n[\n39\n]\nwhere\na\nis the minimum,\nc\nis the maximum, and\nb\nis the most likely value (the\nmode\nfor\nα\n> 1 and\nβ\n> 1).\nThe above estimate for the\nmean\nis known as the\nPERT\nthree-point estimation\nand it is exact for either of the following values of\nβ\n(for arbitrary α within these ranges):\nβ\n=\nα\n> 1 (symmetric case) with\nstandard deviation\n,\nskewness\n= 0, and\nexcess kurtosis\n=\nor\nβ\n= 6 −\nα\nfor 5 >\nα\n> 1 (skewed case) with\nstandard deviation\nskewness\n, and\nexcess kurtosis\nThe above estimate for the\nstandard deviation\nσ\n(\nX\n) = (\nc\n−\na\n)\/6 is exact for either of the following values of\nα\nand\nβ\n:\nα\n=\nβ\n= 4 (symmetric) with\nskewness\n= 0, and\nexcess kurtosis\n= −6\/11.\nβ\n= 6 −\nα\nand\n(right-tailed, positive skew) with\nskewness\n, and\nexcess kurtosis\n= 0\nβ\n= 6 −\nα\nand\n(left-tailed, negative skew) with\nskewness\n, and\nexcess kurtosis\n= 0\nOtherwise, these can be poor approximations for beta distributions with other values of α and β, exhibiting average errors of 40% in the mean and 549% in the variance.\n[\n74\n]\n[\n75\n]\n[\n76\n]\nRandom variate generation\n[\nedit\n]\nIf\nX\nand\nY\nare independent, with\nand\nthen\nSo one algorithm for generating beta variates is to generate\n, where\nX\nis a\ngamma variate\nwith parameters (α, 1) and\nY\nis an independent gamma variate with parameters (β, 1).\n[\n77\n]\nIn fact, here\nand\nare independent, and\n. If\nand\nis independent of\nand\n, then\nand\nis independent of\n. This shows that the product of independent\nand\nrandom variables is a\nrandom variable.\nAlso, the\nk\nth\norder statistic\nof\nn\nuniformly distributed\nvariates is\n, so an alternative if\nα\nand\nβ\nare small integers is to generate α + β − 1 uniform variates and choose the α-th smallest.\n[\n40\n]\nAnother way to generate the Beta distribution is by\nPólya urn model\n. According to this method, one starts with an \"urn\" with α \"black\" balls and β \"white\" balls and draws uniformly with replacement. Every trial an additional ball is added according to the color of the last ball which was drawn. Asymptotically, the proportion of black and white balls will be distributed according to the Beta distribution, where each repetition of the experiment will produce a different value.\nIt is also possible to use the\ninverse transform sampling\n.\nNormal approximation to the Beta distribution\n[\nedit\n]\nA beta distribution\nwith\nand\nand\nis approximately normal with mean\nand variance\n. If\nthe normal approximation can be improved by taking the cube-root of the logarithm of the reciprocal of\n[\n78\n]\n[\n79\n]\nThomas Bayes\n, in a posthumous paper\n[\n62\n]\npublished in 1763 by\nRichard Price\n, obtained a beta distribution as the density of the probability of success in Bernoulli trials (see\n§ Applications, Bayesian inference\n), but the paper does not analyze any of the moments of the beta distribution or discuss any of its properties.\nKarl Pearson\nanalyzed the beta distribution as the solution Type I of Pearson distributions\nThe first systematic modern discussion of the beta distribution is probably due to\nKarl Pearson\n.\n[\n80\n]\n[\n81\n]\nIn Pearson's papers\n[\n21\n]\n[\n33\n]\nthe beta distribution is couched as a solution of a differential equation:\nPearson's Type I distribution\nwhich it is essentially identical to except for arbitrary shifting and re-scaling (the beta and Pearson Type I distributions can always be equalized by proper choice of parameters). In fact, in several English books and journal articles in the few decades prior to World War II, it was common to refer to the beta distribution as Pearson's Type I distribution.\nWilliam P. Elderton\nin his 1906 monograph \"Frequency curves and correlation\"\n[\n42\n]\nfurther analyzes the beta distribution as Pearson's Type I distribution, including a full discussion of the method of moments for the four parameter case, and diagrams of (what Elderton describes as) U-shaped, J-shaped, twisted J-shaped, \"cocked-hat\" shapes, horizontal and angled straight-line cases.  Elderton wrote \"I am chiefly indebted to Professor Pearson, but the indebtedness is of a kind for which it is impossible to offer formal thanks.\"\nElderton\nin his 1906 monograph\n[\n42\n]\nprovides an impressive amount of information on the beta distribution, including equations for the origin of the distribution chosen to be the mode, as well as for other Pearson distributions: types I through VII. Elderton also included a number of appendixes, including one appendix (\"II\") on the beta and gamma functions. In later editions, Elderton added equations for the origin of the distribution chosen to be the mean, and analysis of Pearson distributions VIII through XII.\nAs remarked by Bowman and Shenton\n[\n44\n]\n\"Fisher and Pearson had a difference of opinion in the approach to (parameter) estimation, in particular relating to (Pearson's method of) moments and (Fisher's method of) maximum likelihood in the case of the Beta distribution.\" Also according to Bowman and Shenton, \"the case of a Type I (beta distribution) model being the center of the controversy was pure serendipity. A more difficult model of 4 parameters would have been hard to find.\" The long running public conflict of Fisher with Karl Pearson can be followed in a number of articles in prestigious journals.  For example, concerning the estimation of the four parameters for the beta distribution, and Fisher's criticism of Pearson's method of moments as being arbitrary, see Pearson's article \"Method of moments and method of maximum likelihood\"\n[\n45\n]\n(published three years after his retirement from University College, London, where his position had been divided between Fisher and Pearson's son Egon) in which Pearson writes \"I read (Koshai's paper in the Journal of the Royal Statistical Society, 1933) which as far as I am aware is the only case at present published of the application of Professor Fisher's method. To my astonishment that method depends on first working out the constants of the frequency curve by the (Pearson) Method of Moments and then superposing on it, by what Fisher terms \"the Method of Maximum Likelihood\" a further approximation to obtain, what he holds, he will thus get, 'more efficient values' of the curve constants\".\nDavid and Edwards's treatise on the history of statistics\n[\n82\n]\ncites the first modern treatment of the beta distribution, in 1911,\n[\n83\n]\nusing the beta designation that has become standard, due to\nCorrado Gini\n, an Italian\nstatistician\n,\ndemographer\n, and\nsociologist\n, who developed the\nGini coefficient\n.\nN.L.Johnson\nand\nS.Kotz\n, in their comprehensive and very informative monograph\n[\n84\n]\non leading historical personalities in statistical sciences credit\nCorrado Gini\n[\n85\n]\nas \"an early Bayesian...who dealt with the problem of eliciting the parameters of an initial Beta distribution, by singling out techniques which anticipated the advent of the so-called empirical Bayes approach.\"\n^\na\nb\nc\nd\ne\nf\ng\nh\ni\nj\nk\nl\nm\nn\no\np\nq\nr\ns\nt\nu\nv\nw\nx\ny\nJohnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). \"Chapter 25: Beta Distributions\".\nContinuous Univariate Distributions Vol. 2\n(2nd ed.). Wiley.\nISBN\n \n978-0-471-58494-0\n.\n^\na\nb\nRose, Colin; Smith, Murray D. (2002).\nMathematical Statistics with MATHEMATICA\n. Springer.\nISBN\n \n978-0387952345\n.\n^\na\nb\nc\nKruschke, John K.\n(2011).\nDoing Bayesian data analysis: A tutorial with R and BUGS\n. Academic Press \/ Elsevier. p. 83.\nISBN\n \n978-0123814852\n.\n^\na\nb\nBerger, James O. (2010).\nStatistical Decision Theory and Bayesian Analysis\n(2nd ed.). Springer.\nISBN\n \n978-1441930743\n.\n^\na\nb\nc\nFeller, William (1971).\nAn Introduction to Probability Theory and Its Applications, Vol. 2\n. Wiley.\nISBN\n \n978-0471257097\n.\n^\nWadsworth, G. P. (1960).\nIntroduction to Probability and Random Variables\n. New York: McGraw-Hill. p. \n52\n.\n^\nKruschke, John K.\n(2015).\nDoing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan\n. Academic Press \/ Elsevier.\nISBN\n \n978-0-12-405888-0\n.\n^\na\nb\nWadsworth, George P. and Joseph Bryan (1960).\nIntroduction to Probability and Random Variables\n. McGraw-Hill.\n^\na\nb\nc\nd\ne\nf\ng\nGupta, Arjun K., ed. (2004).\nHandbook of Beta Distribution and Its Applications\n. CRC Press.\nISBN\n \n978-0824753962\n.\n^\na\nb\nKerman, Jouni (2011). \"A closed-form approximation for the median of the beta distribution\".\narXiv\n:\n1111.0433\n[\nmath.ST\n].\n^\nMosteller, Frederick and John Tukey (1977).\nData Analysis and Regression: A Second Course in Statistics\n. Addison-Wesley Pub. Co.\nBibcode\n:\n1977dars.book.....M\n.\nISBN\n \n978-0201048544\n.\n^\nFeller, William (1968).\nAn Introduction to Probability Theory and Its Applications\n. Vol. 1 (3rd ed.). Wiley.\nISBN\n \n978-0471257080\n.\n^\nPhilip J. Fleming and John J. Wallace.\nHow not to lie with statistics: the correct way to summarize benchmark results\n. Communications of the ACM, 29(3):218–221, March 1986.\n^\n\"NIST\/SEMATECH e-Handbook of Statistical Methods 1.3.6.6.17. Beta Distribution\"\n.\nNational Institute of Standards and Technology\nInformation Technology Laboratory\n. April 2012\n. Retrieved\nMay 31,\n2016\n.\n^\nOguamanam, D.C.D.; Martin, H. R.; Huissoon, J. P. (1995). \"On the application of the beta distribution to gear damage analysis\".\nApplied Acoustics\n.\n45\n(3):\n247–\n261.\ndoi\n:\n10.1016\/0003-682X(95)00001-P\n.\n^\nZhiqiang Liang; Jianming Wei; Junyu Zhao; Haitao Liu; Baoqing Li; Jie Shen; Chunlei Zheng (27 August 2008).\n\"The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals\"\n.\nSensors\n.\n8\n(8):\n5106–\n5119.\nBibcode\n:\n2008Senso...8.5106L\n.\ndoi\n:\n10.3390\/s8085106\n.\nPMC\n \n3705491\n.\nPMID\n \n27873804\n.\n^\nKenney, J. F., and E. S. Keeping (1951).\nMathematics of Statistics Part Two, 2nd edition\n. D. Van Nostrand Company Inc.\n{{\ncite book\n}}\n:  CS1 maint: multiple names: authors list (\nlink\n)\n^\na\nb\nc\nd\nAbramowitz, Milton and Irene A. Stegun (1965).\nHandbook Of Mathematical Functions With Formulas, Graphs, And Mathematical Tables\n. Dover.\nISBN\n \n978-0-486-61272-0\n.\n^\nWeisstein., Eric W.\n\"Kurtosis\"\n. MathWorld--A Wolfram Web Resource\n. Retrieved\n13 August\n2012\n.\n^\na\nb\nPanik, Michael J (2005).\nAdvanced Statistics from an Elementary Point of View\n. Academic Press.\nISBN\n \n978-0120884940\n.\n^\na\nb\nc\nd\ne\nf\nPearson, Karl\n(1916).\n\"Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation\"\n.\nPhilosophical Transactions of the Royal Society A\n.\n216\n(\n538–\n548):\n429–\n457.\nBibcode\n:\n1916RSPTA.216..429P\n.\ndoi\n:\n10.1098\/rsta.1916.0009\n.\nJSTOR\n \n91092\n.\n^\nGradshteyn, Izrail Solomonovich\n;\nRyzhik, Iosif Moiseevich\n;\nGeronimus, Yuri Veniaminovich\n;\nTseytlin, Michail Yulyevich\n; Jeffrey, Alan (2015) [October 2014]. Zwillinger, Daniel;\nMoll, Victor Hugo\n(eds.).\nTable of Integrals, Series, and Products\n. Translated by Scripta Technica, Inc. (8 ed.).\nAcademic Press, Inc.\nISBN\n \n978-0-12-384933-5\n.\nLCCN\n \n2014010276\n.\n^\nBillingsley, Patrick (1995). \"Section 30: The Method of Moments\".\nProbability and measure\n(3rd ed.). Wiley-Interscience.\nISBN\n \n978-0-471-00710-4\n.\n^\na\nb\nMacKay, David (2003).\nInformation Theory, Inference and Learning Algorithms\n. Cambridge University Press; First Edition.\nBibcode\n:\n2003itil.book.....M\n.\nISBN\n \n978-0521642989\n.\n^\na\nb\nJohnson, N.L. (1949).\n\"Systems of frequency curves generated by methods of translation\"\n(PDF)\n.\nBiometrika\n.\n36\n(\n1–\n2):\n149–\n176.\ndoi\n:\n10.1093\/biomet\/36.1-2.149\n.\nhdl\n:\n10338.dmlcz\/135506\n.\nPMID\n \n18132090\n.\n^\nVerdugo Lazo, A. C. G.; Rathie, P. N. (1978). \"On the entropy of continuous probability distributions\".\nIEEE Trans. Inf. Theory\n.\n24\n(1):\n120–\n122.\ndoi\n:\n10.1109\/TIT.1978.1055832\n.\n^\nShannon, Claude E. (1948). \"A Mathematical Theory of Communication\".\nBell System Technical Journal\n.\n27\n(4):\n623–\n656.\ndoi\n:\n10.1002\/j.1538-7305.1948.tb01338.x\n.\n^\na\nb\nc\nCover, Thomas M. and Joy A. Thomas (2006).\nElements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing)\n. Wiley-Interscience; 2 edition.\nISBN\n \n978-0471241959\n.\n^\nPlunkett, Kim, and Jeffrey Elman (1997).\nExercises in Rethinking Innateness: A Handbook for Connectionist Simulations (Neural Network Modeling and Connectionism)\n. A Bradford Book. p. 166.\nISBN\n \n978-0262661058\n.\n{{\ncite book\n}}\n:  CS1 maint: multiple names: authors list (\nlink\n)\n^\nNallapati, Ramesh (2006).\nThe smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval\n(Thesis). Computer Science Dept., University of Massachusetts Amherst.\n^\na\nb\nPearson, Egon S. (July 1969).\n\"Some historical reflections traced through the development of the use of frequency curves\"\n.\nTHEMIS Statistical Analysis Research Program, Technical Report 38\n. Office of Naval Research, Contract N000014-68-A-0515 (Project NR 042–260).\n^\nHahn, Gerald J.; Shapiro, S. (1994).\nStatistical Models in Engineering (Wiley Classics Library)\n. Wiley-Interscience.\nISBN\n \n978-0471040651\n.\n^\na\nb\nPearson, Karl\n(1895).\n\"Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material\"\n.\nPhilosophical Transactions of the Royal Society\n.\n186\n:\n343–\n414.\nBibcode\n:\n1895RSPTA.186..343P\n.\ndoi\n:\n10.1098\/rsta.1895.0010\n.\nJSTOR\n \n90649\n.\n^\nBuchanan, K.; Rockway, J.; Sternberg, O.; Mai, N. N. (May 2016).\n\"Sum-difference beamforming for radar applications using circularly tapered random arrays\"\n.\n2016 IEEE Radar Conference (RadarConf)\n. pp. \n1–\n5.\ndoi\n:\n10.1109\/RADAR.2016.7485289\n.\nISBN\n \n978-1-5090-0863-6\n.\nS2CID\n \n32525626\n.\n^\nBuchanan, K.; Flores, C.; Wheeland, S.; Jensen, J.; Grayson, D.; Huff, G. (May 2017). \"Transmit beamforming for radar applications using circularly tapered random arrays\".\n2017 IEEE Radar Conference (RadarConf)\n. pp. \n0112–\n0117.\ndoi\n:\n10.1109\/RADAR.2017.7944181\n.\nISBN\n \n978-1-4673-8823-8\n.\nS2CID\n \n38429370\n.\n^\nRyan, Buchanan, Kristopher (2014-05-29).\n\"Theory and Applications of Aperiodic (Random) Phased Arrays\"\n.\n{{\ncite web\n}}\n:  CS1 maint: multiple names: authors list (\nlink\n)\n^\nPham-Gia, T. (January 2000).\n\"Distributions of the ratios of independent beta variables and applications\"\n.\nCommunications in Statistics - Theory and Methods\n.\n29\n(12):\n2693–\n2715.\ndoi\n:\n10.1080\/03610920008832632\n.\nISSN\n \n0361-0926\n. Retrieved\n13 November\n2024\n.\n^\nHerrerías-Velasco, José Manuel and Herrerías-Pleguezuelo, Rafael and René van Dorp, Johan. (2011). Revisiting the PERT mean and Variance. European Journal of Operational Research (210), p. 448–451.\n^\na\nb\nMalcolm, D. G.; Roseboom, J. H.; Clark, C. E.; Fazar, W. (September–October 1958). \"Application of a Technique for Research and Development Program Evaluation\".\nOperations Research\n.\n7\n(5):\n646–\n669.\ndoi\n:\n10.1287\/opre.7.5.646\n.\nISSN\n \n0030-364X\n.\n^\na\nb\nc\nd\nDavid, H. A., Nagaraja, H. N. (2003)\nOrder Statistics\n(3rd Edition). Wiley, New Jersey pp 458.\nISBN\n \n0-471-38926-9\n^\n\"1.3.6.6.17. Beta Distribution\"\n.\nwww.itl.nist.gov\n.\n^\na\nb\nc\nd\ne\nf\ng\nh\nElderton, William Palin (1906).\nFrequency-Curves and Correlation\n. Charles and Edwin Layton (London).\n^\nElderton, William Palin and Norman Lloyd Johnson (2009).\nSystems of Frequency Curves\n. Cambridge University Press.\nISBN\n \n978-0521093361\n.\n^\na\nb\nc\nBowman, K. O.\n; Shenton, L. R. (2007).\n\"The beta distribution, moment method, Karl Pearson and R.A. Fisher\"\n(PDF)\n.\nFar East J. Theo. Stat\n.\n23\n(2):\n133–\n164.\n^\na\nb\nPearson, Karl (June 1936). \"Method of moments and method of maximum likelihood\".\nBiometrika\n.\n28\n(1\/2):\n34–\n59.\ndoi\n:\n10.2307\/2334123\n.\nJSTOR\n \n2334123\n.\n^\na\nb\nc\nJoanes, D. N.; C. A. Gill (1998). \"Comparing measures of sample skewness and kurtosis\".\nThe Statistician\n.\n47\n(Part 1):\n183–\n189.\ndoi\n:\n10.1111\/1467-9884.00122\n.\n^\nBeckman, R. J.; G. L. Tietjen (1978). \"Maximum likelihood estimation for the beta distribution\".\nJournal of Statistical Computation and Simulation\n.\n7\n(\n3–\n4):\n253–\n258.\ndoi\n:\n10.1080\/00949657808810232\n.\n^\nGnanadesikan, R., Pinkham and Hughes (1967). \"Maximum likelihood estimation of the parameters of the beta distribution from smallest order statistics\".\nTechnometrics\n.\n9\n(4):\n607–\n620.\ndoi\n:\n10.2307\/1266199\n.\nJSTOR\n \n1266199\n.\n{{\ncite journal\n}}\n:  CS1 maint: multiple names: authors list (\nlink\n)\n^\nFackler, Paul.\n\"Inverse Digamma Function (Matlab)\"\n. Harvard University School of Engineering and Applied Sciences\n. Retrieved\n2012-08-18\n.\n^\na\nb\nc\nSilvey, S.D. (1975).\nStatistical Inference\n. Chapman and Hal. p. 40.\nISBN\n \n978-0412138201\n.\n^\nEdwards, A. W. F. (1992).\nLikelihood\n. The Johns Hopkins University Press.\nISBN\n \n978-0801844430\n.\n^\na\nb\nc\nd\ne\nf\nJaynes, E.T. (2003).\nProbability theory, the logic of science\n. Cambridge University Press.\nISBN\n \n978-0521592710\n.\n^\nCosta, Max, and Cover, Thomas (September 1983).\nOn the similarity of the entropy power inequality and the Brunn Minkowski inequality\n(PDF)\n. Tech.Report 48, Dept. Statistics, Stanford University.\n{{\ncite book\n}}\n:  CS1 maint: multiple names: authors list (\nlink\n)\n^\na\nb\nc\nAryal, Gokarna; Saralees Nadarajah (2004).\n\"Information matrix for beta distributions\"\n(PDF)\n.\nSerdica Mathematical Journal (Bulgarian Academy of Science)\n.\n30\n:\n513–\n526.\n^\na\nb\nLaplace, Pierre Simon, marquis de (1902).\nA philosophical essay on probabilities\n. New York : J. Wiley; London : Chapman & Hall.\nISBN\n \n978-1-60206-328-0\n.\nCS1 maint: multiple names: authors list (\nlink\n)\n^\nCox, Richard T. (1961).\nAlgebra of Probable Inference\n. The Johns Hopkins University Press.\nISBN\n \n978-0801869822\n.\n^\na\nb\nKeynes, John Maynard (2010) [1921].\nA Treatise on Probability: The Connection Between Philosophy and the History of Science\n. Wildside Press.\nISBN\n \n978-1434406965\n.\n^\nPearson, Karl (1907). \"On the Influence of Past Experience on Future Expectation\".\nPhilosophical Magazine\n.\n6\n(13):\n365–\n378.\n^\na\nb\nc\nd\nJeffreys, Harold (1998).\nTheory of Probability\n. Oxford University Press, 3rd edition.\nISBN\n \n978-0198503682\n.\n^\nBroad, C. D. (October 1918). \"On the relation between induction and probability\".\nMIND, A Quarterly Review of Psychology and Philosophy\n. 27 (New Series) (108):\n389–\n404.\ndoi\n:\n10.1093\/mind\/XXVII.4.389\n.\nJSTOR\n \n2249035\n.\n^\na\nb\nc\nd\nPerks, Wilfred (January 1947).\n\"Some observations on inverse probability including a new indifference rule\"\n.\nJournal of the Institute of Actuaries\n.\n73\n(2):\n285–\n334.\ndoi\n:\n10.1017\/S0020268100012270\n. Archived from\nthe original\non 2014-01-12\n. Retrieved\n2012-09-19\n.\n^\na\nb\nBayes, Thomas; communicated by Richard Price (1763).\n\"An Essay towards solving a Problem in the Doctrine of Chances\"\n.\nPhilosophical Transactions of the Royal Society\n.\n53\n:\n370–\n418.\ndoi\n:\n10.1098\/rstl.1763.0053\n.\nJSTOR\n \n105741\n.\n^\nHaldane, J. B. S.\n(1932). \"A note on inverse probability\".\nMathematical Proceedings of the Cambridge Philosophical Society\n.\n28\n(1):\n55–\n61.\nBibcode\n:\n1932PCPS...28...55H\n.\ndoi\n:\n10.1017\/s0305004100010495\n.\nS2CID\n \n122773707\n.\n^\nZellner, Arnold (1971).\nAn Introduction to Bayesian Inference in Econometrics\n. Wiley-Interscience.\nISBN\n \n978-0471169376\n.\n^\nJeffreys, Harold (September 1946).\n\"An Invariant Form for the Prior Probability in Estimation Problems\"\n.\nProceedings of the Royal Society\n. A 24.\n186\n(1007):\n453–\n461.\nBibcode\n:\n1946RSPSA.186..453J\n.\ndoi\n:\n10.1098\/rspa.1946.0056\n.\nPMID\n \n20998741\n.\n^\nBerger, James; Bernardo, Jose; Sun, Dongchu (2009).\n\"The formal definition of reference priors\"\n.\nThe Annals of Statistics\n.\n37\n(2):\n905–\n938.\narXiv\n:\n0904.0156\n.\nBibcode\n:\n2009arXiv0904.0156B\n.\ndoi\n:\n10.1214\/07-AOS587\n.\nS2CID\n \n3221355\n.\n^\nClarke, Bertrand S.; Andrew R. Barron (1994).\n\"Jeffreys' prior is asymptotically least favorable under entropy risk\"\n(PDF)\n.\nJournal of Statistical Planning and Inference\n.\n41\n:\n37–\n60.\ndoi\n:\n10.1016\/0378-3758(94)90153-8\n.\n^\nPearson, Karl (1892).\nThe Grammar of Science\n. Walter Scott, London.\n^\nPearson, Karl (2009).\nThe Grammar of Science\n. BiblioLife.\nISBN\n \n978-1110356119\n.\n^\nGelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003).\nBayesian Data Analysis\n. Chapman and Hall\/CRC.\nISBN\n \n978-1584883883\n.\n{{\ncite book\n}}\n:  CS1 maint: multiple names: authors list (\nlink\n)\n^\nJøsang, Audun (2001).\n\"A logic for uncertain probabilities\"\n.\nInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems\n.\n9\n(3):\n279–\n311.\ndoi\n:\n10.1142\/S0218488501000831\n.\nMR\n \n1843261\n.\n^\nH.M. de Oliveira and G.A.A. Araújo,. Compactly Supported One-cyclic Wavelets Derived from Beta Distributions.\nJournal of Communication and Information Systems.\nvol.20, n.3, pp.27-33, 2005.\n^\nBalding, David J.\n; Nichols, Richard A. (1995). \"A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity\".\nGenetica\n.\n96\n(\n1–\n2). Springer:\n3–\n12.\ndoi\n:\n10.1007\/BF01441146\n.\nPMID\n \n7607457\n.\nS2CID\n \n30680826\n.\n^\nKeefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 1086–1091.\n^\nKeefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609.\n^\n\"Defense Resource Management Institute - Naval Postgraduate School\"\n.\nwww.nps.edu\n.\n^\nvan der Waerden, B. L., \"Mathematical Statistics\", Springer,\nISBN\n \n978-3-540-04507-6\n.\n^\nOn normalizing the incomplete beta-function for fitting to dose-response curves M.E. Wise Biometrika vol 47, No. 1\/2, June 1960, pp. 173–175\n^\nPratt, John W. “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, II.” Journal of the American Statistical Association, vol. 63, no. 324, 1968, pp. 1457–83. JSTOR,\nhttps:\/\/doi.org\/10.2307\/2285896\n. Accessed 21 Oct. 2025.\n^\nYule, G. U.\n; Filon, L. N. G. (1936).\n\"Karl Pearson. 1857–1936\"\n.\nObituary Notices of Fellows of the Royal Society\n.\n2\n(5): 72.\ndoi\n:\n10.1098\/rsbm.1936.0007\n.\nJSTOR\n \n769130\n.\n^\n\"Library and Archive catalogue\"\n.\nSackler Digital Archive\n. Royal Society. Archived from\nthe original\non 2011-10-25\n. Retrieved\n2011-07-01\n.\n^\nDavid, H. A. and A.W.F. Edwards (2001).\nAnnotated Readings in the History of Statistics\n. Springer; 1 edition.\nISBN\n \n978-0387988443\n.\n^\nGini, Corrado (1911). \"Considerazioni Sulle Probabilità Posteriori e Applicazioni al Rapporto dei Sessi Nelle Nascite Umane\".\nStudi Economico-Giuridici della Università de Cagliari\n. Anno III (reproduced in Metron 15, 133, 171, 1949):\n5–\n41.\n^\nJohnson, Norman L. and Samuel Kotz, ed. (1997).\nLeading Personalities in Statistical Sciences: From the Seventeenth Century to the Present (Wiley Series in Probability and Statistics\n. Wiley.\nISBN\n \n978-0471163817\n.\n^\nMetron journal.\n\"Biography of Corrado Gini\"\n. Metron Journal. Archived from\nthe original\non 2012-07-16\n. Retrieved\n2012-08-18\n.\n\"Beta Distribution\"\nby Fiona Maclachlan, the\nWolfram Demonstrations Project\n, 2007.\nBeta Distribution – Overview and Example\n, xycoon.com\nBeta Distribution\n, brighton-webs.co.uk\nBeta Distribution Video\n, exstrom.com\n\"Beta-distribution\"\n,\nEncyclopedia of Mathematics\n,\nEMS Press\n, 2001 [1994]\nWeisstein, Eric W.\n\"Beta Distribution\"\n.\nMathWorld\n.\nHarvard University Statistics 110 Lecture 23 Beta Distribution, Prof. Joe Blitzstein","attrs_markdown":"[Jump to content](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#bodyContent)\n\nMain menu\n\nMain menu\n\nmove to sidebar\n\nhide\n\nNavigation\n\n- [Main page](https:\/\/en.wikipedia.org\/wiki\/Main_Page \"Visit the main page [z]\")\n- [Contents](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Contents \"Guides to browsing Wikipedia\")\n- [Current events](https:\/\/en.wikipedia.org\/wiki\/Portal:Current_events \"Articles related to current events\")\n- [Random article](https:\/\/en.wikipedia.org\/wiki\/Special:Random \"Visit a randomly selected article [x]\")\n- [About Wikipedia](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:About \"Learn about Wikipedia and how it works\")\n- [Contact us](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Contact_us \"How to contact Wikipedia\")\n\nContribute\n\n- [Help](https:\/\/en.wikipedia.org\/wiki\/Help:Contents \"Guidance on how to use and edit Wikipedia\")\n- [Learn to edit](https:\/\/en.wikipedia.org\/wiki\/Help:Introduction \"Learn how to edit Wikipedia\")\n- [Community portal](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Community_portal \"The hub for editors\")\n- [Recent changes](https:\/\/en.wikipedia.org\/wiki\/Special:RecentChanges \"A list of recent changes to Wikipedia [r]\")\n- [Upload file](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:File_upload_wizard \"Add images or other media for use on Wikipedia\")\n- [Special pages](https:\/\/en.wikipedia.org\/wiki\/Special:SpecialPages \"A list of all special pages [q]\")\n\n[![](https:\/\/en.wikipedia.org\/static\/images\/icons\/enwiki-25.svg) ![Wikipedia](https:\/\/en.wikipedia.org\/static\/images\/mobile\/copyright\/wikipedia-wordmark-en-25.svg) ![The Free Encyclopedia](https:\/\/en.wikipedia.org\/static\/images\/mobile\/copyright\/wikipedia-tagline-en-25.svg)](https:\/\/en.wikipedia.org\/wiki\/Main_Page)\n\n[Search](https:\/\/en.wikipedia.org\/wiki\/Special:Search \"Search Wikipedia [f]\")\n\nAppearance\n\n- [Donate](https:\/\/donate.wikimedia.org\/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)\n- [Create account](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:CreateAccount&returnto=Beta+distribution \"You are encouraged to create an account and log in; however, it is not mandatory\")\n- [Log in](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:UserLogin&returnto=Beta+distribution \"You're encouraged to log in; however, it's not mandatory. [o]\")\n\nPersonal tools\n\n- [Donate](https:\/\/donate.wikimedia.org\/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)\n- [Create account](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:CreateAccount&returnto=Beta+distribution \"You are encouraged to create an account and log in; however, it is not mandatory\")\n- [Log in](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:UserLogin&returnto=Beta+distribution \"You're encouraged to log in; however, it's not mandatory. [o]\")\n\n## Contents\nmove to sidebar\n\nhide\n\n- [(Top)](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution)\n- [1 Definitions](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Definitions)\n  Toggle Definitions subsection\n  - [1\\.1 Probability density function](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Probability_density_function)\n  - [1\\.2 Cumulative distribution function](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Cumulative_distribution_function)\n  - [1\\.3 Alternative parameterizations](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Alternative_parameterizations)\n    - [1\\.3.1 Two parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Two_parameters)\n      - [1\\.3.1.1 Mean and sample size](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean_and_sample_size)\n      - [1\\.3.1.2 Mode and concentration](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mode_and_concentration)\n      - [1\\.3.1.3 Mean and variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean_and_variance)\n    - [1\\.3.2 Four parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Four_parameters)\n- [2 Properties](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Properties)\n  Toggle Properties subsection\n  - [2\\.1 Measures of central tendency](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Measures_of_central_tendency)\n    - [2\\.1.1 Mode](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mode)\n    - [2\\.1.2 Median](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Median)\n    - [2\\.1.3 Mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean)\n    - [2\\.1.4 Geometric mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_mean)\n    - [2\\.1.5 Harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Harmonic_mean)\n  - [2\\.2 Measures of statistical dispersion](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Measures_of_statistical_dispersion)\n    - [2\\.2.1 Variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Variance)\n    - [2\\.2.2 Geometric variance and covariance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_variance_and_covariance)\n    - [2\\.2.3 Mean absolute deviation around the mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean_absolute_deviation_around_the_mean)\n    - [2\\.2.4 Mean absolute difference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean_absolute_difference)\n  - [2\\.3 Skewness](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Skewness)\n  - [2\\.4 Kurtosis](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis)\n  - [2\\.5 Characteristic function](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Characteristic_function)\n  - [2\\.6 Other moments](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Other_moments)\n    - [2\\.6.1 Moment generating function](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moment_generating_function)\n    - [2\\.6.2 Higher moments](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Higher_moments)\n    - [2\\.6.3 Moments of transformed random variables](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moments_of_transformed_random_variables)\n      - [2\\.6.3.1 Moments of linearly transformed, product and inverted random variables](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moments_of_linearly_transformed,_product_and_inverted_random_variables)\n      - [2\\.6.3.2 Moments of logarithmically transformed random variables](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moments_of_logarithmically_transformed_random_variables)\n  - [2\\.7 Quantities of information (entropy)](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Quantities_of_information_\\(entropy\\))\n  - [2\\.8 Relationships between statistical measures](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Relationships_between_statistical_measures)\n    - [2\\.8.1 Mean, mode and median relationship](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean,_mode_and_median_relationship)\n    - [2\\.8.2 Mean, geometric mean and harmonic mean relationship](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean,_geometric_mean_and_harmonic_mean_relationship)\n    - [2\\.8.3 Kurtosis bounded by the square of the skewness](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness)\n  - [2\\.9 Symmetry](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Symmetry)\n  - [2\\.10 Geometry of the probability density function](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometry_of_the_probability_density_function)\n    - [2\\.10.1 Inflection points](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Inflection_points)\n    - [2\\.10.2 Shapes](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Shapes)\n      - [2\\.10.2.1 Symmetric (*α* = *β*)](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Symmetric_\\(%CE%B1_=_%CE%B2\\))\n      - [2\\.10.2.2 Skewed (*α* ≠ *β*)](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Skewed_\\(%CE%B1_%E2%89%A0_%CE%B2\\))\n- [3 Related distributions](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Related_distributions)\n  Toggle Related distributions subsection\n  - [3\\.1 Transformations](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Transformations)\n  - [3\\.2 Special and limiting cases](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Special_and_limiting_cases)\n  - [3\\.3 Derived from other distributions](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Derived_from_other_distributions)\n  - [3\\.4 Combination with other distributions](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Combination_with_other_distributions)\n  - [3\\.5 Compounding with other distributions](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Compounding_with_other_distributions)\n  - [3\\.6 Generalisations](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Generalisations)\n- [4 Statistical inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Statistical_inference)\n  Toggle Statistical inference subsection\n  - [4\\.1 Parameter estimation](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Parameter_estimation)\n    - [4\\.1.1 Method of moments](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Method_of_moments)\n      - [4\\.1.1.1 Two unknown parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Two_unknown_parameters)\n      - [4\\.1.1.2 Four unknown parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Four_unknown_parameters)\n    - [4\\.1.2 Maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Maximum_likelihood)\n      - [4\\.1.2.1 Two unknown parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Two_unknown_parameters_2)\n      - [4\\.1.2.2 Four unknown parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Four_unknown_parameters_2)\n    - [4\\.1.3 Fisher information matrix](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Fisher_information_matrix)\n      - [4\\.1.3.1 Two parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Two_parameters_2)\n      - [4\\.1.3.2 Four parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Four_parameters_2)\n  - [4\\.2 Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Bayesian_inference)\n    - [4\\.2.1 Rule of succession](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Rule_of_succession)\n    - [4\\.2.2 Bayes–Laplace prior probability (Beta(1,1))](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Bayes%E2%80%93Laplace_prior_probability_\\(Beta\\(1,1\\)\\))\n    - [4\\.2.3 Haldane's prior probability (Beta(0,0))](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Haldane's_prior_probability_\\(Beta\\(0,0\\)\\))\n    - [4\\.2.4 Jeffreys' prior probability (Beta(1\/2,1\/2) for a Bernoulli or for a binomial distribution)](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Jeffreys'_prior_probability_\\(Beta\\(1\/2,1\/2\\)_for_a_Bernoulli_or_for_a_binomial_distribution\\))\n    - [4\\.2.5 Effect of different prior probability choices on the posterior beta distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Effect_of_different_prior_probability_choices_on_the_posterior_beta_distribution)\n- [5 Occurrence and applications](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Occurrence_and_applications)\n  Toggle Occurrence and applications subsection\n  - [5\\.1 Order statistics](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Order_statistics)\n  - [5\\.2 Subjective logic](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Subjective_logic)\n  - [5\\.3 Wavelet analysis](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Wavelet_analysis)\n  - [5\\.4 Population genetics](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Population_genetics)\n  - [5\\.5 Project management: task cost and schedule modeling](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Project_management:_task_cost_and_schedule_modeling)\n- [6 Random variate generation](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Random_variate_generation)\n- [7 Normal approximation to the Beta distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Normal_approximation_to_the_Beta_distribution)\n- [8 History](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#History)\n- [9 References](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#References)\n- [10 External links](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#External_links)\n\nToggle the table of contents\n\n# Beta distribution\n27 languages\n\n- [العربية](https:\/\/ar.wikipedia.org\/wiki\/%D8%AA%D9%88%D8%B2%D9%8A%D8%B9_%D8%A8%D9%8A%D8%AA%D8%A7 \"توزيع بيتا – Arabic\")\n- [Беларуская](https:\/\/be.wikipedia.org\/wiki\/%D0%91%D1%8D%D1%82%D0%B0-%D1%80%D0%B0%D0%B7%D0%BC%D0%B5%D1%80%D0%BA%D0%B0%D0%B2%D0%B0%D0%BD%D0%BD%D0%B5 \"Бэта-размеркаванне – Belarusian\")\n- [Català](https:\/\/ca.wikipedia.org\/wiki\/Distribuci%C3%B3_beta \"Distribució beta – Catalan\")\n- [Čeština](https:\/\/cs.wikipedia.org\/wiki\/Rozd%C4%9Blen%C3%AD_beta \"Rozdělení beta – Czech\")\n- [Deutsch](https:\/\/de.wikipedia.org\/wiki\/Beta-Verteilung \"Beta-Verteilung – German\")\n- [Español](https:\/\/es.wikipedia.org\/wiki\/Distribuci%C3%B3n_beta \"Distribución beta – Spanish\")\n- [فارسی](https:\/\/fa.wikipedia.org\/wiki\/%D8%AA%D9%88%D8%B2%DB%8C%D8%B9_%D8%A8%D8%AA%D8%A7 \"توزیع بتا – Persian\")\n- [Suomi](https:\/\/fi.wikipedia.org\/wiki\/Beta-jakauma \"Beta-jakauma – Finnish\")\n- [Français](https:\/\/fr.wikipedia.org\/wiki\/Loi_b%C3%AAta \"Loi bêta – French\")\n- [Galego](https:\/\/gl.wikipedia.org\/wiki\/Distribuci%C3%B3n_beta \"Distribución beta – Galician\")\n- [עברית](https:\/\/he.wikipedia.org\/wiki\/%D7%94%D7%AA%D7%A4%D7%9C%D7%92%D7%95%D7%AA_%D7%91%D7%98%D7%90 \"התפלגות בטא – Hebrew\")\n- [Magyar](https:\/\/hu.wikipedia.org\/wiki\/B%C3%A9ta-eloszl%C3%A1s \"Béta-eloszlás – Hungarian\")\n- [Italiano](https:\/\/it.wikipedia.org\/wiki\/Distribuzione_Beta \"Distribuzione Beta – Italian\")\n- [日本語](https:\/\/ja.wikipedia.org\/wiki\/%E3%83%99%E3%83%BC%E3%82%BF%E5%88%86%E5%B8%83 \"ベータ分布 – Japanese\")\n- [ქართული](https:\/\/ka.wikipedia.org\/wiki\/%E1%83%91%E1%83%94%E1%83%A2%E1%83%90_%E1%83%92%E1%83%90%E1%83%9C%E1%83%90%E1%83%AC%E1%83%98%E1%83%9A%E1%83%94%E1%83%91%E1%83%90 \"ბეტა განაწილება – Georgian\")\n- [한국어](https:\/\/ko.wikipedia.org\/wiki\/%EB%B2%A0%ED%83%80_%EB%B6%84%ED%8F%AC \"베타 분포 – Korean\")\n- [Nederlands](https:\/\/nl.wikipedia.org\/wiki\/B%C3%A8taverdeling \"Bètaverdeling – Dutch\")\n- [Polski](https:\/\/pl.wikipedia.org\/wiki\/Rozk%C5%82ad_beta \"Rozkład beta – Polish\")\n- [Português](https:\/\/pt.wikipedia.org\/wiki\/Distribui%C3%A7%C3%A3o_beta \"Distribuição beta – Portuguese\")\n- [Русский](https:\/\/ru.wikipedia.org\/wiki\/%D0%91%D0%B5%D1%82%D0%B0-%D1%80%D0%B0%D1%81%D0%BF%D1%80%D0%B5%D0%B4%D0%B5%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5 \"Бета-распределение – Russian\")\n- [Slovenščina](https:\/\/sl.wikipedia.org\/wiki\/Porazdelitev_beta \"Porazdelitev beta – Slovenian\")\n- [Sunda](https:\/\/su.wikipedia.org\/wiki\/Sebaran_b%C3%A9ta \"Sebaran béta – Sundanese\")\n- [Svenska](https:\/\/sv.wikipedia.org\/wiki\/Betaf%C3%B6rdelning \"Betafördelning – Swedish\")\n- [Tagalog](https:\/\/tl.wikipedia.org\/wiki\/Distribusyong_Beta \"Distribusyong Beta – Tagalog\")\n- [Türkçe](https:\/\/tr.wikipedia.org\/wiki\/Beta_da%C4%9F%C4%B1l%C4%B1m%C4%B1 \"Beta dağılımı – Turkish\")\n- [Українська](https:\/\/uk.wikipedia.org\/wiki\/%D0%91%D0%B5%D1%82%D0%B0-%D1%80%D0%BE%D0%B7%D0%BF%D0%BE%D0%B4%D1%96%D0%BB \"Бета-розподіл – Ukrainian\")\n- [中文](https:\/\/zh.wikipedia.org\/wiki\/%CE%92%E5%88%86%E5%B8%83 \"Β分布 – Chinese\")\n\n[Edit links](https:\/\/www.wikidata.org\/wiki\/Special:EntityPage\/Q756254#sitelinks-wikipedia \"Edit interlanguage links\")\n\n- [Article](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution \"View the content page [c]\")\n- [Talk](https:\/\/en.wikipedia.org\/wiki\/Talk:Beta_distribution \"Discuss improvements to the content page [t]\")\n\nEnglish\n\n- [Read](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution)\n- [Edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit \"Edit this page [e]\")\n- [View history](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=history \"Past revisions of this page [h]\")\n\nTools\n\nTools\n\nmove to sidebar\n\nhide\n\nActions\n\n- [Read](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution)\n- [Edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit \"Edit this page [e]\")\n- [View history](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=history)\n\nGeneral\n\n- [What links here](https:\/\/en.wikipedia.org\/wiki\/Special:WhatLinksHere\/Beta_distribution \"List of all English Wikipedia pages containing links to this page [j]\")\n- [Related changes](https:\/\/en.wikipedia.org\/wiki\/Special:RecentChangesLinked\/Beta_distribution \"Recent changes in pages linked from this page [k]\")\n- [Upload file](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:File_Upload_Wizard \"Upload files [u]\")\n- [Permanent link](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&oldid=1345309542 \"Permanent link to this revision of this page\")\n- [Page information](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=info \"More information about this page\")\n- [Cite this page](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:CiteThisPage&page=Beta_distribution&id=1345309542&wpFormIdentifier=titleform \"Information on how to cite this page\")\n- [Get shortened URL](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:UrlShortener&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBeta_distribution)\n\nPrint\/export\n\n- [Download as PDF](https:\/\/en.wikipedia.org\/w\/index.php?title=Special:DownloadAsPdf&page=Beta_distribution&action=show-download-screen \"Download this page as a PDF file\")\n- [Printable version](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&printable=yes \"Printable version of this page [p]\")\n\nIn other projects\n\n- [Wikimedia Commons](https:\/\/commons.wikimedia.org\/wiki\/Category:Beta_distribution)\n- [Wikidata item](https:\/\/www.wikidata.org\/wiki\/Special:EntityPage\/Q756254 \"Structured data on this page hosted by Wikidata [g]\")\n\nAppearance\n\nmove to sidebar\n\nhide\n\nFrom Wikipedia, the free encyclopedia\n\nProbability distribution\n\nNot to be confused with [Beta function](https:\/\/en.wikipedia.org\/wiki\/Beta_function \"Beta function\").\n\n| Beta |   |\n|---|---|\n| Probability density function[![Probability density function for the beta distribution](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/f3\/Beta_distribution_pdf.svg\/330px-Beta_distribution_pdf.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_pdf.svg \"Probability density function for the beta distribution\") |   |\n| Cumulative distribution function[![Cumulative distribution function for the beta distribution](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/11\/Beta_distribution_cdf.svg\/330px-Beta_distribution_cdf.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_cdf.svg \"Cumulative distribution function for the beta distribution\") |   |\n| Notation | Beta(*α*, *β*) |\n| [Parameters](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") | *α* \\> 0 [shape](https:\/\/en.wikipedia.org\/wiki\/Shape_parameter \"Shape parameter\") ([real](https:\/\/en.wikipedia.org\/wiki\/Real_number \"Real number\")) *β* \\> 0 [shape](https:\/\/en.wikipedia.org\/wiki\/Shape_parameter \"Shape parameter\") ([real](https:\/\/en.wikipedia.org\/wiki\/Real_number \"Real number\")) |\n| [Support](https:\/\/en.wikipedia.org\/wiki\/Support_\\(mathematics\\) \"Support (mathematics)\") | x ∈ \\[ 0 , 1 \\] {\\\\displaystyle x\\\\in \\[0,1\\]\\\\!} ![{\\\\displaystyle x\\\\in \\[0,1\\]\\\\!}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/09601f74a28f3e2cad381be1a915ab0c02fe39c6) or x ∈ ( 0 , 1 ) {\\\\displaystyle x\\\\in (0,1)\\\\!} ![{\\\\displaystyle x\\\\in (0,1)\\\\!}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6c4bd4921b023da2cf81472604e1583c7526af1d) |\n\nIn [probability theory](https:\/\/en.wikipedia.org\/wiki\/Probability_theory \"Probability theory\") and [statistics](https:\/\/en.wikipedia.org\/wiki\/Statistics \"Statistics\"), the **beta distribution** is a family of continuous [probability distributions](https:\/\/en.wikipedia.org\/wiki\/Probability_distribution \"Probability distribution\") defined on the interval \\[0, 1\\] or (0, 1) in terms of two positive [parameters](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\"), denoted by *alpha* (*α*) and *beta* (*β*), that appear as exponents of the variable and its complement to 1, respectively, and control the [shape](https:\/\/en.wikipedia.org\/wiki\/Shape_parameter \"Shape parameter\") of the distribution.\n\nThe beta distribution has been applied to model the behavior of [random variables](https:\/\/en.wikipedia.org\/wiki\/Random_variables \"Random variables\") limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions.\n\nIn [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\"), the beta distribution is the [conjugate prior probability distribution](https:\/\/en.wikipedia.org\/wiki\/Conjugate_prior_distribution \"Conjugate prior distribution\") for the [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), [binomial](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\"), [negative binomial](https:\/\/en.wikipedia.org\/wiki\/Negative_binomial_distribution \"Negative binomial distribution\"), and [geometric](https:\/\/en.wikipedia.org\/wiki\/Geometric_distribution \"Geometric distribution\") distributions.\n\nThe formulation of the beta distribution discussed here is also known as the **beta distribution of the first kind**, whereas *beta distribution of the second kind* is an alternative name for the [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\"). The generalization to multiple variables is called a [Dirichlet distribution](https:\/\/en.wikipedia.org\/wiki\/Dirichlet_distribution \"Dirichlet distribution\").\n\n## Definitions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=1 \"Edit section: Definitions\")\\]\n\n### Probability density function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=2 \"Edit section: Probability density function\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/7\/78\/PDF_of_the_Beta_distribution.gif\/250px-PDF_of_the_Beta_distribution.gif)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_of_the_Beta_distribution.gif)\n\nAn animation of the beta distribution for different values of its parameters.\n\nThe [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") (PDF) of the beta distribution, for 0 ≤ x ≤ 1 {\\\\displaystyle 0\\\\leq x\\\\leq 1} ![{\\\\displaystyle 0\\\\leq x\\\\leq 1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/30810e06ad49f3a837bd2193d4392eda1f74e7ab) or 0 \\< x \\< 1 {\\\\displaystyle 0\\<x\\<1} ![{\\\\displaystyle 0\\<x\\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a440e33e5630b5f22cd3ca24cfdf85f56965ac8f), and shape parameters α {\\\\displaystyle \\\\alpha } ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3), β \\> 0 {\\\\displaystyle \\\\beta \\>0} ![{\\\\displaystyle \\\\beta \\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4a87dc52878418173659e6d0ff8e77ab2897eac9), is a [power function](https:\/\/en.wikipedia.org\/wiki\/Power_function \"Power function\") of the variable x {\\\\displaystyle x} ![{\\\\displaystyle x}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/87f9e315fd7e2ba406057a97300593c4802b53e4) and of its [reflection](https:\/\/en.wikipedia.org\/wiki\/Reflection_formula \"Reflection formula\") ( 1 − x ) {\\\\displaystyle (1-x)} ![{\\\\displaystyle (1-x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0539aebb17a9b292910a1a750233eed6c569cd00) as follows:\n\nf ( x ; α , β ) \\= c o n s t a n t ⋅ x α − 1 ( 1 − x ) β − 1 \\= x α − 1 ( 1 − x ) β − 1 ∫ 0 1 u α − 1 ( 1 − u ) β − 1 d u \\= Γ ( α \\+ β ) Γ ( α ) Γ ( β ) x α − 1 ( 1 − x ) β − 1 \\= 1 B ( α , β ) x α − 1 ( 1 − x ) β − 1 {\\\\displaystyle {\\\\begin{aligned}f(x;\\\\alpha ,\\\\beta )&=\\\\mathrm {constant} \\\\cdot x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\\\\\\\[3pt\\]&={\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\displaystyle \\\\int \\_{0}^{1}u^{\\\\alpha -1}(1-u)^{\\\\beta -1}\\\\,du}}\\\\\\\\\\[6pt\\]&={\\\\frac {\\\\Gamma (\\\\alpha +\\\\beta )}{\\\\Gamma (\\\\alpha )\\\\Gamma (\\\\beta )}}\\\\,x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\\\\\\\[6pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}f(x;\\\\alpha ,\\\\beta )&=\\\\mathrm {constant} \\\\cdot x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\\\\\\\[3pt\\]&={\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\displaystyle \\\\int \\_{0}^{1}u^{\\\\alpha -1}(1-u)^{\\\\beta -1}\\\\,du}}\\\\\\\\\\[6pt\\]&={\\\\frac {\\\\Gamma (\\\\alpha +\\\\beta )}{\\\\Gamma (\\\\alpha )\\\\Gamma (\\\\beta )}}\\\\,x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\\\\\\\[6pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5fc18388353b219c482e8e35ca4aae808ab1be81)\n\nwhere Γ ( z ) {\\\\displaystyle \\\\Gamma (z)} ![{\\\\displaystyle \\\\Gamma (z)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/11ca17f880240539116aac7e6326909299e2a080) is the [gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"). The [beta function](https:\/\/en.wikipedia.org\/wiki\/Beta_function \"Beta function\"), B {\\\\displaystyle \\\\mathrm {B} } ![{\\\\displaystyle \\\\mathrm {B} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/93003d072991ba424a73ed1e081afe55c124b6ce), is a [normalization constant](https:\/\/en.wikipedia.org\/wiki\/Normalization_constant \"Normalization constant\") to ensure that the total probability is 1. In the above equations x {\\\\displaystyle x} ![{\\\\displaystyle x}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/87f9e315fd7e2ba406057a97300593c4802b53e4) is a [realization](https:\/\/en.wikipedia.org\/wiki\/Realization_\\(probability\\) \"Realization (probability)\")—an observed value that actually occurred—of a [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") X {\\\\displaystyle X} ![{\\\\displaystyle X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68baa052181f707c662844a465bfeeb135e82bab).\n\nSeveral authors, including [N. L. Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S. Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\"),[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) use the symbols p {\\\\displaystyle p} ![{\\\\displaystyle p}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/81eac1e205430d1f40810df36a0edffdc367af36) and q {\\\\displaystyle q} ![{\\\\displaystyle q}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/06809d64fa7c817ffc7e323f85997f783dbdf71d) (instead of α {\\\\displaystyle \\\\alpha } ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β {\\\\displaystyle \\\\beta } ![{\\\\displaystyle \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ed48a5e36207156fb792fa79d29925d2f7901e8)) for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters α {\\\\displaystyle \\\\alpha } ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β {\\\\displaystyle \\\\beta } ![{\\\\displaystyle \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ed48a5e36207156fb792fa79d29925d2f7901e8) approach zero.\n\nIn the following, a random variable X {\\\\displaystyle X} ![{\\\\displaystyle X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68baa052181f707c662844a465bfeeb135e82bab) beta-distributed with parameters α {\\\\displaystyle \\\\alpha } ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β {\\\\displaystyle \\\\beta } ![{\\\\displaystyle \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ed48a5e36207156fb792fa79d29925d2f7901e8) will be denoted by:[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2)[\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2011-3)\n\nX ∼ Beta ⁡ ( α , β ) {\\\\displaystyle X\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta )} ![{\\\\displaystyle X\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/36783d6420752d49ce41b434457741100627c50a)\n\nOther notations for beta-distributed random variables used in the statistical literature are X ∼ B e ( α , β ) {\\\\displaystyle X\\\\sim {\\\\mathcal {B}}e(\\\\alpha ,\\\\beta )} ![{\\\\displaystyle X\\\\sim {\\\\mathcal {B}}e(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/56a8faad7cb5575778caa99bdb3a393dbc22f42a)[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BergerDecisionTheory-4) and X ∼ β α , β {\\\\displaystyle X\\\\sim \\\\beta \\_{\\\\alpha ,\\\\beta }} ![{\\\\displaystyle X\\\\sim \\\\beta \\_{\\\\alpha ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cda49e80784b8a7df54ac2a81b26c67d9a01831a).[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Feller-5)\n\n### Cumulative distribution function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=3 \"Edit section: Cumulative distribution function\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/d3\/CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg\/250px-CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg)\n\nCDF for symmetric beta distribution vs. *x* and *α* = *β*\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/9\/9f\/CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg\/250px-CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg)\n\nCDF for skewed beta distribution vs. *x* and *β* = 5*α*\n\nThe [cumulative distribution function](https:\/\/en.wikipedia.org\/wiki\/Cumulative_distribution_function \"Cumulative distribution function\") is\n\nF ( x ; α , β ) \\= B ( x ; α , β ) B ( α , β ) \\= I x ( α , β ) {\\\\displaystyle F(x;\\\\alpha ,\\\\beta )={\\\\frac {\\\\mathrm {B} {}(x;\\\\alpha ,\\\\beta )}{\\\\mathrm {B} {}(\\\\alpha ,\\\\beta )}}=I\\_{x}(\\\\alpha ,\\\\beta )} ![{\\\\displaystyle F(x;\\\\alpha ,\\\\beta )={\\\\frac {\\\\mathrm {B} {}(x;\\\\alpha ,\\\\beta )}{\\\\mathrm {B} {}(\\\\alpha ,\\\\beta )}}=I\\_{x}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4ef58bb8473944bfb8efa7f5477fb7201d39ac21)\n\nwhere B ( x ; α , β ) {\\\\displaystyle \\\\mathrm {B} (x;\\\\alpha ,\\\\beta )} ![{\\\\displaystyle \\\\mathrm {B} (x;\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/95174df7b06d48c98cf8c754e2964784f71f1530) is the [incomplete beta function](https:\/\/en.wikipedia.org\/wiki\/Beta_function#Incomplete_beta_function \"Beta function\") and I x ( α , β ) {\\\\displaystyle I\\_{x}(\\\\alpha ,\\\\beta )} ![{\\\\displaystyle I\\_{x}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/663054c3f3dc36c0c8c445386d9e52aea7e26b07) is the [regularized incomplete beta function](https:\/\/en.wikipedia.org\/wiki\/Regularized_incomplete_beta_function \"Regularized incomplete beta function\").\n\nFor positive integers *α* and *β*, the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a [binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") with[\\[6\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-6)\n\nF beta ( x ; α , β ) \\= F binomial ( β − 1 ; α \\+ β − 1 , 1 − x ) . {\\\\displaystyle F\\_{\\\\text{beta}}(x;\\\\alpha ,\\\\beta )=F\\_{\\\\text{binomial}}(\\\\beta -1;\\\\alpha +\\\\beta -1,1-x).} ![{\\\\displaystyle F\\_{\\\\text{beta}}(x;\\\\alpha ,\\\\beta )=F\\_{\\\\text{binomial}}(\\\\beta -1;\\\\alpha +\\\\beta -1,1-x).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/38f787a104f98d5db23c517a28454d8fa997c134)\n\n### Alternative parameterizations\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=4 \"Edit section: Alternative parameterizations\")\\]\n\n#### Two parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=5 \"Edit section: Two parameters\")\\]\n\n##### Mean and sample size\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=6 \"Edit section: Mean and sample size\")\\]\n\nThe beta distribution may also be reparameterized in terms of its mean *μ* (0 \\< *μ* \\< 1) and the sum of the two shape parameters *ν* = *α* + *β* \\> 0([\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2011-3) p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = *ν* = *α*·Posterior + *β*·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = *α*·Posterior + *β* Posterior − 2, or *ν* = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Bayesian_inference) for further details.) *ν* = *α* + *β* is referred to as the \"sample size\" of a beta distribution, but one should remember that it is, strictly speaking, the \"sample size\" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem.\n\nThis parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ *θ* ≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters *α* and *β* via[\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2011-3)\n\n*α* = *μν*, *β* = (1 − *μ*)*ν*\n\nUnder this [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\"), one may place an [uninformative prior](https:\/\/en.wikipedia.org\/wiki\/Uninformative_prior \"Uninformative prior\") probability over the mean, and a vague prior probability (such as an [exponential](https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution \"Exponential distribution\") or [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\")) over the positive reals for the sample size, if they are independent, and prior data and\/or beliefs justify it.\n\n##### Mode and concentration\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=7 \"Edit section: Mode and concentration\")\\]\n\n[Concave](https:\/\/en.wikipedia.org\/wiki\/Concave_function \"Concave function\") beta distributions, which have α , β \\> 1 {\\\\displaystyle \\\\alpha ,\\\\beta \\>1} ![{\\\\displaystyle \\\\alpha ,\\\\beta \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc3f33fc553c096bb6e12987a13ab58edef863b6), can be parametrized in terms of mode and \"concentration\". The mode, ω \\= α − 1 α \\+ β − 2 {\\\\displaystyle \\\\omega ={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}} ![{\\\\displaystyle \\\\omega ={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3af2c8c28b40e585a0ee03663cd5e9a0a9663303), and concentration, κ \\= α \\+ β {\\\\displaystyle \\\\kappa =\\\\alpha +\\\\beta } ![{\\\\displaystyle \\\\kappa =\\\\alpha +\\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/083a5361b63ff338f0f521f743abe6c497967dbb), can be used to define the usual shape parameters as follows:[\\[7\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2015-7) α \\= ω ( κ − 2 ) \\+ 1 β \\= ( 1 − ω ) ( κ − 2 ) \\+ 1 {\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\omega (\\\\kappa -2)+1\\\\\\\\\\\\beta &=(1-\\\\omega )(\\\\kappa -2)+1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\omega (\\\\kappa -2)+1\\\\\\\\\\\\beta &=(1-\\\\omega )(\\\\kappa -2)+1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2f0609418607d04de71bfcf1aaccbe653ebc97dd) For the mode, 0 \\< ω \\< 1 {\\\\displaystyle 0\\<\\\\omega \\<1} ![{\\\\displaystyle 0\\<\\\\omega \\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/31c81e865b4e001c098ca75a143060ec563cfbf3), to be well-defined, we need α , β \\> 1 {\\\\displaystyle \\\\alpha ,\\\\beta \\>1} ![{\\\\displaystyle \\\\alpha ,\\\\beta \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc3f33fc553c096bb6e12987a13ab58edef863b6), or equivalently κ \\> 2 {\\\\displaystyle \\\\kappa \\>2} ![{\\\\displaystyle \\\\kappa \\>2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4e53f495cab81087d61f1a1efc9e5bbbb91e3632). If instead we define the concentration as c \\= α \\+ β − 2 {\\\\displaystyle c=\\\\alpha +\\\\beta -2} ![{\\\\displaystyle c=\\\\alpha +\\\\beta -2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c8ddb6cd3165b37fc58796c628e74e2b8028c57c), the condition simplifies to c \\> 0 {\\\\displaystyle c\\>0} ![{\\\\displaystyle c\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2ba126f626d61752f62eaacaf11761a54de4dc84) and the beta density at α \\= 1 \\+ c ω {\\\\displaystyle \\\\alpha =1+c\\\\omega } ![{\\\\displaystyle \\\\alpha =1+c\\\\omega }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba7c36ac271ae5f8221d66e2ad723511bd7e8c97) and β \\= 1 \\+ c ( 1 − ω ) {\\\\displaystyle \\\\beta =1+c(1-\\\\omega )} ![{\\\\displaystyle \\\\beta =1+c(1-\\\\omega )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f2b5f8667e2d553e1b907808d7ba1a77f455c81f) can be written as: f ( x ; ω , c ) \\= x c ω ( 1 − x ) c ( 1 − ω ) B ( 1 \\+ c ω , 1 \\+ c ( 1 − ω ) ) {\\\\displaystyle f(x;\\\\omega ,c)={\\\\frac {x^{c\\\\omega }(1-x)^{c(1-\\\\omega )}}{\\\\mathrm {B} {\\\\bigl (}1+c\\\\omega ,1+c(1-\\\\omega ){\\\\bigr )}}}} ![{\\\\displaystyle f(x;\\\\omega ,c)={\\\\frac {x^{c\\\\omega }(1-x)^{c(1-\\\\omega )}}{\\\\mathrm {B} {\\\\bigl (}1+c\\\\omega ,1+c(1-\\\\omega ){\\\\bigr )}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2d3157f6c9c92fb94a2058c8ee7a01b607676c9c) where c {\\\\displaystyle c} ![{\\\\displaystyle c}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/86a67b81c2de995bd608d5b2df50cd8cd7d92455) directly scales the [sufficient statistics](https:\/\/en.wikipedia.org\/wiki\/Sufficient_statistics \"Sufficient statistics\"), log ⁡ ( x ) {\\\\displaystyle \\\\log(x)} ![{\\\\displaystyle \\\\log(x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4157d3b51ac7b147fca145d431d58ec92abc1f70) and log ⁡ ( 1 − x ) {\\\\displaystyle \\\\log(1-x)} ![{\\\\displaystyle \\\\log(1-x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d88b5d2afd979fe80e61f8c9c217d7801c810160). Note also that in the limit, c → 0 {\\\\displaystyle c\\\\to 0} ![{\\\\displaystyle c\\\\to 0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/daa7595054d0a6c13cd4431f85cb517c857a2109), the distribution becomes flat.\n\n##### Mean and variance\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=8 \"Edit section: Mean and variance\")\\]\n\nSolving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters *α* and *β*, one can express the *α* and *β* parameters in terms of the mean (*μ*) and the variance (var):\n\nν \\= α \\+ β \\= μ ( 1 − μ ) v a r − 1 , where ν \\= ( α \\+ β ) \\> 0 , therefore: var \\< μ ( 1 − μ ) α \\= μ ν \\= μ ( μ ( 1 − μ ) var − 1 ) , if var \\< μ ( 1 − μ ) β \\= ( 1 − μ ) ν \\= ( 1 − μ ) ( μ ( 1 − μ ) var − 1 ) , if var \\< μ ( 1 − μ ) . {\\\\displaystyle {\\\\begin{aligned}\\\\nu &=\\\\alpha +\\\\beta ={\\\\frac {\\\\mu (1-\\\\mu )}{\\\\mathrm {var} }}-1,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,{\\\\text{ therefore: }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )\\\\\\\\\\\\alpha &=\\\\mu \\\\nu =\\\\mu \\\\left({\\\\frac {\\\\mu (1-\\\\mu )}{\\\\text{var}}}-1\\\\right),{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu =(1-\\\\mu )\\\\left({\\\\frac {\\\\mu (1-\\\\mu )}{\\\\text{var}}}-1\\\\right),{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu ).\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\nu &=\\\\alpha +\\\\beta ={\\\\frac {\\\\mu (1-\\\\mu )}{\\\\mathrm {var} }}-1,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,{\\\\text{ therefore: }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )\\\\\\\\\\\\alpha &=\\\\mu \\\\nu =\\\\mu \\\\left({\\\\frac {\\\\mu (1-\\\\mu )}{\\\\text{var}}}-1\\\\right),{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu =(1-\\\\mu )\\\\left({\\\\frac {\\\\mu (1-\\\\mu )}{\\\\text{var}}}-1\\\\right),{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a68ffe42cadd81968db6bc0e75ee9c8b739c2416)\n\nThis [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters *α* and *β*. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance:\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/68\/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg\/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c2\/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg\/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/a4\/Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/9\/95\/Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/01\/Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg\/330px-Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/66\/Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg\/330px-Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/02\/Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/8f\/Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg)\n\n#### Four parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=9 \"Edit section: Four parameters\")\\]\n\nA beta distribution with the two shape parameters *α* and *β* is supported on the range \\[0,1\\] or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, *a*, and maximum *c* (*c* \\> *a*), values of the distribution,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) by a linear transformation substituting the non-dimensional variable *x* in terms of the new variable *y* (with support \\[*a*,*c*\\] or (*a*,*c*)) and the parameters *a* and *c*:\n\ny \\= x ( c − a ) \\+ a , therefore x \\= y − a c − a . {\\\\displaystyle y=x(c-a)+a,{\\\\text{ therefore }}x={\\\\frac {y-a}{c-a}}.} ![{\\\\displaystyle y=x(c-a)+a,{\\\\text{ therefore }}x={\\\\frac {y-a}{c-a}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8570165db81ac45831ccffe1b07b63fb5292fecc)\n\nThe [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (*c* − *a*), (so that the total area under the density curve equals a probability of one), and with the \"y\" variable shifted and scaled as follows: f ( y ; α , β , a , c ) \\= f ( x ; α , β ) c − a \\= ( y − a c − a ) α − 1 ( c − y c − a ) β − 1 ( c − a ) B ( α , β ) \\= ( y − a ) α − 1 ( c − y ) β − 1 ( c − a ) α \\+ β − 1 B ( α , β ) . {\\\\displaystyle {\\\\begin{aligned}f(y;\\\\alpha ,\\\\beta ,a,c)={\\\\frac {f(x;\\\\alpha ,\\\\beta )}{c-a}}&={\\\\frac {\\\\left({\\\\frac {y-a}{c-a}}\\\\right)^{\\\\alpha -1}\\\\left({\\\\frac {c-y}{c-a}}\\\\right)^{\\\\beta -1}}{(c-a)B(\\\\alpha ,\\\\beta )}}\\\\\\\\\\[1ex\\]&={\\\\frac {(y-a)^{\\\\alpha -1}(c-y)^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}B(\\\\alpha ,\\\\beta )}}.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}f(y;\\\\alpha ,\\\\beta ,a,c)={\\\\frac {f(x;\\\\alpha ,\\\\beta )}{c-a}}&={\\\\frac {\\\\left({\\\\frac {y-a}{c-a}}\\\\right)^{\\\\alpha -1}\\\\left({\\\\frac {c-y}{c-a}}\\\\right)^{\\\\beta -1}}{(c-a)B(\\\\alpha ,\\\\beta )}}\\\\\\\\\\[1ex\\]&={\\\\frac {(y-a)^{\\\\alpha -1}(c-y)^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}B(\\\\alpha ,\\\\beta )}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ebfbb9c4da37593762747522d2d91a4ca72e0011)\n\nThat a random variable *Y* is beta-distributed with four parameters *α*, *β*, *a*, and *c* will be denoted by:\n\nY ∼ Beta ⁡ ( α , β , a , c ) . {\\\\displaystyle Y\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta ,a,c).} ![{\\\\displaystyle Y\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta ,a,c).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/085ce268909196f9aff5e61f3dca6e308bdb4797)\n\nSome measures of central location are scaled (by (*c* − *a*)) and shifted (by *a*), as follows:\n\nμ Y \\= μ X ( c − a ) \\+ a \\= α α \\+ β ( c − a ) \\+ a \\= α c \\+ β a α \\+ β {\\\\displaystyle {\\\\begin{aligned}\\\\mu \\_{Y}&=\\\\mu \\_{X}(c-a)+a\\\\\\\\\\[1ex\\]&={\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\left(c-a\\\\right)+a={\\\\frac {\\\\alpha c+\\\\beta a}{\\\\alpha +\\\\beta }}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\mu \\_{Y}&=\\\\mu \\_{X}(c-a)+a\\\\\\\\\\[1ex\\]&={\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\left(c-a\\\\right)+a={\\\\frac {\\\\alpha c+\\\\beta a}{\\\\alpha +\\\\beta }}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/86a8b4fe30b5075b8c038d5b5b3e1f6ee8e5963f)\n\nmode ( Y ) \\= mode ( X ) ( c − a ) \\+ a \\= α − 1 α \\+ β − 2 ( c − a ) \\+ a \\= ( α − 1 ) c \\+ ( β − 1 ) a α \\+ β − 2 , if α , β \\> 1 {\\\\displaystyle {\\\\begin{aligned}{\\\\text{mode}}(Y)&={\\\\text{mode}}(X)(c-a)+a\\\\\\\\\\[1ex\\]&={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}\\\\left(c-a\\\\right)+a\\\\\\\\\\[1ex\\]&={\\\\frac {(\\\\alpha -1)c+(\\\\beta -1)a}{\\\\alpha +\\\\beta -2}}\\\\ ,&{\\\\text{ if }}\\\\alpha ,\\\\,\\\\beta \\>1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{mode}}(Y)&={\\\\text{mode}}(X)(c-a)+a\\\\\\\\\\[1ex\\]&={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}\\\\left(c-a\\\\right)+a\\\\\\\\\\[1ex\\]&={\\\\frac {(\\\\alpha -1)c+(\\\\beta -1)a}{\\\\alpha +\\\\beta -2}}\\\\ ,&{\\\\text{ if }}\\\\alpha ,\\\\,\\\\beta \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/768c42362dbb2d2904c218dcfc6df1de62b5f635)\n\nmedian ( Y ) \\= median ( X ) ( c − a ) \\+ a \\= I 1 2 \\[ − 1 \\] ( α , β ) ( c − a ) \\+ a {\\\\displaystyle {\\\\begin{aligned}{\\\\text{median}}(Y)&={\\\\text{median}}(X)(c-a)+a\\\\\\\\\\[1ex\\]&=I\\_{\\\\frac {1}{2}}^{\\[-1\\]}(\\\\alpha ,\\\\beta )\\\\left(c-a\\\\right)+a\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{median}}(Y)&={\\\\text{median}}(X)(c-a)+a\\\\\\\\\\[1ex\\]&=I\\_{\\\\frac {1}{2}}^{\\[-1\\]}(\\\\alpha ,\\\\beta )\\\\left(c-a\\\\right)+a\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9b039773a1da5743c1e27109447176e3cfe1925e)\n\nNote: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can.\n\nThe shape parameters of *Y* can be written in term of its mean and variance as\n\nα \\= ( a − μ Y ) ( a c − a μ Y − c μ Y \\+ μ Y 2 \\+ σ Y 2 ) σ Y 2 ( c − a ) β \\= − ( c − μ Y ) ( a c − a μ Y − c μ Y \\+ μ Y 2 \\+ σ Y 2 ) σ Y 2 ( c − a ) {\\\\displaystyle {\\\\begin{aligned}\\\\alpha &={\\\\frac {\\\\left(a-\\\\mu \\_{Y}\\\\right)\\\\left(a\\\\,c-a\\\\,\\\\mu \\_{Y}-c\\\\,\\\\mu \\_{Y}+\\\\mu \\_{Y}^{2}+\\\\sigma \\_{Y}^{2}\\\\right)}{\\\\sigma \\_{Y}^{2}(c-a)}}\\\\\\\\\\\\beta &=-{\\\\frac {\\\\left(c-\\\\mu \\_{Y}\\\\right)\\\\left(a\\\\,c-a\\\\,\\\\mu \\_{Y}-c\\\\,\\\\mu \\_{Y}+\\\\mu \\_{Y}^{2}+\\\\sigma \\_{Y}^{2}\\\\right)}{\\\\sigma \\_{Y}^{2}(c-a)}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &={\\\\frac {\\\\left(a-\\\\mu \\_{Y}\\\\right)\\\\left(a\\\\,c-a\\\\,\\\\mu \\_{Y}-c\\\\,\\\\mu \\_{Y}+\\\\mu \\_{Y}^{2}+\\\\sigma \\_{Y}^{2}\\\\right)}{\\\\sigma \\_{Y}^{2}(c-a)}}\\\\\\\\\\\\beta &=-{\\\\frac {\\\\left(c-\\\\mu \\_{Y}\\\\right)\\\\left(a\\\\,c-a\\\\,\\\\mu \\_{Y}-c\\\\,\\\\mu \\_{Y}+\\\\mu \\_{Y}^{2}+\\\\sigma \\_{Y}^{2}\\\\right)}{\\\\sigma \\_{Y}^{2}(c-a)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c05caf453d32d506f7290c48a091d9c48ddd4250)\n\nThe statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (*c* − *a*), linearly for the mean deviation and nonlinearly for the variance:\n\n(mean deviation around mean) ( Y ) \\= ( (mean deviation around mean) ( X ) ) ( c − a ) \\= 2 α α β β B ( α , β ) ( α \\+ β ) α \\+ β \\+ 1 ( c − a ) {\\\\displaystyle {\\\\begin{aligned}&{\\\\text{(mean deviation around mean)}}(Y)\\\\\\\\\\[1ex\\]&=({\\\\text{(mean deviation around mean)}}(X))(c-a)\\\\\\\\&={\\\\frac {2\\\\alpha ^{\\\\alpha }\\\\beta ^{\\\\beta }}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )(\\\\alpha +\\\\beta )^{\\\\alpha +\\\\beta +1}}}(c-a)\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&{\\\\text{(mean deviation around mean)}}(Y)\\\\\\\\\\[1ex\\]&=({\\\\text{(mean deviation around mean)}}(X))(c-a)\\\\\\\\&={\\\\frac {2\\\\alpha ^{\\\\alpha }\\\\beta ^{\\\\beta }}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )(\\\\alpha +\\\\beta )^{\\\\alpha +\\\\beta +1}}}(c-a)\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/769fc1182a10805b989db4ae5c207769240dc3b5) var ( Y ) \\= var ( X ) ( c − a ) 2 \\= α β ( c − a ) 2 ( α \\+ β ) 2 ( α \\+ β \\+ 1 ) . {\\\\displaystyle {\\\\text{var}}(Y)={\\\\text{var}}(X)(c-a)^{2}={\\\\frac {\\\\alpha \\\\beta (c-a)^{2}}{(\\\\alpha +\\\\beta )^{2}(\\\\alpha +\\\\beta +1)}}.} ![{\\\\displaystyle {\\\\text{var}}(Y)={\\\\text{var}}(X)(c-a)^{2}={\\\\frac {\\\\alpha \\\\beta (c-a)^{2}}{(\\\\alpha +\\\\beta )^{2}(\\\\alpha +\\\\beta +1)}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/211065eb20fc7ce222957c21a73d8bd8a5906a5a)\n\nSince the [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") are non-dimensional quantities (as [moments](https:\/\/en.wikipedia.org\/wiki\/Moment_\\(mathematics\\) \"Moment (mathematics)\") centered on the mean and normalized by the [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\")), they are independent of the parameters *a* and *c*, and therefore equal to the expressions given above in terms of *X* (with support \\[0,1\\] or (0,1)):\n\nskewness ( Y ) \\= skewness ( X ) \\= 2 ( β − α ) α \\+ β \\+ 1 ( α \\+ β \\+ 2 ) α β . {\\\\displaystyle {\\\\text{skewness}}(Y)={\\\\text{skewness}}(X)={\\\\frac {2(\\\\beta -\\\\alpha ){\\\\sqrt {\\\\alpha +\\\\beta +1}}}{(\\\\alpha +\\\\beta +2){\\\\sqrt {\\\\alpha \\\\beta }}}}.} ![{\\\\displaystyle {\\\\text{skewness}}(Y)={\\\\text{skewness}}(X)={\\\\frac {2(\\\\beta -\\\\alpha ){\\\\sqrt {\\\\alpha +\\\\beta +1}}}{(\\\\alpha +\\\\beta +2){\\\\sqrt {\\\\alpha \\\\beta }}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/884dd9bbf9a6dcc30f70bba80abdfcbd1c4a18ed)\n\nkurtosis excess ( Y ) \\= kurtosis excess ( X ) \\= 6 \\[ ( α − β ) 2 ( α \\+ β \\+ 1 ) − α β ( α \\+ β \\+ 2 ) \\] α β ( α \\+ β \\+ 2 ) ( α \\+ β \\+ 3 ) {\\\\displaystyle {\\\\text{kurtosis excess}}(Y)={\\\\text{kurtosis excess}}(X)={\\\\frac {6\\\\left\\[(\\\\alpha -\\\\beta )^{2}(\\\\alpha +\\\\beta +1)-\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)\\\\right\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}} ![{\\\\displaystyle {\\\\text{kurtosis excess}}(Y)={\\\\text{kurtosis excess}}(X)={\\\\frac {6\\\\left\\[(\\\\alpha -\\\\beta )^{2}(\\\\alpha +\\\\beta +1)-\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)\\\\right\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/76f8b524c994e9cdaf8317555457b4369ab2271e)\n\n## Properties\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=10 \"Edit section: Properties\")\\]\n\n### Measures of central tendency\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=11 \"Edit section: Measures of central tendency\")\\]\n\n#### Mode\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=12 \"Edit section: Mode\")\\]\n\nThe [mode](https:\/\/en.wikipedia.org\/wiki\/Mode_\\(statistics\\) \"Mode (statistics)\") of a beta distributed [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* with *α*, *β* \\> 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\nα − 1 α \\+ β − 2 . {\\\\displaystyle {\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}.} ![{\\\\displaystyle {\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/064149e2700adff9c3fb957a3682577905181336)\n\nWhen both parameters are less than one (*α*, *β* \\< 1), this is the anti-mode: the lowest point of the probability density curve.[\\[8\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Wadsworth-8)\n\nLetting *α* = *β*, the expression for the mode simplifies to 1\/2, showing that for *α* = *β* \\> 1 the mode (resp. anti-mode when *α*, *β* \\< 1), is at the center of the distribution: it is symmetric in those cases. See [Shapes](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Shapes) section in this article for a full list of mode cases, for arbitrary values of *α* and *β*. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of *α* = 2, *β* = 1 (or *α* = 1, *β* = 2), the density function becomes a [right-triangle distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\") which is finite at both ends. In several other cases there is a [singularity](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") at one end, where the value of the density function approaches infinity. For example, in the case *α* = *β* = 1\/2, the beta distribution simplifies to become the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\"). There is debate among mathematicians about some of these cases and whether the ends (*x* = 0, and *x* = 1) can be called *modes* or not.[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/5\/56\/Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)\n\nMode for beta distribution for 1 ≤ *α* ≤ 5 and 1 ≤ β ≤ 5\n\n- Whether the ends are part of the [domain](https:\/\/en.wikipedia.org\/wiki\/Domain_of_a_function \"Domain of a function\") of the density function\n- Whether a [singularity](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") can ever be called a *mode*\n- Whether cases with two maxima should be called *bimodal*\n\n#### Median\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=13 \"Edit section: Median\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/42\/Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg\/330px-Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)\n\nMedian for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e9\/%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg\/250px-%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:\\(Mean_-_Median\\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)\n\n(Mean–median) for beta distribution versus alpha and beta from 0 to 2\n\nThe median of the beta distribution is the unique real number x \\= I 1 \/ 2 \\[ − 1 \\] ( α , β ) {\\\\displaystyle x=I\\_{1\/2}^{\\[-1\\]}(\\\\alpha ,\\\\beta )} ![{\\\\displaystyle x=I\\_{1\/2}^{\\[-1\\]}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d7510f94efa49f254eb3924678b527a6fd22d0fc) for which the [regularized incomplete beta function](https:\/\/en.wikipedia.org\/wiki\/Regularized_incomplete_beta_function \"Regularized incomplete beta function\") I x ( α , β ) \\= 1 2 {\\\\displaystyle I\\_{x}(\\\\alpha ,\\\\beta )={\\\\tfrac {1}{2}}} ![{\\\\displaystyle I\\_{x}(\\\\alpha ,\\\\beta )={\\\\tfrac {1}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/80f2c2ef73043b11fddaf3488eb5108dabb78a4c). There is no general [closed-form expression](https:\/\/en.wikipedia.org\/wiki\/Closed-form_expression \"Closed-form expression\") for the [median](https:\/\/en.wikipedia.org\/wiki\/Median \"Median\") of the beta distribution for arbitrary values of *α* and *β*. [Closed-form expressions](https:\/\/en.wikipedia.org\/wiki\/Closed-form_expression \"Closed-form expression\") for particular values of the parameters *α* and *β* follow:\\[*[citation needed](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Citation_needed \"Wikipedia:Citation needed\")*\\]\n\n- For symmetric cases *α* = *β*, median = 1\/2.\n- For *α* = 1 and *β* \\> 0, median\n  \\=\n  \n  1\n  \n  −\n  \n  2\n  \n  −\n  \n  1\n  \n  \/\n  \n  β\n  \n  {\\\\displaystyle =1-2^{-1\/\\\\beta }}\n  \n  ![{\\\\displaystyle =1-2^{-1\/\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6fa00571487d3351617d6e32c47662994f9f55b9)\n  (this case is the [mirror-image](https:\/\/en.wikipedia.org\/wiki\/Mirror_image \"Mirror image\") of the [power function distribution](https:\/\/en.wikipedia.org\/w\/index.php?title=Power_function_distribution&action=edit&redlink=1 \"Power function distribution (page does not exist)\"))\n- For *α* \\> 0 and *β* = 1, median =\n  2\n  \n  −\n  \n  1\n  \n  \/\n  \n  α\n  \n  {\\\\displaystyle 2^{-1\/\\\\alpha }}\n  \n  ![{\\\\displaystyle 2^{-1\/\\\\alpha }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e980502f2881b1aef6067b05e7c704834bf00297)\n  (this case is the power function distribution[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9))\n- For *α* = 3 and *β* = 2, median = 0.6142724318676105..., the real solution to the [quartic equation](https:\/\/en.wikipedia.org\/wiki\/Quartic_function \"Quartic function\") 1 − 8*x*3 + 6*x*4 = 0, which lies in \\[0,1\\].\n- For *α* = 2 and *β* = 3, median = 0.38572756813238945... = 1−median(Beta(3, 2))\n\nThe following are the limits with one parameter finite (non-zero) and the other approaching these limits:\\[*[citation needed](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Citation_needed \"Wikipedia:Citation needed\")*\\]\n\nlim β → 0 median \\= lim α → ∞ median \\= 1 , lim α → 0 median \\= lim β → ∞ median \\= 0\\. {\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}{\\\\text{median}}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{median}}=1,\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\text{median}}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{median}}=0.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}{\\\\text{median}}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{median}}=1,\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\text{median}}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{median}}=0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cfb9ecb081e25c8eb5633c4e1ad99a00924d9f96)\n\nA reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kerman2011-10)\n\nmedian ≈ α − 1 3 α \\+ β − 2 3 for α , β ≥ 1\\. {\\\\displaystyle {\\\\text{median}}\\\\approx {\\\\frac {\\\\alpha -{\\\\tfrac {1}{3}}}{\\\\alpha +\\\\beta -{\\\\tfrac {2}{3}}}}{\\\\text{ for }}\\\\alpha ,\\\\beta \\\\geq 1.} ![{\\\\displaystyle {\\\\text{median}}\\\\approx {\\\\frac {\\\\alpha -{\\\\tfrac {1}{3}}}{\\\\alpha +\\\\beta -{\\\\tfrac {2}{3}}}}{\\\\text{ for }}\\\\alpha ,\\\\beta \\\\geq 1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/90cd0f42e583dd11e7add651ef1851597d88f184)\n\nWhen *α*, *β* ≥ 1, the [relative error](https:\/\/en.wikipedia.org\/wiki\/Relative_error \"Relative error\") (the [absolute error](https:\/\/en.wikipedia.org\/wiki\/Approximation_error \"Approximation error\") divided by the median) in this approximation is less than 4% and for both *α* ≥ 2 and *β* ≥ 2 it is less than 1%. The [absolute error](https:\/\/en.wikipedia.org\/wiki\/Approximation_error \"Approximation error\") divided by the difference between the mean and the mode is similarly small:\n\n[![Abs\\[(Median-Appr.)\/Median\\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/af\/Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg \"Abs[(Median-Appr.)\/Median] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5\")[![Abs\\[(Median-Appr.)\/(Mean-Mode)\\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e8\/Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg \"Abs[(Median-Appr.)\/(Mean-Mode)] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5\")\n\n#### Mean\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=14 \"Edit section: Mean\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/12\/Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg\/330px-Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)\n\nMean for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5\n\nThe [expected value](https:\/\/en.wikipedia.org\/wiki\/Expected_value \"Expected value\") (mean) (*μ*) of a beta distribution [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* with two parameters *α* and *β* is a function of only the ratio *β*\/*α* of these parameters:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\nμ \\= E ⁡ \\[ X \\] \\= ∫ 0 1 x f ( x ; α , β ) d x \\= ∫ 0 1 x x α − 1 ( 1 − x ) β − 1 B ( α , β ) d x \\= α α \\+ β \\= 1 1 \\+ β α {\\\\displaystyle {\\\\begin{aligned}\\\\mu =\\\\operatorname {E} \\[X\\]&=\\\\int \\_{0}^{1}xf(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\&=\\\\int \\_{0}^{1}x\\\\,{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx\\\\\\\\&={\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\\\\\&={\\\\frac {1}{1+{\\\\frac {\\\\beta }{\\\\alpha }}}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\mu =\\\\operatorname {E} \\[X\\]&=\\\\int \\_{0}^{1}xf(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\&=\\\\int \\_{0}^{1}x\\\\,{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx\\\\\\\\&={\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\\\\\&={\\\\frac {1}{1+{\\\\frac {\\\\beta }{\\\\alpha }}}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e9137834d9d47360ed6c23550c6236fed5fd35f7)\n\nLetting *α* = *β* in the above expression one obtains *μ* = 1\/2, showing that for *α* = *β* the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression:\n\nlim β α → 0 μ \\= 1 lim β α → ∞ μ \\= 0 {\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{{\\\\frac {\\\\beta }{\\\\alpha }}\\\\to 0}\\\\mu =1\\\\\\\\\\\\lim \\_{{\\\\frac {\\\\beta }{\\\\alpha }}\\\\to \\\\infty }\\\\mu =0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{{\\\\frac {\\\\beta }{\\\\alpha }}\\\\to 0}\\\\mu =1\\\\\\\\\\\\lim \\_{{\\\\frac {\\\\beta }{\\\\alpha }}\\\\to \\\\infty }\\\\mu =0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/89775a4dd28774cf29d02e7bc054848f5d617946)\n\nTherefore, for *β*\/*α* → 0, or for *α*\/*β* → ∞, the mean is located at the right end, *x* = 1. For these limit ratios, the beta distribution becomes a one-point [degenerate distribution](https:\/\/en.wikipedia.org\/wiki\/Degenerate_distribution \"Degenerate distribution\") with a [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") spike at the right end, *x* = 1, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, *x* = 1.\n\nSimilarly, for *β*\/*α* → ∞, or for *α*\/*β* → 0, the mean is located at the left end, *x* = 0. The beta distribution becomes a 1-point [Degenerate distribution](https:\/\/en.wikipedia.org\/wiki\/Degenerate_distribution \"Degenerate distribution\") with a [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") spike at the left end, *x* = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, *x* = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\nlim β → 0 μ \\= lim α → ∞ μ \\= 1 lim α → 0 μ \\= lim β → ∞ μ \\= 0 {\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}\\\\mu =\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\mu =1\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}\\\\mu =\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\mu =0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}\\\\mu =\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\mu =1\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}\\\\mu =\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\mu =0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/79321fbb81bc184dbb8f51471b36495d844c7a14)\n\nWhile for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(*α*, *β*) such that *α*, *β* \\> 2) it is known that the sample mean (as an estimate of location) is not as [robust](https:\/\/en.wikipedia.org\/wiki\/Robust_statistics \"Robust statistics\") as the sample median, the opposite is the case for uniform or \"U-shaped\" bimodal distributions (with Beta(*α*, *β*) such that *α*, *β* ≤ 1), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ([\\[11\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-MostellerTukey-11) p. 207) \"the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight.\" By contrast, it follows that the median of \"U-shaped\" bimodal distributions with modes at the edge of the distribution (with Beta(*α*, *β*) such that *α*, *β* ≤ 1) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for [random walks](https:\/\/en.wikipedia.org\/wiki\/Random_walk \"Random walk\"), since the probability for the time of the last visit to the origin in a random walk is distributed as the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") Beta(1\/2, 1\/2):[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Feller-5)[\\[12\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-WillyFeller1-12) the mean of a number of [realizations](https:\/\/en.wikipedia.org\/wiki\/Realization_\\(probability\\) \"Realization (probability)\") of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case).\n\n#### Geometric mean\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=15 \"Edit section: Geometric mean\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/9\/9f\/%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg\/250px-%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:\\(Mean_-_GeometricMean\\)_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)\n\n(Mean − GeometricMean) for beta distribution versus *α* and *β* from 0 to 2, showing the asymmetry between *α* and *β* for the geometric mean\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Geometric_Means_for_Beta_distribution_Purple%3DG\\(X\\),_Yellow%3DG\\(1-X\\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nGeometric means for beta distribution Purple = *G*(*x*), Yellow = *G*(1 − *x*), smaller values *α* and *β* in front\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Geometric_Means_for_Beta_distribution_Purple%3DG\\(X\\),_Yellow%3DG\\(1-X\\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nGeometric means for beta distribution. purple = *G*(*x*), yellow = *G*(1 − *x*), larger values *α* and *β* in front\n\nThe logarithm of the [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") *GX* of a distribution with [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* is the arithmetic mean of ln(*X*), or, equivalently, its expected value:\n\nln ⁡ G X \\= E ⁡ \\[ ln ⁡ X \\] {\\\\displaystyle \\\\ln G\\_{X}=\\\\operatorname {E} \\[\\\\ln X\\]} ![{\\\\displaystyle \\\\ln G\\_{X}=\\\\operatorname {E} \\[\\\\ln X\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/64b67cb73b90bc0e09ba41003b44f84b6e1d3feb)\n\nFor a beta distribution, the expected value integral gives:\n\nE ⁡ \\[ ln ⁡ X \\] \\= ∫ 0 1 ln ⁡ x f ( x ; α , β ) d x \\= ∫ 0 1 ln ⁡ x x α − 1 ( 1 − x ) β − 1 B ( α , β ) d x \\= 1 B ( α , β ) ∫ 0 1 ∂ x α − 1 ( 1 − x ) β − 1 ∂ α d x \\= 1 B ( α , β ) ∂ ∂ α ∫ 0 1 x α − 1 ( 1 − x ) β − 1 d x \\= 1 B ( α , β ) ∂ B ( α , β ) ∂ α \\= ∂ ln ⁡ B ( α , β ) ∂ α \\= ∂ ln ⁡ Γ ( α ) ∂ α − ∂ ln ⁡ Γ ( α \\+ β ) ∂ α \\= ψ ( α ) − ψ ( α \\+ β ) {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\\\ln X\\]&=\\\\int \\_{0}^{1}\\\\ln x\\\\,f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}\\\\ln x\\\\,{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,\\\\int \\_{0}^{1}{\\\\frac {\\\\partial x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\partial \\\\alpha }}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}{\\\\frac {\\\\partial }{\\\\partial \\\\alpha }}\\\\int \\_{0}^{1}x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}{\\\\frac {\\\\partial \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&={\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&={\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\\\ln X\\]&=\\\\int \\_{0}^{1}\\\\ln x\\\\,f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}\\\\ln x\\\\,{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,\\\\int \\_{0}^{1}{\\\\frac {\\\\partial x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\partial \\\\alpha }}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}{\\\\frac {\\\\partial }{\\\\partial \\\\alpha }}\\\\int \\_{0}^{1}x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}{\\\\frac {\\\\partial \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&={\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&={\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cd9db519e08e3c72cd6f9e2f0c90a7c57bdba035)\n\nwhere *ψ* is the [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\").\n\nTherefore, the geometric mean of a beta distribution with shape parameters *α* and *β* is the exponential of the digamma functions of *α* and *β* as follows:\n\nG X \\= e E ⁡ \\[ ln ⁡ X \\] \\= e ψ ( α ) − ψ ( α \\+ β ) {\\\\displaystyle G\\_{X}=e^{\\\\operatorname {E} \\[\\\\ln X\\]}=e^{\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )}} ![{\\\\displaystyle G\\_{X}=e^{\\\\operatorname {E} \\[\\\\ln X\\]}=e^{\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c93ffa7f0155fa3816fcb151c3eb677700aabca2)\n\nWhile for a beta distribution with equal shape parameters *α* = *β*, it follows that skewness = 0 and mode = mean = median = 1\/2, the geometric mean is less than 1\/2: 0 \\< *G**X* \\< 1\/2. The reason for this is that the logarithmic transformation strongly weights the values of *X* close to zero, as ln(*X*) strongly tends towards negative infinity as *X* approaches zero, while ln(*X*) flattens towards zero as *X* → 1.\n\nAlong a line *α* = *β*, the following limits apply:\n\nlim α \\= β → 0 G X \\= 0 lim α \\= β → ∞ G X \\= 1 2 {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}G\\_{X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }G\\_{X}={\\\\tfrac {1}{2}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}G\\_{X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }G\\_{X}={\\\\tfrac {1}{2}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f79aaab766e7ff7eadf78bed8e0ba71401906469)\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\nlim β → 0 G X \\= lim α → ∞ G X \\= 1 lim α → 0 G X \\= lim β → ∞ G X \\= 0 {\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}G\\_{X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }G\\_{X}=1\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}G\\_{X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }G\\_{X}=0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}G\\_{X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }G\\_{X}=1\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}G\\_{X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }G\\_{X}=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/17991f065f9f550de7ce3f62d4f6c0818871b4ff)\n\nThe accompanying plot shows the difference between the mean and the geometric mean for shape parameters *α* and *β* from zero to 2. Besides the fact that the difference between them approaches zero as *α* and *β* approach infinity and that the difference becomes large for values of *α* and *β* approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters *α* and *β*. The difference between the geometric mean and the mean is larger for small values of *α* in relation to *β* than when exchanging the magnitudes of *β* and *α*.\n\n[N. L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S. Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) suggest the logarithmic approximation to the digamma function *ψ*(*α*) ≈ ln(*α* − 1\/2) which results in the following approximation to the geometric mean:\n\nG X ≈ α − 1 2 α \\+ β − 1 2 if α , β \\> 1\\. {\\\\displaystyle G\\_{X}\\\\approx {\\\\frac {\\\\alpha \\\\,-{\\\\frac {1}{2}}}{\\\\alpha +\\\\beta -{\\\\frac {1}{2}}}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.} ![{\\\\displaystyle G\\_{X}\\\\approx {\\\\frac {\\\\alpha \\\\,-{\\\\frac {1}{2}}}{\\\\alpha +\\\\beta -{\\\\frac {1}{2}}}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b99248644aa6d645f217ee91b14fd9dc653c044e)\n\nNumerical values for the [relative error](https:\/\/en.wikipedia.org\/wiki\/Relative_error \"Relative error\") in this approximation follow: \\[(*α* = *β* = 1): 9.39%\\]; \\[(*α* = *β* = 2): 1.29%\\]; \\[(*α* = 2, *β* = 3): 1.51%\\]; \\[(*α* = 3, *β* = 2): 0.44%\\]; \\[(*α* = *β* = 3): 0.51%\\]; \\[(*α* = *β* = 4): 0.26%\\]; \\[(*α* = 3, *β* = 4): 0.55%\\]; \\[(*α* = 4, *β* = 3): 0.24%\\].\n\nSimilarly, one can calculate the value of shape parameters required for the geometric mean to equal 1\/2. Given the value of the parameter *β*, what would be the value of the other parameter, *α*, required for the geometric mean to equal 1\/2?. The answer is that (for *β* \\> 1), the value of *α* required tends towards *β* + 1\/2 as *β* → ∞. For example, all these couples have the same geometric mean of 1\/2: \\[*β* = 1, *α* = 1.4427\\], \\[*β* = 2, *α* = 2.46958\\], \\[*β* = 3, *α* = 3.47943\\], \\[*β* = 4, *α* = 4.48449\\], \\[*β* = 5, *α* = 5.48756\\], \\[*β* = 10, *α* = 10.4938\\], \\[*β* = 100, *α* = 100.499\\].\n\nThe fundamental property of the geometric mean, which can be proven to be false for any other mean, is\n\nG ( X i Y i ) \\= G ( X i ) G ( Y i ) {\\\\displaystyle G{\\\\left({\\\\frac {X\\_{i}}{Y\\_{i}}}\\\\right)}={\\\\frac {G(X\\_{i})}{G(Y\\_{i})}}} ![{\\\\displaystyle G{\\\\left({\\\\frac {X\\_{i}}{Y\\_{i}}}\\\\right)}={\\\\frac {G(X\\_{i})}{G(Y\\_{i})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7f6dfb89e5cbd95bc0d8f931a46f5f7d8643feb9)\n\nThis makes the geometric mean the only correct mean when averaging *normalized* results, that is results that are presented as ratios to reference values.[\\[13\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-13) This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section \"Parameter estimation, maximum likelihood.\" Actually, when performing maximum likelihood estimation, besides the [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") *GX* based on the random variable X, also another geometric mean appears naturally: the [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") based on the linear transformation ––(1 − *X*), the mirror-image of *X*, denoted by *G*(1−*X*):\n\nG 1 − X \\= e E ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= e ψ ( β ) − ψ ( α \\+ β ) {\\\\displaystyle G\\_{1-X}=e^{\\\\operatorname {E} \\[\\\\ln(1-X)\\]}=e^{\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta )}} ![{\\\\displaystyle G\\_{1-X}=e^{\\\\operatorname {E} \\[\\\\ln(1-X)\\]}=e^{\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/58d36e067302e87f85db3f0fb1e2902201e38d76)\n\nAlong a line *α* = *β*, the following limits apply:\n\nlim α \\= β → 0 G 1 − X \\= 0 lim α \\= β → ∞ G 1 − X \\= 1 2 {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}G\\_{1-X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }G\\_{1-X}={\\\\tfrac {1}{2}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}G\\_{1-X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }G\\_{1-X}={\\\\tfrac {1}{2}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/003b2e791a1e220ab169db260dac39b0a180f7b4)\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\nlim β → 0 G ( 1 − X ) \\= lim α → ∞ G ( 1 − X ) \\= 0 lim α → 0 G ( 1 − X ) \\= lim β → ∞ G ( 1 − X ) \\= 1 {\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}G\\_{(1-X)}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }G\\_{(1-X)}=0\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}G\\_{(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }G\\_{(1-X)}=1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}G\\_{(1-X)}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }G\\_{(1-X)}=0\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}G\\_{(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }G\\_{(1-X)}=1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0b3abc831911811340bd23f919c81b72ba4cbdbe)\n\nIt has the following approximate value:\n\nG ( 1 − X ) ≈ β − 1 2 α \\+ β − 1 2 if α , β \\> 1\\. {\\\\displaystyle G\\_{(1-X)}\\\\approx {\\\\frac {\\\\beta -{\\\\frac {1}{2}}}{\\\\alpha +\\\\beta -{\\\\frac {1}{2}}}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.} ![{\\\\displaystyle G\\_{(1-X)}\\\\approx {\\\\frac {\\\\beta -{\\\\frac {1}{2}}}{\\\\alpha +\\\\beta -{\\\\frac {1}{2}}}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/40d7459baac164c2fedfad9dd8316320553e3d10)\n\nAlthough both *G**X* and *G*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the geometric means are equal: *G**X* = *G*(1−*X*). This equality follows from the following symmetry displayed between both geometric means:\n\nG X ( B ( α , β ) ) \\= G 1 − X ( B ( β , α ) ) . {\\\\displaystyle G\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=G\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )).} ![{\\\\displaystyle G\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=G\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4153dc018580179899eb2d817b44418862eb0922)\n\n#### Harmonic mean\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=16 \"Edit section: Harmonic mean\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/8d\/Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg\/250px-Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)\n\nHarmonic mean for beta distribution for 0 \\< *α* \\< 5 and 0 \\< *β* \\< 5\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c0\/%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg\/250px-%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:\\(Mean_-_HarmonicMean\\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)\n\nHarmonic mean for beta distribution versus *α* and *β* from 0 to 2\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/32\/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\\(X\\),_Yellow%3DH\\(1-X\\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nHarmonic means for beta distribution Purple = *H*(*X*), Yellow = *H*(1 − *X*), smaller values *α* and *β* in front\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/00\/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\\(X\\),_Yellow%3DH\\(1-X\\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nHarmonic means for beta distribution: purple = *H*(*X*), yellow = *H*(1 − *X*), larger values *α* and *β* in front\n\nThe inverse of the [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*HX*) of a distribution with [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* is the arithmetic mean of 1\/*X*, or, equivalently, its expected value. Therefore, the [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*HX*) of a beta distribution with shape parameters *α* and *β* is:\n\nH X \\= 1 E ⁡ \\[ 1 X \\] \\= 1 ∫ 0 1 f ( x ; α , β ) x d x \\= 1 ∫ 0 1 x α − 1 ( 1 − x ) β − 1 x B ( α , β ) d x \\= α − 1 α \\+ β − 1 if α \\> 1 and β \\> 0 {\\\\displaystyle {\\\\begin{aligned}H\\_{X}&={\\\\frac {1}{\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]}}\\\\\\\\&={\\\\frac {1}{\\\\int \\_{0}^{1}{\\\\frac {f(x;\\\\alpha ,\\\\beta )}{x}}\\\\,dx}}\\\\\\\\&={\\\\frac {1}{\\\\int \\_{0}^{1}{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{x\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx}}\\\\\\\\&={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -1}}{\\\\text{ if }}\\\\alpha \\>1{\\\\text{ and }}\\\\beta \\>0\\\\\\\\\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}H\\_{X}&={\\\\frac {1}{\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]}}\\\\\\\\&={\\\\frac {1}{\\\\int \\_{0}^{1}{\\\\frac {f(x;\\\\alpha ,\\\\beta )}{x}}\\\\,dx}}\\\\\\\\&={\\\\frac {1}{\\\\int \\_{0}^{1}{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{x\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx}}\\\\\\\\&={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -1}}{\\\\text{ if }}\\\\alpha \\>1{\\\\text{ and }}\\\\beta \\>0\\\\\\\\\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ed7d99dd7493b9c085cd5d407861730e2a2abf6c)\n\nThe [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*HX*) of a beta distribution with *α* \\< 1 is undefined, because its defining expression is not bounded in \\[0, 1\\] for shape parameter *α* less than unity.\n\nLetting *α* = *β* in the above expression one obtains\n\nH X \\= α − 1 2 α − 1 , {\\\\displaystyle H\\_{X}={\\\\frac {\\\\alpha -1}{2\\\\alpha -1}},} ![{\\\\displaystyle H\\_{X}={\\\\frac {\\\\alpha -1}{2\\\\alpha -1}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0748b4780dc8ea57db97db149af39c2c37fcd769)\n\nshowing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1\/2, for *α* = *β* → ∞.\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\nlim α → 0 H X is undefined lim α → 1 H X \\= lim β → ∞ H X \\= 0 lim β → 0 H X \\= lim α → ∞ H X \\= 1 {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}H\\_{X}{\\\\text{ is undefined}}\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 1}H\\_{X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }H\\_{X}=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}H\\_{X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }H\\_{X}=1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}H\\_{X}{\\\\text{ is undefined}}\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 1}H\\_{X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }H\\_{X}=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}H\\_{X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }H\\_{X}=1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f9c9be97ae581b7d1ba77fc375e8cedfd799bb4e)\n\nThe harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean *HX* based on the random variable *X*, also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − *X*), the mirror-image of *X*, denoted by *H*1 − *X*:\n\nH 1 − X \\= 1 E ⁡ \\[ 1 1 − X \\] \\= β − 1 α \\+ β − 1 if β \\> 1 , and α \\> 0\\. {\\\\displaystyle H\\_{1-X}={\\\\frac {1}{\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]}}={\\\\frac {\\\\beta -1}{\\\\alpha +\\\\beta -1}}{\\\\text{ if }}\\\\beta \\>1,{\\\\text{ and }}\\\\alpha \\>0.} ![{\\\\displaystyle H\\_{1-X}={\\\\frac {1}{\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]}}={\\\\frac {\\\\beta -1}{\\\\alpha +\\\\beta -1}}{\\\\text{ if }}\\\\beta \\>1,{\\\\text{ and }}\\\\alpha \\>0.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/48f4fd69f20c4259cb8a50e754df8dfed5a1ddca)\n\nThe [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*H*(1 − *X*)) of a beta distribution with *β* \\< 1 is undefined, because its defining expression is not bounded in \\[0, 1\\] for shape parameter *β* less than unity.\n\nLetting *α* = *β* in the above expression one obtains\n\nH ( 1 − X ) \\= β − 1 2 β − 1 , {\\\\displaystyle H\\_{(1-X)}={\\\\frac {\\\\beta -1}{2\\\\beta -1}},} ![{\\\\displaystyle H\\_{(1-X)}={\\\\frac {\\\\beta -1}{2\\\\beta -1}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2e44f14a6bf3119656952a5ff13b75bc5b81cdec)\n\nshowing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1\/2, for *α* = *β* → ∞.\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\nlim β → 0 H 1 − X is undefined lim β → 1 H 1 − X \\= lim α → ∞ H 1 − X \\= 0 lim α → 0 H 1 − X \\= lim β → ∞ H 1 − X \\= 1 {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\beta \\\\to 0}H\\_{1-X}{\\\\text{ is undefined}}\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 1}H\\_{1-X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }H\\_{1-X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}H\\_{1-X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }H\\_{1-X}=1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\beta \\\\to 0}H\\_{1-X}{\\\\text{ is undefined}}\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 1}H\\_{1-X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }H\\_{1-X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}H\\_{1-X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }H\\_{1-X}=1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/383c5ac1f2d11de963103ce8fec670e2a5eba78c)\n\nAlthough both *H**X* and *H*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the harmonic means are equal: *H**X* = *H*1−*X*. This equality follows from the following symmetry displayed between both harmonic means:\n\nH X ( B ( α , β ) ) \\= H 1 − X ( B ( β , α ) ) if α , β \\> 1\\. {\\\\displaystyle H\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=H\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )){\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.} ![{\\\\displaystyle H\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=H\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )){\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0e80c207c2d510bbda4077f80954f86f872c4986)\n\n### Measures of statistical dispersion\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=17 \"Edit section: Measures of statistical dispersion\")\\]\n\n#### Variance\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=18 \"Edit section: Variance\")\\]\n\nThe [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") (the second moment centered on the mean) of a beta distribution [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* with parameters *α* and *β* is:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[14\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-14)\n\nvar ⁡ ( X ) \\= E ⁡ \\[ ( X − μ ) 2 \\] \\= α β ( α \\+ β ) 2 ( α \\+ β \\+ 1 ) {\\\\displaystyle \\\\operatorname {var} (X)=\\\\operatorname {E} \\\\left\\[(X-\\\\mu )^{2}\\\\right\\]={\\\\frac {\\\\alpha \\\\beta }{\\\\left(\\\\alpha +\\\\beta \\\\right)^{2}\\\\left(\\\\alpha +\\\\beta +1\\\\right)}}} ![{\\\\displaystyle \\\\operatorname {var} (X)=\\\\operatorname {E} \\\\left\\[(X-\\\\mu )^{2}\\\\right\\]={\\\\frac {\\\\alpha \\\\beta }{\\\\left(\\\\alpha +\\\\beta \\\\right)^{2}\\\\left(\\\\alpha +\\\\beta +1\\\\right)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d7d2effe47b57f9a004264ee6aac04029d3954de)\n\nLetting *α* = *β* in the above expression one obtains\n\nvar ⁡ ( X ) \\= 1 4 ( 2 β \\+ 1 ) , {\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {1}{4(2\\\\beta +1)}},} ![{\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {1}{4(2\\\\beta +1)}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9ddb2c17489caef8b881faaf9005d70cbfdb113f)\n\nshowing that for *α* = *β* the variance decreases monotonically as *α* = *β* increases. Setting *α* = *β* = 0 in this expression, one finds the maximum variance var(*X*) = 1\/4[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) which only occurs approaching the limit, at *α* = *β* = 0.\n\nThe beta distribution may also be [parametrized](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of its mean *μ* (0 \\< *μ* \\< 1) and sample size *ν* = *α* + *β* (*ν* \\> 0) (see subsection [Mean and sample size](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean_and_sample_size)):\n\nα \\= μ ν , where ν \\= ( α \\+ β ) \\> 0 , β \\= ( 1 − μ ) ν , where ν \\= ( α \\+ β ) \\> 0\\. {\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/20a45fa8a94426b232051253608b906c6822b55f)\n\nUsing this [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\"), one can express the variance in terms of the mean *μ* and the sample size *ν* as follows:\n\nvar ⁡ ( X ) \\= μ ( 1 − μ ) 1 \\+ ν {\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {\\\\mu (1-\\\\mu )}{1+\\\\nu }}} ![{\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {\\\\mu (1-\\\\mu )}{1+\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c47a19d1f9adc5d491983b5709e2cf1b54ccdc7f)\n\nSince *ν* = *α* + *β* \\> 0, it follows that var(*X*) \\< *μ*(1 − *μ*).\n\nFor a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1\/2, and therefore:\n\nvar ⁡ ( X ) \\= 1 4 ( 1 \\+ ν ) if μ \\= 1 2 {\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {1}{4(1+\\\\nu )}}{\\\\text{ if }}\\\\mu ={\\\\tfrac {1}{2}}} ![{\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {1}{4(1+\\\\nu )}}{\\\\text{ if }}\\\\mu ={\\\\tfrac {1}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ce5b637b913b97c48ca2fb6582eaf867338b149a)\n\nAlso, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\nlim β → 0 var ⁡ ( X ) \\= lim α → 0 var ⁡ ( X ) \\= lim β → ∞ var ⁡ ( X ) \\= lim α → ∞ var ⁡ ( X ) \\= 0 lim ν → ∞ var ⁡ ( X ) \\= lim μ → 0 var ⁡ ( X ) \\= lim μ → 1 var ⁡ ( X ) \\= 0 lim ν → 0 var ⁡ ( X ) \\= μ ( 1 − μ ) {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\beta \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\operatorname {var} (X)=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\operatorname {var} (X)=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\operatorname {var} (X)=\\\\lim \\_{\\\\mu \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\mu \\\\to 1}\\\\operatorname {var} (X)=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}\\\\operatorname {var} (X)=\\\\mu (1-\\\\mu )\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\beta \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\operatorname {var} (X)=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\operatorname {var} (X)=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\operatorname {var} (X)=\\\\lim \\_{\\\\mu \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\mu \\\\to 1}\\\\operatorname {var} (X)=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}\\\\operatorname {var} (X)=\\\\mu (1-\\\\mu )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7fbf483ab2c1f8a471abc14c877b0b4ca13cbaff)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/49\/Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg\/330px-Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)\n\n#### Geometric variance and covariance\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=19 \"Edit section: Geometric variance and covariance\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/36\/Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png\/250px-Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png)\n\nlog geometric variances vs. *α* and *β*\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png\/250px-Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png)\n\nlog geometric variances vs. *α* and *β*\n\nThe logarithm of the geometric variance, ln(var*GX*), of a distribution with [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* is the second moment of the logarithm of *X* centered on the geometric mean of *X*, ln(*GX*):\n\nln ⁡ var G X \\= E ⁡ \\[ ( ln ⁡ X − ln ⁡ G X ) 2 \\] \\= E ⁡ \\[ ( ln ⁡ X − E ⁡ \\[ ln ⁡ X \\] ) 2 \\] \\= E ⁡ \\[ ( ln ⁡ X ) 2 \\] − ( E ⁡ \\[ ln ⁡ X \\] ) 2 \\= var ⁡ \\[ ln ⁡ X \\] {\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\operatorname {var} \\_{GX}&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X-\\\\ln G\\_{X}\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X-\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\right\\]\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X\\\\right)^{2}\\\\right\\]-\\\\left(\\\\operatorname {E} \\[\\\\ln X\\]\\\\right)^{2}\\\\\\\\&=\\\\operatorname {var} \\[\\\\ln X\\]\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\operatorname {var} \\_{GX}&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X-\\\\ln G\\_{X}\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X-\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\right\\]\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X\\\\right)^{2}\\\\right\\]-\\\\left(\\\\operatorname {E} \\[\\\\ln X\\]\\\\right)^{2}\\\\\\\\&=\\\\operatorname {var} \\[\\\\ln X\\]\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/03642aabe02f83fe6e01e5a136c245d82b898904)\n\nand therefore, the geometric variance is:\n\nvar G X \\= e var ⁡ \\[ ln ⁡ X \\] {\\\\displaystyle \\\\operatorname {var} \\_{GX}=e^{\\\\operatorname {var} \\[\\\\ln X\\]}} ![{\\\\displaystyle \\\\operatorname {var} \\_{GX}=e^{\\\\operatorname {var} \\[\\\\ln X\\]}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/524cf664ccfd5eb381fd1987926209f1c401a200)\n\nIn the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\") matrix, and the curvature of the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\"), the logarithm of the geometric variance of the [reflected](https:\/\/en.wikipedia.org\/wiki\/Reflection_formula \"Reflection formula\") variable 1 − *X* and the logarithm of the geometric covariance between *X* and 1 − *X* appear:\n\nln ⁡ v a r G ( 1 \\- X ) \\= E ⁡ \\[ ( ln ⁡ ( 1 − X ) − ln ⁡ G 1 − X ) 2 \\] \\= E ⁡ \\[ ( ln ⁡ ( 1 − X ) − E ⁡ \\[ ln ⁡ ( 1 − X ) \\] ) 2 \\] \\= E ⁡ \\[ ( ln ⁡ ( 1 − X ) ) 2 \\] − ( E ⁡ \\[ ln ⁡ ( 1 − X ) \\] ) 2 \\= var ⁡ \\[ ln ⁡ ( 1 − X ) \\] v a r G ( 1 \\- X ) \\= e var ⁡ \\[ ln ⁡ ( 1 − X ) \\] ln ⁡ c o v G X , 1 \\- X \\= E ⁡ \\[ ( ln ⁡ X − ln ⁡ G X ) ( ln ⁡ ( 1 − X ) − ln ⁡ G 1 − X ) \\] \\= E ⁡ \\[ ( ln ⁡ X − E ⁡ \\[ ln ⁡ X \\] ) ( ln ⁡ ( 1 − X ) − E ⁡ \\[ ln ⁡ ( 1 − X ) \\] ) \\] \\= E ⁡ \\[ ln ⁡ X ln ⁡ ( 1 − X ) \\] − E ⁡ \\[ ln ⁡ X \\] E ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] cov G X , ( 1 − X ) \\= e cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] {\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\operatorname {var\\_{G(1-X)}} &=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln(1-X)-\\\\ln G\\_{1-X}\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln(1-X)-\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[(\\\\ln(1-X))^{2}\\\\right\\]-\\\\left(\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\right)^{2}\\\\\\\\&=\\\\operatorname {var} \\[\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {var\\_{G(1-X)}} &=e^{\\\\operatorname {var} \\[\\\\ln(1-X)\\]}\\\\\\\\&\\\\\\\\\\\\ln \\\\operatorname {cov\\_{G{X,1-X}}} &=\\\\operatorname {E} \\[(\\\\ln X-\\\\ln G\\_{X})(\\\\ln(1-X)-\\\\ln G\\_{1-X})\\]\\\\\\\\&=\\\\operatorname {E} \\[(\\\\ln X-\\\\operatorname {E} \\[\\\\ln X\\])(\\\\ln(1-X)-\\\\operatorname {E} \\[\\\\ln(1-X)\\])\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\ln(1-X)\\\\right\\]-\\\\operatorname {E} \\[\\\\ln X\\]\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\\\\\&=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {cov} \\_{G{X,(1-X)}}&=e^{\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\operatorname {var\\_{G(1-X)}} &=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln(1-X)-\\\\ln G\\_{1-X}\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln(1-X)-\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[(\\\\ln(1-X))^{2}\\\\right\\]-\\\\left(\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\right)^{2}\\\\\\\\&=\\\\operatorname {var} \\[\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {var\\_{G(1-X)}} &=e^{\\\\operatorname {var} \\[\\\\ln(1-X)\\]}\\\\\\\\&\\\\\\\\\\\\ln \\\\operatorname {cov\\_{G{X,1-X}}} &=\\\\operatorname {E} \\[(\\\\ln X-\\\\ln G\\_{X})(\\\\ln(1-X)-\\\\ln G\\_{1-X})\\]\\\\\\\\&=\\\\operatorname {E} \\[(\\\\ln X-\\\\operatorname {E} \\[\\\\ln X\\])(\\\\ln(1-X)-\\\\operatorname {E} \\[\\\\ln(1-X)\\])\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\ln(1-X)\\\\right\\]-\\\\operatorname {E} \\[\\\\ln X\\]\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\\\\\&=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {cov} \\_{G{X,(1-X)}}&=e^{\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/657c11ca41846366cb8d9843536af9e002ea4cdf)\n\nFor a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section [§ Moments of logarithmically transformed random variables](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moments_of_logarithmically_transformed_random_variables). The [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of the logarithmic variables and [covariance](https:\/\/en.wikipedia.org\/wiki\/Covariance \"Covariance\") of ln *X* and ln(1−*X*) are:\n\nvar ⁡ \\[ ln ⁡ X \\] \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e396e8700267735eb741f73e8906445579c43bc6) var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/70eefadef46c7d56cc13c8221aa3df1d71596b7f) cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] \\= − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e7a515ada0b9d62c5a3a7b35662b03256d66e3b9)\n\nwhere the **[trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), and is defined as the derivative of the [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\"):\n\nψ 1 ( α ) \\= d 2 ln ⁡ Γ ( α ) d α 2 \\= d ψ ( α ) d α . {\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{d\\\\alpha ^{2}}}={\\\\frac {d\\\\psi (\\\\alpha )}{d\\\\alpha }}.} ![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{d\\\\alpha ^{2}}}={\\\\frac {d\\\\psi (\\\\alpha )}{d\\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/041bbff527a17a19f628022e9d3bbb548e7c9f87)\n\nTherefore,\n\nln ⁡ var G X \\= var ⁡ \\[ ln ⁡ X \\] \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{GX}=\\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{GX}=\\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/194b00552edda5d8d026a24872cdb27b604516c9) ln ⁡ var G ( 1 − X ) \\= var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/96dd82553307c025c84da68a3c373aad7467abd2) ln ⁡ cov G X , 1 − X \\= cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] \\= − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,1-X}=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,1-X}=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/40a793c0271e457f671edb0668edc15bbae8740f)\n\nThe accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters *α* and *β*. The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters *α* and *β* greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values *α* and *β* less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for *α* and *β* less than unity.\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\nlim α → 0 ln ⁡ var G X \\= lim β → 0 ln ⁡ var G ( 1 − X ) \\= ∞ lim β → 0 ln ⁡ var G X \\= lim α → ∞ ln ⁡ var G X \\= lim α → 0 ln ⁡ var G ( 1 − X ) \\= lim β → ∞ ln ⁡ var G ( 1 − X ) \\= 0 lim α → ∞ ln ⁡ cov G X , ( 1 − X ) \\= lim β → ∞ ln ⁡ cov G X , ( 1 − X ) \\= 0 lim β → ∞ ln ⁡ var G X \\= ψ 1 ( α ) lim α → ∞ ln ⁡ var G ( 1 − X ) \\= ψ 1 ( β ) lim α → 0 ln ⁡ cov G X , ( 1 − X ) \\= − ψ 1 ( β ) lim β → 0 ln ⁡ cov G X , ( 1 − X ) \\= − ψ 1 ( α ) {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)}=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX}=\\\\psi \\_{1}(\\\\alpha )\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\psi \\_{1}(\\\\beta )\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=-\\\\psi \\_{1}(\\\\beta )\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=-\\\\psi \\_{1}(\\\\alpha )\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)}=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX}=\\\\psi \\_{1}(\\\\alpha )\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\psi \\_{1}(\\\\beta )\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=-\\\\psi \\_{1}(\\\\beta )\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=-\\\\psi \\_{1}(\\\\alpha )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/52b49d30ad8262df98b1571219f266c2555dc215)\n\nLimits with two parameters varying:\n\nlim α → ∞ ( lim β → ∞ ln ⁡ var G X ) \\= lim β → ∞ ( lim α → ∞ ln ⁡ var G ( 1 − X ) ) \\= lim α → ∞ ( lim β → 0 ln ⁡ cov G X , ( 1 − X ) ) \\= lim β → ∞ ( lim α → 0 ln ⁡ cov G X , ( 1 − X ) ) \\= 0 lim α → ∞ ( lim β → 0 ln ⁡ var G X ) \\= lim β → ∞ ( lim α → 0 ln ⁡ var G ( 1 − X ) ) \\= ∞ lim α → 0 ( lim β → 0 ln ⁡ cov G X , ( 1 − X ) ) \\= lim β → 0 ( lim α → 0 ln ⁡ cov G X , ( 1 − X ) ) \\= − ∞ {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)})=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)})=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=\\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=-\\\\infty \\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)})=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)})=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=\\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=-\\\\infty \\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0bddeff859939cba195caabb8a8340195320ad31)\n\nAlthough both ln(var*GX*) and ln(var*G*(1 − *X*)) are asymmetric, when the shape parameters are equal, *α* = *β*, one has: ln(var*GX*) = ln(var*G*(1−*X*)). This equality follows from the following symmetry displayed between both log geometric variances:\n\nln ⁡ var G X ⁡ ( B ( α , β ) ) \\= ln ⁡ var G ( 1 − X ) ⁡ ( B ( β , α ) ) . {\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{GX}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {var} \\_{G(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )).} ![{\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{GX}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {var} \\_{G(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/73e9d65f936352d6c5ba0168fdc0a51177ec77ba)\n\nThe log geometric covariance is symmetric:\n\nln ⁡ cov G X , ( 1 − X ) ⁡ ( B ( α , β ) ) \\= ln ⁡ cov G X , ( 1 − X ) ⁡ ( B ( β , α ) ) {\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))} ![{\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d59fb83efc45a84f3606874dc5791681812ca46b)\n\n#### Mean absolute deviation around the mean\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=20 \"Edit section: Mean absolute deviation around the mean\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/60\/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg\/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)\n\nRatio of, ean abs.dev. to std.dev. for beta distribution with α and β ranging from 0 to 5\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/10\/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg\/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg)\n\nRatio of mean abs.dev. to std.dev. for beta distribution with mean 0 ≤ *μ* ≤ 1 and sample size 0 \\< *ν* ≤ 10\n\nThe [mean absolute deviation](https:\/\/en.wikipedia.org\/wiki\/Mean_absolute_deviation \"Mean absolute deviation\") around the mean for the beta distribution with shape parameters *α* and *β* is:[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)\n\nE ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= 2 α α β β B ( α , β ) ( α \\+ β ) α \\+ β \\+ 1 {\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2\\\\alpha ^{\\\\alpha }\\\\beta ^{\\\\beta }}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )(\\\\alpha +\\\\beta )^{\\\\alpha +\\\\beta +1}}}} ![{\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2\\\\alpha ^{\\\\alpha }\\\\beta ^{\\\\beta }}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )(\\\\alpha +\\\\beta )^{\\\\alpha +\\\\beta +1}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6d1c6330a91df22b40cedc7903dbc70120d66cf9)\n\nThe mean absolute deviation around the mean is a more [robust](https:\/\/en.wikipedia.org\/wiki\/Robust_statistics \"Robust statistics\") [estimator](https:\/\/en.wikipedia.org\/wiki\/Estimator \"Estimator\") of [statistical dispersion](https:\/\/en.wikipedia.org\/wiki\/Statistical_dispersion \"Statistical dispersion\") than the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta(*α*, *β*) distributions with *α*,*β* \\> 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean. Therefore, the effect of very large deviations from the mean are not as overly weighted.\n\nUsing [Stirling's approximation](https:\/\/en.wikipedia.org\/wiki\/Stirling%27s_approximation \"Stirling's approximation\") to the [Gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"), [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) derived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for *α* = *β* = 1, and it decreases to zero as *α* → ∞, *β* → ∞):\n\nmean abs. dev. from mean standard deviation \\= E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] var ⁡ ( X ) ≈ 2 π ( 1 \\+ 7 12 ( α \\+ β ) − 1 12 α − 1 12 β ) , if α , β \\> 1\\. {\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\text{mean abs. dev. from mean}}{\\\\text{standard deviation}}}&={\\\\frac {\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]}{\\\\sqrt {\\\\operatorname {var} (X)}}}\\\\\\\\&\\\\approx {\\\\sqrt {\\\\frac {2}{\\\\pi }}}\\\\left(1+{\\\\frac {7}{12(\\\\alpha +\\\\beta )}}{}-{\\\\frac {1}{12\\\\alpha }}-{\\\\frac {1}{12\\\\beta }}\\\\right),{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\text{mean abs. dev. from mean}}{\\\\text{standard deviation}}}&={\\\\frac {\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]}{\\\\sqrt {\\\\operatorname {var} (X)}}}\\\\\\\\&\\\\approx {\\\\sqrt {\\\\frac {2}{\\\\pi }}}\\\\left(1+{\\\\frac {7}{12(\\\\alpha +\\\\beta )}}{}-{\\\\frac {1}{12\\\\alpha }}-{\\\\frac {1}{12\\\\beta }}\\\\right),{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c196a5a2eb110b71471a3dc019241c6cb8c3f927)\n\nAt the limit *α* → ∞, *β* → ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution: 2 π {\\\\displaystyle {\\\\sqrt {\\\\frac {2}{\\\\pi }}}} ![{\\\\displaystyle {\\\\sqrt {\\\\frac {2}{\\\\pi }}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/14c49a2b1362a06b132b9477b3669977ad5633dd). For *α* = *β* = 1 this ratio equals 3 2 {\\\\displaystyle {\\\\frac {\\\\sqrt {3}}{2}}} ![{\\\\displaystyle {\\\\frac {\\\\sqrt {3}}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4864a0c173339d1d88e89ca3c943f016744c879a), so that from *α* = *β* = 1 to *α*, *β* → ∞ the ratio decreases by 8.5%. For *α* = *β* = 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from *α* = *β* = 0 to *α* = *β* = 1, and by 25% from *α* = *β* = 0 to *α*, *β* → ∞ . However, for skewed beta distributions such that *α* → 0 or *β* → 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation.\n\nUsing the [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of mean *μ* and sample size *ν* = *α* + *β* \\> 0:\n\n*α* = *μν*, *β* = (1 − *μ*)*ν*\n\none can express the mean [absolute deviation](https:\/\/en.wikipedia.org\/wiki\/Absolute_deviation \"Absolute deviation\") around the mean in terms of the mean *μ* and the sample size *ν* as follows:\n\nE ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= 2 μ μ ν ( 1 − μ ) ( 1 − μ ) ν ν B ( μ ν , ( 1 − μ ) ν ) {\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2\\\\mu ^{\\\\mu \\\\nu }(1-\\\\mu )^{(1-\\\\mu )\\\\nu }}{\\\\nu \\\\mathrm {B} (\\\\mu \\\\nu ,(1-\\\\mu )\\\\nu )}}} ![{\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2\\\\mu ^{\\\\mu \\\\nu }(1-\\\\mu )^{(1-\\\\mu )\\\\nu }}{\\\\nu \\\\mathrm {B} (\\\\mu \\\\nu ,(1-\\\\mu )\\\\nu )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/027efecf8aaefea8c805194e47a1374ffcb63cb8)\n\nFor a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1\/2, and therefore:\n\nE ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= 2 1 − ν ν B ( ν 2 , ν 2 ) \\= 2 1 − ν Γ ( ν ) ν ( Γ ( ν 2 ) ) 2 lim ν → 0 ( lim μ → 1 2 E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] ) \\= 1 2 lim ν → ∞ ( lim μ → 1 2 E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] ) \\= 0 {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2^{1-\\\\nu }}{\\\\nu \\\\mathrm {B} ({\\\\tfrac {\\\\nu }{2}},{\\\\tfrac {\\\\nu }{2}})}}&={\\\\frac {2^{1-\\\\nu }\\\\Gamma (\\\\nu )}{\\\\nu (\\\\Gamma ({\\\\tfrac {\\\\nu }{2}}))^{2}}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to 0}\\\\left(\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]\\\\right)&={\\\\frac {1}{2}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\left(\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]\\\\right)&=0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2^{1-\\\\nu }}{\\\\nu \\\\mathrm {B} ({\\\\tfrac {\\\\nu }{2}},{\\\\tfrac {\\\\nu }{2}})}}&={\\\\frac {2^{1-\\\\nu }\\\\Gamma (\\\\nu )}{\\\\nu (\\\\Gamma ({\\\\tfrac {\\\\nu }{2}}))^{2}}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to 0}\\\\left(\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]\\\\right)&={\\\\frac {1}{2}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\left(\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]\\\\right)&=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/745a053a1ef3cc7edf07332763b401bd09b40e42)\n\nAlso, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\nlim β → 0 E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= lim α → 0 E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= 0 lim β → ∞ E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= lim α → ∞ E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= 0 lim μ → 0 E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= lim μ → 1 E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= 0 lim ν → 0 E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= μ ( 1 − μ ) lim ν → ∞ E ⁡ \\[ \\| X − E \\[ X \\] \\| \\] \\= 0 {\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\mu \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\mu \\\\to 1}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\nu \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&={\\\\sqrt {\\\\mu (1-\\\\mu )}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\mu \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\mu \\\\to 1}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\nu \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&={\\\\sqrt {\\\\mu (1-\\\\mu )}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/87c43b4a05f8ea3acf3f15b0a16f6ee07811ac6b)\n\n#### Mean absolute difference\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=21 \"Edit section: Mean absolute difference\")\\]\n\nThe [mean absolute difference](https:\/\/en.wikipedia.org\/wiki\/Mean_absolute_difference \"Mean absolute difference\") for the beta distribution is:\n\nM D \\= ∫ 0 1 ∫ 0 1 f ( x ; α , β ) f ( y ; α , β ) \\| x − y \\| d x d y \\= 4 α \\+ β B ( α \\+ β , α \\+ β ) B ( α , α ) B ( β , β ) {\\\\displaystyle {\\\\begin{aligned}\\\\mathrm {MD} &=\\\\int \\_{0}^{1}\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\,f(y;\\\\alpha ,\\\\beta )\\\\left\\|x-y\\\\right\\|dx\\\\,dy\\\\\\\\\\[1ex\\]&={\\\\frac {4}{\\\\alpha +\\\\beta }}{\\\\frac {B(\\\\alpha +\\\\beta ,\\\\alpha +\\\\beta )}{B(\\\\alpha ,\\\\alpha )B(\\\\beta ,\\\\beta )}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\mathrm {MD} &=\\\\int \\_{0}^{1}\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\,f(y;\\\\alpha ,\\\\beta )\\\\left\\|x-y\\\\right\\|dx\\\\,dy\\\\\\\\\\[1ex\\]&={\\\\frac {4}{\\\\alpha +\\\\beta }}{\\\\frac {B(\\\\alpha +\\\\beta ,\\\\alpha +\\\\beta )}{B(\\\\alpha ,\\\\alpha )B(\\\\beta ,\\\\beta )}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0de290eea66b8a424727bff1b9a02f53f2607361)\n\nThe [Gini coefficient](https:\/\/en.wikipedia.org\/wiki\/Gini_coefficient \"Gini coefficient\") for the beta distribution is half of the relative mean absolute difference:\n\nG \\= ( 2 α ) B ( α \\+ β , α \\+ β ) B ( α , α ) B ( β , β ) {\\\\displaystyle \\\\mathrm {G} =\\\\left({\\\\frac {2}{\\\\alpha }}\\\\right){\\\\frac {B(\\\\alpha +\\\\beta ,\\\\alpha +\\\\beta )}{B(\\\\alpha ,\\\\alpha )B(\\\\beta ,\\\\beta )}}} ![{\\\\displaystyle \\\\mathrm {G} =\\\\left({\\\\frac {2}{\\\\alpha }}\\\\right){\\\\frac {B(\\\\alpha +\\\\beta ,\\\\alpha +\\\\beta )}{B(\\\\alpha ,\\\\alpha )B(\\\\beta ,\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0b4dc9f001aea3434b57f12eaaabd341347cc169)\n\n### Skewness\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=22 \"Edit section: Skewness\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1b\/Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg\/330px-Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg)\n\nSkewness for beta distribution as a function of variance and mean\n\nThe [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") (the third moment centered on the mean, normalized by the 3\/2 power of the variance) of the beta distribution is[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\nγ 1 \\= E ⁡ \\[ ( X − μ ) 3 \\] ( var ⁡ ( X ) ) 3 \/ 2 \\= 2 ( β − α ) α \\+ β \\+ 1 ( α \\+ β \\+ 2 ) α β . {\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\\\left\\[\\\\left(X-\\\\mu \\\\right)^{3}\\\\right\\]}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3\/2}}}={\\\\frac {2\\\\left(\\\\beta -\\\\alpha \\\\right){\\\\sqrt {\\\\alpha +\\\\beta +1}}}{\\\\left(\\\\alpha +\\\\beta +2\\\\right){\\\\sqrt {\\\\alpha \\\\beta }}}}.} ![{\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\\\left\\[\\\\left(X-\\\\mu \\\\right)^{3}\\\\right\\]}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3\/2}}}={\\\\frac {2\\\\left(\\\\beta -\\\\alpha \\\\right){\\\\sqrt {\\\\alpha +\\\\beta +1}}}{\\\\left(\\\\alpha +\\\\beta +2\\\\right){\\\\sqrt {\\\\alpha \\\\beta }}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1c880f0b8322d91fe382f87eaa4f8730faa164ed)\n\nLetting *α* = *β* in the above expression one obtains *γ*1 = 0, showing once again that for *α* = *β* the distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for *α* \\< *β*, negative skew (left-tailed) for *α* \\> *β*.\n\nUsing the [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of mean *μ* and sample size *ν* = *α* + *β*:\n\nα \\= μ ν , where ν \\= ( α \\+ β ) \\> 0 , β \\= ( 1 − μ ) ν , where ν \\= ( α \\+ β ) \\> 0\\. {\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/20a45fa8a94426b232051253608b906c6822b55f)\n\none can express the skewness in terms of the mean *μ* and the sample size ν as follows:\n\nγ 1 \\= E ⁡ \\[ ( X − μ ) 3 \\] ( var ⁡ ( X ) ) 3 \/ 2 \\= 2 ( 1 − 2 μ ) 1 \\+ ν ( 2 \\+ ν ) μ ( 1 − μ ) . {\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3\/2}}}={\\\\frac {2(1-2\\\\mu ){\\\\sqrt {1+\\\\nu }}}{(2+\\\\nu ){\\\\sqrt {\\\\mu (1-\\\\mu )}}}}.} ![{\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3\/2}}}={\\\\frac {2(1-2\\\\mu ){\\\\sqrt {1+\\\\nu }}}{(2+\\\\nu ){\\\\sqrt {\\\\mu (1-\\\\mu )}}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c88399efb587f7d2443ddb232e4b24f2763050b9)\n\nThe skewness can also be expressed just in terms of the variance *var* and the mean *μ* as follows:\n\nγ 1 \\= E ⁡ \\[ ( X − μ ) 3 \\] ( var ⁡ ( X ) ) 3 \/ 2 \\= 2 ( 1 − 2 μ ) var μ ( 1 − μ ) \\+ var if var \\< μ ( 1 − μ ) {\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]}{(\\\\operatorname {var} (X))^{3\/2}}}={\\\\frac {2(1-2\\\\mu ){\\\\sqrt {\\\\operatorname {var} }}}{\\\\mu (1-\\\\mu )+\\\\operatorname {var} }}{\\\\text{ if }}\\\\operatorname {var} \\<\\\\mu (1-\\\\mu )} ![{\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]}{(\\\\operatorname {var} (X))^{3\/2}}}={\\\\frac {2(1-2\\\\mu ){\\\\sqrt {\\\\operatorname {var} }}}{\\\\mu (1-\\\\mu )+\\\\operatorname {var} }}{\\\\text{ if }}\\\\operatorname {var} \\<\\\\mu (1-\\\\mu )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6b48373ac7ce8096381e7f74edf9e44bc435ad13)\n\nThe accompanying plot of skewness as a function of variance and mean shows that maximum variance (1\/4) is coupled with zero skewness and the symmetry condition (*μ* = 1\/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the \"mass\" of the probability distribution is concentrated at the ends (minimum variance).\n\nThe following expression for the square of the skewness, in terms of the sample size *ν* = *α* + *β* and the variance var, is useful for the method of moments estimation of four parameters:\n\n( γ 1 ) 2 \\= ( E ⁡ \\[ ( X − μ ) 3 \\] ) 2 ( var ⁡ ( X ) ) 3 \\= 4 ( 2 \\+ ν ) 2 ( 1 var − 4 ( 1 \\+ ν ) ) {\\\\displaystyle (\\\\gamma \\_{1})^{2}={\\\\frac {\\\\left(\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]\\\\right)^{2}}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3}}}={\\\\frac {4}{(2+\\\\nu )^{2}}}\\\\left({\\\\frac {1}{\\\\operatorname {var} }}-4(1+\\\\nu )\\\\right)} ![{\\\\displaystyle (\\\\gamma \\_{1})^{2}={\\\\frac {\\\\left(\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]\\\\right)^{2}}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3}}}={\\\\frac {4}{(2+\\\\nu )^{2}}}\\\\left({\\\\frac {1}{\\\\operatorname {var} }}-4(1+\\\\nu )\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b4f7b7d9a6d73e9bbcc43812b8f3fd573bc02ee3)\n\nThis expression correctly gives a skewness of zero for *α* = *β*, since in that case (see [§ Variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Variance)): var \\= 1 4 ( 1 \\+ ν ) {\\\\displaystyle \\\\operatorname {var} ={\\\\frac {1}{4(1+\\\\nu )}}} ![{\\\\displaystyle \\\\operatorname {var} ={\\\\frac {1}{4(1+\\\\nu )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ceee4bc548c8256d3770abfd91ced35cdaeb4305).\n\nFor the symmetric case (*α* = *β*), skewness = 0 over the whole range, and the following limits apply:\n\nlim α \\= β → 0 γ 1 \\= lim α \\= β → ∞ γ 1 \\= lim ν → 0 γ 1 \\= lim ν → ∞ γ 1 \\= lim μ → 1 2 γ 1 \\= 0 {\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1}=\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\gamma \\_{1}=0} ![{\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1}=\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\gamma \\_{1}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/62067392844dd260a0af419672fd2f6e8c964ea8)\n\nFor the asymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\nlim α → 0 γ 1 \\= lim μ → 0 γ 1 \\= ∞ lim β → 0 γ 1 \\= lim μ → 1 γ 1 \\= − ∞ lim α → ∞ γ 1 \\= − 2 β , lim β → 0 ( lim α → ∞ γ 1 ) \\= − ∞ , lim β → ∞ ( lim α → ∞ γ 1 ) \\= 0 lim β → ∞ γ 1 \\= 2 α , lim α → 0 ( lim β → ∞ γ 1 ) \\= ∞ , lim α → ∞ ( lim β → ∞ γ 1 ) \\= 0 lim ν → 0 γ 1 \\= 1 − 2 μ μ ( 1 − μ ) , lim μ → 0 ( lim ν → 0 γ 1 ) \\= ∞ , lim μ → 1 ( lim ν → 0 γ 1 ) \\= − ∞ {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to 0}\\\\gamma \\_{1}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to 1}\\\\gamma \\_{1}=-\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1}=-{\\\\frac {2}{\\\\sqrt {\\\\beta }}},\\\\quad \\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1})=-\\\\infty ,\\\\quad \\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1})=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1}={\\\\frac {2}{\\\\sqrt {\\\\alpha }}},\\\\quad \\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1})=\\\\infty ,\\\\quad \\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1})=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1}={\\\\frac {1-2\\\\mu }{\\\\sqrt {\\\\mu (1-\\\\mu )}}},\\\\quad \\\\lim \\_{\\\\mu \\\\to 0}(\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1})=\\\\infty ,\\\\quad \\\\lim \\_{\\\\mu \\\\to 1}(\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1})=-\\\\infty \\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to 0}\\\\gamma \\_{1}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to 1}\\\\gamma \\_{1}=-\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1}=-{\\\\frac {2}{\\\\sqrt {\\\\beta }}},\\\\quad \\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1})=-\\\\infty ,\\\\quad \\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1})=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1}={\\\\frac {2}{\\\\sqrt {\\\\alpha }}},\\\\quad \\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1})=\\\\infty ,\\\\quad \\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1})=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1}={\\\\frac {1-2\\\\mu }{\\\\sqrt {\\\\mu (1-\\\\mu )}}},\\\\quad \\\\lim \\_{\\\\mu \\\\to 0}(\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1})=\\\\infty ,\\\\quad \\\\lim \\_{\\\\mu \\\\to 1}(\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1})=-\\\\infty \\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bba90eaf084c6a3d43b4c92915911f61b0b7a77)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/43\/Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/48\/Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg)\n\n### Kurtosis\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=23 \"Edit section: Kurtosis\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/69\/Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg\/330px-Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg)\n\nExcess Kurtosis for Beta Distribution as a function of variance and mean\n\nThe beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear.[\\[15\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Oguamanam-15) Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc.[\\[16\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Liang-16) Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping[\\[17\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kenney_and_Keeping-17) use the symbol γ2 for the [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\"), but [Abramowitz and Stegun](https:\/\/en.wikipedia.org\/wiki\/Abramowitz_and_Stegun \"Abramowitz and Stegun\")[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18) use different terminology. To prevent confusion[\\[19\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Weisstein.Kurtosi-19) between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows:[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\\[20\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Panik-20)\n\nexcess kurtosis \\= kurtosis − 3 \\= E ⁡ \\[ ( X − μ ) 4 \\] ( var ⁡ ( X ) ) 2 − 3 \\= 6 \\[ α 3 − α 2 ( 2 β − 1 ) \\+ β 2 ( β \\+ 1 ) − 2 α β ( β \\+ 2 ) \\] α β ( α \\+ β \\+ 2 ) ( α \\+ β \\+ 3 ) \\= 6 \\[ ( α − β ) 2 ( α \\+ β \\+ 1 ) − α β ( α \\+ β \\+ 2 ) \\] α β ( α \\+ β \\+ 2 ) ( α \\+ β \\+ 3 ) . {\\\\displaystyle {\\\\begin{aligned}{\\\\text{excess kurtosis}}&={\\\\text{kurtosis}}-3\\\\\\\\&={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{4}\\]}{(\\\\operatorname {var} (X))^{2}}}-3\\\\\\\\&={\\\\frac {6\\[\\\\alpha ^{3}-\\\\alpha ^{2}(2\\\\beta -1)+\\\\beta ^{2}(\\\\beta +1)-2\\\\alpha \\\\beta (\\\\beta +2)\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}\\\\\\\\&={\\\\frac {6\\[(\\\\alpha -\\\\beta )^{2}(\\\\alpha +\\\\beta +1)-\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{excess kurtosis}}&={\\\\text{kurtosis}}-3\\\\\\\\&={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{4}\\]}{(\\\\operatorname {var} (X))^{2}}}-3\\\\\\\\&={\\\\frac {6\\[\\\\alpha ^{3}-\\\\alpha ^{2}(2\\\\beta -1)+\\\\beta ^{2}(\\\\beta +1)-2\\\\alpha \\\\beta (\\\\beta +2)\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}\\\\\\\\&={\\\\frac {6\\[(\\\\alpha -\\\\beta )^{2}(\\\\alpha +\\\\beta +1)-\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ed8320d4f38ba9260f8ad91c30238abc08306dc8)\n\nLetting *α* = *β* in the above expression one obtains\n\nexcess kurtosis \\= − 6 3 \\+ 2 α if α \\= β . {\\\\displaystyle {\\\\text{excess kurtosis}}=-{\\\\frac {6}{3+2\\\\alpha }}{\\\\text{ if }}\\\\alpha =\\\\beta .} ![{\\\\displaystyle {\\\\text{excess kurtosis}}=-{\\\\frac {6}{3+2\\\\alpha }}{\\\\text{ if }}\\\\alpha =\\\\beta .}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f71fa5123180a4ba12fefe14575f8ee902ecf6f5)\n\nTherefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as {*α* = *β*} → 0, and approaching a maximum value of zero as {*α* = *β*} → ∞. The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve. This minimum value is reached when all the probability density is entirely concentrated at each end *x* = 0 and *x* = 1, with nothing in between: a 2-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each end (a coin toss: see section below \"Kurtosis bounded by the square of the skewness\" for further discussion). The description of [kurtosis](https:\/\/en.wikipedia.org\/wiki\/Kurtosis \"Kurtosis\") as a measure of the \"potential outliers\" (or \"potential rare, extreme values\") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For *α* ≠ *β*, skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for *α* → 0 for finite *β*, or for *β* → 0 for finite *α*) because the side away from the mode will produce occasional extreme values. Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends.\n\nUsing the [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of mean *μ* and sample size *ν* = *α* + *β*:\n\nα \\= μ ν , where ν \\= ( α \\+ β ) \\> 0 β \\= ( 1 − μ ) ν , where ν \\= ( α \\+ β ) \\> 0\\. {\\\\displaystyle {\\\\begin{aligned}\\\\alpha &{}=\\\\mu \\\\nu ,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0\\\\\\\\\\\\beta &{}=(1-\\\\mu )\\\\nu ,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &{}=\\\\mu \\\\nu ,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0\\\\\\\\\\\\beta &{}=(1-\\\\mu )\\\\nu ,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e9235083c23a44820d57502412277b6492733df3)\n\none can express the excess kurtosis in terms of the mean *μ* and the sample size *ν* as follows:\n\nexcess kurtosis \\= 6 3 \\+ ν ( ( 1 − 2 μ ) 2 ( 1 \\+ ν ) μ ( 1 − μ ) ( 2 \\+ ν ) − 1 ) {\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}{\\\\bigg (}{\\\\frac {(1-2\\\\mu )^{2}(1+\\\\nu )}{\\\\mu (1-\\\\mu )(2+\\\\nu )}}-1{\\\\bigg )}} ![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}{\\\\bigg (}{\\\\frac {(1-2\\\\mu )^{2}(1+\\\\nu )}{\\\\mu (1-\\\\mu )(2+\\\\nu )}}-1{\\\\bigg )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/93b68e62d58b197fa50fe15bc12d94b9a4accd9a)\n\nThe excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size *ν* as follows:\n\nexcess kurtosis \\= 6 ( 3 \\+ ν ) ( 2 \\+ ν ) ( 1 var − 6 − 5 ν ) if var \\< μ ( 1 − μ ) {\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{(3+\\\\nu )(2+\\\\nu )}}\\\\left({\\\\frac {1}{\\\\text{ var }}}-6-5\\\\nu \\\\right){\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )} ![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{(3+\\\\nu )(2+\\\\nu )}}\\\\left({\\\\frac {1}{\\\\text{ var }}}-6-5\\\\nu \\\\right){\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c62c8758de9adab39b9c265430d2c8be8c1156be)\n\nand, in terms of the variance *var* and the mean *μ* as follows:\n\nexcess kurtosis \\= 6 var ( 1 − var − 5 μ ( 1 − μ ) ) ( var \\+ μ ( 1 − μ ) ) ( 2 var \\+ μ ( 1 − μ ) ) if var \\< μ ( 1 − μ ) {\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6{\\\\text{ var }}(1-{\\\\text{ var }}-5\\\\mu (1-\\\\mu ))}{({\\\\text{var }}+\\\\mu (1-\\\\mu ))(2{\\\\text{ var }}+\\\\mu (1-\\\\mu ))}}{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )} ![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6{\\\\text{ var }}(1-{\\\\text{ var }}-5\\\\mu (1-\\\\mu ))}{({\\\\text{var }}+\\\\mu (1-\\\\mu ))(2{\\\\text{ var }}+\\\\mu (1-\\\\mu ))}}{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/742a17be5afc4513ea91bfa55f1b1483f0a66a02)\n\nThe plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1\/4) and the symmetry condition: the mean occurring at the midpoint (*μ* = 1\/2). This occurs for the symmetric case of *α* = *β* = 0, with zero skewness. At the limit, this is the 2 point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") end *x* = 0 and *x* = 1 and zero probability everywhere else. (A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1.) Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end. Excess kurtosis is minimum: the probability density \"mass\" is zero at the mean and it is concentrated at the two peaks at each end. Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-\"peaky\" with nothing in between them.\n\nOn the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end (*μ* = 0 or *μ* = 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end.\n\nAlternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows:\n\nexcess kurtosis \\= 6 3 \\+ ν ( ( 2 \\+ ν ) 4 ( skewness ) 2 − 1 ) if (skewness) 2 − 2 \\< excess kurtosis \\< 3 2 ( skewness ) 2 {\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}{\\\\bigg (}{\\\\frac {(2+\\\\nu )}{4}}({\\\\text{skewness}})^{2}-1{\\\\bigg )}{\\\\text{ if (skewness)}}^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}} ![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}{\\\\bigg (}{\\\\frac {(2+\\\\nu )}{4}}({\\\\text{skewness}})^{2}-1{\\\\bigg )}{\\\\text{ if (skewness)}}^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5c4a2a00762216460d3146b5e185c584ca7894da)\n\nFrom this last expression, one can obtain the same limits published over a century ago by [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\")[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) for the beta distribution (see section below titled \"Kurtosis bounded by the square of the skewness\"). Setting *α* + *β* = *ν* = 0 in the above expression, one obtains Pearson's lower boundary (values for the skewness and excess kurtosis below the boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") appropriately called the region below this boundary the \"impossible region\"). The limit of *α* + *β* = *ν* → ∞ determines Pearson's upper boundary.\n\nlim ν → 0 excess kurtosis \\= ( skewness ) 2 − 2 lim ν → ∞ excess kurtosis \\= 3 2 ( skewness ) 2 {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}}=({\\\\text{skewness}})^{2}-2\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}}=({\\\\text{skewness}})^{2}-2\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5c1b3a5082942b2039499009068f1640b3cb8507)\n\ntherefore:\n\n( skewness ) 2 − 2 \\< excess kurtosis \\< 3 2 ( skewness ) 2 {\\\\displaystyle ({\\\\text{skewness}})^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}} ![{\\\\displaystyle ({\\\\text{skewness}})^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1cfacc2713aca2945a2a4d65e4a41d7ce3486ec4)\n\nValues of *ν* = *α* + *β* such that *ν* ranges from zero to infinity, 0 \\< *ν* \\< ∞, span the whole region of the beta distribution in the plane of excess kurtosis versus squared skewness.\n\nFor the symmetric case (*α* = *β*), the following limits apply:\n\nlim α \\= β → 0 excess kurtosis \\= − 2 lim α \\= β → ∞ excess kurtosis \\= 0 lim μ → 1 2 excess kurtosis \\= − 6 3 \\+ ν {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}{\\\\text{excess kurtosis}}=-2\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}}=0\\\\\\\\&\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}{\\\\text{excess kurtosis}}=-{\\\\frac {6}{3+\\\\nu }}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}{\\\\text{excess kurtosis}}=-2\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}}=0\\\\\\\\&\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}{\\\\text{excess kurtosis}}=-{\\\\frac {6}{3+\\\\nu }}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/faf6d13398e34d7d3fab238a94b36633d89460b0)\n\nFor the unsymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\nlim α → 0 excess kurtosis \\= lim β → 0 excess kurtosis \\= lim μ → 0 excess kurtosis \\= lim μ → 1 excess kurtosis \\= ∞ lim α → ∞ excess kurtosis \\= 6 β , lim β → 0 ( lim α → ∞ excess kurtosis ) \\= ∞ , lim β → ∞ ( lim α → ∞ excess kurtosis ) \\= 0 lim β → ∞ excess kurtosis \\= 6 α , lim α → 0 ( lim β → ∞ excess kurtosis ) \\= ∞ , lim α → ∞ ( lim β → ∞ excess kurtosis ) \\= 0 lim ν → 0 excess kurtosis \\= − 6 \\+ 1 μ ( 1 − μ ) , lim μ → 0 ( lim ν → 0 excess kurtosis ) \\= ∞ , lim μ → 1 ( lim ν → 0 excess kurtosis ) \\= ∞ {\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\beta \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\mu \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\mu \\\\to 1}{\\\\text{excess kurtosis}}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\frac {6}{\\\\beta }},{\\\\text{ }}\\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}})=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\frac {6}{\\\\alpha }},{\\\\text{ }}\\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}})=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}}=-6+{\\\\frac {1}{\\\\mu (1-\\\\mu )}},{\\\\text{ }}\\\\lim \\_{\\\\mu \\\\to 0}(\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\mu \\\\to 1}(\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}})=\\\\infty \\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\beta \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\mu \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\mu \\\\to 1}{\\\\text{excess kurtosis}}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\frac {6}{\\\\beta }},{\\\\text{ }}\\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}})=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\frac {6}{\\\\alpha }},{\\\\text{ }}\\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}})=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}}=-6+{\\\\frac {1}{\\\\mu (1-\\\\mu )}},{\\\\text{ }}\\\\lim \\_{\\\\mu \\\\to 0}(\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\mu \\\\to 1}(\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}})=\\\\infty \\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/28a968a269123a07cf9a32ccfe2d75ee09e42460)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c9\/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg\/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/26\/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg\/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg)\n\n### Characteristic function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=24 \"Edit section: Characteristic function\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/24\/Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg\/330px-Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacteristicFunction\\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") symmetric case *α* = *β* ranging from 25 to 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/f3\/Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg\/330px-Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacteristicFunc\\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") symmetric case *α* = *β* ranging from 0 to 25\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/24\/Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg\/330px-Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacteristFunc\\)_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") *β* = *α* + 1\/2; *α* ranging from 25 to 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg\/330px-Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacterFunc\\)_Beta_Distrib._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") *α* = *β* + 1\/2; *β* ranging from 25 to 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/0f\/Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg\/330px-Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacterFunc\\)_Beta_Distr._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") *α* = *β* + 1\/2; *β* ranging from 0 to 25\n\nThe [characteristic function](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") is the [Fourier transform](https:\/\/en.wikipedia.org\/wiki\/Fourier_transform \"Fourier transform\") of the probability density function. The characteristic function of the beta distribution is [Kummer's confluent hypergeometric function](https:\/\/en.wikipedia.org\/wiki\/Confluent_hypergeometric_function \"Confluent hypergeometric function\") (of the first kind):[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18)[\\[22\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Zwillinger_2014-22)\n\nφ\n\nX\n\n(\n\nα\n\n;\n\nβ\n\n;\n\nt\n\n)\n\n\\=\n\nE\n\n⁡\n\n\\[\n\ne\n\ni\n\nt\n\nX\n\n\\]\n\n\\=\n\n∫\n\n0\n\n1\n\ne\n\ni\n\nt\n\nx\n\nf\n\n(\n\nx\n\n;\n\nα\n\n,\n\nβ\n\n)\n\nd\n\nx\n\n\\=\n\n1\n\nF\n\n1\n\n(\n\nα\n\n;\n\nα\n\n\\+\n\nβ\n\n;\n\ni\n\nt\n\n)\n\n\\=\n\n∑\n\nn\n\n\\=\n\n0\n\n∞\n\nα\n\nn\n\n¯\n\n(\n\ni\n\nt\n\n)\n\nn\n\n(\n\nα\n\n\\+\n\nβ\n\n)\n\nn\n\n¯\n\nn\n\n\\!\n\n\\=\n\n1\n\n\\+\n\n∑\n\nk\n\n\\=\n\n1\n\n∞\n\n(\n\n∏\n\nr\n\n\\=\n\n0\n\nk\n\n−\n\n1\n\nα\n\n\\+\n\nr\n\nα\n\n\\+\n\nβ\n\n\\+\n\nr\n\n)\n\n(\n\ni\n\nt\n\n)\n\nk\n\nk\n\n\\!\n\n{\\\\displaystyle {\\\\begin{aligned}\\\\varphi \\_{X}(\\\\alpha ;\\\\beta ;t)&=\\\\operatorname {E} \\\\left\\[e^{itX}\\\\right\\]\\\\\\\\&=\\\\int \\_{0}^{1}e^{itx}f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\&={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\!\\\\\\\\&=\\\\sum \\_{n=0}^{\\\\infty }{\\\\frac {\\\\alpha ^{\\\\overline {n}}(it)^{n}}{(\\\\alpha +\\\\beta )^{\\\\overline {n}}n!}}\\\\\\\\&=1+\\\\sum \\_{k=1}^{\\\\infty }\\\\left(\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}\\\\right){\\\\frac {(it)^{k}}{k!}}\\\\end{aligned}}}\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\varphi \\_{X}(\\\\alpha ;\\\\beta ;t)&=\\\\operatorname {E} \\\\left\\[e^{itX}\\\\right\\]\\\\\\\\&=\\\\int \\_{0}^{1}e^{itx}f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\&={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\!\\\\\\\\&=\\\\sum \\_{n=0}^{\\\\infty }{\\\\frac {\\\\alpha ^{\\\\overline {n}}(it)^{n}}{(\\\\alpha +\\\\beta )^{\\\\overline {n}}n!}}\\\\\\\\&=1+\\\\sum \\_{k=1}^{\\\\infty }\\\\left(\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}\\\\right){\\\\frac {(it)^{k}}{k!}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2e0e8f5d3bf4ec0cbe9b911e26c961e9ebaefdd8)\n\nwhere\n\nx\n\nn\n\n¯\n\n\\=\n\nx\n\n(\n\nx\n\n\\+\n\n1\n\n)\n\n(\n\nx\n\n\\+\n\n2\n\n)\n\n⋯\n\n(\n\nx\n\n\\+\n\nn\n\n−\n\n1\n\n)\n\n{\\\\displaystyle x^{\\\\overline {n}}=x(x+1)(x+2)\\\\cdots (x+n-1)}\n\n![{\\\\displaystyle x^{\\\\overline {n}}=x(x+1)(x+2)\\\\cdots (x+n-1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4621f37bf97baedc02455832a6345db0233819aa)\n\nis the [rising factorial](https:\/\/en.wikipedia.org\/wiki\/Rising_factorial \"Rising factorial\"). The value of the characteristic function for *t* = 0, is one:\n\nφ X ( α ; β ; 0 ) \\= 1 F 1 ( α ; α \\+ β ; 0 ) \\= 1\\. {\\\\displaystyle \\\\varphi \\_{X}(\\\\alpha ;\\\\beta ;0)={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;0)=1.} ![{\\\\displaystyle \\\\varphi \\_{X}(\\\\alpha ;\\\\beta ;0)={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;0)=1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4b9e9291a6bd2e67688aa52ef57fdf08b4c50bc4)\n\nAlso, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable *t*:\n\nRe ⁡ \\[ 1 F 1 ( α ; α \\+ β ; i t ) \\] \\= Re ⁡ \\[ 1 F 1 ( α ; α \\+ β ; − i t ) \\] {\\\\displaystyle \\\\operatorname {Re} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\right\\]=\\\\operatorname {Re} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\\\right\\]} ![{\\\\displaystyle \\\\operatorname {Re} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\right\\]=\\\\operatorname {Re} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\\\right\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/468f2135d76bd1b522c84092679209ff6abd5845) Im ⁡ \\[ 1 F 1 ( α ; α \\+ β ; i t ) \\] \\= − Im ⁡ \\[ 1 F 1 ( α ; α \\+ β ; − i t ) \\] {\\\\displaystyle \\\\operatorname {Im} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\right\\]=-\\\\operatorname {Im} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\\\right\\]} ![{\\\\displaystyle \\\\operatorname {Im} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\right\\]=-\\\\operatorname {Im} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\\\right\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0fca8984292cc85cb8a37ecb5a2b7c04b5596282)\n\nThe symmetric case *α* = *β* simplifies the characteristic function of the beta distribution to a [Bessel function](https:\/\/en.wikipedia.org\/wiki\/Bessel_function \"Bessel function\"), since in the special case *α* + *β* = 2*α* the [confluent hypergeometric function](https:\/\/en.wikipedia.org\/wiki\/Confluent_hypergeometric_function \"Confluent hypergeometric function\") (of the first kind) reduces to a [Bessel function](https:\/\/en.wikipedia.org\/wiki\/Bessel_function \"Bessel function\") (the modified Bessel function of the first kind I α − 1 2 {\\\\displaystyle I\\_{\\\\alpha -{\\\\frac {1}{2}}}} ![{\\\\displaystyle I\\_{\\\\alpha -{\\\\frac {1}{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/93d199b8c1bcbdb8b27bc66ea2a0eb51102aa71e) ) using [Kummer's](https:\/\/en.wikipedia.org\/wiki\/Ernst_Kummer \"Ernst Kummer\") second transformation as follows:\n\n1 F 1 ( α ; 2 α ; i t ) \\= e i t 2 0 F 1 ( ; α \\+ 1 2 ; ( i t ) 2 16 ) \\= e i t 2 ( i t 4 ) 1 2 − α Γ ( α \\+ 1 2 ) I α − 1 2 ( i t 2 ) . {\\\\displaystyle {\\\\begin{aligned}{}\\_{1}F\\_{1}(\\\\alpha ;2\\\\alpha ;it)&=e^{\\\\frac {it}{2}}{}\\_{0}F\\_{1}\\\\left(;\\\\alpha +{\\\\tfrac {1}{2}};{\\\\frac {(it)^{2}}{16}}\\\\right)\\\\\\\\&=e^{\\\\frac {it}{2}}\\\\left({\\\\frac {it}{4}}\\\\right)^{{\\\\frac {1}{2}}-\\\\alpha }\\\\Gamma \\\\left(\\\\alpha +{\\\\tfrac {1}{2}}\\\\right)I\\_{\\\\alpha -{\\\\frac {1}{2}}}\\\\left({\\\\frac {it}{2}}\\\\right).\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{}\\_{1}F\\_{1}(\\\\alpha ;2\\\\alpha ;it)&=e^{\\\\frac {it}{2}}{}\\_{0}F\\_{1}\\\\left(;\\\\alpha +{\\\\tfrac {1}{2}};{\\\\frac {(it)^{2}}{16}}\\\\right)\\\\\\\\&=e^{\\\\frac {it}{2}}\\\\left({\\\\frac {it}{4}}\\\\right)^{{\\\\frac {1}{2}}-\\\\alpha }\\\\Gamma \\\\left(\\\\alpha +{\\\\tfrac {1}{2}}\\\\right)I\\_{\\\\alpha -{\\\\frac {1}{2}}}\\\\left({\\\\frac {it}{2}}\\\\right).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f6729c240186898dc4200acd85640a1bc43a37f8)\n\nIn the accompanying plots, the [real part](https:\/\/en.wikipedia.org\/wiki\/Complex_number \"Complex number\") (Re) of the [characteristic function](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") of the beta distribution is displayed for symmetric (*α* = *β*) and skewed (*α* ≠ *β*) cases.\n\n### Other moments\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=25 \"Edit section: Other moments\")\\]\n\n#### Moment generating function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=26 \"Edit section: Moment generating function\")\\]\n\nIt also follows[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9) that the [moment generating function](https:\/\/en.wikipedia.org\/wiki\/Moment_generating_function \"Moment generating function\") is\n\nM X ( α ; β ; t ) \\= E ⁡ \\[ e t X \\] \\= ∫ 0 1 e t x f ( x ; α , β ) d x \\= 1 F 1 ( α ; α \\+ β ; t ) \\= ∑ n \\= 0 ∞ α n ¯ ( α \\+ β ) n ¯ t n n \\! \\= 1 \\+ ∑ k \\= 1 ∞ ( ∏ r \\= 0 k − 1 α \\+ r α \\+ β \\+ r ) t k k \\! . {\\\\displaystyle {\\\\begin{aligned}M\\_{X}(\\\\alpha ;\\\\beta ;t)&=\\\\operatorname {E} \\\\left\\[e^{tX}\\\\right\\]\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}e^{tx}f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;t)\\\\\\\\\\[4pt\\]&=\\\\sum \\_{n=0}^{\\\\infty }{\\\\frac {\\\\alpha ^{\\\\overline {n}}}{(\\\\alpha +\\\\beta )^{\\\\overline {n}}}}{\\\\frac {t^{n}}{n!}}\\\\\\\\\\[4pt\\]&=1+\\\\sum \\_{k=1}^{\\\\infty }\\\\left(\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}\\\\right){\\\\frac {t^{k}}{k!}}.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}M\\_{X}(\\\\alpha ;\\\\beta ;t)&=\\\\operatorname {E} \\\\left\\[e^{tX}\\\\right\\]\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}e^{tx}f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;t)\\\\\\\\\\[4pt\\]&=\\\\sum \\_{n=0}^{\\\\infty }{\\\\frac {\\\\alpha ^{\\\\overline {n}}}{(\\\\alpha +\\\\beta )^{\\\\overline {n}}}}{\\\\frac {t^{n}}{n!}}\\\\\\\\\\[4pt\\]&=1+\\\\sum \\_{k=1}^{\\\\infty }\\\\left(\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}\\\\right){\\\\frac {t^{k}}{k!}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7c5c0eea7bcffadb73fef0c85534a8bdca36ebbb)\n\nIn particular *M**X*(*α*; *β*; 0) = 1.\n\n#### Higher moments\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=27 \"Edit section: Higher moments\")\\]\n\nUsing the [moment generating function](https:\/\/en.wikipedia.org\/wiki\/Moment_generating_function \"Moment generating function\"), the *k*\\-th [raw moment](https:\/\/en.wikipedia.org\/wiki\/Raw_moment \"Raw moment\") is given by[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) the factor\n\n∏ r \\= 0 k − 1 α \\+ r α \\+ β \\+ r {\\\\displaystyle \\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}} ![{\\\\displaystyle \\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2ebd486800857fa26dc780277828ac6e8549b6dd)\n\nmultiplying the (exponential series) term ( t k k \\! ) {\\\\displaystyle \\\\left({\\\\frac {t^{k}}{k!}}\\\\right)} ![{\\\\displaystyle \\\\left({\\\\frac {t^{k}}{k!}}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0fa171a890daf87017345709927744967da720eb) in the series of the [moment generating function](https:\/\/en.wikipedia.org\/wiki\/Moment_generating_function \"Moment generating function\")\n\nE ⁡ \\[ X k \\] \\= α k ¯ ( α \\+ β ) k ¯ \\= ∏ r \\= 0 k − 1 α \\+ r α \\+ β \\+ r {\\\\displaystyle \\\\operatorname {E} \\[X^{k}\\]={\\\\frac {\\\\alpha ^{\\\\overline {k}}}{(\\\\alpha +\\\\beta )^{\\\\overline {k}}}}=\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}} ![{\\\\displaystyle \\\\operatorname {E} \\[X^{k}\\]={\\\\frac {\\\\alpha ^{\\\\overline {k}}}{(\\\\alpha +\\\\beta )^{\\\\overline {k}}}}=\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/af7e0927f7f82eccc61e9fb55883ce96d7f958f1)\n\nwhere (*x*)(*k*) is a [Pochhammer symbol](https:\/\/en.wikipedia.org\/wiki\/Pochhammer_symbol \"Pochhammer symbol\") representing rising factorial. It can also be written in a recursive form as\n\nE ⁡ \\[ X k \\] \\= α \\+ k − 1 α \\+ β \\+ k − 1 E ⁡ \\[ X k − 1 \\] . {\\\\displaystyle \\\\operatorname {E} \\[X^{k}\\]={\\\\frac {\\\\alpha +k-1}{\\\\alpha +\\\\beta +k-1}}\\\\operatorname {E} \\[X^{k-1}\\].} ![{\\\\displaystyle \\\\operatorname {E} \\[X^{k}\\]={\\\\frac {\\\\alpha +k-1}{\\\\alpha +\\\\beta +k-1}}\\\\operatorname {E} \\[X^{k-1}\\].}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/069cb373a905b1e8a5a82a0e3b028e88f63672e2)\n\nSince the moment generating function M X ( α ; β ; ⋅ ) {\\\\displaystyle M\\_{X}(\\\\alpha ;\\\\beta ;\\\\cdot )} ![{\\\\displaystyle M\\_{X}(\\\\alpha ;\\\\beta ;\\\\cdot )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2da3ca5fb26a0197cfdb61c2bad8a839475aedb4) has a positive radius of convergence,\\[*[citation needed](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Citation_needed \"Wikipedia:Citation needed\")*\\] the beta distribution is [determined by its moments](https:\/\/en.wikipedia.org\/wiki\/Moment_problem \"Moment problem\").[\\[23\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-23)\n\n#### Moments of transformed random variables\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=28 \"Edit section: Moments of transformed random variables\")\\]\n\n##### Moments of linearly transformed, product and inverted random variables\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=29 \"Edit section: Moments of linearly transformed, product and inverted random variables\")\\]\n\nOne can also show the following expectations for a transformed random variable,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) where the random variable *X* is Beta-distributed with parameters *α* and *β*: *X* ~ Beta(*α*, *β*). The expected value of the variable 1 − *X* is the mirror-symmetry of the expected value based on *X*:\n\nE ⁡ \\[ 1 − X \\] \\= β α \\+ β E ⁡ \\[ X ( 1 − X ) \\] \\= E ⁡ \\[ ( 1 − X ) X \\] \\= α β ( α \\+ β ) ( α \\+ β \\+ 1 ) {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[1-X\\]&={\\\\frac {\\\\beta }{\\\\alpha +\\\\beta }}\\\\\\\\\\\\operatorname {E} \\[X(1-X)\\]&=\\\\operatorname {E} \\[(1-X)X\\]={\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )(\\\\alpha +\\\\beta +1)}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[1-X\\]&={\\\\frac {\\\\beta }{\\\\alpha +\\\\beta }}\\\\\\\\\\\\operatorname {E} \\[X(1-X)\\]&=\\\\operatorname {E} \\[(1-X)X\\]={\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )(\\\\alpha +\\\\beta +1)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/43fc49b9eafccd56d39c236b26d222dde51638ce)\n\nDue to the mirror-symmetry of the probability density function of the beta distribution, the variances based on variables *X* and 1 − *X* are identical, and the covariance on *X*(1 − *X*) is the negative of the variance:\n\nvar ⁡ \\[ ( 1 − X ) \\] \\= var ⁡ \\[ X \\] \\= − cov ⁡ \\[ X , ( 1 − X ) \\] \\= α β ( α \\+ β ) 2 ( α \\+ β \\+ 1 ) {\\\\displaystyle \\\\operatorname {var} \\[(1-X)\\]=\\\\operatorname {var} \\[X\\]=-\\\\operatorname {cov} \\[X,(1-X)\\]={\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{2}(\\\\alpha +\\\\beta +1)}}} ![{\\\\displaystyle \\\\operatorname {var} \\[(1-X)\\]=\\\\operatorname {var} \\[X\\]=-\\\\operatorname {cov} \\[X,(1-X)\\]={\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{2}(\\\\alpha +\\\\beta +1)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7273cc84a6c789724b985c34059fa75a62bce631)\n\nThese are the expected values for inverted variables, (these are related to the harmonic means, see [§ Harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Harmonic_mean)):\n\nE ⁡ \\[ 1 X \\] \\= α \\+ β − 1 α − 1 if α \\> 1 E ⁡ \\[ 1 1 − X \\] \\= α \\+ β − 1 β − 1 if β \\> 1 {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]&={\\\\frac {\\\\alpha +\\\\beta -1}{\\\\alpha -1}}&&{\\\\text{ if }}\\\\alpha \\>1\\\\\\\\\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]&={\\\\frac {\\\\alpha +\\\\beta -1}{\\\\beta -1}}&&{\\\\text{ if }}\\\\beta \\>1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]&={\\\\frac {\\\\alpha +\\\\beta -1}{\\\\alpha -1}}&&{\\\\text{ if }}\\\\alpha \\>1\\\\\\\\\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]&={\\\\frac {\\\\alpha +\\\\beta -1}{\\\\beta -1}}&&{\\\\text{ if }}\\\\beta \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/972a9c1853e6991aac6666b06c0a633a0caad5b7)\n\nThe following transformation by dividing the variable *X* by its mirror-image *X*\/(1 − *X*)) results in the expected value of the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")):[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\nE ⁡ \\[ X 1 − X \\] \\= α β − 1 if β \\> 1 E ⁡ \\[ 1 − X X \\] \\= β α − 1 if α \\> 1 {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]&={\\\\frac {\\\\alpha }{\\\\beta -1}}&&{\\\\text{ if }}\\\\beta \\>1\\\\\\\\\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]&={\\\\frac {\\\\beta }{\\\\alpha -1}}&&{\\\\text{ if }}\\\\alpha \\>1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]&={\\\\frac {\\\\alpha }{\\\\beta -1}}&&{\\\\text{ if }}\\\\beta \\>1\\\\\\\\\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]&={\\\\frac {\\\\beta }{\\\\alpha -1}}&&{\\\\text{ if }}\\\\alpha \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/db2a0928a7f906bcbe5cfd9d0c57713c9ab5cfb7)\n\nVariances of these transformed variables can be obtained by integration, as the expected values of the second moments centered on the corresponding variables:\n\nvar ⁡ \\[ 1 X \\] \\= E ⁡ \\[ ( 1 X − E ⁡ \\[ 1 X \\] ) 2 \\] \\= var ⁡ \\[ 1 − X X \\] \\= E ⁡ \\[ ( 1 − X X − E ⁡ \\[ 1 − X X \\] ) 2 \\] \\= β ( α \\+ β − 1 ) ( α − 2 ) ( α − 1 ) 2 if α \\> 2 {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1}{X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]\\\\right)^{2}\\\\right\\]=\\\\operatorname {var} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1-X}{X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\right)^{2}\\\\right\\]={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{\\\\left(\\\\alpha -2\\\\right)\\\\left(\\\\alpha -1\\\\right)^{2}}}{\\\\text{ if }}\\\\alpha \\>2\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1}{X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]\\\\right)^{2}\\\\right\\]=\\\\operatorname {var} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1-X}{X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\right)^{2}\\\\right\\]={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{\\\\left(\\\\alpha -2\\\\right)\\\\left(\\\\alpha -1\\\\right)^{2}}}{\\\\text{ if }}\\\\alpha \\>2\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ab8c3db5d04798c92c6360624060ba583ebdf3df)\n\nThe following variance of the variable *X* divided by its mirror-image (*X*\/(1−*X*) results in the variance of the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")):[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\nvar ⁡ \\[ 1 1 − X \\] \\= E ⁡ \\[ ( 1 1 − X − E ⁡ \\[ 1 1 − X \\] ) 2 \\] \\= var ⁡ \\[ X 1 − X \\] \\= E ⁡ \\[ ( X 1 − X − E ⁡ \\[ X 1 − X \\] ) 2 \\] \\= α ( α \\+ β − 1 ) ( β − 2 ) ( β − 1 ) 2 if β \\> 2 {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1}{1-X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]\\\\right)^{2}\\\\right\\]=\\\\operatorname {var} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {X}{1-X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\right)^{2}\\\\right\\]={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{\\\\left(\\\\beta -2\\\\right)\\\\left(\\\\beta -1\\\\right)^{2}}}{\\\\text{ if }}\\\\beta \\>2\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1}{1-X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]\\\\right)^{2}\\\\right\\]=\\\\operatorname {var} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {X}{1-X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\right)^{2}\\\\right\\]={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{\\\\left(\\\\beta -2\\\\right)\\\\left(\\\\beta -1\\\\right)^{2}}}{\\\\text{ if }}\\\\beta \\>2\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/76d83000746ff4852e63b063948f9db8110d2d22)\n\nThe covariances are:\n\ncov ⁡ \\[ 1 X , 1 1 − X \\] \\= cov ⁡ \\[ 1 − X X , X 1 − X \\] \\= cov ⁡ \\[ 1 X , X 1 − X \\] \\= cov ⁡ \\[ 1 − X X , 1 1 − X \\] \\= α \\+ β − 1 ( α − 1 ) ( β − 1 ) if α , β \\> 1 {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {X}{1-X}}\\\\right\\]=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {X}{1-X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {1}{1-X}}\\\\right\\]={\\\\frac {\\\\alpha +\\\\beta -1}{(\\\\alpha -1)(\\\\beta -1)}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {X}{1-X}}\\\\right\\]=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {X}{1-X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {1}{1-X}}\\\\right\\]={\\\\frac {\\\\alpha +\\\\beta -1}{(\\\\alpha -1)(\\\\beta -1)}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ce6ebb40e9f206c0269353799307092c2edbcfa0) These expectations and variances appear in the four-parameter Fisher information matrix ([§ Fisher information](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Fisher_information).)\n\n##### Moments of logarithmically transformed random variables\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=30 \"Edit section: Moments of logarithmically transformed random variables\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c8\/Logit.svg\/500px-Logit.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Logit.svg)\n\nPlot of logit(*X*) = ln(*X*\/(1 −*X*)) (vertical axis) vs. *X* in the domain of 0 to 1 (horizontal axis). Logit transformations are interesting, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable\n\nExpected values for [logarithmic transformations](https:\/\/en.wikipedia.org\/wiki\/Logarithm_transformation \"Logarithm transformation\") (useful for [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimates, see [§ Parameter estimation, Maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Parameter_estimation,_Maximum_likelihood)) are discussed in this section. The following logarithmic linear transformations are related to the geometric means *GX* and *G*1−*X* (see [§ Geometric Mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_Mean)):\n\nE ⁡ \\[ ln ⁡ X \\] \\= ψ ( α ) − ψ ( α \\+ β ) \\= − E ⁡ \\[ ln ⁡ 1 X \\] , E ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= ψ ( β ) − ψ ( α \\+ β ) \\= − E ⁡ \\[ ln ⁡ 1 1 − X \\] . {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\\\ln X\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{X}}\\\\right\\],\\\\\\\\\\\\operatorname {E} \\[\\\\ln(1-X)\\]&=\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\].\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\\\ln X\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{X}}\\\\right\\],\\\\\\\\\\\\operatorname {E} \\[\\\\ln(1-X)\\]&=\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\].\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/af6c3188ac96160054472db2a23bab9a22f1e486)\n\nWhere the **[digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\")** *ψ*(*α*) is defined as the [logarithmic derivative](https:\/\/en.wikipedia.org\/wiki\/Logarithmic_derivative \"Logarithmic derivative\") of the [gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"):[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18)\n\nψ ( α ) \\= d d α ln ⁡ Γ ( α ) {\\\\displaystyle \\\\psi (\\\\alpha )={\\\\frac {d}{d\\\\alpha }}\\\\ln \\\\Gamma (\\\\alpha )} ![{\\\\displaystyle \\\\psi (\\\\alpha )={\\\\frac {d}{d\\\\alpha }}\\\\ln \\\\Gamma (\\\\alpha )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7990c6470eee454b17e98d894aa6a2b31960a8f7)\n\n[Logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformations are interesting,[\\[24\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-MacKay-24) as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable:\n\nE ⁡ \\[ ln ⁡ X 1 − X \\] \\= ψ ( α ) − ψ ( β ) \\= E ⁡ \\[ ln ⁡ X \\] \\+ E ⁡ \\[ ln ⁡ 1 1 − X \\] , E ⁡ \\[ ln ⁡ 1 − X X \\] \\= ψ ( β ) − ψ ( α ) \\= − E ⁡ \\[ ln ⁡ X 1 − X \\] . {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\beta )=\\\\operatorname {E} \\[\\\\ln X\\]+\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\],\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]&=\\\\psi (\\\\beta )-\\\\psi (\\\\alpha )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\].\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\beta )=\\\\operatorname {E} \\[\\\\ln X\\]+\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\],\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]&=\\\\psi (\\\\beta )-\\\\psi (\\\\alpha )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\].\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a30b0fbbefff93de41d6775f2ab623670f09a4b2)\n\nJohnson[\\[25\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JohnsonLogInv-25) considered the distribution of the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") – transformed variable ln(*X*\/1 − *X*), including its moment generating function and approximations for large values of the shape parameters. This transformation extends the finite support \\[0, 1\\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the [logistic-beta distribution](https:\/\/en.wikipedia.org\/wiki\/Logistic-beta_distribution \"Logistic-beta distribution\").\n\nHigher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows:\n\nE ⁡ \\[ ln 2 ⁡ ( X ) \\] \\= ( ψ ( α ) − ψ ( α \\+ β ) ) 2 \\+ ψ 1 ( α ) − ψ 1 ( α \\+ β ) , E ⁡ \\[ ln 2 ⁡ ( 1 − X ) \\] \\= ( ψ ( β ) − ψ ( α \\+ β ) ) 2 \\+ ψ 1 ( β ) − ψ 1 ( α \\+ β ) , E ⁡ \\[ ln ⁡ ( X ) ln ⁡ ( 1 − X ) \\] \\= ( ψ ( α ) − ψ ( α \\+ β ) ) ( ψ ( β ) − ψ ( α \\+ β ) ) − ψ 1 ( α \\+ β ) . {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[\\\\ln ^{2}(X)\\\\right\\]&=(\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta ))^{2}+\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln ^{2}(1-X)\\\\right\\]&=(\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta ))^{2}+\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln(X)\\\\ln(1-X)\\\\right\\]&=(\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta ))(\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta ))-\\\\psi \\_{1}(\\\\alpha +\\\\beta ).\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[\\\\ln ^{2}(X)\\\\right\\]&=(\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta ))^{2}+\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln ^{2}(1-X)\\\\right\\]&=(\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta ))^{2}+\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln(X)\\\\ln(1-X)\\\\right\\]&=(\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta ))(\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta ))-\\\\psi \\_{1}(\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b42eb1276e349df39df3051df11e0e16afe88e2e)\n\ntherefore the [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of the logarithmic variables and [covariance](https:\/\/en.wikipedia.org\/wiki\/Covariance \"Covariance\") of ln(*X*) and ln(1−*X*) are:\n\ncov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] \\= E ⁡ \\[ ln ⁡ X ln ⁡ ( 1 − X ) \\] − E ⁡ \\[ ln ⁡ X \\] E ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= − ψ 1 ( α \\+ β ) var ⁡ \\[ ln ⁡ X \\] \\= E ⁡ \\[ ln 2 ⁡ X \\] − ( E ⁡ \\[ ln ⁡ X \\] ) 2 \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) \\= ψ 1 ( α ) \\+ cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= E ⁡ \\[ ln 2 ⁡ ( 1 − X ) \\] − ( E ⁡ \\[ ln ⁡ ( 1 − X ) \\] ) 2 \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) \\= ψ 1 ( β ) \\+ cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]&=\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\ln(1-X)\\\\right\\]-\\\\operatorname {E} \\[\\\\ln X\\]\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\\\\\&=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&\\\\\\\\\\\\operatorname {var} \\[\\\\ln X\\]&=\\\\operatorname {E} \\[\\\\ln ^{2}X\\]-(\\\\operatorname {E} \\[\\\\ln X\\])^{2}\\\\\\\\&=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&=\\\\psi \\_{1}(\\\\alpha )+\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {var} \\[\\\\ln(1-X)\\]&=\\\\operatorname {E} \\[\\\\ln ^{2}(1-X)\\]-(\\\\operatorname {E} \\[\\\\ln(1-X)\\])^{2}\\\\\\\\&=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&=\\\\psi \\_{1}(\\\\beta )+\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]&=\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\ln(1-X)\\\\right\\]-\\\\operatorname {E} \\[\\\\ln X\\]\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\\\\\&=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&\\\\\\\\\\\\operatorname {var} \\[\\\\ln X\\]&=\\\\operatorname {E} \\[\\\\ln ^{2}X\\]-(\\\\operatorname {E} \\[\\\\ln X\\])^{2}\\\\\\\\&=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&=\\\\psi \\_{1}(\\\\alpha )+\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {var} \\[\\\\ln(1-X)\\]&=\\\\operatorname {E} \\[\\\\ln ^{2}(1-X)\\]-(\\\\operatorname {E} \\[\\\\ln(1-X)\\])^{2}\\\\\\\\&=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&=\\\\psi \\_{1}(\\\\beta )+\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53f5f8222528ab3aa1e5f610f49d440d348f7d40)\n\nwhere the **[trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), and is defined as the derivative of the [digamma](https:\/\/en.wikipedia.org\/wiki\/Digamma \"Digamma\") function:\n\nψ 1 ( α ) \\= d 2 ln ⁡ Γ ( α ) d α 2 \\= d ψ ( α ) d α . {\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{d\\\\alpha ^{2}}}={\\\\frac {d\\\\psi (\\\\alpha )}{d\\\\alpha }}.} ![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{d\\\\alpha ^{2}}}={\\\\frac {d\\\\psi (\\\\alpha )}{d\\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/041bbff527a17a19f628022e9d3bbb548e7c9f87)\n\nThe variances and covariance of the logarithmically transformed variables *X* and (1 − *X*) are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables *X* and (1 − *X*), as the logarithm approaches negative infinity for the variable approaching zero.\n\nThese logarithmic variances and covariance are the elements of the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\") matrix for the beta distribution. They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation).\n\nThe variances of the log inverse variables are identical to the variances of the log variables:\n\nvar ⁡ \\[ ln ⁡ 1 X \\] \\= var ⁡ \\[ ln ⁡ X \\] \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) , var ⁡ \\[ ln ⁡ 1 1 − X \\] \\= var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) , cov ⁡ \\[ ln ⁡ 1 X , ln ⁡ 1 1 − X \\] \\= cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] \\= − ψ 1 ( α \\+ β ) . {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1}{X}}\\\\right\\]&=\\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {cov} \\\\left\\[\\\\ln {\\\\frac {1}{X}},\\\\,\\\\ln {\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta ).\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1}{X}}\\\\right\\]&=\\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {cov} \\\\left\\[\\\\ln {\\\\frac {1}{X}},\\\\,\\\\ln {\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/18739db82ac6e431571537bb8f09ff006d670b04)\n\nIt also follows that the variances of the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\")\\-transformed variables are\n\nvar ⁡ \\[ ln ⁡ X 1 − X \\] \\= var ⁡ \\[ ln ⁡ 1 − X X \\] \\= − cov ⁡ \\[ ln ⁡ X 1 − X , ln ⁡ 1 − X X \\] \\= ψ 1 ( α ) \\+ ψ 1 ( β ) . {\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]&=\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\&=-\\\\operatorname {cov} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}},\\\\,\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ).\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]&=\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\&=-\\\\operatorname {cov} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}},\\\\,\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b0f473b85a33bfd20c16bd2e752af80def37388a)\n\n### Quantities of information (entropy)\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=31 \"Edit section: Quantities of information (entropy)\")\\]\n\nGiven a beta distributed random variable, *X* ~ Beta(*α*, *β*), the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") of *X* is (measured in [nats](https:\/\/en.wikipedia.org\/wiki\/Nat_\\(unit\\) \"Nat (unit)\")),[\\[26\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-26) the expected value of the negative of the logarithm of the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\"):\n\nh ( X ) \\= E ⁡ \\[ − ln ⁡ f ( X ; α , β ) \\] \\= ∫ 0 1 − f ( x ; α , β ) ln ⁡ f ( x ; α , β ) d x \\= ln ⁡ B ( α , β ) − ( α − 1 ) ψ ( α ) − ( β − 1 ) ψ ( β ) \\+ ( α \\+ β − 2 ) ψ ( α \\+ β ) {\\\\displaystyle {\\\\begin{aligned}h(X)&=\\\\operatorname {E} \\\\left\\[-\\\\ln f(X;\\\\alpha ,\\\\beta )\\\\right\\]\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}-f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-(\\\\alpha -1)\\\\psi (\\\\alpha )-(\\\\beta -1)\\\\psi (\\\\beta )+(\\\\alpha +\\\\beta -2)\\\\psi (\\\\alpha +\\\\beta )\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}h(X)&=\\\\operatorname {E} \\\\left\\[-\\\\ln f(X;\\\\alpha ,\\\\beta )\\\\right\\]\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}-f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-(\\\\alpha -1)\\\\psi (\\\\alpha )-(\\\\beta -1)\\\\psi (\\\\beta )+(\\\\alpha +\\\\beta -2)\\\\psi (\\\\alpha +\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ee7535c40811773d4239dc63ea6c2200c4b7a63c)\n\nwhere *f*(*x*; *α*, *β*) is the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") of the beta distribution:\n\nf ( x ; α , β ) \\= x α − 1 ( 1 − x ) β − 1 B ( α , β ) {\\\\displaystyle f(x;\\\\alpha ,\\\\beta )={\\\\frac {x^{\\\\alpha -1}\\\\left(1-x\\\\right)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}} ![{\\\\displaystyle f(x;\\\\alpha ,\\\\beta )={\\\\frac {x^{\\\\alpha -1}\\\\left(1-x\\\\right)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4fe624be6412bdc2abe8b1775b1ca44b5a9ea27c)\n\nThe [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\") *ψ* appears in the formula for the differential entropy as a consequence of Euler's integral formula for the [harmonic numbers](https:\/\/en.wikipedia.org\/wiki\/Harmonic_number \"Harmonic number\") which follows from the integral:\n\n∫ 0 1 1 − x α − 1 1 − x d x \\= ψ ( α ) − ψ ( 1 ) {\\\\displaystyle \\\\int \\_{0}^{1}{\\\\frac {1-x^{\\\\alpha -1}}{1-x}}\\\\,dx=\\\\psi (\\\\alpha )-\\\\psi (1)} ![{\\\\displaystyle \\\\int \\_{0}^{1}{\\\\frac {1-x^{\\\\alpha -1}}{1-x}}\\\\,dx=\\\\psi (\\\\alpha )-\\\\psi (1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/364bd5d460a8db9c5038b2f19cc1e5d088671ae8)\n\nThe [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") of the beta distribution is negative for all values of *α* and *β* greater than zero, except at *α* = *β* = 1 (for which values the beta distribution is the same as the [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\")), where the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") reaches its [maximum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of zero. It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable.\n\nFor *α* or *β* approaching zero, the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") approaches its [minimum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of negative infinity. For (either or both) *α* or *β* approaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both) *α* or *β* approaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order. If either *α* or *β* approaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else. If both shape parameters are equal (the symmetric case), *α* = *β*, and they approach infinity simultaneously, the probability density becomes a spike ([Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\")) concentrated at the middle *x* = 1\/2, and hence there is 100% probability at the middle *x* = 1\/2 and zero probability everywhere else.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e2\/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/2a\/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg)\n\nThe (continuous case) [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") was introduced by Shannon in his original paper (where he named it the \"entropy of a continuous distribution\"), as the concluding part of the same paper where he defined the [discrete entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\").[\\[27\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-27) It is known since then that the differential entropy may differ from the [infinitesimal](https:\/\/en.wikipedia.org\/wiki\/Infinitesimal \"Infinitesimal\") limit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy.\n\nGiven two beta distributed random variables, *X*1 ~ Beta(*α*, *β*) and *X*2 ~ Beta(*α′*, *β′*), the [cross-entropy](https:\/\/en.wikipedia.org\/wiki\/Cross-entropy \"Cross-entropy\") is (measured in nats)[\\[28\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Cover_and_Thomas-28)\n\nH ( X 1 , X 2 ) \\= ∫ 0 1 − f ( x ; α , β ) ln ⁡ f ( x ; α ′ , β ′ ) d x \\= ln ⁡ B ( α ′ , β ′ ) − ( α ′ − 1 ) ψ ( α ) − ( β ′ − 1 ) ψ ( β ) \\+ ( α ′ \\+ β ′ − 2 ) ψ ( α \\+ β ) . {\\\\displaystyle {\\\\begin{aligned}H(X\\_{1},X\\_{2})&=\\\\int \\_{0}^{1}-f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ',\\\\beta ')\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\ln \\\\mathrm {B} (\\\\alpha ',\\\\beta ')-(\\\\alpha '-1)\\\\psi (\\\\alpha )-(\\\\beta '-1)\\\\psi (\\\\beta )+\\\\left(\\\\alpha '+\\\\beta '-2\\\\right)\\\\psi (\\\\alpha +\\\\beta ).\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}H(X\\_{1},X\\_{2})&=\\\\int \\_{0}^{1}-f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ',\\\\beta ')\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\ln \\\\mathrm {B} (\\\\alpha ',\\\\beta ')-(\\\\alpha '-1)\\\\psi (\\\\alpha )-(\\\\beta '-1)\\\\psi (\\\\beta )+\\\\left(\\\\alpha '+\\\\beta '-2\\\\right)\\\\psi (\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f9e7662d68f802bd6e6b1019f0eb46e6a4bfc0a4)\n\nThe [cross entropy](https:\/\/en.wikipedia.org\/wiki\/Cross_entropy \"Cross entropy\") has been used as an error metric to measure the distance between two hypotheses.[\\[29\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Plunkett-29)[\\[30\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Nallapati-30) Its absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood [\\[28\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Cover_and_Thomas-28)(see section on \"Parameter estimation. Maximum likelihood estimation\")).\n\nThe relative entropy, or [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") *D*KL(*X*1 \\|\\| *X*2), is a measure of the inefficiency of assuming that the distribution is *X*2 ~ Beta(*α′*, *β′*) when the distribution is really *X*1 ~ Beta(*α*, *β*). It is defined as follows (measured in nats).\n\nD K L ( X 1 ∥ X 2 ) \\= ∫ 0 1 f ( x ; α , β ) ln ⁡ f ( x ; α , β ) f ( x ; α ′ , β ′ ) d x \\= ( ∫ 0 1 f ( x ; α , β ) ln ⁡ f ( x ; α , β ) d x ) − ( ∫ 0 1 f ( x ; α , β ) ln ⁡ f ( x ; α ′ , β ′ ) d x ) \\= − h ( X 1 ) \\+ H ( X 1 , X 2 ) \\= ln ⁡ B ( α ′ , β ′ ) B ( α , β ) \\+ ( α − α ′ ) ψ ( α ) \\+ ( β − β ′ ) ψ ( β ) \\+ ( α ′ − α \\+ β ′ − β ) ψ ( α \\+ β ) . {\\\\displaystyle {\\\\begin{aligned}D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})&=\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\,\\\\ln {\\\\frac {f(x;\\\\alpha ,\\\\beta )}{f(x;\\\\alpha ',\\\\beta ')}}\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\left(\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\right)-\\\\left(\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ',\\\\beta ')\\\\,dx\\\\right)\\\\\\\\\\[4pt\\]&=-h(X\\_{1})+H(X\\_{1},X\\_{2})\\\\\\\\\\[4pt\\]&=\\\\ln {\\\\frac {\\\\mathrm {B} (\\\\alpha ',\\\\beta ')}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}+\\\\left(\\\\alpha -\\\\alpha '\\\\right)\\\\psi (\\\\alpha )+\\\\left(\\\\beta -\\\\beta '\\\\right)\\\\psi (\\\\beta )+\\\\left(\\\\alpha '-\\\\alpha +\\\\beta '-\\\\beta \\\\right)\\\\psi (\\\\alpha +\\\\beta ).\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})&=\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\,\\\\ln {\\\\frac {f(x;\\\\alpha ,\\\\beta )}{f(x;\\\\alpha ',\\\\beta ')}}\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\left(\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\right)-\\\\left(\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ',\\\\beta ')\\\\,dx\\\\right)\\\\\\\\\\[4pt\\]&=-h(X\\_{1})+H(X\\_{1},X\\_{2})\\\\\\\\\\[4pt\\]&=\\\\ln {\\\\frac {\\\\mathrm {B} (\\\\alpha ',\\\\beta ')}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}+\\\\left(\\\\alpha -\\\\alpha '\\\\right)\\\\psi (\\\\alpha )+\\\\left(\\\\beta -\\\\beta '\\\\right)\\\\psi (\\\\beta )+\\\\left(\\\\alpha '-\\\\alpha +\\\\beta '-\\\\beta \\\\right)\\\\psi (\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/533f00a1d061ebd96a170013f1339d34fc8f1322)\n\nThe relative entropy, or [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\"), is always non-negative. A few numerical examples follow:\n\n- *X*1 ~ Beta(1, 1) and *X*2 ~ Beta(3, 3); *D*KL(*X*1 \\|\\| *X*2) = 0.598803; *D*KL(*X*2 \\|\\| *X*1) = 0.267864; *h*(*X*1) = 0; *h*(*X*2) = −0.267864\n- *X*1 ~ Beta(3, 0.5) and *X*2 ~ Beta(0.5, 3); *D*KL(*X*1 \\|\\| *X*2) = 7.21574; *D*KL(*X*2 \\|\\| *X*1) = 7.21574; *h*(*X*1) = −1.10805; *h*(*X*2) = −1.10805.\n\nThe [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") is not symmetric *D*KL(*X*1 \\|\\| *X*2) ≠ *D*KL(*X*2 \\|\\| *X*1) for the case in which the individual beta distributions Beta(1, 1) and Beta(3, 3) are symmetric, but have different entropies *h*(*X*1) ≠ *h*(*X*2). The value of the Kullback divergence depends on the direction traveled: whether going from a higher (differential) entropy to a lower (differential) entropy or the other way around. In the numerical example above, the Kullback divergence measures the inefficiency of assuming that the distribution is (bell-shaped) Beta(3, 3), rather than (uniform) Beta(1, 1). The \"h\" entropy of Beta(1, 1) is higher than the \"h\" entropy of Beta(3, 3) because the uniform distribution Beta(1, 1) has a maximum amount of disorder. The Kullback divergence is more than two times higher (0.598803 instead of 0.267864) when measured in the direction of decreasing entropy: the direction that assumes that the (uniform) Beta(1, 1) distribution is (bell-shaped) Beta(3, 3) rather than the other way around. In this restricted sense, the Kullback divergence is consistent with the [second law of thermodynamics](https:\/\/en.wikipedia.org\/wiki\/Second_law_of_thermodynamics \"Second law of thermodynamics\").\n\nThe [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") is symmetric *D*KL(*X*1 \\|\\| *X*2) = *D*KL(*X*2 \\|\\| *X*1) for the skewed cases Beta(3, 0.5) and Beta(0.5, 3) that have equal differential entropy *h*(*X*1) = *h*(*X*2).\n\nThe symmetry condition:\n\nD K L ( X 1 ∥ X 2 ) \\= D K L ( X 2 ∥ X 1 ) , if h ( X 1 ) \\= h ( X 2 ) , for (skewed) α ≠ β {\\\\displaystyle D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})=D\\_{\\\\mathrm {KL} }(X\\_{2}\\\\parallel X\\_{1}),{\\\\text{ if }}h(X\\_{1})=h(X\\_{2}),{\\\\text{ for (skewed) }}\\\\alpha \\\\neq \\\\beta } ![{\\\\displaystyle D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})=D\\_{\\\\mathrm {KL} }(X\\_{2}\\\\parallel X\\_{1}),{\\\\text{ if }}h(X\\_{1})=h(X\\_{2}),{\\\\text{ for (skewed) }}\\\\alpha \\\\neq \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba60c74ffe89210448b12938900faa4019c8daea)\n\nfollows from the above definitions and the mirror-symmetry *f*(*x*; *α*, *β*) = *f*(1 − *x*; *α*, *β*) enjoyed by the beta distribution.\n\n### Relationships between statistical measures\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=32 \"Edit section: Relationships between statistical measures\")\\]\n\n#### Mean, mode and median relationship\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=33 \"Edit section: Mean, mode and median relationship\")\\]\n\nIf 1 \\< *α* \\< *β* then mode ≤ median ≤ mean.[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kerman2011-10) Expressing the mode (only for *α*, *β* \\> 1), and the mean in terms of *α* and *β*:\n\nα − 1 α \\+ β − 2 ≤ median ≤ α α \\+ β , {\\\\displaystyle {\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}\\\\leq {\\\\text{median}}\\\\leq {\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }},} ![{\\\\displaystyle {\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}\\\\leq {\\\\text{median}}\\\\leq {\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bce75b15e77da7773748f47ccb73d08207595281)\n\nIf 1 \\< *β* \\< *α* then the order of the inequalities are reversed. For *α*, *β* \\> 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of *x*. On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of *x*, for the ([pathological](https:\/\/en.wikipedia.org\/wiki\/Pathological_\\(mathematics\\) \"Pathological (mathematics)\")) case of *α* = 1 and *β* = 1, for which values the beta distribution approaches the uniform distribution and the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") approaches its [maximum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value, and hence maximum \"disorder\".\n\nFor example, for *α* = 1.0001 and *β* = 1.00000001:\n\n- mode = 0.9999; PDF(mode) = 1.00010\n- mean = 0.500025; PDF(mean) = 1.00003\n- median = 0.500035; PDF(median) = 1.00003\n- mean − mode = −0.499875\n- mean − median = −9.65538 × 10−6\n\nwhere PDF stands for the value of the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\").\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/ff\/Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/88\/Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)\n\n#### Mean, geometric mean and harmonic mean relationship\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=34 \"Edit section: Mean, geometric mean and harmonic mean relationship\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/a0\/Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png\/250px-Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Mean,_Median,_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png)\n\n:Mean, median, geometric mean and harmonic mean for beta distribution with 0 \\< *α* = *β* \\< 5\n\nIt is known from the [inequality of arithmetic and geometric means](https:\/\/en.wikipedia.org\/wiki\/Inequality_of_arithmetic_and_geometric_means \"Inequality of arithmetic and geometric means\") that the geometric mean is lower than the mean. Similarly, the harmonic mean is lower than the geometric mean. The accompanying plot shows that for *α* = *β*, both the mean and the median are exactly equal to 1\/2, regardless of the value of *α* = *β*, and the mode is also equal to 1\/2 for *α* = *β* \\> 1, however the geometric and harmonic means are lower than 1\/2 and they only approach this value asymptotically as *α* = *β* → ∞.\n\n#### Kurtosis bounded by the square of the skewness\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=35 \"Edit section: Kurtosis bounded by the square of the skewness\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png\/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:\\(alpha_and_beta\\)_Parameter_estimates_vs._excess_Kurtosis_and_\\(squared\\)_Skewness_Beta_distribution_-_J._Rodal.png)\n\nBeta distribution *α* and *β* parameters vs. excess kurtosis and squared skewness\n\nAs remarked by [Feller](https:\/\/en.wikipedia.org\/wiki\/William_Feller \"William Feller\"),[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Feller-5) in the [Pearson system](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\") the beta probability density appears as [type I](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\") (any difference between the beta distribution and Pearson's type I distribution is only superficial and it makes no difference for the following discussion regarding the relationship between kurtosis and skewness). [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") showed, in Plate 1 of his paper [\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) published in 1916, a graph with the [kurtosis](https:\/\/en.wikipedia.org\/wiki\/Kurtosis \"Kurtosis\") as the vertical axis ([ordinate](https:\/\/en.wikipedia.org\/wiki\/Ordinate \"Ordinate\")) and the square of the [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") as the horizontal axis ([abscissa](https:\/\/en.wikipedia.org\/wiki\/Abscissa \"Abscissa\")), in which a number of distributions were displayed.[\\[31\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Egon-31) The region occupied by the beta distribution is bounded by the following two [lines](https:\/\/en.wikipedia.org\/wiki\/Line_\\(geometry\\) \"Line (geometry)\") in the (skewness2,kurtosis) [plane](https:\/\/en.wikipedia.org\/wiki\/Cartesian_coordinate_system \"Cartesian coordinate system\"), or the (skewness2,excess kurtosis) [plane](https:\/\/en.wikipedia.org\/wiki\/Cartesian_coordinate_system \"Cartesian coordinate system\"):\n\n( skewness ) 2 \\+ 1 \\< kurtosis \\< 3 2 ( skewness ) 2 \\+ 3 {\\\\displaystyle ({\\\\text{skewness}})^{2}+1\\<{\\\\text{kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}+3} ![{\\\\displaystyle ({\\\\text{skewness}})^{2}+1\\<{\\\\text{kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}+3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2061054109329b1c61c8bc91e10a8a45e80b9dc2)\n\nor, equivalently,\n\n( skewness ) 2 − 2 \\< excess kurtosis \\< 3 2 ( skewness ) 2 {\\\\displaystyle ({\\\\text{skewness}})^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}} ![{\\\\displaystyle ({\\\\text{skewness}})^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f75ee7e52ed7a7bceb6f1d754ee547640bebb19f)\n\nAt a time when there were no powerful digital computers, [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") accurately computed further boundaries,[\\[32\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Hahn_and_Shapiro-32)[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) for example, separating the \"U-shaped\" from the \"J-shaped\" distributions. The lower boundary line (excess kurtosis + 2 − skewness2 = 0) is produced by skewed \"U-shaped\" beta distributions with both values of shape parameters *α* and *β* close to zero. The upper boundary line (excess kurtosis − (3\/2) skewness2 = 0) is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") showed[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) that this upper boundary line (excess kurtosis − (3\/2) skewness2 = 0) is also the intersection with Pearson's distribution III, which has unlimited support in one direction (towards positive infinity), and can be bell-shaped or J-shaped. His son, [Egon Pearson](https:\/\/en.wikipedia.org\/wiki\/Egon_Pearson \"Egon Pearson\"), showed[\\[31\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Egon-31) that the region (in the kurtosis\/squared-skewness plane) occupied by the beta distribution (equivalently, Pearson's distribution I) as it approaches this boundary (excess kurtosis − (3\/2) skewness2 = 0) is shared with the [noncentral chi-squared distribution](https:\/\/en.wikipedia.org\/wiki\/Noncentral_chi-squared_distribution \"Noncentral chi-squared distribution\"). Karl Pearson[\\[33\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1895-33) (Pearson 1895, pp. 357, 360, 373–376) also showed that the [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\") is a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6\/*k* and the square of the skewness is 4\/*k*, hence (excess kurtosis − (3\/2) skewness2 = 0) is identically satisfied by the gamma distribution regardless of the value of the parameter \"k\"). Pearson later noted that the [chi-squared distribution](https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution \"Chi-squared distribution\") is a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the [chi-squared distribution](https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution \"Chi-squared distribution\") the excess kurtosis is 12\/*k* and the square of the skewness is 8\/*k*, hence (excess kurtosis − (3\/2) skewness2 = 0) is identically satisfied regardless of the value of the parameter \"k\"). This is to be expected, since the chi-squared distribution *X* ~ χ2(*k*) is a special case of the gamma distribution, with parametrization X ~ Γ(k\/2, 1\/2) where k is a positive integer that specifies the \"number of degrees of freedom\" of the chi-squared distribution.\n\nAn example of a beta distribution near the upper boundary (excess kurtosis − (3\/2) skewness2 = 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)\/(skewness2) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness2 = 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)\/(skewness2) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both *α* and *β* approaching zero symmetrically, the excess kurtosis reaches its minimum value at −2. This minimum value occurs at the point at which the lower boundary line intersects the vertical axis ([ordinate](https:\/\/en.wikipedia.org\/wiki\/Ordinate \"Ordinate\")). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards).\n\nValues for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") appropriately called the region below this boundary the \"impossible region\". The boundary for this \"impossible region\" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters *α* and *β* approach zero and hence all the probability density is concentrated at the ends: *x* = 0, 1 with practically nothing in between them. Since for *α* ≈ *β* ≈ 0 the probability density is concentrated at the two ends *x* = 0 and *x* = 1, this \"impossible boundary\" is determined by a [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), where the two only possible outcomes occur with respective probabilities *p* and *q* = 1 − *p*. For cases approaching this limit boundary with symmetry *α* = *β*, skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are *p* ≈ *q* ≈ 1\/2. For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness2, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities p \\= β α \\+ β {\\\\displaystyle p={\\\\tfrac {\\\\beta }{\\\\alpha +\\\\beta }}} ![{\\\\displaystyle p={\\\\tfrac {\\\\beta }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and q \\= 1 − p \\= α α \\+ β {\\\\displaystyle q=1-p={\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}} ![{\\\\displaystyle q=1-p={\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1.\n\n### Symmetry\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=36 \"Edit section: Symmetry\")\\]\n\nAll statements are conditional on *α*, *β* \\> 0:\n\n- **Probability density function** [reflection symmetry](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\")\n  f\n  \n  (\n  \n  x\n  \n  ;\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  \\=\n  \n  f\n  \n  (\n  \n  1\n  \n  −\n  \n  x\n  \n  ;\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  {\\\\displaystyle f(x;\\\\alpha ,\\\\beta )=f(1-x;\\\\beta ,\\\\alpha )}\n  \n  ![{\\\\displaystyle f(x;\\\\alpha ,\\\\beta )=f(1-x;\\\\beta ,\\\\alpha )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/205d7fb285ac28ac490f522167589070fc9a2ed7)\n- **Cumulative distribution function** [reflection symmetry](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\") plus unitary [translation](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\")\n  F\n  \n  (\n  \n  x\n  \n  ;\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  \\=\n  \n  I\n  \n  x\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  \\=\n  \n  1\n  \n  −\n  \n  F\n  \n  (\n  \n  1\n  \n  −\n  \n  x\n  \n  ;\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  \\=\n  \n  1\n  \n  −\n  \n  I\n  \n  1\n  \n  −\n  \n  x\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  {\\\\displaystyle F(x;\\\\alpha ,\\\\beta )=I\\_{x}(\\\\alpha ,\\\\beta )=1-F(1-x;\\\\beta ,\\\\alpha )=1-I\\_{1-x}(\\\\beta ,\\\\alpha )}\n  \n  ![{\\\\displaystyle F(x;\\\\alpha ,\\\\beta )=I\\_{x}(\\\\alpha ,\\\\beta )=1-F(1-x;\\\\beta ,\\\\alpha )=1-I\\_{1-x}(\\\\beta ,\\\\alpha )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4fd0b60a23fd363c11d1b5bc7247a2ca23a1ca9b)\n- **Mode** [reflection symmetry](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\") plus unitary [translation](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\")\n  mode\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  1\n  \n  −\n  \n  mode\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  ,\n  \n  if\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  ≠\n  \n  B\n  \n  (\n  \n  1\n  \n  ,\n  \n  1\n  \n  )\n  \n  {\\\\displaystyle \\\\operatorname {mode} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=1-\\\\operatorname {mode} (\\\\mathrm {B} (\\\\beta ,\\\\alpha )),{\\\\text{ if }}\\\\mathrm {B} (\\\\beta ,\\\\alpha )\\\\neq \\\\mathrm {B} (1,1)}\n  \n  ![{\\\\displaystyle \\\\operatorname {mode} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=1-\\\\operatorname {mode} (\\\\mathrm {B} (\\\\beta ,\\\\alpha )),{\\\\text{ if }}\\\\mathrm {B} (\\\\beta ,\\\\alpha )\\\\neq \\\\mathrm {B} (1,1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6e4026defabbf20db73c3ca78d814097181dad1b)\n- **Median** [reflection symmetry](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\") plus unitary [translation](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\")\n  median\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  1\n  \n  −\n  \n  median\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle \\\\operatorname {median} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=1-\\\\operatorname {median} (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle \\\\operatorname {median} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=1-\\\\operatorname {median} (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/19c5fc81a596f9cf061b4d44f522bc1ce338b884)\n- **Mean** [reflection symmetry](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\") plus unitary [translation](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\")\n  μ\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  1\n  \n  −\n  \n  μ\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle \\\\mu (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=1-\\\\mu (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle \\\\mu (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=1-\\\\mu (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/76f30cec5ad799ce3fb2738b63d77239089284d4)\n- **Geometric means** each is individually asymmetric, the following symmetry applies between the geometric mean based on *X* and the geometric mean based on its [reflection](https:\/\/en.wikipedia.org\/wiki\/Reflection_formula \"Reflection formula\") 1−*X*\n  G\n  \n  X\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  G\n  \n  1\n  \n  −\n  \n  X\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle G\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=G\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle G\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=G\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/89c7628b97558d0091a96555f199da15889a6783)\n- **Harmonic means** each is individually asymmetric, the following symmetry applies between the harmonic mean based on *X* and the harmonic mean based on its [reflection](https:\/\/en.wikipedia.org\/wiki\/Reflection_formula \"Reflection formula\") 1−*X*\n  H\n  \n  X\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  H\n  \n  1\n  \n  −\n  \n  X\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  if\n  \n  α\n  \n  ,\n  \n  β\n  \n  \\>\n  \n  1\\.\n  \n  {\\\\displaystyle H\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=H\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )){\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}\n  \n  ![{\\\\displaystyle H\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=H\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )){\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0e80c207c2d510bbda4077f80954f86f872c4986)\n- **Variance** symmetry\n  var\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  var\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle \\\\operatorname {var} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\operatorname {var} (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle \\\\operatorname {var} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\operatorname {var} (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/74a71c71ec7328072c2e21431f3a3400332bb421)\n- **Geometric variances** each is individually asymmetric, the following symmetry applies between the log geometric variance based on X and the log geometric variance based on its [reflection](https:\/\/en.wikipedia.org\/wiki\/Reflection_formula \"Reflection formula\") 1−*X*\n  ln\n  \n  ⁡\n  \n  (\n  \n  var\n  \n  G\n  \n  X\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  )\n  \n  \\=\n  \n  ln\n  \n  ⁡\n  \n  (\n  \n  var\n  \n  G\n  \n  (\n  \n  1\n  \n  −\n  \n  X\n  \n  )\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle \\\\ln(\\\\operatorname {var} \\_{GX}(\\\\mathrm {B} (\\\\alpha ,\\\\beta )))=\\\\ln(\\\\operatorname {var} \\_{G(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )))}\n  \n  ![{\\\\displaystyle \\\\ln(\\\\operatorname {var} \\_{GX}(\\\\mathrm {B} (\\\\alpha ,\\\\beta )))=\\\\ln(\\\\operatorname {var} \\_{G(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/02858709f01f43892fd42f844eebb028c6805dc7)\n- **Geometric covariance** symmetry\n  ln\n  \n  ⁡\n  \n  cov\n  \n  G\n  \n  X\n  \n  ,\n  \n  (\n  \n  1\n  \n  −\n  \n  X\n  \n  )\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  ln\n  \n  ⁡\n  \n  cov\n  \n  G\n  \n  X\n  \n  ,\n  \n  (\n  \n  1\n  \n  −\n  \n  X\n  \n  )\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d59fb83efc45a84f3606874dc5791681812ca46b)\n- **Mean [absolute deviation](https:\/\/en.wikipedia.org\/wiki\/Absolute_deviation \"Absolute deviation\") around the mean** symmetry\n  E\n  \n  ⁡\n  \n  \\[\n  \n  \\|\n  \n  X\n  \n  −\n  \n  E\n  \n  \\[\n  \n  X\n  \n  \\]\n  \n  \\|\n  \n  \\]\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  E\n  \n  ⁡\n  \n  \\[\n  \n  \\|\n  \n  X\n  \n  −\n  \n  E\n  \n  \\[\n  \n  X\n  \n  \\]\n  \n  \\|\n  \n  \\]\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\](\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\](\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\](\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\](\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/83468b7365d9095b07e04bfbb5c9cff50c64ea2d)\n- **Skewness** [skew-symmetry](https:\/\/en.wikipedia.org\/wiki\/Symmetry_\\(mathematics\\) \"Symmetry (mathematics)\")\n  skewness\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  −\n  \n  skewness\n  \n  ⁡\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle \\\\operatorname {skewness} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=-\\\\operatorname {skewness} (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle \\\\operatorname {skewness} (\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=-\\\\operatorname {skewness} (\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/748ffd21174eaca16caadef5d2a418721c9e433b)\n- **Excess kurtosis** symmetry\n  excess kurtosis\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  excess kurtosis\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle {\\\\text{excess kurtosis}}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))={\\\\text{excess kurtosis}}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle {\\\\text{excess kurtosis}}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))={\\\\text{excess kurtosis}}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5a8fa5ae5e41c84ac33b9681ecef49133ae20ff3)\n- **Characteristic function** symmetry of [Real part](https:\/\/en.wikipedia.org\/wiki\/Real_part \"Real part\") (with respect to the origin of variable \"*t*\")\n  Re\n  \n  \\[\n  \n  1\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  ;\n  \n  α\n  \n  \\+\n  \n  β\n  \n  ;\n  \n  i\n  \n  t\n  \n  )\n  \n  \\]\n  \n  \\=\n  \n  Re\n  \n  \\[\n  \n  1\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  ;\n  \n  α\n  \n  \\+\n  \n  β\n  \n  ;\n  \n  −\n  \n  i\n  \n  t\n  \n  )\n  \n  \\]\n  \n  {\\\\displaystyle {\\\\text{Re}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\]={\\\\text{Re}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\]}\n  \n  ![{\\\\displaystyle {\\\\text{Re}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\]={\\\\text{Re}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/55bbddff6ec53eb39eccc603618459ba36e10ad2)\n- **Characteristic function** [skew-symmetry](https:\/\/en.wikipedia.org\/wiki\/Symmetry_\\(mathematics\\) \"Symmetry (mathematics)\") of [Imaginary part](https:\/\/en.wikipedia.org\/wiki\/Imaginary_part \"Imaginary part\") (with respect to the origin of variable \"*t*\")\n  Im\n  \n  \\[\n  \n  1\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  ;\n  \n  α\n  \n  \\+\n  \n  β\n  \n  ;\n  \n  i\n  \n  t\n  \n  )\n  \n  \\]\n  \n  \\=\n  \n  −\n  \n  Im\n  \n  \\[\n  \n  1\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  ;\n  \n  α\n  \n  \\+\n  \n  β\n  \n  ;\n  \n  −\n  \n  i\n  \n  t\n  \n  )\n  \n  \\]\n  \n  {\\\\displaystyle {\\\\text{Im}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\]=-{\\\\text{Im}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\]}\n  \n  ![{\\\\displaystyle {\\\\text{Im}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\]=-{\\\\text{Im}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ebb682c9915a60ef8d982340de042127ab262a0f)\n- **Characteristic function** symmetry of [Absolute value](https:\/\/en.wikipedia.org\/wiki\/Absolute_value \"Absolute value\") (with respect to the origin of variable \"*t*\")\n  Abs\n  \n  \\[\n  \n  1\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  ;\n  \n  α\n  \n  \\+\n  \n  β\n  \n  ;\n  \n  i\n  \n  t\n  \n  )\n  \n  \\]\n  \n  \\=\n  \n  Abs\n  \n  \\[\n  \n  1\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  ;\n  \n  α\n  \n  \\+\n  \n  β\n  \n  ;\n  \n  −\n  \n  i\n  \n  t\n  \n  )\n  \n  \\]\n  \n  {\\\\displaystyle {\\\\text{Abs}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\]={\\\\text{Abs}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\]}\n  \n  ![{\\\\displaystyle {\\\\text{Abs}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\]={\\\\text{Abs}}\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/08c1c15ff4c63020b66d2404d439514d965946d0)\n- **Differential entropy** symmetry\n  h\n  \n  (\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  )\n  \n  \\=\n  \n  h\n  \n  (\n  \n  B\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  )\n  \n  {\\\\displaystyle h(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=h(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}\n  \n  ![{\\\\displaystyle h(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=h(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b4d4f6513fd16af109112a3b87590558cddb1d29)\n- **Relative entropy (also called [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\"))** symmetry\n  D\n  \n  K\n  \n  L\n  \n  (\n  \n  X\n  \n  1\n  \n  ∥\n  \n  X\n  \n  2\n  \n  )\n  \n  \\=\n  \n  D\n  \n  K\n  \n  L\n  \n  (\n  \n  X\n  \n  2\n  \n  ∥\n  \n  X\n  \n  1\n  \n  )\n  \n  ,\n  \n  if\n  \n  h\n  \n  (\n  \n  X\n  \n  1\n  \n  )\n  \n  \\=\n  \n  h\n  \n  (\n  \n  X\n  \n  2\n  \n  )\n  \n  , for (skewed)\n  \n  α\n  \n  ≠\n  \n  β\n  \n  {\\\\displaystyle D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})=D\\_{\\\\mathrm {KL} }(X\\_{2}\\\\parallel X\\_{1}),{\\\\text{ if }}h(X\\_{1})=h(X\\_{2}){\\\\text{, for (skewed) }}\\\\alpha \\\\neq \\\\beta }\n  \n  ![{\\\\displaystyle D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})=D\\_{\\\\mathrm {KL} }(X\\_{2}\\\\parallel X\\_{1}),{\\\\text{ if }}h(X\\_{1})=h(X\\_{2}){\\\\text{, for (skewed) }}\\\\alpha \\\\neq \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4fb12e025beb0652951bc2e4401c947a957d3a62)\n- **Fisher information matrix** symmetry\n  I\n  \n  i\n  \n  ,\n  \n  j\n  \n  \\=\n  \n  I\n  \n  j\n  \n  ,\n  \n  i\n  \n  {\\\\displaystyle {\\\\mathcal {I}}\\_{i,j}={\\\\mathcal {I}}\\_{j,i}}\n  \n  ![{\\\\displaystyle {\\\\mathcal {I}}\\_{i,j}={\\\\mathcal {I}}\\_{j,i}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/af6b1dc4d38273cf6dc8499ce6411044ec63ab18)\n\n### Geometry of the probability density function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=37 \"Edit section: Geometry of the probability density function\")\\]\n\n#### Inflection points\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=38 \"Edit section: Inflection points\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e0\/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg\/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg)\n\nInflection point location versus α and β showing regions with one inflection point\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/37\/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg\/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg)\n\nInflection point location versus α and β showing region with two inflection points\n\nFor certain values of the shape parameters α and β, the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") has [inflection points](https:\/\/en.wikipedia.org\/wiki\/Inflection_points \"Inflection points\"), at which the [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") changes sign. The position of these inflection points can be useful as a measure of the [dispersion](https:\/\/en.wikipedia.org\/wiki\/Statistical_dispersion \"Statistical dispersion\") or spread of the distribution.\n\nDefining the following quantity:\n\nκ \\= ( α − 1 ) ( β − 1 ) α \\+ β − 3 α \\+ β − 2 {\\\\displaystyle \\\\kappa ={\\\\frac {\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}{\\\\alpha +\\\\beta -2}}} ![{\\\\displaystyle \\\\kappa ={\\\\frac {\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a25618bff371a2cacfa4d37cc75a397fc79eda3)\n\nPoints of inflection occur,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[8\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Wadsworth-8)[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\\[20\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Panik-20) depending on the value of the shape parameters *α* and *β*, as follows:\n\n- (*α* \\> 2, *β* \\> 2) The distribution is bell-shaped (symmetric for *α* = *β* and skewed otherwise), with **two inflection points**, equidistant from the mode:\n\nx \\= mode ± κ \\= α − 1 ± ( α − 1 ) ( β − 1 ) α \\+ β − 3 α \\+ β − 2 {\\\\displaystyle x={\\\\text{mode}}\\\\pm \\\\kappa ={\\\\frac {\\\\alpha -1\\\\pm {\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}} ![{\\\\displaystyle x={\\\\text{mode}}\\\\pm \\\\kappa ={\\\\frac {\\\\alpha -1\\\\pm {\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/824e9ad23c78338bd281a68fa12cd6488803800a)\n\n- (*α* = 2, *β* \\> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode:\n\nx \\= mode \\+ κ \\= 2 β {\\\\displaystyle x={\\\\text{mode}}+\\\\kappa ={\\\\frac {2}{\\\\beta }}} ![{\\\\displaystyle x={\\\\text{mode}}+\\\\kappa ={\\\\frac {2}{\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/620d8f2d2763e7caf8633301e4142d00a754e60a)\n\n- (*α* \\> 2, β = 2) The distribution is unimodal, negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode:\n\nx \\= mode − κ \\= 1 − 2 α {\\\\displaystyle x={\\\\text{mode}}-\\\\kappa =1-{\\\\frac {2}{\\\\alpha }}} ![{\\\\displaystyle x={\\\\text{mode}}-\\\\kappa =1-{\\\\frac {2}{\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1c3330dc70173881e3a955fb2d48c7531ce4501c)\n\n- (1 \\< *α* \\< 2, β \\> 2, *α* + *β* \\> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode:\n\nx \\= mode \\+ κ \\= α − 1 \\+ ( α − 1 ) ( β − 1 ) α \\+ β − 3 α \\+ β − 2 {\\\\displaystyle x={\\\\text{mode}}+\\\\kappa ={\\\\frac {\\\\alpha -1+{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}} ![{\\\\displaystyle x={\\\\text{mode}}+\\\\kappa ={\\\\frac {\\\\alpha -1+{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f6f4879f21644e05c71610250aae2fdf73b14418)\n\n- (0 \\< *α* \\< 1, 1 \\< *β* \\< 2) The distribution has a mode at the left end *x* = 0 and it is positively skewed, right-tailed. There is **one inflection point**, located to the right of the mode:\n\nx \\= α − 1 \\+ ( α − 1 ) ( β − 1 ) α \\+ β − 3 α \\+ β − 2 {\\\\displaystyle x={\\\\frac {\\\\alpha -1+{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}} ![{\\\\displaystyle x={\\\\frac {\\\\alpha -1+{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c27732cec33ec3f248a4a2f1ef607aef29e374a1)\n\n- (*α* \\> 2, 1 \\< *β* \\< 2) The distribution is unimodal negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode:\n\nx \\= mode − κ \\= α − 1 − ( α − 1 ) ( β − 1 ) α \\+ β − 3 α \\+ β − 2 {\\\\displaystyle x={\\\\text{mode}}-\\\\kappa ={\\\\frac {\\\\alpha -1-{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}} ![{\\\\displaystyle x={\\\\text{mode}}-\\\\kappa ={\\\\frac {\\\\alpha -1-{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/711b8cb900645ea1295fc899aa63bdd0648315d4)\n\n- (1 \\< *α* \\< 2, 0 \\< *β* \\< 1) The distribution has a mode at the right end *x* = 1 and it is negatively skewed, left-tailed. There is **one inflection point**, located to the left of the mode:\n\nx \\= α − 1 − ( α − 1 ) ( β − 1 ) α \\+ β − 3 α \\+ β − 2 {\\\\displaystyle x={\\\\frac {\\\\alpha -1-{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}} ![{\\\\displaystyle x={\\\\frac {\\\\alpha -1-{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0d9005b2b06e722f6f78c8c0ace994575a519d65)\n\nThere are no inflection points in the remaining (symmetric and skewed) regions: U-shaped: (*α*, *β* \\< 1) upside-down-U-shaped: (1 \\< *α* \\< 2, 1 \\< *β* \\< 2), reverse-J-shaped (*α* \\< 1, *β* \\> 2) or J-shaped: (*α* \\> 2, *β* \\< 1)\n\nThe accompanying plots show the inflection point locations (shown vertically, ranging from 0 to 1) versus *α* and *β* (the horizontal axes ranging from 0 to 5). There are large cuts at surfaces intersecting the lines *α* = 1, *β* = 1, *α* = 2, and *β* = 2 because at these values the beta distribution change from 2 modes, to 1 mode to no mode.\n\n#### Shapes\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=39 \"Edit section: Shapes\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/b9\/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg\/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg)\n\nPDF for symmetric beta distribution vs. *x* and *α* = *β* from 0 to 30\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/4e\/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg\/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg)\n\nPDF for symmetric beta distribution vs. x and *α* = *β* from 0 to 2\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/de\/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg\/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg)\n\nPDF for skewed beta distribution vs. *x* and *β* = 2.5*α* from 0 to 9\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/dc\/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg\/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg)\n\nPDF for skewed beta distribution vs. x and *β* = 5.5*α* from 0 to 9\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/85\/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg\/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg)\n\nPDF for skewed beta distribution vs. x and *β* = 8*α* from 0 to 10\n\nThe beta density function can take a wide variety of different shapes depending on the values of the two parameters *α* and *β*. The ability of the beta distribution to take this great diversity of shapes (using only two parameters) is partly responsible for finding wide application for modeling actual measurements:\n\n##### Symmetric (*α* = *β*)\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=40 \"Edit section: Symmetric (α = β)\")\\]\n\n- the density function is [symmetric](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\") about 1\/2 (blue & teal plots).\n- median = mean = 1\/2.\n- skewness = 0.\n- variance = 1\/(4(2*α* + 1))\n- ***α* = *β* \\< 1**\n  - U-shaped (blue plot).\n  - bimodal: left mode = 0, right mode =1, anti-mode = 1\/2\n  - 1\/12 \\< var(*X*) \\< 1\/4[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n  - −2 \\< excess kurtosis(*X*) \\< −6\/5\n  - *α* = *β* = 1\/2 is the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\")\n    - var(*X*) = 1\/8\n    - excess kurtosis(*X*) = −3\/2\n    - CF = Rinc (t) [\\[34\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-34)\n  - *α* = *β* → 0 is a 2-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") end *x* = 0 and *x* = 1 and zero probability everywhere else. A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1.\n    - lim\n      \n      α\n      \n      \\=\n      \n      β\n      \n      →\n      \n      0\n      \n      var\n      \n      ⁡\n      \n      (\n      \n      X\n      \n      )\n      \n      \\=\n      \n      1\n      \n      4\n      \n      {\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}\\\\operatorname {var} (X)={\\\\tfrac {1}{4}}}\n      \n      ![{\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}\\\\operatorname {var} (X)={\\\\tfrac {1}{4}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a318e9812e5bcddf9f90e080e7c29f2fdbbf3492)\n    - lim\n      \n      α\n      \n      \\=\n      \n      β\n      \n      →\n      \n      0\n      \n      e\n      \n      x\n      \n      c\n      \n      e\n      \n      s\n      \n      s\n      \n      k\n      \n      u\n      \n      r\n      \n      t\n      \n      o\n      \n      s\n      \n      i\n      \n      s\n      \n      ⁡\n      \n      (\n      \n      X\n      \n      )\n      \n      \\=\n      \n      −\n      \n      2\n      \n      {\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}\\\\operatorname {excess\\\\ kurtosis} (X)=-2}\n      \n      ![{\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}\\\\operatorname {excess\\\\ kurtosis} (X)=-2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7faf8f6752129fa96869d74fa7998970f17d06f4)\n      a lower value than this is impossible for any distribution to reach.\n    - The [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") approaches a [minimum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of −∞\n- **α = β = 1**\n  - the [uniform \\[0, 1\\] distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\")\n  - no mode\n  - var(*X*) = 1\/12\n  - excess kurtosis(*X*) = −6\/5\n  - The (negative anywhere else) [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") reaches its [maximum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of zero\n  - CF = Sinc (t)\n- ***α* = *β* \\> 1**\n  - symmetric [unimodal](https:\/\/en.wikipedia.org\/wiki\/Unimodal \"Unimodal\")\n  - mode = 1\/2.\n  - 0 \\< var(*X*) \\< 1\/12[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n  - −6\/5 \\< excess kurtosis(*X*) \\< 0\n  - *α* = *β* = 3\/2 is a semi-elliptic \\[0, 1\\] distribution, see: [Wigner semicircle distribution](https:\/\/en.wikipedia.org\/wiki\/Wigner_semicircle_distribution \"Wigner semicircle distribution\")[\\[35\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-35)\n    - var(*X*) = 1\/16.\n    - excess kurtosis(*X*) = −1\n    - CF = 2 Jinc (t)\n  - *α* = *β* = 2 is the parabolic \\[0, 1\\] distribution\n    - var(*X*) = 1\/20\n    - excess kurtosis(*X*) = −6\/7\n    - CF = 3 Tinc (t) [\\[36\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-36)\n  - *α* = *β* \\> 2 is bell-shaped, with [inflection points](https:\/\/en.wikipedia.org\/wiki\/Inflection_point \"Inflection point\") located to either side of the mode\n    - 0 \\< var(*X*) \\< 1\/20\n    - −6\/7 \\< excess kurtosis(*X*) \\< 0\n  - *α* = *β* → ∞ is a 1-point [Degenerate distribution](https:\/\/en.wikipedia.org\/wiki\/Degenerate_distribution \"Degenerate distribution\") with a [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") spike at the midpoint *x* = 1\/2 with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the single point *x* = 1\/2.\n    - lim\n      \n      α\n      \n      \\=\n      \n      β\n      \n      →\n      \n      ∞\n      \n      var\n      \n      ⁡\n      \n      (\n      \n      X\n      \n      )\n      \n      \\=\n      \n      0\n      \n      {\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }\\\\operatorname {var} (X)=0}\n      \n      ![{\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }\\\\operatorname {var} (X)=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/64b7560056b21989a42e3aae0f2686a63c5d9115)\n    - lim\n      \n      α\n      \n      \\=\n      \n      β\n      \n      →\n      \n      ∞\n      \n      e\n      \n      x\n      \n      c\n      \n      e\n      \n      s\n      \n      s\n      \n      k\n      \n      u\n      \n      r\n      \n      t\n      \n      o\n      \n      s\n      \n      i\n      \n      s\n      \n      ⁡\n      \n      (\n      \n      X\n      \n      )\n      \n      \\=\n      \n      0\n      \n      {\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }\\\\operatorname {excess\\\\ kurtosis} (X)=0}\n      \n      ![{\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }\\\\operatorname {excess\\\\ kurtosis} (X)=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7f02e97f6ca6d3c46ed639e5b03e51c4262591ab)\n    - The [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") approaches a [minimum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of −∞\n\n##### Skewed (*α* ≠ *β*)\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=41 \"Edit section: Skewed (α ≠ β)\")\\]\n\nThe density function is [skewed](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\"). An interchange of parameter values yields the [mirror image](https:\/\/en.wikipedia.org\/wiki\/Mirror_image \"Mirror image\") (the reverse) of the initial curve, some more specific cases:\n\n- ***α* \\< 1, *β* \\< 1**\n  - U-shaped\n  - Positive skew for *α* \\< *β*, negative skew for *α* \\> *β*.\n  - bimodal: left mode = 0, right mode = 1, anti-mode =\n    α\n    \n    −\n    \n    1\n    \n    α\n    \n    \\+\n    \n    β\n    \n    −\n    \n    2\n    \n    {\\\\displaystyle {\\\\tfrac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}\n    \n    ![{\\\\displaystyle {\\\\tfrac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1632b6874ce03bd0d75001d5816183986685c998)\n  - 0 \\< median \\< 1.\n  - 0 \\< var(*X*) \\< 1\/4\n- ***α* \\> 1, *β* \\> 1**\n  - [unimodal](https:\/\/en.wikipedia.org\/wiki\/Unimodal \"Unimodal\") (magenta & cyan plots),\n  - Positive skew for *α* \\< *β*, negative skew for *α* \\> *β*.\n  - mode\n    \n    \\=\n    \n    α\n    \n    −\n    \n    1\n    \n    α\n    \n    \\+\n    \n    β\n    \n    −\n    \n    2\n    \n    {\\\\displaystyle {\\\\text{mode}}={\\\\tfrac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}\n    \n    ![{\\\\displaystyle {\\\\text{mode}}={\\\\tfrac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2a200ee365bda4944379e77c02b4b8278b4a8815)\n  - 0 \\< median \\< 1\n  - 0 \\< var(*X*) \\< 1\/12\n- ***α* \\< 1, *β* ≥ 1**\n  - reverse J-shaped with a right tail,\n  - positively skewed,\n  - strictly decreasing, [convex](https:\/\/en.wikipedia.org\/wiki\/Convex_function \"Convex function\")\n  - mode = 0\n  - 0 \\< median \\< 1\/2.\n  - 0\n    \n    \\<\n    \n    var\n    \n    ⁡\n    \n    (\n    \n    X\n    \n    )\n    \n    \\<\n    \n    −\n    \n    11\n    \n    \\+\n    \n    5\n    \n    5\n    \n    2\n    \n    ,\n    \n    {\\\\displaystyle 0\\<\\\\operatorname {var} (X)\\<{\\\\tfrac {-11+5{\\\\sqrt {5}}}{2}},}\n    \n    ![{\\\\displaystyle 0\\<\\\\operatorname {var} (X)\\<{\\\\tfrac {-11+5{\\\\sqrt {5}}}{2}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4cadd8b11fdd42150ace45b7773d08ecc8e8fe74)\n    (maximum variance occurs for\n    α\n    \n    \\=\n    \n    −\n    \n    1\n    \n    \\+\n    \n    5\n    \n    2\n    \n    ,\n    \n    β\n    \n    \\=\n    \n    1\n    \n    {\\\\displaystyle \\\\alpha ={\\\\tfrac {-1+{\\\\sqrt {5}}}{2}},\\\\beta =1}\n    \n    ![{\\\\displaystyle \\\\alpha ={\\\\tfrac {-1+{\\\\sqrt {5}}}{2}},\\\\beta =1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f410ad34dcc819db94aa295c039855a4f8d36ed3)\n    , or *α* = **Φ** the [golden ratio conjugate](https:\/\/en.wikipedia.org\/wiki\/Golden_ratio \"Golden ratio\"))\n- ***α* ≥ 1, *β* \\< 1**\n  - J-shaped with a left tail,\n  - negatively skewed,\n  - strictly increasing, [convex](https:\/\/en.wikipedia.org\/wiki\/Convex_function \"Convex function\")\n  - mode = 1\n  - 1\/2 \\< median \\< 1\n  - 0\n    \n    \\<\n    \n    var\n    \n    ⁡\n    \n    (\n    \n    X\n    \n    )\n    \n    \\<\n    \n    −\n    \n    11\n    \n    \\+\n    \n    5\n    \n    5\n    \n    2\n    \n    ,\n    \n    {\\\\displaystyle 0\\<\\\\operatorname {var} (X)\\<{\\\\tfrac {-11+5{\\\\sqrt {5}}}{2}},}\n    \n    ![{\\\\displaystyle 0\\<\\\\operatorname {var} (X)\\<{\\\\tfrac {-11+5{\\\\sqrt {5}}}{2}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4cadd8b11fdd42150ace45b7773d08ecc8e8fe74)\n    (maximum variance occurs for\n    α\n    \n    \\=\n    \n    1\n    \n    ,\n    \n    β\n    \n    \\=\n    \n    −\n    \n    1\n    \n    \\+\n    \n    5\n    \n    2\n    \n    {\\\\displaystyle \\\\alpha =1,\\\\beta ={\\\\tfrac {-1+{\\\\sqrt {5}}}{2}}}\n    \n    ![{\\\\displaystyle \\\\alpha =1,\\\\beta ={\\\\tfrac {-1+{\\\\sqrt {5}}}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1a90d376a97ae58c5b62c070f3a64797617cecc6)\n    , or *β* = **Φ** the [golden ratio conjugate](https:\/\/en.wikipedia.org\/wiki\/Golden_ratio \"Golden ratio\"))\n- ***α* = 1, *β* \\> 1**\n  - positively skewed,\n  - strictly decreasing (red plot),\n  - a reversed (mirror-image) [power function distribution](https:\/\/en.wikipedia.org\/w\/index.php?title=Power_function_distribution&action=edit&redlink=1 \"Power function distribution (page does not exist)\")\n  - mean = 1 \/ (*β* + 1)\n  - median = 1 - 1\/21\/*β*\n  - mode = 0\n  - α = 1, 1 \\< β \\< 2\n    - [concave](https:\/\/en.wikipedia.org\/wiki\/Concave_function \"Concave function\")\n    - 1\n      \n      −\n      \n      1\n      \n      2\n      \n      \\<\n      \n      median\n      \n      \\<\n      \n      1\n      \n      2\n      \n      {\\\\displaystyle 1-{\\\\tfrac {1}{\\\\sqrt {2}}}\\<{\\\\text{median}}\\<{\\\\tfrac {1}{2}}}\n      \n      ![{\\\\displaystyle 1-{\\\\tfrac {1}{\\\\sqrt {2}}}\\<{\\\\text{median}}\\<{\\\\tfrac {1}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b55b97f0824d4ceb525a8eb431b1edc14491785e)\n    - 1\/18 \\< var(*X*) \\< 1\/12.\n  - α = 1, β = 2\n    - a straight line with slope −2, the right-[triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\") with right angle at the left end, at *x* = 0\n    - median\n      \n      \\=\n      \n      1\n      \n      −\n      \n      1\n      \n      2\n      \n      {\\\\displaystyle {\\\\text{median}}=1-{\\\\tfrac {1}{\\\\sqrt {2}}}}\n      \n      ![{\\\\displaystyle {\\\\text{median}}=1-{\\\\tfrac {1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/81c4cfb877754e9c488dd91417d658cd6acdcd17)\n    - var(*X*) = 1\/18\n  - α = 1, β \\> 2\n    - reverse J-shaped with a right tail,\n    - [convex](https:\/\/en.wikipedia.org\/wiki\/Convex_function \"Convex function\")\n    - 0\n      \n      \\<\n      \n      median\n      \n      \\<\n      \n      1\n      \n      −\n      \n      1\n      \n      2\n      \n      {\\\\displaystyle 0\\<{\\\\text{median}}\\<1-{\\\\tfrac {1}{\\\\sqrt {2}}}}\n      \n      ![{\\\\displaystyle 0\\<{\\\\text{median}}\\<1-{\\\\tfrac {1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1a83299b14cd568ea16972ae4e5ba0ed49e9db9e)\n    - 0 \\< var(*X*) \\< 1\/18\n- **α \\> 1, β = 1**\n  - negatively skewed,\n  - strictly increasing (green plot),\n  - the power function distribution[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)\n  - mean = α \/ (α + 1)\n  - median = 1\/21\/α\n  - mode = 1\n  - 2 \\> α \\> 1, β = 1\n    - [concave](https:\/\/en.wikipedia.org\/wiki\/Concave_function \"Concave function\")\n    - 1\n      \n      2\n      \n      \\<\n      \n      median\n      \n      \\<\n      \n      1\n      \n      2\n      \n      {\\\\displaystyle {\\\\tfrac {1}{2}}\\<{\\\\text{median}}\\<{\\\\tfrac {1}{\\\\sqrt {2}}}}\n      \n      ![{\\\\displaystyle {\\\\tfrac {1}{2}}\\<{\\\\text{median}}\\<{\\\\tfrac {1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c545efc34084ab06a0e012554c5d733c23388060)\n    - 1\/18 \\< var(*X*) \\< 1\/12\n  - α = 2, β = 1\n    - a straight line with slope +2, the right-[triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\") with right angle at the right end, at *x* = 1\n    - median\n      \n      \\=\n      \n      1\n      \n      2\n      \n      {\\\\displaystyle {\\\\text{median}}={\\\\tfrac {1}{\\\\sqrt {2}}}}\n      \n      ![{\\\\displaystyle {\\\\text{median}}={\\\\tfrac {1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dfd31bd7dc82f3d5caa0eb45b3b3f360f32dbf14)\n    - var(*X*) = 1\/18\n  - α \\> 2, β = 1\n    - J-shaped with a left tail, [convex](https:\/\/en.wikipedia.org\/wiki\/Convex_function \"Convex function\")\n    - 1\n      \n      2\n      \n      \\<\n      \n      median\n      \n      \\<\n      \n      1\n      \n      {\\\\displaystyle {\\\\tfrac {1}{\\\\sqrt {2}}}\\<{\\\\text{median}}\\<1}\n      \n      ![{\\\\displaystyle {\\\\tfrac {1}{\\\\sqrt {2}}}\\<{\\\\text{median}}\\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/80e97699a1620336c7267276f96efbdafc13ba64)\n    - 0 \\< var(*X*) \\< 1\/18\n\n## Related distributions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=42 \"Edit section: Related distributions\")\\]\n\n### Transformations\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=43 \"Edit section: Transformations\")\\]\n\n- If *X* ~ Beta(*α*, *β*) then 1 − *X* ~ Beta(*β*, *α*) [mirror-image](https:\/\/en.wikipedia.org\/wiki\/Mirror_image \"Mirror image\") symmetry\n- If *X* ~ Beta(*α*, *β*) then\n  X\n  \n  1\n  \n  −\n  \n  X\n  \n  ∼\n  \n  β\n  \n  ′\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {X}{1-X}}\\\\sim {\\\\beta '}(\\\\alpha ,\\\\beta )}\n  \n  ![{\\\\displaystyle {\\\\tfrac {X}{1-X}}\\\\sim {\\\\beta '}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4729760d819d7894ca35e889bc9192ad601f5fd1)\n  . The [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\"), also called \"beta distribution of the second kind\".\n- If\n  X\n  \n  ∼\n  \n  Beta\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  {\\\\displaystyle X\\\\sim {\\\\text{Beta}}(\\\\alpha ,\\\\beta )}\n  \n  ![{\\\\displaystyle X\\\\sim {\\\\text{Beta}}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/54f5f5824479195eafef981fab9a5b7722002d15)\n  , then\n  Y\n  \n  \\=\n  \n  log\n  \n  ⁡\n  \n  X\n  \n  1\n  \n  −\n  \n  X\n  \n  {\\\\displaystyle Y=\\\\log {\\\\frac {X}{1-X}}}\n  \n  ![{\\\\displaystyle Y=\\\\log {\\\\frac {X}{1-X}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/45a2407291080b9e048ea5234c98581218a8aa46)\n  has a [generalized logistic distribution](https:\/\/en.wikipedia.org\/wiki\/Generalized_logistic_distribution \"Generalized logistic distribution\"), with density\n  σ\n  \n  (\n  \n  y\n  \n  )\n  \n  α\n  \n  σ\n  \n  (\n  \n  −\n  \n  y\n  \n  )\n  \n  β\n  \n  B\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  {\\\\displaystyle {\\\\frac {\\\\sigma (y)^{\\\\alpha }\\\\sigma (-y)^{\\\\beta }}{B(\\\\alpha ,\\\\beta )}}}\n  \n  ![{\\\\displaystyle {\\\\frac {\\\\sigma (y)^{\\\\alpha }\\\\sigma (-y)^{\\\\beta }}{B(\\\\alpha ,\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9c9e5c0ca7d0c451eeb1d6de5e6756a2ea1044c5)\n  , where\n  σ\n  \n  {\\\\displaystyle \\\\sigma }\n  \n  ![{\\\\displaystyle \\\\sigma }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/59f59b7c3e6fdb1d0365a494b81fb9a696138c36)\n  is the [logistic sigmoid](https:\/\/en.wikipedia.org\/wiki\/Logistic_sigmoid \"Logistic sigmoid\").\n- If *X* ~ Beta(*α*, *β*) then\n  1\n  \n  X\n  \n  −\n  \n  1\n  \n  ∼\n  \n  β\n  \n  ′\n  \n  (\n  \n  β\n  \n  ,\n  \n  α\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {1}{X}}-1\\\\sim {\\\\beta '}(\\\\beta ,\\\\alpha )}\n  \n  ![{\\\\displaystyle {\\\\tfrac {1}{X}}-1\\\\sim {\\\\beta '}(\\\\beta ,\\\\alpha )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/589767ba92a975c90e9ec4542a6caddacdd2ee04)\n  .\n- If\n  X\n  \n  ∼\n  \n  Beta\n  \n  (\n  \n  α\n  \n  1\n  \n  ,\n  \n  β\n  \n  1\n  \n  )\n  \n  {\\\\displaystyle X\\\\sim {\\\\text{Beta}}(\\\\alpha \\_{1},\\\\beta \\_{1})}\n  \n  ![{\\\\displaystyle X\\\\sim {\\\\text{Beta}}(\\\\alpha \\_{1},\\\\beta \\_{1})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ca69afff636e2e379147763ab1fcd9ad720360cd)\n  and\n  Y\n  \n  ∼\n  \n  Beta\n  \n  (\n  \n  α\n  \n  2\n  \n  ,\n  \n  β\n  \n  2\n  \n  )\n  \n  {\\\\displaystyle Y\\\\sim {\\\\text{Beta}}(\\\\alpha \\_{2},\\\\beta \\_{2})}\n  \n  ![{\\\\displaystyle Y\\\\sim {\\\\text{Beta}}(\\\\alpha \\_{2},\\\\beta \\_{2})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68129647ad7f447682d0c7d50ac8eddb789eb0b4)\n  then\n  Z\n  \n  \\=\n  \n  X\n  \n  Y\n  \n  {\\\\displaystyle Z={\\\\tfrac {X}{Y}}}\n  \n  ![{\\\\displaystyle Z={\\\\tfrac {X}{Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ef4bd5ded089856779a9af407dc80a9c407b6851)\n  has density\n  B\n  \n  (\n  \n  α\n  \n  1\n  \n  \\+\n  \n  α\n  \n  2\n  \n  ,\n  \n  β\n  \n  2\n  \n  )\n  \n  z\n  \n  α\n  \n  1\n  \n  −\n  \n  1\n  \n  2\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  1\n  \n  \\+\n  \n  α\n  \n  2\n  \n  ,\n  \n  1\n  \n  −\n  \n  β\n  \n  1\n  \n  ;\n  \n  α\n  \n  1\n  \n  \\+\n  \n  α\n  \n  2\n  \n  \\+\n  \n  β\n  \n  2\n  \n  ;\n  \n  z\n  \n  )\n  \n  B\n  \n  (\n  \n  α\n  \n  1\n  \n  ,\n  \n  β\n  \n  1\n  \n  )\n  \n  B\n  \n  (\n  \n  α\n  \n  2\n  \n  ,\n  \n  β\n  \n  2\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {B(\\\\alpha \\_{1}+\\\\alpha \\_{2},\\\\beta \\_{2})z^{\\\\alpha \\_{1}-1}{}\\_{2}F\\_{1}(\\\\alpha \\_{1}+\\\\alpha \\_{2},1-\\\\beta \\_{1};\\\\alpha \\_{1}+\\\\alpha \\_{2}+\\\\beta \\_{2};z)}{B(\\\\alpha \\_{1},\\\\beta \\_{1})B(\\\\alpha \\_{2},\\\\beta \\_{2})}}}\n  \n  ![{\\\\displaystyle {\\\\tfrac {B(\\\\alpha \\_{1}+\\\\alpha \\_{2},\\\\beta \\_{2})z^{\\\\alpha \\_{1}-1}{}\\_{2}F\\_{1}(\\\\alpha \\_{1}+\\\\alpha \\_{2},1-\\\\beta \\_{1};\\\\alpha \\_{1}+\\\\alpha \\_{2}+\\\\beta \\_{2};z)}{B(\\\\alpha \\_{1},\\\\beta \\_{1})B(\\\\alpha \\_{2},\\\\beta \\_{2})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1403166914066b28598d36efe3835e4a320ad620)\n  for\n  0\n  \n  \\<\n  \n  z\n  \n  ≤\n  \n  1\n  \n  {\\\\displaystyle 0\\<z\\\\leq 1}\n  \n  ![{\\\\displaystyle 0\\<z\\\\leq 1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2fd7451c82b8fde0587bd1de5a14e0384ef31691)\n  and\n  B\n  \n  (\n  \n  α\n  \n  1\n  \n  \\+\n  \n  α\n  \n  2\n  \n  ,\n  \n  β\n  \n  1\n  \n  )\n  \n  z\n  \n  −\n  \n  (\n  \n  α\n  \n  2\n  \n  \\+\n  \n  1\n  \n  )\n  \n  2\n  \n  F\n  \n  1\n  \n  (\n  \n  α\n  \n  1\n  \n  \\+\n  \n  α\n  \n  2\n  \n  ,\n  \n  1\n  \n  −\n  \n  β\n  \n  2\n  \n  ;\n  \n  α\n  \n  1\n  \n  \\+\n  \n  α\n  \n  2\n  \n  \\+\n  \n  β\n  \n  1\n  \n  ;\n  \n  1\n  \n  z\n  \n  )\n  \n  B\n  \n  (\n  \n  α\n  \n  1\n  \n  ,\n  \n  β\n  \n  1\n  \n  )\n  \n  B\n  \n  (\n  \n  α\n  \n  2\n  \n  ,\n  \n  β\n  \n  2\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {B(\\\\alpha \\_{1}+\\\\alpha \\_{2},\\\\beta \\_{1})z^{-(\\\\alpha \\_{2}+1)}{}\\_{2}F\\_{1}(\\\\alpha \\_{1}+\\\\alpha \\_{2},1-\\\\beta \\_{2};\\\\alpha \\_{1}+\\\\alpha \\_{2}+\\\\beta \\_{1};{\\\\tfrac {1}{z}})}{B(\\\\alpha \\_{1},\\\\beta \\_{1})B(\\\\alpha \\_{2},\\\\beta \\_{2})}}}\n  \n  ![{\\\\displaystyle {\\\\tfrac {B(\\\\alpha \\_{1}+\\\\alpha \\_{2},\\\\beta \\_{1})z^{-(\\\\alpha \\_{2}+1)}{}\\_{2}F\\_{1}(\\\\alpha \\_{1}+\\\\alpha \\_{2},1-\\\\beta \\_{2};\\\\alpha \\_{1}+\\\\alpha \\_{2}+\\\\beta \\_{1};{\\\\tfrac {1}{z}})}{B(\\\\alpha \\_{1},\\\\beta \\_{1})B(\\\\alpha \\_{2},\\\\beta \\_{2})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3a5b9b30e91efa5764a5dbaf3d825e6f7c0c9584)\n  for\n  z\n  \n  ≥\n  \n  1\n  \n  {\\\\displaystyle z\\\\geq 1}\n  \n  ![{\\\\displaystyle z\\\\geq 1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ec063fd092caf41bf16256d71e392658e394ee1f)\n  , where\n  2\n  \n  F\n  \n  1\n  \n  (\n  \n  a\n  \n  ,\n  \n  b\n  \n  ;\n  \n  c\n  \n  ;\n  \n  x\n  \n  )\n  \n  {\\\\displaystyle {}\\_{2}F\\_{1}(a,b;c;x)}\n  \n  ![{\\\\displaystyle {}\\_{2}F\\_{1}(a,b;c;x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad33456f0f2b7432a504572054b22396ba9d3692)\n  is the [Hypergeometric function](https:\/\/en.wikipedia.org\/wiki\/Hypergeometric_function \"Hypergeometric function\").[\\[37\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pham-Gia2000-37)\n- If *X* ~ Beta(*n*\/2, *m*\/2) then\n  m\n  \n  X\n  \n  n\n  \n  (\n  \n  1\n  \n  −\n  \n  X\n  \n  )\n  \n  ∼\n  \n  F\n  \n  (\n  \n  n\n  \n  ,\n  \n  m\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {mX}{n(1-X)}}\\\\sim F(n,m)}\n  \n  ![{\\\\displaystyle {\\\\tfrac {mX}{n(1-X)}}\\\\sim F(n,m)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/457612b34711860ab2e561e98784b94be23068fc)\n  (assuming *n* \\> 0 and *m* \\> 0), the [Fisher–Snedecor F distribution](https:\/\/en.wikipedia.org\/wiki\/F-distribution \"F-distribution\").\n- If\n  X\n  \n  ∼\n  \n  Beta\n  \n  ⁡\n  \n  (\n  \n  1\n  \n  \\+\n  \n  λ\n  \n  m\n  \n  −\n  \n  min\n  \n  max\n  \n  −\n  \n  min\n  \n  ,\n  \n  1\n  \n  \\+\n  \n  λ\n  \n  max\n  \n  −\n  \n  m\n  \n  max\n  \n  −\n  \n  min\n  \n  )\n  \n  {\\\\displaystyle X\\\\sim \\\\operatorname {Beta} \\\\left(1+\\\\lambda {\\\\tfrac {m-\\\\min }{\\\\max -\\\\min }},1+\\\\lambda {\\\\tfrac {\\\\max -m}{\\\\max -\\\\min }}\\\\right)}\n  \n  ![{\\\\displaystyle X\\\\sim \\\\operatorname {Beta} \\\\left(1+\\\\lambda {\\\\tfrac {m-\\\\min }{\\\\max -\\\\min }},1+\\\\lambda {\\\\tfrac {\\\\max -m}{\\\\max -\\\\min }}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cf1ef11aa6ae7b24fc6add2ad05667a5ece7f8b0)\n  then min + *X*(max − min) ~ PERT(min, max, *m*, *λ*) where *PERT* denotes a [PERT distribution](https:\/\/en.wikipedia.org\/wiki\/PERT_distribution \"PERT distribution\") used in [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\") analysis, and *m*\\=most likely value.[\\[38\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-NewPERT-38) Traditionally[\\[39\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Malcolm-39) *λ* = 4 in PERT analysis.\n- If *X* ~ Beta(1, *β*) then *X* ~ [Kumaraswamy distribution](https:\/\/en.wikipedia.org\/wiki\/Kumaraswamy_distribution \"Kumaraswamy distribution\") with parameters (1, *β*)\n- If *X* ~ Beta(*α*, 1) then *X* ~ [Kumaraswamy distribution](https:\/\/en.wikipedia.org\/wiki\/Kumaraswamy_distribution \"Kumaraswamy distribution\") with parameters (*α*, 1)\n- If *X* ~ Beta(*α*, 1) then −ln(*X*) ~ Exponential(*α*)\n\n### Special and limiting cases\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=44 \"Edit section: Special and limiting cases\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/da\/Random_Walk_example.svg\/250px-Random_Walk_example.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Random_Walk_example.svg)\n\nExample of eight realizations of a random walk in one dimension starting at 0: the probability for the time of the last visit to the origin is distributed as Beta(1\/2, 1\/2)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/db\/Arcsin_density.svg\/250px-Arcsin_density.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Arcsin_density.svg)\n\nBeta(1\/2, 1\/2): The [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") probability density was proposed by [Harold Jeffreys](https:\/\/en.wikipedia.org\/wiki\/Harold_Jeffreys \"Harold Jeffreys\") to represent uncertainty for a [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") or a [binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") in [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\"), and is now commonly referred to as [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\"): *p*−1\/2(1 − *p*)−1\/2. This distribution also appears in several [random walk](https:\/\/en.wikipedia.org\/wiki\/Random_walk \"Random walk\") fundamental theorems\n\n- Beta(1, 1) ~ [U(0, 1)](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") with density 1 on that interval.\n- Beta(n, 1) ~ Maximum of *n* independent rvs. with [U(0, 1)](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\"), sometimes called a *a standard power function distribution* with density *n* *x**n*–1 on that interval.\n- Beta(1, n) ~ Minimum of *n* independent rvs. with [U(0, 1)](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") with density *n*(1 − *x*)*n*−1 on that interval.\n- If *X* ~ Beta(3\/2, 3\/2) and *r* \\> 0 then 2*rX* − *r* ~ [Wigner semicircle distribution](https:\/\/en.wikipedia.org\/wiki\/Wigner_semicircle_distribution \"Wigner semicircle distribution\").\n- Beta(1\/2, 1\/2) is equivalent to the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\"). This distribution is also [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability for the [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") and [binomial distributions](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\").\n- lim\n  \n  n\n  \n  →\n  \n  ∞\n  \n  n\n  \n  Beta\n  \n  ⁡\n  \n  (\n  \n  1\n  \n  ,\n  \n  n\n  \n  )\n  \n  \\=\n  \n  Exponential\n  \n  ⁡\n  \n  (\n  \n  1\n  \n  )\n  \n  {\\\\displaystyle \\\\lim \\_{n\\\\to \\\\infty }n\\\\operatorname {Beta} (1,n)=\\\\operatorname {Exponential} (1)}\n  \n  ![{\\\\displaystyle \\\\lim \\_{n\\\\to \\\\infty }n\\\\operatorname {Beta} (1,n)=\\\\operatorname {Exponential} (1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e36846ea4b3cb7e13970a0fb5618bb9e53c5b72f)\n  the [exponential distribution](https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution \"Exponential distribution\").\n- lim\n  \n  n\n  \n  →\n  \n  ∞\n  \n  n\n  \n  Beta\n  \n  ⁡\n  \n  (\n  \n  k\n  \n  ,\n  \n  n\n  \n  )\n  \n  \\=\n  \n  Gamma\n  \n  ⁡\n  \n  (\n  \n  k\n  \n  ,\n  \n  1\n  \n  )\n  \n  {\\\\displaystyle \\\\lim \\_{n\\\\to \\\\infty }n\\\\operatorname {Beta} (k,n)=\\\\operatorname {Gamma} (k,1)}\n  \n  ![{\\\\displaystyle \\\\lim \\_{n\\\\to \\\\infty }n\\\\operatorname {Beta} (k,n)=\\\\operatorname {Gamma} (k,1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3f88394ad2fa55d9ea30ac2220c0f12befc29869)\n  the [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\").\n- For large\n  n\n  \n  {\\\\displaystyle n}\n  \n  ![{\\\\displaystyle n}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b)\n  ,\n  Beta\n  \n  ⁡\n  \n  (\n  \n  α\n  \n  n\n  \n  ,\n  \n  β\n  \n  n\n  \n  )\n  \n  →\n  \n  N\n  \n  (\n  \n  α\n  \n  α\n  \n  \\+\n  \n  β\n  \n  ,\n  \n  α\n  \n  β\n  \n  (\n  \n  α\n  \n  \\+\n  \n  β\n  \n  )\n  \n  3\n  \n  1\n  \n  n\n  \n  )\n  \n  {\\\\displaystyle \\\\operatorname {Beta} (\\\\alpha n,\\\\beta n)\\\\to {\\\\mathcal {N}}\\\\left({\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }},{\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{3}}}{\\\\frac {1}{n}}\\\\right)}\n  \n  ![{\\\\displaystyle \\\\operatorname {Beta} (\\\\alpha n,\\\\beta n)\\\\to {\\\\mathcal {N}}\\\\left({\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }},{\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{3}}}{\\\\frac {1}{n}}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/485d916ae0bdbd9f069c23bd938746587ed3b0ab)\n  the [normal distribution](https:\/\/en.wikipedia.org\/wiki\/Normal_distribution \"Normal distribution\"). More precisely, if\n  X\n  \n  n\n  \n  ∼\n  \n  Beta\n  \n  ⁡\n  \n  (\n  \n  α\n  \n  n\n  \n  ,\n  \n  β\n  \n  n\n  \n  )\n  \n  {\\\\displaystyle X\\_{n}\\\\sim \\\\operatorname {Beta} (\\\\alpha n,\\\\beta n)}\n  \n  ![{\\\\displaystyle X\\_{n}\\\\sim \\\\operatorname {Beta} (\\\\alpha n,\\\\beta n)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/39946932655f63cd48bdd025f3fdb545ec112dbb)\n  then\n  n\n  \n  (\n  \n  X\n  \n  n\n  \n  −\n  \n  α\n  \n  α\n  \n  \\+\n  \n  β\n  \n  )\n  \n  {\\\\displaystyle {\\\\sqrt {n}}\\\\left(X\\_{n}-{\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\right)}\n  \n  ![{\\\\displaystyle {\\\\sqrt {n}}\\\\left(X\\_{n}-{\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9fdaade6d88a840030d07b9c38e2c27d7020f872)\n  converges in distribution to a normal distribution with mean 0 and variance\n  α\n  \n  β\n  \n  (\n  \n  α\n  \n  \\+\n  \n  β\n  \n  )\n  \n  3\n  \n  {\\\\displaystyle {\\\\tfrac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{3}}}}\n  \n  ![{\\\\displaystyle {\\\\tfrac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{3}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0114023401152c80846cac38a22673de12583f03)\n  as *n* increases.\n\n### Derived from other distributions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=45 \"Edit section: Derived from other distributions\")\\]\n\n- The *k*th [order statistic](https:\/\/en.wikipedia.org\/wiki\/Order_statistic \"Order statistic\") of a sample of size *n* from the [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") is a beta random variable, *U*(*k*) ~ Beta(*k*, *n*\\+1−*k*).[\\[40\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David1-40)\n- [Gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\"): If *X* ~ Gamma(α, θ) and *Y* ~ Gamma(β, θ) are independent, then\n  X\n  \n  X\n  \n  \\+\n  \n  Y\n  \n  ∼\n  \n  Beta\n  \n  ⁡\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {X}{X+Y}}\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta )\\\\,}\n  \n  ![{\\\\displaystyle {\\\\tfrac {X}{X+Y}}\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta )\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5321274f06f425629d6e07f8bfac884b7508e30a)\n  .\n- [Chi-squared distribution](https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution \"Chi-squared distribution\"): If\n  X\n  \n  ∼\n  \n  χ\n  \n  2\n  \n  (\n  \n  α\n  \n  )\n  \n  {\\\\displaystyle X\\\\sim \\\\chi ^{2}(\\\\alpha )\\\\,}\n  \n  ![{\\\\displaystyle X\\\\sim \\\\chi ^{2}(\\\\alpha )\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d6335f0d8ec60cb75eb2d8203531be0c97677d09)\n  and\n  Y\n  \n  ∼\n  \n  χ\n  \n  2\n  \n  (\n  \n  β\n  \n  )\n  \n  {\\\\displaystyle Y\\\\sim \\\\chi ^{2}(\\\\beta )\\\\,}\n  \n  ![{\\\\displaystyle Y\\\\sim \\\\chi ^{2}(\\\\beta )\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b3190978e14ca37cc03cfb58de1cec4413034ad8)\n  are independent, then\n  X\n  \n  X\n  \n  \\+\n  \n  Y\n  \n  ∼\n  \n  Beta\n  \n  ⁡\n  \n  (\n  \n  α\n  \n  2\n  \n  ,\n  \n  β\n  \n  2\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {X}{X+Y}}\\\\sim \\\\operatorname {Beta} ({\\\\tfrac {\\\\alpha }{2}},{\\\\tfrac {\\\\beta }{2}})}\n  \n  ![{\\\\displaystyle {\\\\tfrac {X}{X+Y}}\\\\sim \\\\operatorname {Beta} ({\\\\tfrac {\\\\alpha }{2}},{\\\\tfrac {\\\\beta }{2}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9816b3e3f3991e974d32e38329d5fcecc0920914)\n  .\n- The [power transformation](https:\/\/en.wikipedia.org\/wiki\/Power_transformation_\\(statistics\\) \"Power transformation (statistics)\") for the uniform distribution: If *X* ~ U(0, 1) and *α* \\> 0 then *X*1\/*α* ~ Beta(*α*, 1).\n- [Cauchy distribution](https:\/\/en.wikipedia.org\/wiki\/Cauchy_distribution \"Cauchy distribution\"): If *X* ~ Cauchy(0, 1) then\n  1\n  \n  1\n  \n  \\+\n  \n  X\n  \n  2\n  \n  ∼\n  \n  Beta\n  \n  ⁡\n  \n  (\n  \n  1\n  \n  2\n  \n  ,\n  \n  1\n  \n  2\n  \n  )\n  \n  {\\\\displaystyle {\\\\tfrac {1}{1+X^{2}}}\\\\sim \\\\operatorname {Beta} \\\\left({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}}\\\\right)\\\\,}\n  \n  ![{\\\\displaystyle {\\\\tfrac {1}{1+X^{2}}}\\\\sim \\\\operatorname {Beta} \\\\left({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}}\\\\right)\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5f8be38fd0a97a9044aa661f4b4234a703b4e0e6)\n\n### Combination with other distributions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=46 \"Edit section: Combination with other distributions\")\\]\n\n- *X* ~ Beta(*α*, *β*) and *Y* ~ F(2*β*,2*α*) then\n  Pr\n  \n  (\n  \n  X\n  \n  ≤\n  \n  α\n  \n  α\n  \n  \\+\n  \n  β\n  \n  x\n  \n  )\n  \n  \\=\n  \n  Pr\n  \n  (\n  \n  Y\n  \n  ≥\n  \n  x\n  \n  )\n  \n  {\\\\displaystyle \\\\Pr(X\\\\leq {\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta x}})=\\\\Pr(Y\\\\geq x)\\\\,}\n  \n  ![{\\\\displaystyle \\\\Pr(X\\\\leq {\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta x}})=\\\\Pr(Y\\\\geq x)\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ea4e0f1de85d4ebad666d63b2448cb71bea790f3)\n  for all *x* \\> 0.\n\n### Compounding with other distributions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=47 \"Edit section: Compounding with other distributions\")\\]\n\n- If *p* ~ Beta(α, β) and *X* ~ Bin(*k*, *p*) then *X* ~ [beta-binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Beta-binomial_distribution \"Beta-binomial distribution\")\n- If *p* ~ Beta(α, β) and *X* ~ NB(*r*, *p*) then *X* ~ [beta negative binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_negative_binomial_distribution \"Beta negative binomial distribution\")\n\n### Generalisations\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=48 \"Edit section: Generalisations\")\\]\n\n- The generalization to multiple variables, i.e. a [multivariate Beta distribution](https:\/\/en.wikipedia.org\/wiki\/Dirichlet_distribution \"Dirichlet distribution\"), is called a [Dirichlet distribution](https:\/\/en.wikipedia.org\/wiki\/Dirichlet_distribution \"Dirichlet distribution\"). Univariate marginals of the Dirichlet distribution have a beta distribution. The beta distribution is [conjugate](https:\/\/en.wikipedia.org\/wiki\/Conjugate_prior \"Conjugate prior\") to the binomial and Bernoulli distributions in exactly the same way as the [Dirichlet distribution](https:\/\/en.wikipedia.org\/wiki\/Dirichlet_distribution \"Dirichlet distribution\") is conjugate to the [multinomial distribution](https:\/\/en.wikipedia.org\/wiki\/Multinomial_distribution \"Multinomial distribution\") and [categorical distribution](https:\/\/en.wikipedia.org\/wiki\/Categorical_distribution \"Categorical distribution\").\n- The [Pearson type I distribution](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution#The_Pearson_type_I_distribution \"Pearson distribution\") is identical to the beta distribution (except for arbitrary shifting and re-scaling that can also be accomplished with the four parameter parametrization of the beta distribution).\n- The beta distribution is the special case of the [noncentral beta distribution](https:\/\/en.wikipedia.org\/wiki\/Noncentral_beta_distribution \"Noncentral beta distribution\") where\n  λ\n  \n  \\=\n  \n  0\n  \n  {\\\\displaystyle \\\\lambda =0}\n  \n  ![{\\\\displaystyle \\\\lambda =0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/00c4bba30544017fe76932de5a4e25adb5512d95)\n  :\n  Beta\n  \n  ⁡\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  )\n  \n  \\=\n  \n  NonCentralBeta\n  \n  ⁡\n  \n  (\n  \n  α\n  \n  ,\n  \n  β\n  \n  ,\n  \n  0\n  \n  )\n  \n  {\\\\displaystyle \\\\operatorname {Beta} (\\\\alpha ,\\\\beta )=\\\\operatorname {NonCentralBeta} (\\\\alpha ,\\\\beta ,0)}\n  \n  ![{\\\\displaystyle \\\\operatorname {Beta} (\\\\alpha ,\\\\beta )=\\\\operatorname {NonCentralBeta} (\\\\alpha ,\\\\beta ,0)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5bfd1dc936f885eb37c9460ddcfc9252f7528c7f)\n  .\n- The [generalized beta distribution](https:\/\/en.wikipedia.org\/wiki\/Generalized_beta_distribution \"Generalized beta distribution\") is a five-parameter distribution family which has the beta distribution as a special case.\n- The [matrix variate beta distribution](https:\/\/en.wikipedia.org\/wiki\/Matrix_variate_beta_distribution \"Matrix variate beta distribution\") is a distribution for [positive-definite matrices](https:\/\/en.wikipedia.org\/wiki\/Positive-definite_matrices \"Positive-definite matrices\").\n\n## Statistical inference\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=49 \"Edit section: Statistical inference\")\\]\n\n### Parameter estimation\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=50 \"Edit section: Parameter estimation\")\\]\n\n#### Method of moments\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=51 \"Edit section: Method of moments\")\\]\n\n##### Two unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=52 \"Edit section: Two unknown parameters\")\\]\n\nTwo unknown parameters (( α ^ , β ^ ) {\\\\displaystyle ({\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }})} ![{\\\\displaystyle ({\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7877c02d2932e6ce21cf536d4f5bb3949fb7a285) of a beta distribution supported in the \\[0,1\\] interval) can be estimated, using the method of moments, with the first two moments (sample mean and sample variance) as follows. Let:\n\nsample mean(X) \\= x ¯ \\= 1 N ∑ i \\= 1 N X i {\\\\displaystyle {\\\\text{sample mean(X)}}={\\\\bar {x}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}X\\_{i}} ![{\\\\displaystyle {\\\\text{sample mean(X)}}={\\\\bar {x}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}X\\_{i}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bca659167e8ac6d0b9c74970a03d8e0ceea9cd20)\n\nbe the [sample mean](https:\/\/en.wikipedia.org\/wiki\/Sample_mean \"Sample mean\") estimate and\n\nsample variance(X) \\= v ¯ \\= 1 N − 1 ∑ i \\= 1 N ( X i − x ¯ ) 2 {\\\\displaystyle {\\\\text{sample variance(X)}}={\\\\bar {v}}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}\\\\left(X\\_{i}-{\\\\bar {x}}\\\\right)^{2}} ![{\\\\displaystyle {\\\\text{sample variance(X)}}={\\\\bar {v}}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}\\\\left(X\\_{i}-{\\\\bar {x}}\\\\right)^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b4817e917d747026cb3b0aeb8534247e3e67d7a5)\n\nbe the [sample variance](https:\/\/en.wikipedia.org\/wiki\/Sample_variance \"Sample variance\") estimate. The [method-of-moments](https:\/\/en.wikipedia.org\/wiki\/Method_of_moments_\\(statistics\\) \"Method of moments (statistics)\") estimates of the parameters are\n\nα ^ \\= x ¯ ( x ¯ ( 1 − x ¯ ) v ¯ − 1 ) if v ¯ \\< x ¯ ( 1 − x ¯ ) , {\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\bar {x}}\\\\left({\\\\frac {{\\\\bar {x}}(1-{\\\\bar {x}})}{\\\\bar {v}}}-1\\\\right)\\\\ {\\\\text{if}}\\\\ {\\\\bar {v}}\\<{\\\\bar {x}}(1-{\\\\bar {x}}),} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\bar {x}}\\\\left({\\\\frac {{\\\\bar {x}}(1-{\\\\bar {x}})}{\\\\bar {v}}}-1\\\\right)\\\\ {\\\\text{if}}\\\\ {\\\\bar {v}}\\<{\\\\bar {x}}(1-{\\\\bar {x}}),}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6897ae5cc4c9863166b6c18367d324cbf50b8086) β ^ \\= ( 1 − x ¯ ) ( x ¯ ( 1 − x ¯ ) v ¯ − 1 ) if v ¯ \\< x ¯ ( 1 − x ¯ ) . {\\\\displaystyle {\\\\hat {\\\\beta }}=(1-{\\\\bar {x}})\\\\left({\\\\frac {{\\\\bar {x}}(1-{\\\\bar {x}})}{\\\\bar {v}}}-1\\\\right)\\\\ {\\\\text{if}}\\\\ {\\\\bar {v}}\\<{\\\\bar {x}}(1-{\\\\bar {x}}).} ![{\\\\displaystyle {\\\\hat {\\\\beta }}=(1-{\\\\bar {x}})\\\\left({\\\\frac {{\\\\bar {x}}(1-{\\\\bar {x}})}{\\\\bar {v}}}-1\\\\right)\\\\ {\\\\text{if}}\\\\ {\\\\bar {v}}\\<{\\\\bar {x}}(1-{\\\\bar {x}}).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/622e18f3fe8fa5a34032e278238eff46fcd5f646)\n\nWhen the distribution is required over a known interval other than \\[0, 1\\] with random variable *X*, say \\[*a*, *c*\\] with random variable *Y*, then replace x ¯ {\\\\displaystyle {\\\\bar {x}}} ![{\\\\displaystyle {\\\\bar {x}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/466e03e1c9533b4dab1b9949dad393883f385d80) with y ¯ − a c − a , {\\\\displaystyle {\\\\frac {{\\\\bar {y}}-a}{c-a}},} ![{\\\\displaystyle {\\\\frac {{\\\\bar {y}}-a}{c-a}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc1c50b9065f399be03b7ccfe3e4cfd1df2c228b) and v ¯ {\\\\displaystyle {\\\\bar {v}}} ![{\\\\displaystyle {\\\\bar {v}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba1d09340f8f6c1979330c2f23e514e38f243a3b) with v Y ¯ ( c − a ) 2 {\\\\displaystyle {\\\\frac {\\\\bar {v\\_{Y}}}{(c-a)^{2}}}} ![{\\\\displaystyle {\\\\frac {\\\\bar {v\\_{Y}}}{(c-a)^{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/891375739df6a6c363dd4836de599f29de1c790e) in the above couple of equations for the shape parameters (see the \"Four unknown parameters\" section below),[\\[41\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-41) where:\n\nsample mean(Y) \\= y ¯ \\= 1 N ∑ i \\= 1 N Y i {\\\\displaystyle {\\\\text{sample mean(Y)}}={\\\\bar {y}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}Y\\_{i}} ![{\\\\displaystyle {\\\\text{sample mean(Y)}}={\\\\bar {y}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}Y\\_{i}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/064cd9b29b4931f7d973c01358f9b76979148e17) sample variance(Y) \\= v ¯ Y \\= 1 N − 1 ∑ i \\= 1 N ( Y i − y ¯ ) 2 {\\\\displaystyle {\\\\text{sample variance(Y)}}={\\\\bar {v}}\\_{Y}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}\\\\left(Y\\_{i}-{\\\\bar {y}}\\\\right)^{2}} ![{\\\\displaystyle {\\\\text{sample variance(Y)}}={\\\\bar {v}}\\_{Y}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}\\\\left(Y\\_{i}-{\\\\bar {y}}\\\\right)^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/06ce4bb898c414da5e456519e24e270244e3d401)\n\n##### Four unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=53 \"Edit section: Four unknown parameters\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png\/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:\\(alpha_and_beta\\)_Parameter_estimates_vs._excess_Kurtosis_and_\\(squared\\)_Skewness_Beta_distribution_-_J._Rodal.png)\n\nSolutions for parameter estimates vs. (sample) excess Kurtosis and (sample) squared Skewness Beta distribution\n\nAll four parameters (α ^ , β ^ , a ^ , c ^ {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }},{\\\\hat {a}},{\\\\hat {c}}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }},{\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8116b37df2fff6248cb3bce7dd137af10ed8e5ab) of a beta distribution supported in the \\[*a*, *c*\\] interval, see section [\"Alternative parametrizations, Four parameters\"](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Four_parameters)) can be estimated, using the method of moments developed by [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\"), by equating sample and population values of the first four central moments (mean, variance, skewness and excess kurtosis).[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)[\\[43\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton_and_Johnson-43) The excess kurtosis was expressed in terms of the square of the skewness, and the sample size ν = α + β, (see previous section [\"Kurtosis\"](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis)) as follows:\n\nexcess kurtosis \\= 6 3 \\+ ν ( ( 2 \\+ ν ) 4 ( skewness ) 2 − 1 ) if (skewness) 2 − 2 \\< excess kurtosis \\< 3 2 ( skewness ) 2 {\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}\\\\left({\\\\frac {(2+\\\\nu )}{4}}({\\\\text{skewness}})^{2}-1\\\\right){\\\\text{ if (skewness)}}^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}} ![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}\\\\left({\\\\frac {(2+\\\\nu )}{4}}({\\\\text{skewness}})^{2}-1\\\\right){\\\\text{ if (skewness)}}^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/40ad8aea80a012f7bbb462295760a9c2c6b2ea49)\n\nOne can use this equation to solve for the sample size ν= α + β in terms of the square of the skewness and the excess kurtosis as follows:[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)\n\nν ^ \\= α ^ \\+ β ^ \\= 3 ( sample excess kurtosis ) − ( sample skewness ) 2 \\+ 2 3 2 ( sample skewness ) 2 − (sample excess kurtosis) {\\\\displaystyle {\\\\hat {\\\\nu }}={\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}=3{\\\\frac {({\\\\text{sample excess kurtosis}})-({\\\\text{sample skewness}})^{2}+2}{{\\\\frac {3}{2}}({\\\\text{sample skewness}})^{2}-{\\\\text{(sample excess kurtosis)}}}}} ![{\\\\displaystyle {\\\\hat {\\\\nu }}={\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}=3{\\\\frac {({\\\\text{sample excess kurtosis}})-({\\\\text{sample skewness}})^{2}+2}{{\\\\frac {3}{2}}({\\\\text{sample skewness}})^{2}-{\\\\text{(sample excess kurtosis)}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f921c58ebdd7caa136034aee66eb29c214a96ff0) if (sample skewness) 2 − 2 \\< sample excess kurtosis \\< 3 2 ( sample skewness ) 2 {\\\\displaystyle {\\\\text{ if (sample skewness)}}^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}} ![{\\\\displaystyle {\\\\text{ if (sample skewness)}}^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2)\n\nThis is the ratio (multiplied by a factor of 3) between the previously derived limit boundaries for the beta distribution in a space (as originally done by Karl Pearson[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21)) defined with coordinates of the square of the skewness in one axis and the excess kurtosis in the other axis (see [§ Kurtosis bounded by the square of the skewness](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness)):\n\nThe case of zero skewness, can be immediately solved because for zero skewness, *α* = *β* and hence *ν* = 2*α* = 2*β*, therefore *α* = *β* = *ν*\/2\n\nα ^ \\= β ^ \\= ν ^ 2 \\= 3 2 ( sample excess kurtosis ) \\+ 3 − (sample excess kurtosis) {\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\hat {\\\\beta }}={\\\\frac {\\\\hat {\\\\nu }}{2}}={\\\\frac {{\\\\frac {3}{2}}({\\\\text{sample excess kurtosis}})+3}{-{\\\\text{(sample excess kurtosis)}}}}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\hat {\\\\beta }}={\\\\frac {\\\\hat {\\\\nu }}{2}}={\\\\frac {{\\\\frac {3}{2}}({\\\\text{sample excess kurtosis}})+3}{-{\\\\text{(sample excess kurtosis)}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e7f2ee658c3932894542979710e9495be0ff74a5) if sample skewness \\= 0 and − 2 \\< sample excess kurtosis \\< 0 {\\\\displaystyle {\\\\text{ if sample skewness}}=0{\\\\text{ and }}-2\\<{\\\\text{sample excess kurtosis}}\\<0} ![{\\\\displaystyle {\\\\text{ if sample skewness}}=0{\\\\text{ and }}-2\\<{\\\\text{sample excess kurtosis}}\\<0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8f99d8fb52fee902bf792a5ca7699a79828a212c)\n\n(Excess kurtosis is negative for the beta distribution with zero skewness, ranging from -2 to 0, so that ν ^ {\\\\displaystyle {\\\\hat {\\\\nu }}} ![{\\\\displaystyle {\\\\hat {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba8c4f0785c6b4c01435dcc0aa5b9cfba84bb1c3) -and therefore the sample shape parameters- is positive, ranging from zero when the shape parameters approach zero and the excess kurtosis approaches -2, to infinity when the shape parameters approach infinity and the excess kurtosis approaches zero).\n\nFor non-zero sample skewness one needs to solve a system of two coupled equations. Since the skewness and the excess kurtosis are independent of the parameters a ^ , c ^ {\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}} ![{\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bfd585fc7cfe1831b2ddc258427b2e6ca017195d), the parameters α ^ , β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) can be uniquely determined from the sample skewness and the sample excess kurtosis, by solving the coupled equations with two known variables (sample skewness and sample excess kurtosis) and two unknowns (the shape parameters):\n\n( sample skewness ) 2 \\= 4 ( β ^ − α ^ ) 2 ( 1 \\+ α ^ \\+ β ^ ) α ^ β ^ ( 2 \\+ α ^ \\+ β ^ ) 2 {\\\\displaystyle ({\\\\text{sample skewness}})^{2}={\\\\frac {4\\\\left({\\\\hat {\\\\beta }}-{\\\\hat {\\\\alpha }}\\\\right)^{2}\\\\left(1+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}\\\\right)}{{\\\\hat {\\\\alpha }}{\\\\hat {\\\\beta }}\\\\left(2+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}\\\\right)^{2}}}} ![{\\\\displaystyle ({\\\\text{sample skewness}})^{2}={\\\\frac {4\\\\left({\\\\hat {\\\\beta }}-{\\\\hat {\\\\alpha }}\\\\right)^{2}\\\\left(1+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}\\\\right)}{{\\\\hat {\\\\alpha }}{\\\\hat {\\\\beta }}\\\\left(2+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}\\\\right)^{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1f4c42ca60d8120a9ca85e67c6f6e722b7cbeb8d) sample excess kurtosis \\= 6 3 \\+ α ^ \\+ β ^ ( ( 2 \\+ α ^ \\+ β ^ ) 4 ( sample skewness ) 2 − 1 ) {\\\\displaystyle {\\\\text{sample excess kurtosis}}={\\\\frac {6}{3+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}}}\\\\left({\\\\frac {(2+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}{4}}({\\\\text{sample skewness}})^{2}-1\\\\right)} ![{\\\\displaystyle {\\\\text{sample excess kurtosis}}={\\\\frac {6}{3+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}}}\\\\left({\\\\frac {(2+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}{4}}({\\\\text{sample skewness}})^{2}-1\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6e903b748dd46c02966e45c5444f51314c452937) if (sample skewness) 2 − 2 \\< sample excess kurtosis \\< 3 2 ( sample skewness ) 2 {\\\\displaystyle {\\\\text{ if (sample skewness)}}^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}} ![{\\\\displaystyle {\\\\text{ if (sample skewness)}}^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2)\n\nresulting in the following solution:[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)\n\nα ^ , β ^ \\= ν ^ 2 ( 1 ± 1 1 \\+ 16 ( ν ^ \\+ 1 ) ( ν ^ \\+ 2 ) 2 ( sample skewness ) 2 ) {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}={\\\\frac {\\\\hat {\\\\nu }}{2}}\\\\left(1\\\\pm {\\\\frac {1}{\\\\sqrt {1+{\\\\frac {16({\\\\hat {\\\\nu }}+1)}{({\\\\hat {\\\\nu }}+2)^{2}({\\\\text{sample skewness}})^{2}}}}}}\\\\right)} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}={\\\\frac {\\\\hat {\\\\nu }}{2}}\\\\left(1\\\\pm {\\\\frac {1}{\\\\sqrt {1+{\\\\frac {16({\\\\hat {\\\\nu }}+1)}{({\\\\hat {\\\\nu }}+2)^{2}({\\\\text{sample skewness}})^{2}}}}}}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9a9b9a43935818dc6c3a75fb41b49d803b2d2741)\n\nif sample skewness ≠ 0 and ( sample skewness ) 2 − 2 \\< sample excess kurtosis \\< 3 2 ( sample skewness ) 2 {\\\\displaystyle {\\\\text{ if sample skewness}}\\\\neq 0{\\\\text{ and }}({\\\\text{sample skewness}})^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}} ![{\\\\displaystyle {\\\\text{ if sample skewness}}\\\\neq 0{\\\\text{ and }}({\\\\text{sample skewness}})^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8d876ee057046414ad35c2551c997575576d025d)\n\nWhere one should take the solutions as follows: α ^ \\> β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }}\\>{\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\>{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bba17be3bb65a91cb1d98c314aa0545401c2109) for (negative) sample skewness \\< 0, and α ^ \\< β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }}\\<{\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\<{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad87e83b4996b5b82de8452d1f861db48a0987ff) for (positive) sample skewness \\> 0.\n\nThe accompanying plot shows these two solutions as surfaces in a space with horizontal axes of (sample excess kurtosis) and (sample squared skewness) and the shape parameters as the vertical axis. The surfaces are constrained by the condition that the sample excess kurtosis must be bounded by the sample squared skewness as stipulated in the above equation. The two surfaces meet at the right edge defined by zero skewness. Along this right edge, both parameters are equal and the distribution is symmetric U-shaped for α = β \\< 1, uniform for α = β = 1, upside-down-U-shaped for 1 \\< α = β \\< 2 and bell-shaped for α = β \\> 2. The surfaces also meet at the front (lower) edge defined by \"the impossible boundary\" line (excess kurtosis + 2 - skewness2 = 0). Along this front (lower) boundary both shape parameters approach zero, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities p \\= β α \\+ β {\\\\displaystyle p={\\\\tfrac {\\\\beta }{\\\\alpha +\\\\beta }}} ![{\\\\displaystyle p={\\\\tfrac {\\\\beta }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and q \\= 1 − p \\= α α \\+ β {\\\\displaystyle q=1-p={\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}} ![{\\\\displaystyle q=1-p={\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1. The two surfaces become further apart towards the rear edge. At this rear edge the surface parameters are quite different from each other. As remarked, for example, by Bowman and Shenton,[\\[44\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BowmanShenton-44) sampling in the neighborhood of the line (sample excess kurtosis - (3\/2)(sample skewness)2 = 0) (the just-J-shaped portion of the rear edge where blue meets beige), \"is dangerously near to chaos\", because at that line the denominator of the expression above for the estimate ν = α + β becomes zero and hence ν approaches infinity as that line is approached. Bowman and Shenton [\\[44\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BowmanShenton-44) write that \"the higher moment parameters (kurtosis and skewness) are extremely fragile (near that line). However, the mean and standard deviation are fairly reliable.\" Therefore, the problem is for the case of four parameter estimation for very skewed distributions such that the excess kurtosis approaches (3\/2) times the square of the skewness. This boundary line is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. See [§ Kurtosis bounded by the square of the skewness](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness) for a numerical example and further comments about this rear edge boundary line (sample excess kurtosis - (3\/2)(sample skewness)2 = 0). As remarked by Karl Pearson himself [\\[45\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1936-45) this issue may not be of much practical importance as this trouble arises only for very skewed J-shaped (or mirror-image J-shaped) distributions with very different values of shape parameters that are unlikely to occur much in practice). The usual skewed-bell-shape distributions that occur in practice do not have this parameter estimation problem.\n\nThe remaining two parameters a ^ , c ^ {\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}} ![{\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) can be determined using the sample mean and the sample variance using a variety of equations.[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) One alternative is to calculate the support interval range ( c ^ − a ^ ) {\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})} ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample kurtosis. For this purpose one can solve, in terms of the range ( c ^ − a ^ ) {\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})} ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the excess kurtosis in terms of the sample variance, and the sample size ν (see [§ Kurtosis](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis) and [§ Alternative parametrizations, four parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Alternative_parametrizations,_four_parameters)):\n\nsample excess kurtosis \\= 6 ( 3 \\+ ν ^ ) ( 2 \\+ ν ^ ) ( ( c ^ − a ^ ) 2 (sample variance) − 6 − 5 ν ^ ) {\\\\displaystyle {\\\\text{sample excess kurtosis}}={\\\\frac {6}{(3+{\\\\hat {\\\\nu }})(2+{\\\\hat {\\\\nu }})}}{\\\\bigg (}{\\\\frac {({\\\\hat {c}}-{\\\\hat {a}})^{2}}{\\\\text{(sample variance)}}}-6-5{\\\\hat {\\\\nu }}{\\\\bigg )}} ![{\\\\displaystyle {\\\\text{sample excess kurtosis}}={\\\\frac {6}{(3+{\\\\hat {\\\\nu }})(2+{\\\\hat {\\\\nu }})}}{\\\\bigg (}{\\\\frac {({\\\\hat {c}}-{\\\\hat {a}})^{2}}{\\\\text{(sample variance)}}}-6-5{\\\\hat {\\\\nu }}{\\\\bigg )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0c460925d92ea38bdf5416e50b2dff13d5293329)\n\nto obtain:\n\n( c ^ − a ^ ) \\= (sample variance) 6 \\+ 5 ν ^ \\+ ( 2 \\+ ν ^ ) ( 3 \\+ ν ^ ) 6 (sample excess kurtosis) {\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})={\\\\sqrt {\\\\text{(sample variance)}}}{\\\\sqrt {6+5{\\\\hat {\\\\nu }}+{\\\\frac {(2+{\\\\hat {\\\\nu }})(3+{\\\\hat {\\\\nu }})}{6}}{\\\\text{(sample excess kurtosis)}}}}} ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})={\\\\sqrt {\\\\text{(sample variance)}}}{\\\\sqrt {6+5{\\\\hat {\\\\nu }}+{\\\\frac {(2+{\\\\hat {\\\\nu }})(3+{\\\\hat {\\\\nu }})}{6}}{\\\\text{(sample excess kurtosis)}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7cdf062d6ad446f67133c1f259b23d81dc80c0e9)\n\nAnother alternative is to calculate the support interval range ( c ^ − a ^ ) {\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})} ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample skewness.[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) For this purpose one can solve, in terms of the range ( c ^ − a ^ ) {\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})} ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the squared skewness in terms of the sample variance, and the sample size ν (see section titled \"Skewness\" and \"Alternative parametrizations, four parameters\"):\n\n( sample skewness ) 2 \\= 4 ( 2 \\+ ν ^ ) 2 ( ( c ^ − a ^ ) 2 (sample variance) − 4 ( 1 \\+ ν ^ ) ) {\\\\displaystyle ({\\\\text{sample skewness}})^{2}={\\\\frac {4}{(2+{\\\\hat {\\\\nu }})^{2}}}{\\\\bigg (}{\\\\frac {({\\\\hat {c}}-{\\\\hat {a}})^{2}}{\\\\text{(sample variance)}}}-4(1+{\\\\hat {\\\\nu }}){\\\\bigg )}} ![{\\\\displaystyle ({\\\\text{sample skewness}})^{2}={\\\\frac {4}{(2+{\\\\hat {\\\\nu }})^{2}}}{\\\\bigg (}{\\\\frac {({\\\\hat {c}}-{\\\\hat {a}})^{2}}{\\\\text{(sample variance)}}}-4(1+{\\\\hat {\\\\nu }}){\\\\bigg )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d92409417c7a56192ac25d364a949bbb6eade754)\n\nto obtain:[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)\n\n( c ^ − a ^ ) \\= (sample variance) 2 ( 2 \\+ ν ^ ) 2 ( sample skewness ) 2 \\+ 16 ( 1 \\+ ν ^ ) {\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})={\\\\frac {\\\\sqrt {\\\\text{(sample variance)}}}{2}}{\\\\sqrt {(2+{\\\\hat {\\\\nu }})^{2}({\\\\text{sample skewness}})^{2}+16(1+{\\\\hat {\\\\nu }})}}} ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})={\\\\frac {\\\\sqrt {\\\\text{(sample variance)}}}{2}}{\\\\sqrt {(2+{\\\\hat {\\\\nu }})^{2}({\\\\text{sample skewness}})^{2}+16(1+{\\\\hat {\\\\nu }})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/94357c829648f16ff30339863e46cb9bc0755da6)\n\nThe remaining parameter can be determined from the sample mean and the previously obtained parameters: ( c ^ − a ^ ) , α ^ , ν ^ \\= α ^ \\+ β ^ {\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}}),{\\\\hat {\\\\alpha }},{\\\\hat {\\\\nu }}={\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}} ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}}),{\\\\hat {\\\\alpha }},{\\\\hat {\\\\nu }}={\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dedfbdca756f6074846d73f732b0289a0751749b):\n\na ^ \\= ( sample mean ) − ( α ^ ν ^ ) ( c ^ − a ^ ) {\\\\displaystyle {\\\\hat {a}}=({\\\\text{sample mean}})-\\\\left({\\\\frac {\\\\hat {\\\\alpha }}{\\\\hat {\\\\nu }}}\\\\right)({\\\\hat {c}}-{\\\\hat {a}})} ![{\\\\displaystyle {\\\\hat {a}}=({\\\\text{sample mean}})-\\\\left({\\\\frac {\\\\hat {\\\\alpha }}{\\\\hat {\\\\nu }}}\\\\right)({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5cd4a8f52bbe61a10c591db75f2ac8551280d692)\n\nand finally, c ^ \\= ( c ^ − a ^ ) \\+ a ^ {\\\\displaystyle {\\\\hat {c}}=({\\\\hat {c}}-{\\\\hat {a}})+{\\\\hat {a}}} ![{\\\\displaystyle {\\\\hat {c}}=({\\\\hat {c}}-{\\\\hat {a}})+{\\\\hat {a}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a703629c1b8091aacfe6a7f8f104ee1f592893db).\n\nIn the above formulas one may take, for example, as estimates of the sample moments:\n\nsample mean \\= y ¯ \\= 1 N ∑ i \\= 1 N Y i sample variance \\= v ¯ Y \\= 1 N − 1 ∑ i \\= 1 N ( Y i − y ¯ ) 2 sample skewness \\= G 1 \\= N ( N − 1 ) ( N − 2 ) ∑ i \\= 1 N ( Y i − y ¯ ) 3 v ¯ Y 3 2 sample excess kurtosis \\= G 2 \\= N ( N \\+ 1 ) ( N − 1 ) ( N − 2 ) ( N − 3 ) ∑ i \\= 1 N ( Y i − y ¯ ) 4 v ¯ Y 2 − 3 ( N − 1 ) 2 ( N − 2 ) ( N − 3 ) {\\\\displaystyle {\\\\begin{aligned}{\\\\text{sample mean}}&={\\\\overline {y}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}Y\\_{i}\\\\\\\\{\\\\text{sample variance}}&={\\\\overline {v}}\\_{Y}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{2}\\\\\\\\{\\\\text{sample skewness}}&=G\\_{1}={\\\\frac {N}{(N-1)(N-2)}}{\\\\frac {\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{3}}{{\\\\overline {v}}\\_{Y}^{\\\\frac {3}{2}}}}\\\\\\\\{\\\\text{sample excess kurtosis}}&=G\\_{2}={\\\\frac {N(N+1)}{(N-1)(N-2)(N-3)}}{\\\\frac {\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{4}}{{\\\\overline {v}}\\_{Y}^{2}}}-{\\\\frac {3(N-1)^{2}}{(N-2)(N-3)}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{sample mean}}&={\\\\overline {y}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}Y\\_{i}\\\\\\\\{\\\\text{sample variance}}&={\\\\overline {v}}\\_{Y}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{2}\\\\\\\\{\\\\text{sample skewness}}&=G\\_{1}={\\\\frac {N}{(N-1)(N-2)}}{\\\\frac {\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{3}}{{\\\\overline {v}}\\_{Y}^{\\\\frac {3}{2}}}}\\\\\\\\{\\\\text{sample excess kurtosis}}&=G\\_{2}={\\\\frac {N(N+1)}{(N-1)(N-2)(N-3)}}{\\\\frac {\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{4}}{{\\\\overline {v}}\\_{Y}^{2}}}-{\\\\frac {3(N-1)^{2}}{(N-2)(N-3)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7afd9f1a8604887fe11cf57117dfea6848023586)\n\nThe estimators *G*1 for [sample skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") and *G*2 for [sample kurtosis](https:\/\/en.wikipedia.org\/wiki\/Kurtosis \"Kurtosis\") are used by [DAP](https:\/\/en.wikipedia.org\/wiki\/DAP_\\(software\\) \"DAP (software)\")\/[SAS](https:\/\/en.wikipedia.org\/wiki\/SAS_System \"SAS System\"), [PSPP](https:\/\/en.wikipedia.org\/wiki\/PSPP \"PSPP\")\/[SPSS](https:\/\/en.wikipedia.org\/wiki\/SPSS \"SPSS\"), and [Excel](https:\/\/en.wikipedia.org\/wiki\/Microsoft_Excel \"Microsoft Excel\"). However, they are not used by [BMDP](https:\/\/en.wikipedia.org\/wiki\/BMDP \"BMDP\") and (according to [\\[46\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Joanes_and_Gill-46)) they were not used by [MINITAB](https:\/\/en.wikipedia.org\/wiki\/MINITAB \"MINITAB\") in 1998. Actually, Joanes and Gill in their 1998 study[\\[46\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Joanes_and_Gill-46) concluded that the skewness and kurtosis estimators used in [BMDP](https:\/\/en.wikipedia.org\/wiki\/BMDP \"BMDP\") and in [MINITAB](https:\/\/en.wikipedia.org\/wiki\/MINITAB \"MINITAB\") (at that time) had smaller variance and mean-squared error in normal samples, but the skewness and kurtosis estimators used in [DAP](https:\/\/en.wikipedia.org\/wiki\/DAP_\\(software\\) \"DAP (software)\")\/[SAS](https:\/\/en.wikipedia.org\/wiki\/SAS_System \"SAS System\"), [PSPP](https:\/\/en.wikipedia.org\/wiki\/PSPP \"PSPP\")\/[SPSS](https:\/\/en.wikipedia.org\/wiki\/SPSS \"SPSS\"), namely *G*1 and *G*2, had smaller mean-squared error in samples from a very skewed distribution. It is for this reason that we have spelled out \"sample skewness\", etc., in the above formulas, to make it explicit that the user should choose the best estimator according to the problem at hand, as the best estimator for skewness and kurtosis depends on the amount of skewness (as shown by Joanes and Gill[\\[46\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Joanes_and_Gill-46)).\n\n#### Maximum likelihood\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=54 \"Edit section: Maximum likelihood\")\\]\n\n##### Two unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=55 \"Edit section: Two unknown parameters\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/5\/58\/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png\/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Max_\\(Joint_Log_Likelihood_per_N\\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png)\n\nMax (joint log likelihood\/*N*) for beta distribution maxima at *α* = *β* = 2\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1d\/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png\/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Max_\\(Joint_Log_Likelihood_per_N\\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25,0.5,1,2,4,6,8_-_J._Rodal.png)\n\nMax (joint log likelihood\/*N*) for Beta distribution maxima at *α* = *β* ∈ {0.25,0.5,1,2,4,6,8}\n\nAs is also the case for [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimates for the [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\"), the maximum likelihood estimates for the beta distribution do not have a general closed form solution for arbitrary values of the shape parameters. If *X*1, ..., *XN* are independent random variables each having a beta distribution, the joint log likelihood function for *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\nln L ( α , β ∣ X ) \\= ∑ i \\= 1 N ln ⁡ L i ( α , β ∣ X i ) \\= ∑ i \\= 1 N ln ⁡ f ( X i ; α , β ) \\= ∑ i \\= 1 N ln ⁡ X i α − 1 ( 1 − X i ) β − 1 B ( α , β ) \\= ( α − 1 ) ∑ i \\= 1 N ln ⁡ X i \\+ ( β − 1 ) ∑ i \\= 1 N ln ⁡ ( 1 − X i ) − N ln ⁡ B ( α , β ) {\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\,{\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\mathcal {L}}\\_{i}(\\\\alpha ,\\\\beta \\\\mid X\\_{i})\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln f(X\\_{i};\\\\alpha ,\\\\beta )\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}^{\\\\alpha -1}(1-X\\_{i})^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\\\\\&=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\,{\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\mathcal {L}}\\_{i}(\\\\alpha ,\\\\beta \\\\mid X\\_{i})\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln f(X\\_{i};\\\\alpha ,\\\\beta )\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}^{\\\\alpha -1}(1-X\\_{i})^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\\\\\&=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ca9d8693adcca31abd266e44f7de59dffc5e0b17)\n\nFinding the maximum with respect to a shape parameter involves taking the [partial derivative](https:\/\/en.wikipedia.org\/wiki\/Partial_derivative \"Partial derivative\") with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimator of the shape parameters:\n\n∂ ln ⁡ L ( α , β ∣ X ) ∂ α \\= ∑ i \\= 1 N ln ⁡ X i − N ∂ ln ⁡ B ( α , β ) ∂ α \\= 0 {\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha }}=\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}-N{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}=0} ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha }}=\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}-N{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d936dd94d5ad4b3c27e12654cb07764bcead5284) ∂ ln ⁡ L ( α , β ∣ X ) ∂ β \\= ∑ i \\= 1 N ln ⁡ ( 1 − X i ) − N ∂ ln ⁡ B ( α , β ) ∂ β \\= 0 {\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta }}=\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta }}=0} ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta }}=\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta }}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/12538d1b65457b831fbd8127a4820d24632decc1)\n\nwhere:\n\n∂ ln ⁡ B ( α , β ) ∂ α \\= − ∂ ln ⁡ Γ ( α \\+ β ) ∂ α \\+ ∂ ln ⁡ Γ ( α ) ∂ α \\+ ∂ ln ⁡ Γ ( β ) ∂ α \\= − ψ ( α \\+ β ) \\+ ψ ( α ) \\+ 0 {\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}&=-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\alpha }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[1ex\\]&=-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\alpha )+0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}&=-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\alpha }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[1ex\\]&=-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\alpha )+0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/82bf10edac73617ec99a3adbad3ff020391c4a71) ∂ ln ⁡ B ( α , β ) ∂ β \\= − ∂ ln ⁡ Γ ( α \\+ β ) ∂ β \\+ ∂ ln ⁡ Γ ( α ) ∂ β \\+ ∂ ln ⁡ Γ ( β ) ∂ β \\= − ψ ( α \\+ β ) \\+ 0 \\+ ψ ( β ) {\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta }}&=-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\beta }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\beta }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\beta )}{\\\\partial \\\\beta }}\\\\\\\\\\[1ex\\]&=-\\\\psi (\\\\alpha +\\\\beta )+0+\\\\psi (\\\\beta )\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta }}&=-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\beta }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\beta }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\beta )}{\\\\partial \\\\beta }}\\\\\\\\\\[1ex\\]&=-\\\\psi (\\\\alpha +\\\\beta )+0+\\\\psi (\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/10b097547a81011b4e212824977d66b5350dc780)\n\nsince the **[digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\")** denoted ψ(α) is defined as the [logarithmic derivative](https:\/\/en.wikipedia.org\/wiki\/Logarithmic_derivative \"Logarithmic derivative\") of the [gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"):[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18)\n\nψ ( α ) \\= ∂ ln ⁡ Γ ( α ) ∂ α {\\\\displaystyle \\\\psi (\\\\alpha )={\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}} ![{\\\\displaystyle \\\\psi (\\\\alpha )={\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8a36357d4b6ef30c0ff68e5a25546b7873e11bd4)\n\nTo ensure that the values with zero tangent slope are indeed a maximum (instead of a saddle-point or a minimum) one has to also satisfy the condition that the curvature is negative. This amounts to satisfying that the second partial derivative with respect to the shape parameters is negative\n\n∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ α 2 \\= − N ∂ 2 ln ⁡ B ( α , β ) ∂ α 2 \\< 0 {\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha ^{2}}}=-N{\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha ^{2}}}\\<0} ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha ^{2}}}=-N{\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha ^{2}}}\\<0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4179be42b0a1f18afa18258ee78745457a592e5e) ∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ β 2 \\= − N ∂ 2 ln ⁡ B ( α , β ) ∂ β 2 \\< 0 {\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta ^{2}}}=-N{\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta ^{2}}}\\<0} ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta ^{2}}}=-N{\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta ^{2}}}\\<0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bf0985833d003549771de21323f1db4aee770c7a)\n\nusing the previous equations, this is equivalent to:\n\n∂ 2 ln ⁡ B ( α , β ) ∂ α 2 \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) \\> 0 {\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha ^{2}}}=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\>0} ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha ^{2}}}=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1e6864a9409f9bba6a7bdda1e43695bad6c61cba) ∂ 2 ln ⁡ B ( α , β ) ∂ β 2 \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) \\> 0 {\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta ^{2}}}=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\>0} ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta ^{2}}}=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c8355a7e7f6fa44f71e366b668191826ad5b051b)\n\nwhere the **[trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), and is defined as the derivative of the [digamma](https:\/\/en.wikipedia.org\/wiki\/Digamma \"Digamma\") function:\n\nψ 1 ( α ) \\= ∂ 2 ln ⁡ Γ ( α ) ∂ α 2 \\= ∂ ψ ( α ) ∂ α . {\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {\\\\partial ^{2}\\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha ^{2}}}=\\\\,{\\\\frac {\\\\partial \\\\,\\\\psi (\\\\alpha )}{\\\\partial \\\\alpha }}.} ![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {\\\\partial ^{2}\\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha ^{2}}}=\\\\,{\\\\frac {\\\\partial \\\\,\\\\psi (\\\\alpha )}{\\\\partial \\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/569f3595daf4226763ad208ce13b94c23b84c783)\n\nThese conditions are equivalent to stating that the variances of the logarithmically transformed variables are positive, since:\n\nvar ⁡ \\[ ln ⁡ ( X ) \\] \\= E ⁡ \\[ ln 2 ⁡ ( X ) \\] − ( E ⁡ \\[ ln ⁡ ( X ) \\] ) 2 \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\operatorname {E} \\[\\\\ln ^{2}(X)\\]-(\\\\operatorname {E} \\[\\\\ln(X)\\])^{2}=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\operatorname {E} \\[\\\\ln ^{2}(X)\\]-(\\\\operatorname {E} \\[\\\\ln(X)\\])^{2}=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d7737d681fea7490e27f8760c6bcc8fccb154904) var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= E ⁡ \\[ ln 2 ⁡ ( 1 − X ) \\] − ( E ⁡ \\[ ln ⁡ ( 1 − X ) \\] ) 2 \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) {\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\operatorname {E} \\[\\\\ln ^{2}(1-X)\\]-(\\\\operatorname {E} \\[\\\\ln(1-X)\\])^{2}=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )} ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\operatorname {E} \\[\\\\ln ^{2}(1-X)\\]-(\\\\operatorname {E} \\[\\\\ln(1-X)\\])^{2}=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f84c3747955206cf2190c61bd7875a6cd739ac04)\n\nTherefore, the condition of negative curvature at a maximum is equivalent to the statements:\n\nvar ⁡ \\[ ln ⁡ ( X ) \\] \\> 0 {\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(X)\\]\\>0} ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(X)\\]\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5fb5a5d0db057469fb9dad8df2902fe93e3f3b0d) var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\> 0 {\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]\\>0} ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c66a5aaa8362f578beb9b141b4108138b9d21e89)\n\nAlternatively, the condition of negative curvature at a maximum is also equivalent to stating that the following [logarithmic derivatives](https:\/\/en.wikipedia.org\/wiki\/Logarithmic_derivative \"Logarithmic derivative\") of the [geometric means](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") *GX* and *G(1−X)* are positive, since:\n\nψ 1 ( α ) − ψ 1 ( α \\+ β ) \\= ∂ ln ⁡ G X ∂ α \\> 0 {\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\frac {\\\\partial \\\\ln G\\_{X}}{\\\\partial \\\\alpha }}\\>0} ![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\frac {\\\\partial \\\\ln G\\_{X}}{\\\\partial \\\\alpha }}\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dff4369c6551204082795c96a76d02e1c3c09f1d) ψ 1 ( β ) − ψ 1 ( α \\+ β ) \\= ∂ ln ⁡ G ( 1 − X ) ∂ β \\> 0 {\\\\displaystyle \\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\frac {\\\\partial \\\\ln G\\_{(1-X)}}{\\\\partial \\\\beta }}\\>0} ![{\\\\displaystyle \\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\frac {\\\\partial \\\\ln G\\_{(1-X)}}{\\\\partial \\\\beta }}\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8032acad733eb8da414442492eba553fc57b91ee)\n\nWhile these slopes are indeed positive, the other slopes are negative:\n\n∂ ln ⁡ G X ∂ β , ∂ ln ⁡ G 1 − X ∂ α \\< 0\\. {\\\\displaystyle {\\\\frac {\\\\partial \\\\,\\\\ln G\\_{X}}{\\\\partial \\\\beta }},{\\\\frac {\\\\partial \\\\ln G\\_{1-X}}{\\\\partial \\\\alpha }}\\<0.} ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\,\\\\ln G\\_{X}}{\\\\partial \\\\beta }},{\\\\frac {\\\\partial \\\\ln G\\_{1-X}}{\\\\partial \\\\alpha }}\\<0.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/73da64f6bf33fd6c54b1f14562bb49f420a3d9b5)\n\nThe slopes of the mean and the median with respect to *α* and *β* display similar sign behavior.\n\nFrom the condition that at a maximum, the partial derivative with respect to the shape parameter equals zero, we obtain the following system of coupled [maximum likelihood estimate](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood_estimate \"Maximum likelihood estimate\") equations (for the average log-likelihoods) that needs to be inverted to obtain the (unknown) shape parameter estimates α ^ , β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) in terms of the (known) average of logarithms of the samples *X*1, ..., *XN*:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\nE ^ \\[ ln ⁡ ( X ) \\] \\= ψ ( α ^ ) − ψ ( α ^ \\+ β ^ ) \\= 1 N ∑ i \\= 1 N ln ⁡ X i \\= ln ⁡ G ^ X E ^ \\[ ln ⁡ ( 1 − X ) \\] \\= ψ ( β ^ ) − ψ ( α ^ \\+ β ^ ) \\= 1 N ∑ i \\= 1 N ln ⁡ ( 1 − X i ) \\= ln ⁡ G ^ 1 − X {\\\\displaystyle {\\\\begin{aligned}{\\\\hat {\\\\operatorname {E} }}\\[\\\\ln(X)\\]&=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}=\\\\ln {\\\\hat {G}}\\_{X}\\\\\\\\{\\\\hat {\\\\operatorname {E} }}\\[\\\\ln(1-X)\\]&=\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})=\\\\ln {\\\\hat {G}}\\_{1-X}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\hat {\\\\operatorname {E} }}\\[\\\\ln(X)\\]&=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}=\\\\ln {\\\\hat {G}}\\_{X}\\\\\\\\{\\\\hat {\\\\operatorname {E} }}\\[\\\\ln(1-X)\\]&=\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})=\\\\ln {\\\\hat {G}}\\_{1-X}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42f099a4869ce2d200dc80c7675a237caae021e6)\n\nwhere we recognize log ⁡ G ^ X {\\\\displaystyle \\\\log {\\\\hat {G}}\\_{X}} ![{\\\\displaystyle \\\\log {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c12392eb5835891720681a6588ad3f5de311c23e) as the logarithm of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") and log ⁡ G ^ 1 − X {\\\\displaystyle \\\\log {\\\\hat {G}}\\_{1-X}} ![{\\\\displaystyle \\\\log {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f5d621e56a0238c0b3ae087ae15061fbd94a201c) as the logarithm of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") based on (1 − *X*), the mirror-image of *X*. For α ^ \\= β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c8c3e8e6b17efa205ed99e5cdb6b9c673f0c1cd4), it follows that G ^ X \\= G ^ 1 − X {\\\\displaystyle {\\\\hat {G}}\\_{X}={\\\\hat {G}}\\_{1-X}} ![{\\\\displaystyle {\\\\hat {G}}\\_{X}={\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5159708756a9966cd39e45bd9256d1a613f9440b).\n\nG ^ X \\= ∏ i \\= 1 N ( X i ) 1 \/ N G ^ 1 − X \\= ∏ i \\= 1 N ( 1 − X i ) 1 \/ N {\\\\displaystyle {\\\\begin{aligned}{\\\\hat {G}}\\_{X}&=\\\\prod \\_{i=1}^{N}(X\\_{i})^{1\/N}\\\\\\\\{\\\\hat {G}}\\_{1-X}&=\\\\prod \\_{i=1}^{N}(1-X\\_{i})^{1\/N}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\hat {G}}\\_{X}&=\\\\prod \\_{i=1}^{N}(X\\_{i})^{1\/N}\\\\\\\\{\\\\hat {G}}\\_{1-X}&=\\\\prod \\_{i=1}^{N}(1-X\\_{i})^{1\/N}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/77612533161d151c46ed38d52609f9ca1ddaab44)\n\nThese coupled equations containing [digamma functions](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\") of the shape parameter estimates α ^ , β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) must be solved by numerical methods as done, for example, by Beckman et al.[\\[47\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-47) Gnanadesikan et al. give numerical solutions for a few cases.[\\[48\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-48) [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) suggest that for \"not too small\" shape parameter estimates α ^ , β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07), the logarithmic approximation to the digamma function ψ ( α ^ ) ≈ ln ⁡ ( α ^ − 1 2 ) {\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})\\\\approx \\\\ln({\\\\hat {\\\\alpha }}-{\\\\tfrac {1}{2}})} ![{\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})\\\\approx \\\\ln({\\\\hat {\\\\alpha }}-{\\\\tfrac {1}{2}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a3ec1442b30cbf1dc905e33ad571d2e53326bba1) may be used to obtain initial values for an iterative solution, since the equations resulting from this approximation can be solved exactly:\n\nln ⁡ α ^ − 1 2 α ^ \\+ β ^ − 1 2 ≈ ln ⁡ G ^ X {\\\\displaystyle \\\\ln {\\\\frac {{\\\\hat {\\\\alpha }}-{\\\\frac {1}{2}}}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}}\\\\approx \\\\ln {\\\\hat {G}}\\_{X}} ![{\\\\displaystyle \\\\ln {\\\\frac {{\\\\hat {\\\\alpha }}-{\\\\frac {1}{2}}}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}}\\\\approx \\\\ln {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5aa1c1d36d619455ea3d54bf713efe79ebeafea1) ln ⁡ β ^ − 1 2 α ^ \\+ β ^ − 1 2 ≈ ln ⁡ G ^ 1 − X {\\\\displaystyle \\\\ln {\\\\frac {{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}}\\\\approx \\\\ln {\\\\hat {G}}\\_{1-X}} ![{\\\\displaystyle \\\\ln {\\\\frac {{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}}\\\\approx \\\\ln {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ab9f6dd9b841f20d8cbba1c6c55709ce08905e8)\n\nwhich leads to the following solution for the initial values (of the estimate shape parameters in terms of the sample geometric means) for an iterative solution:\n\nα ^ ≈ 1 2 \\+ G ^ X 2 ( 1 − G ^ X − G ^ 1 − X ) if α ^ \\> 1 {\\\\displaystyle {\\\\hat {\\\\alpha }}\\\\approx {\\\\frac {1}{2}}+{\\\\frac {{\\\\hat {G}}\\_{X}}{2\\\\left(1-{\\\\hat {G}}\\_{X}-{\\\\hat {G}}\\_{1-X}\\\\right)}}{\\\\text{ if }}{\\\\hat {\\\\alpha }}\\>1} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\\\approx {\\\\frac {1}{2}}+{\\\\frac {{\\\\hat {G}}\\_{X}}{2\\\\left(1-{\\\\hat {G}}\\_{X}-{\\\\hat {G}}\\_{1-X}\\\\right)}}{\\\\text{ if }}{\\\\hat {\\\\alpha }}\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/983d617675f31cf20c27ac710259e57964c33c32) β ^ ≈ 1 2 \\+ G ^ 1 − X 2 ( 1 − G ^ X − G ^ 1 − X ) if β ^ \\> 1 {\\\\displaystyle {\\\\hat {\\\\beta }}\\\\approx {\\\\frac {1}{2}}+{\\\\frac {{\\\\hat {G}}\\_{1-X}}{2\\\\left(1-{\\\\hat {G}}\\_{X}-{\\\\hat {G}}\\_{1-X}\\\\right)}}{\\\\text{ if }}{\\\\hat {\\\\beta }}\\>1} ![{\\\\displaystyle {\\\\hat {\\\\beta }}\\\\approx {\\\\frac {1}{2}}+{\\\\frac {{\\\\hat {G}}\\_{1-X}}{2\\\\left(1-{\\\\hat {G}}\\_{X}-{\\\\hat {G}}\\_{1-X}\\\\right)}}{\\\\text{ if }}{\\\\hat {\\\\beta }}\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/540effc6d6fe123a575d06e5502edc57188d8afa)\n\nAlternatively, the estimates provided by the method of moments can instead be used as initial values for an iterative solution of the maximum likelihood coupled equations in terms of the digamma functions.\n\nWhen the distribution is required over a known interval other than \\[0, 1\\] with random variable *X*, say \\[*a*, *c*\\] with random variable *Y*, then replace ln(*Xi*) in the first equation with\n\nln ⁡ Y i − a c − a , {\\\\displaystyle \\\\ln {\\\\frac {Y\\_{i}-a}{c-a}},} ![{\\\\displaystyle \\\\ln {\\\\frac {Y\\_{i}-a}{c-a}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6321d786e8900bc2fe0be4b9eca1e856fe524633)\n\nand replace ln(1−*Xi*) in the second equation with\n\nln ⁡ c − Y i c − a {\\\\displaystyle \\\\ln {\\\\frac {c-Y\\_{i}}{c-a}}} ![{\\\\displaystyle \\\\ln {\\\\frac {c-Y\\_{i}}{c-a}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f56e30b510f8089a8d4fe6c41fca7ec41e6cb056)\n\n(see \"Alternative parametrizations, four parameters\" section below).\n\nIf one of the shape parameters is known, the problem is considerably simplified. The following [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation can be used to solve for the unknown shape parameter (for skewed cases such that α ^ ≠ β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }}\\\\neq {\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\\\neq {\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/88de4dc6f2131efeb9861f9db76d8969f4d87db8), otherwise, if symmetric, both -equal- parameters are known when one is known):\n\nE ^ \\[ ln ⁡ X 1 − X \\] \\= ψ ( α ^ ) − ψ ( β ^ ) \\= 1 N ∑ i \\= 1 N ln ⁡ X i 1 − X i \\= ln ⁡ G ^ X − ln ⁡ G ^ 1 − X {\\\\displaystyle {\\\\hat {\\\\operatorname {E} }}\\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}}{1-X\\_{i}}}=\\\\ln {\\\\hat {G}}\\_{X}-\\\\ln {\\\\hat {G}}\\_{1-X}} ![{\\\\displaystyle {\\\\hat {\\\\operatorname {E} }}\\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}}{1-X\\_{i}}}=\\\\ln {\\\\hat {G}}\\_{X}-\\\\ln {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ca41c85f9e8cd1b427e96fd209fea0522c951d65)\n\nThis [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation is the logarithm of the transformation that divides the variable *X* by its mirror-image (*X*\/(1 - *X*) resulting in the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")) with support \\[0, +∞). As previously discussed in the section \"Moments of logarithmically transformed random variables,\" the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation ln ⁡ X 1 − X {\\\\displaystyle \\\\ln {\\\\frac {X}{1-X}}} ![{\\\\displaystyle \\\\ln {\\\\frac {X}{1-X}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/90cf9c0de659980f879076aa348d83a41ab985b0), studied by Johnson,[\\[25\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JohnsonLogInv-25) extends the finite support \\[0, 1\\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞).\n\nIf, for example, β ^ {\\\\displaystyle {\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/efdb50e00928e4013750a476dab75eeb3cbd5799) is known, the unknown parameter α ^ {\\\\displaystyle {\\\\hat {\\\\alpha }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682d943d1947245b587f282aba6c88f0870fb302) can be obtained in terms of the inverse[\\[49\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-invpsi.m-49) digamma function of the right hand side of this equation:\n\nψ ( α ^ ) \\= 1 N ∑ i \\= 1 N ln ⁡ X i 1 − X i \\+ ψ ( β ^ ) {\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}}{1-X\\_{i}}}+\\\\psi ({\\\\hat {\\\\beta }})} ![{\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}}{1-X\\_{i}}}+\\\\psi ({\\\\hat {\\\\beta }})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/29fdbc8a523905ff0f3b6f20200f8e5b4c258cad) α ^ \\= ψ − 1 ( ln ⁡ G ^ X − ln ⁡ G ^ ( 1 − X ) \\+ ψ ( β ^ ) ) {\\\\displaystyle {\\\\hat {\\\\alpha }}=\\\\psi ^{-1}\\\\left(\\\\ln {\\\\hat {G}}\\_{X}-\\\\ln {\\\\hat {G}}\\_{(1-X)}+\\\\psi ({\\\\hat {\\\\beta }})\\\\right)} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}=\\\\psi ^{-1}\\\\left(\\\\ln {\\\\hat {G}}\\_{X}-\\\\ln {\\\\hat {G}}\\_{(1-X)}+\\\\psi ({\\\\hat {\\\\beta }})\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/73f0a716979c28c7040b8c5595e3d4186d9aec0a)\n\nIn particular, if one of the shape parameters has a value of unity, for example for β ^ \\= 1 {\\\\displaystyle {\\\\hat {\\\\beta }}=1} ![{\\\\displaystyle {\\\\hat {\\\\beta }}=1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a965585e069f798d68c78b5088a53097d4a338b7) (the power function distribution with bounded support \\[0,1\\]), using the identity ψ(*x* + 1) = ψ(*x*) + 1\/*x* in the equation ψ ( α ^ ) − ψ ( α ^ \\+ β ^ ) \\= ln ⁡ G ^ X {\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{X}} ![{\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f58f925d2ede5013a91f3206874b2cabc827cb30), the maximum likelihood estimator for the unknown parameter α ^ {\\\\displaystyle {\\\\hat {\\\\alpha }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682d943d1947245b587f282aba6c88f0870fb302) is,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) exactly:\n\nα ^ \\= − 1 1 N ∑ i \\= 1 N ln ⁡ X i \\= − 1 ln ⁡ G ^ X {\\\\displaystyle {\\\\hat {\\\\alpha }}=-{\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}}}=-{\\\\frac {1}{\\\\ln {\\\\hat {G}}\\_{X}}}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}=-{\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}}}=-{\\\\frac {1}{\\\\ln {\\\\hat {G}}\\_{X}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/83d2c1dcfb9a5fee567dce7e354517f77ffa0c2f)\n\nThe beta has support \\[0, 1\\], therefore G ^ X \\< 1 {\\\\displaystyle {\\\\hat {G}}\\_{X}\\<1} ![{\\\\displaystyle {\\\\hat {G}}\\_{X}\\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/67fd685a3f6f53b538ee1a4e8a3cb21988b806c2), and hence ( − ln ⁡ G ^ X ) \\> 0 {\\\\displaystyle (-\\\\ln {\\\\hat {G}}\\_{X})\\>0} ![{\\\\displaystyle (-\\\\ln {\\\\hat {G}}\\_{X})\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1e33bf71520d5c05a66871474c69a695632be63b), and therefore α ^ \\> 0\\. {\\\\displaystyle {\\\\hat {\\\\alpha }}\\>0.} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\>0.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4717b1e2f228d1dcd6ef64e903f377faa8075e44)\n\nIn conclusion, the maximum likelihood estimates of the shape parameters of a beta distribution are (in general) a complicated function of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\"), and of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") based on (1−*X*)), the mirror-image of *X*. One may ask, if the variance (in addition to the mean) is necessary to estimate two shape parameters with the method of moments, why is the (logarithmic or geometric) variance not necessary to estimate two shape parameters with the maximum likelihood method, for which only the geometric means suffice? The answer is because the mean does not provide as much information as the geometric mean. For a beta distribution with equal shape parameters *α* = *β*, the mean is exactly 1\/2, regardless of the value of the shape parameters, and therefore regardless of the value of the statistical dispersion (the variance). On the other hand, the geometric mean of a beta distribution with equal shape parameters *α* = *β*, depends on the value of the shape parameters, and therefore it contains more information. Also, the geometric mean of a beta distribution does not satisfy the symmetry conditions satisfied by the mean, therefore, by employing both the geometric mean based on *X* and geometric mean based on (1 − *X*), the maximum likelihood method is able to provide best estimates for both parameters *α* = *β*, without need of employing the variance.\n\nOne can express the joint log likelihood per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations in terms of the *[sufficient statistics](https:\/\/en.wikipedia.org\/wiki\/Sufficient_statistic \"Sufficient statistic\")* (the sample geometric means) as follows:\n\nln ⁡ L ( α , β ∣ X ) N \\= ( α − 1 ) ln ⁡ G ^ X \\+ ( β − 1 ) ln ⁡ G ^ ( 1 − X ) − ln ⁡ B ( α , β ) . {\\\\displaystyle {\\\\frac {\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}=(\\\\alpha -1)\\\\ln {\\\\hat {G}}\\_{X}+(\\\\beta -1)\\\\ln {\\\\hat {G}}\\_{(1-X)}-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta ).} ![{\\\\displaystyle {\\\\frac {\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}=(\\\\alpha -1)\\\\ln {\\\\hat {G}}\\_{X}+(\\\\beta -1)\\\\ln {\\\\hat {G}}\\_{(1-X)}-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta ).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3d51aaa07c41f9f5c49c303cc059ae1aeb54489d)\n\nWe can plot the joint log likelihood per *N* observations for fixed values of the sample geometric means to see the behavior of the likelihood function as a function of the shape parameters α and β. In such a plot, the shape parameter estimators α ^ , β ^ {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) correspond to the maxima of the likelihood function. See the accompanying graph that shows that all the likelihood functions intersect at α = β = 1, which corresponds to the values of the shape parameters that give the maximum entropy (the maximum entropy occurs for shape parameters equal to unity: the uniform distribution). It is evident from the plot that the likelihood function gives sharp peaks for values of the shape parameter estimators close to zero, but that for values of the shape parameters estimators greater than one, the likelihood function becomes quite flat, with less defined peaks. Obviously, the maximum likelihood parameter estimation method for the beta distribution becomes less acceptable for larger values of the shape parameter estimators, as the uncertainty in the peak definition increases with the value of the shape parameter estimators. One can arrive at the same conclusion by noticing that the expression for the curvature of the likelihood function is in terms of the geometric variances\n\n∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ α 2 \\= − var ⁡ \\[ ln ⁡ X \\] {\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha ^{2}}}=-\\\\operatorname {var} \\[\\\\ln X\\]} ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha ^{2}}}=-\\\\operatorname {var} \\[\\\\ln X\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/517a09a3b13d22689a3e1e400cbcab3af08e130c) ∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ β 2 \\= − var ⁡ \\[ ln ⁡ ( 1 − X ) \\] {\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta ^{2}}}=-\\\\operatorname {var} \\[\\\\ln(1-X)\\]} ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta ^{2}}}=-\\\\operatorname {var} \\[\\\\ln(1-X)\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1715f50fc408ceba307005a3f9e404520edd5a20)\n\nThese variances (and therefore the curvatures) are much larger for small values of the shape parameter α and β. However, for shape parameter values α, β \\> 1, the variances (and therefore the curvatures) flatten out. Equivalently, this result follows from the [Cramér–Rao bound](https:\/\/en.wikipedia.org\/wiki\/Cram%C3%A9r%E2%80%93Rao_bound \"Cramér–Rao bound\"), since the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\") matrix components for the beta distribution are these logarithmic variances. The [Cramér–Rao bound](https:\/\/en.wikipedia.org\/wiki\/Cram%C3%A9r%E2%80%93Rao_bound \"Cramér–Rao bound\") states that the [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of any *unbiased* estimator α ^ {\\\\displaystyle {\\\\hat {\\\\alpha }}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682d943d1947245b587f282aba6c88f0870fb302) of α is bounded by the [reciprocal](https:\/\/en.wikipedia.org\/wiki\/Multiplicative_inverse \"Multiplicative inverse\") of the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\"):\n\nv a r ( α ^ ) ≥ 1 var ⁡ \\[ ln ⁡ X \\] ≥ 1 ψ 1 ( α ^ ) − ψ 1 ( α ^ \\+ β ^ ) {\\\\displaystyle \\\\mathrm {var} ({\\\\hat {\\\\alpha }})\\\\geq {\\\\frac {1}{\\\\operatorname {var} \\[\\\\ln X\\]}}\\\\geq {\\\\frac {1}{\\\\psi \\_{1}({\\\\hat {\\\\alpha }})-\\\\psi \\_{1}({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}}} ![{\\\\displaystyle \\\\mathrm {var} ({\\\\hat {\\\\alpha }})\\\\geq {\\\\frac {1}{\\\\operatorname {var} \\[\\\\ln X\\]}}\\\\geq {\\\\frac {1}{\\\\psi \\_{1}({\\\\hat {\\\\alpha }})-\\\\psi \\_{1}({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/744f1e8421337ed7a2e6cae00fccec1eaf68e3dc) v a r ( β ^ ) ≥ 1 var ⁡ \\[ ln ⁡ ( 1 − X ) \\] ≥ 1 ψ 1 ( β ^ ) − ψ 1 ( α ^ \\+ β ^ ) {\\\\displaystyle \\\\mathrm {var} ({\\\\hat {\\\\beta }})\\\\geq {\\\\frac {1}{\\\\operatorname {var} \\[\\\\ln(1-X)\\]}}\\\\geq {\\\\frac {1}{\\\\psi \\_{1}({\\\\hat {\\\\beta }})-\\\\psi \\_{1}({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}}} ![{\\\\displaystyle \\\\mathrm {var} ({\\\\hat {\\\\beta }})\\\\geq {\\\\frac {1}{\\\\operatorname {var} \\[\\\\ln(1-X)\\]}}\\\\geq {\\\\frac {1}{\\\\psi \\_{1}({\\\\hat {\\\\beta }})-\\\\psi \\_{1}({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f01dc5c3eb614cfa71a500fd34f3fa430c183c76)\n\nso the variance of the estimators increases with increasing α and β, as the logarithmic variances decrease.\n\nAlso one can express the joint log likelihood per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations in terms of the [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\") expressions for the logarithms of the sample geometric means as follows:\n\nln L ( α , β ∣ X ) N \\= ( α − 1 ) ( ψ ( α ^ ) − ψ ( α ^ \\+ β ^ ) ) \\+ ( β − 1 ) ( ψ ( β ^ ) − ψ ( α ^ \\+ β ^ ) ) − ln ⁡ B ( α , β ) {\\\\displaystyle {\\\\frac {\\\\ln \\\\,{\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}=(\\\\alpha -1)(\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}))+(\\\\beta -1)(\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}))-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )} ![{\\\\displaystyle {\\\\frac {\\\\ln \\\\,{\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}=(\\\\alpha -1)(\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}))+(\\\\beta -1)(\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}))-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/25a4676c5e903b8ecd627fe6effa892db4341aa0)\n\nthis expression is identical to the negative of the cross-entropy (see section on \"Quantities of information (entropy)\"). Therefore, finding the maximum of the joint log likelihood of the shape parameters, per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations, is identical to finding the minimum of the cross-entropy for the beta distribution, as a function of the shape parameters.\n\nln ⁡ L ( α , β ∣ X ) N \\= − H \\= − h − D K L \\= − ln ⁡ B ( α , β ) \\+ ( α − 1 ) ψ ( α ^ ) \\+ ( β − 1 ) ψ ( β ^ ) − ( α \\+ β − 2 ) ψ ( α ^ \\+ β ^ ) {\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}&=-H=-h-D\\_{\\\\mathrm {KL} }\\\\\\\\&=-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )+(\\\\alpha -1)\\\\psi ({\\\\hat {\\\\alpha }})+(\\\\beta -1)\\\\psi ({\\\\hat {\\\\beta }})-(\\\\alpha +\\\\beta -2)\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}&=-H=-h-D\\_{\\\\mathrm {KL} }\\\\\\\\&=-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )+(\\\\alpha -1)\\\\psi ({\\\\hat {\\\\alpha }})+(\\\\beta -1)\\\\psi ({\\\\hat {\\\\beta }})-(\\\\alpha +\\\\beta -2)\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b0a1878d4956258885b927cbfd24f522b8ebc9e5)\n\nwith the cross-entropy defined as follows:\n\nH \\= ∫ 0 1 − f ( X ; α ^ , β ^ ) ln ⁡ ( f ( X ; α , β ) ) d X {\\\\displaystyle H=\\\\int \\_{0}^{1}-f(X;{\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }})\\\\ln(f(X;\\\\alpha ,\\\\beta ))\\\\,{\\\\rm {d}}X} ![{\\\\displaystyle H=\\\\int \\_{0}^{1}-f(X;{\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }})\\\\ln(f(X;\\\\alpha ,\\\\beta ))\\\\,{\\\\rm {d}}X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1dacaf170c607c27fac660f26875d1e7915dffcd)\n\n##### Four unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=56 \"Edit section: Four unknown parameters\")\\]\n\nThe procedure is similar to the one followed in the two unknown parameter case. If *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters, the joint log likelihood function for *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\nln ⁡ L ( α , β , a , c ∣ Y ) \\= ∑ i \\= 1 N ln L i ( α , β , a , c ∣ Y i ) \\= ∑ i \\= 1 N ln ⁡ f ( Y i ; α , β , a , c ) \\= ∑ i \\= 1 N ln ⁡ ( Y i − a ) α − 1 ( c − Y i ) β − 1 ( c − a ) α \\+ β − 1 B ( α , β ) \\= ( α − 1 ) ∑ i \\= 1 N ln ⁡ ( Y i − a ) \\+ ( β − 1 ) ∑ i \\= 1 N ln ⁡ ( c − Y i ) − N ln ⁡ B ( α , β ) − N ( α \\+ β − 1 ) ln ⁡ ( c − a ) {\\\\displaystyle {\\\\begin{aligned}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)&=\\\\sum \\_{i=1}^{N}\\\\ln \\\\,{\\\\mathcal {L}}\\_{i}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y\\_{i})\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln f(Y\\_{i};\\\\alpha ,\\\\beta ,a,c)\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {(Y\\_{i}-a)^{\\\\alpha -1}(c-Y\\_{i})^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\\\\\&=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-N(\\\\alpha +\\\\beta -1)\\\\ln(c-a)\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)&=\\\\sum \\_{i=1}^{N}\\\\ln \\\\,{\\\\mathcal {L}}\\_{i}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y\\_{i})\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln f(Y\\_{i};\\\\alpha ,\\\\beta ,a,c)\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {(Y\\_{i}-a)^{\\\\alpha -1}(c-Y\\_{i})^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\\\\\&=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-N(\\\\alpha +\\\\beta -1)\\\\ln(c-a)\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/568e233a41b47ffbcb62039f7f1018e98bebddfb)\n\nFinding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimator of the shape parameters:\n\n∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α \\= ∑ i \\= 1 N ln ⁡ ( Y i − a ) − N ( − ψ ( α \\+ β ) \\+ ψ ( α ) ) − N ln ⁡ ( c − a ) \\= 0 {\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha }}=\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)-N(-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\alpha ))-N\\\\ln(c-a)=0} ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha }}=\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)-N(-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\alpha ))-N\\\\ln(c-a)=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dec8ea829b8ce52d65edcf18375e9843f34f3bab) ∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β \\= ∑ i \\= 1 N ln ⁡ ( c − Y i ) − N ( − ψ ( α \\+ β ) \\+ ψ ( β ) ) − N ln ⁡ ( c − a ) \\= 0 {\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta }}=\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-N(-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\beta ))-N\\\\ln(c-a)=0} ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta }}=\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-N(-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\beta ))-N\\\\ln(c-a)=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/decc8da83539ce590981c48b27edc90b521d7d0b) ∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a \\= − ( α − 1 ) ∑ i \\= 1 N 1 Y i − a \\+ N ( α \\+ β − 1 ) 1 c − a \\= 0 {\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a}}=-(\\\\alpha -1)\\\\sum \\_{i=1}^{N}{\\\\frac {1}{Y\\_{i}-a}}\\\\,+N(\\\\alpha +\\\\beta -1){\\\\frac {1}{c-a}}=0} ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a}}=-(\\\\alpha -1)\\\\sum \\_{i=1}^{N}{\\\\frac {1}{Y\\_{i}-a}}\\\\,+N(\\\\alpha +\\\\beta -1){\\\\frac {1}{c-a}}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/025a6f1a4e3fdcf2f6740a331d8f8b3985fde351) ∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ c \\= ( β − 1 ) ∑ i \\= 1 N 1 c − Y i − N ( α \\+ β − 1 ) 1 c − a \\= 0 {\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c}}=(\\\\beta -1)\\\\sum \\_{i=1}^{N}{\\\\frac {1}{c-Y\\_{i}}}\\\\,-N(\\\\alpha +\\\\beta -1){\\\\frac {1}{c-a}}=0} ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c}}=(\\\\beta -1)\\\\sum \\_{i=1}^{N}{\\\\frac {1}{c-Y\\_{i}}}\\\\,-N(\\\\alpha +\\\\beta -1){\\\\frac {1}{c-a}}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/976c431dd6db2b070e95ab4c7a685e3dc9039db2)\n\nthese equations can be re-arranged as the following system of four coupled equations (the first two equations are geometric means and the second two equations are the harmonic means) in terms of the maximum likelihood estimates for the four parameters α ^ , β ^ , a ^ , c ^ {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }},{\\\\hat {a}},{\\\\hat {c}}} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }},{\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8116b37df2fff6248cb3bce7dd137af10ed8e5ab):\n\n1 N ∑ i \\= 1 N ln ⁡ Y i − a ^ c ^ − a ^ \\= ψ ( α ^ ) − ψ ( α ^ \\+ β ^ ) \\= ln ⁡ G ^ X {\\\\displaystyle {\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {Y\\_{i}-{\\\\hat {a}}}{{\\\\hat {c}}-{\\\\hat {a}}}}=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{X}} ![{\\\\displaystyle {\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {Y\\_{i}-{\\\\hat {a}}}{{\\\\hat {c}}-{\\\\hat {a}}}}=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3d025e03965c2a5d2c23a9e493d7314c3f2dfca1) 1 N ∑ i \\= 1 N ln ⁡ c ^ − Y i c ^ − a ^ \\= ψ ( β ^ ) − ψ ( α ^ \\+ β ^ ) \\= ln ⁡ G ^ 1 − X {\\\\displaystyle {\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {{\\\\hat {c}}-Y\\_{i}}{{\\\\hat {c}}-{\\\\hat {a}}}}=\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{1-X}} ![{\\\\displaystyle {\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {{\\\\hat {c}}-Y\\_{i}}{{\\\\hat {c}}-{\\\\hat {a}}}}=\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b2851705444792cb8f4089f5dea6443823e5a15b) 1 1 N ∑ i \\= 1 N c ^ − a ^ Y i − a ^ \\= α ^ − 1 α ^ \\+ β ^ − 1 \\= H ^ X {\\\\displaystyle {\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}{\\\\frac {{\\\\hat {c}}-{\\\\hat {a}}}{Y\\_{i}-{\\\\hat {a}}}}}}={\\\\frac {{\\\\hat {\\\\alpha }}-1}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-1}}={\\\\hat {H}}\\_{X}} ![{\\\\displaystyle {\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}{\\\\frac {{\\\\hat {c}}-{\\\\hat {a}}}{Y\\_{i}-{\\\\hat {a}}}}}}={\\\\frac {{\\\\hat {\\\\alpha }}-1}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-1}}={\\\\hat {H}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3918a55675f88d637c3c1d91ac5743155433350a) 1 1 N ∑ i \\= 1 N c ^ − a ^ c ^ − Y i \\= β ^ − 1 α ^ \\+ β ^ − 1 \\= H ^ 1 − X {\\\\displaystyle {\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}{\\\\frac {{\\\\hat {c}}-{\\\\hat {a}}}{{\\\\hat {c}}-Y\\_{i}}}}}={\\\\frac {{\\\\hat {\\\\beta }}-1}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-1}}={\\\\hat {H}}\\_{1-X}} ![{\\\\displaystyle {\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}{\\\\frac {{\\\\hat {c}}-{\\\\hat {a}}}{{\\\\hat {c}}-Y\\_{i}}}}}={\\\\frac {{\\\\hat {\\\\beta }}-1}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-1}}={\\\\hat {H}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0d929f9e0acad7d24ac0112cd9fd865eb67f64ee)\n\nwith sample geometric means:\n\nG ^ X \\= ∏ i \\= 1 N ( Y i − a ^ c ^ − a ^ ) 1 N {\\\\displaystyle {\\\\hat {G}}\\_{X}=\\\\prod \\_{i=1}^{N}\\\\left({\\\\frac {Y\\_{i}-{\\\\hat {a}}}{{\\\\hat {c}}-{\\\\hat {a}}}}\\\\right)^{\\\\frac {1}{N}}} ![{\\\\displaystyle {\\\\hat {G}}\\_{X}=\\\\prod \\_{i=1}^{N}\\\\left({\\\\frac {Y\\_{i}-{\\\\hat {a}}}{{\\\\hat {c}}-{\\\\hat {a}}}}\\\\right)^{\\\\frac {1}{N}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/24c05fd2cdd1acc7e2370a6c9155f0658e8bfc73) G ^ ( 1 − X ) \\= ∏ i \\= 1 N ( c ^ − Y i c ^ − a ^ ) 1 N {\\\\displaystyle {\\\\hat {G}}\\_{(1-X)}=\\\\prod \\_{i=1}^{N}\\\\left({\\\\frac {{\\\\hat {c}}-Y\\_{i}}{{\\\\hat {c}}-{\\\\hat {a}}}}\\\\right)^{\\\\frac {1}{N}}} ![{\\\\displaystyle {\\\\hat {G}}\\_{(1-X)}=\\\\prod \\_{i=1}^{N}\\\\left({\\\\frac {{\\\\hat {c}}-Y\\_{i}}{{\\\\hat {c}}-{\\\\hat {a}}}}\\\\right)^{\\\\frac {1}{N}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/66b3e17f4fea833bd98721a8bcef4e5769e8892f)\n\nThe parameters a ^ , c ^ {\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}} ![{\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) are embedded inside the geometric mean expressions in a nonlinear way (to the power 1\/*N*). This precludes, in general, a closed form solution, even for an initial value approximation for iteration purposes. One alternative is to use as initial values for iteration the values obtained from the method of moments solution for the four parameter case. Furthermore, the expressions for the harmonic means are well-defined only for α ^ , β ^ \\> 1 {\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}\\>1} ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8cf28629a51347b164038f8ed1561affcfc32a08), which precludes a maximum likelihood solution for shape parameters less than unity in the four-parameter case. Fisher's information matrix for the four parameter case is [positive-definite](https:\/\/en.wikipedia.org\/wiki\/Positive-definite_matrix \"Positive-definite matrix\") only for α, β \\> 2 (for further discussion, see section on Fisher information matrix, four parameter case), for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. The following Fisher information components (that represent the expectations of the curvature of the log likelihood function) have [singularities](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") at the following values:\n\nα \\= 2 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a 2 \\] \\= I a , a {\\\\displaystyle \\\\alpha =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a^{2}}}\\\\right\\]={\\\\mathcal {I}}\\_{a,a}} ![{\\\\displaystyle \\\\alpha =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a^{2}}}\\\\right\\]={\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53538160a2404a5d7b74ae2033fbbb2dbc1045eb) β \\= 2 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ c 2 \\] \\= I c , c {\\\\displaystyle \\\\beta =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c^{2}}}\\\\right\\]={\\\\mathcal {I}}\\_{c,c}} ![{\\\\displaystyle \\\\beta =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c^{2}}}\\\\right\\]={\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ee2c6ecfcafe60e54799ab4a16451cf478d65f7d) α \\= 2 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ a \\] \\= I α , a {\\\\displaystyle \\\\alpha =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\partial a}}\\\\right\\]={\\\\mathcal {I}}\\_{\\\\alpha ,a}} ![{\\\\displaystyle \\\\alpha =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\partial a}}\\\\right\\]={\\\\mathcal {I}}\\_{\\\\alpha ,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8b8af544d7c5e0cc278aa725daba7f3de2f70d31) β \\= 1 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β ∂ c \\] \\= I β , c {\\\\displaystyle \\\\beta =1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\partial c}}\\\\right\\]={\\\\mathcal {I}}\\_{\\\\beta ,c}} ![{\\\\displaystyle \\\\beta =1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\partial c}}\\\\right\\]={\\\\mathcal {I}}\\_{\\\\beta ,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2f87b32b39997964dcbc95bc64c1364e6832db1d)\n\n(for further discussion see section on Fisher information matrix). Thus, it is not possible to strictly carry on the maximum likelihood estimation for some well known distributions belonging to the four-parameter beta distribution family, like the [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Continuous_uniform_distribution \"Continuous uniform distribution\") (Beta(1, 1, *a*, *c*)), and the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") (Beta(1\/2, 1\/2, *a*, *c*)). [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) ignore the equations for the harmonic means and instead suggest \"If a and c are unknown, and maximum likelihood estimators of *a*, *c*, α and β are required, the above procedure (for the two unknown parameter case, with *X* transformed as *X* = (*Y* − *a*)\/(*c* − *a*)) can be repeated using a succession of trial values of *a* and *c*, until the pair (*a*, *c*) for which maximum likelihood (given *a* and *c*) is as great as possible, is attained\" (where, for the purpose of clarity, their notation for the parameters has been translated into the present notation).\n\n#### Fisher information matrix\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=57 \"Edit section: Fisher information matrix\")\\]\n\nLet a random variable X have a probability density *f*(*x*;*α*). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\") is called the [score](https:\/\/en.wikipedia.org\/wiki\/Score_\\(statistics\\) \"Score (statistics)\"). The second moment of the score is called the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\"):\n\nI ( α ) \\= E ⁡ \\[ ( ∂ ∂ α ln ⁡ L ( α ∣ X ) ) 2 \\] , {\\\\displaystyle {\\\\mathcal {I}}(\\\\alpha )=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {\\\\partial }{\\\\partial \\\\alpha }}\\\\ln {\\\\mathcal {L}}(\\\\alpha \\\\mid X)\\\\right)^{2}\\\\right\\],} ![{\\\\displaystyle {\\\\mathcal {I}}(\\\\alpha )=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {\\\\partial }{\\\\partial \\\\alpha }}\\\\ln {\\\\mathcal {L}}(\\\\alpha \\\\mid X)\\\\right)^{2}\\\\right\\],}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/daec13972d17a073bcd447abfde55a6b0e168720)\n\nThe [expectation](https:\/\/en.wikipedia.org\/wiki\/Expected_value \"Expected value\") of the [score](https:\/\/en.wikipedia.org\/wiki\/Score_\\(statistics\\) \"Score (statistics)\") is zero, therefore the Fisher information is also the second moment centered on the mean of the score: the [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of the score.\n\nIf the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\") is twice differentiable with respect to the parameter α, and under certain regularity conditions,[\\[50\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Silvey-50) then the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes):\n\nI ( α ) \\= − E ⁡ \\[ ∂ 2 ∂ α 2 ln ⁡ L ( α ∣ X ) \\] . {\\\\displaystyle {\\\\mathcal {I}}(\\\\alpha )=-\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial ^{2}}{\\\\partial \\\\alpha ^{2}}}\\\\ln {\\\\mathcal {L}}(\\\\alpha \\\\mid X)\\\\right\\].} ![{\\\\displaystyle {\\\\mathcal {I}}(\\\\alpha )=-\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial ^{2}}{\\\\partial \\\\alpha ^{2}}}\\\\ln {\\\\mathcal {L}}(\\\\alpha \\\\mid X)\\\\right\\].}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3fdd5f6730d5ffb0a5f833c89ba784362322cbc8)\n\nThus, the Fisher information is the negative of the expectation of the second [derivative](https:\/\/en.wikipedia.org\/wiki\/Derivative \"Derivative\") with respect to the parameter α of the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\"). Therefore, Fisher information is a measure of the [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") of the log likelihood function of α. A low [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") (and therefore high [radius of curvature](https:\/\/en.wikipedia.org\/wiki\/Radius_of_curvature_\\(mathematics\\) \"Radius of curvature (mathematics)\")), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") (and therefore low [radius of curvature](https:\/\/en.wikipedia.org\/wiki\/Radius_of_curvature_\\(mathematics\\) \"Radius of curvature (mathematics)\")) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters (\"the observed Fisher information matrix\") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms.[\\[51\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-EdwardsLikelihood-51) The word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators. The [Cramér–Rao bound](https:\/\/en.wikipedia.org\/wiki\/Cram%C3%A9r%E2%80%93Rao_bound \"Cramér–Rao bound\") states that the inverse of the Fisher information is a lower bound on the variance of any [estimator](https:\/\/en.wikipedia.org\/wiki\/Estimator \"Estimator\") of a parameter α:\n\nvar ⁡ \\[ α ^ \\] ≥ 1 I ( α ) . {\\\\displaystyle \\\\operatorname {var} \\[{\\\\hat {\\\\alpha }}\\]\\\\geq {\\\\frac {1}{{\\\\mathcal {I}}(\\\\alpha )}}.} ![{\\\\displaystyle \\\\operatorname {var} \\[{\\\\hat {\\\\alpha }}\\]\\\\geq {\\\\frac {1}{{\\\\mathcal {I}}(\\\\alpha )}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d93d4983a2717258c52eb47d1562e849a3a66c5c)\n\nThe precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two [alternative hypothesis](https:\/\/en.wikipedia.org\/wiki\/Alternative_hypothesis \"Alternative hypothesis\") of a parameter.[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52)\n\nWhen there are *N* parameters\n\n\\[ θ 1 θ 2 ⋮ θ N \\] , {\\\\displaystyle {\\\\begin{bmatrix}\\\\theta \\_{1}\\\\\\\\\\\\theta \\_{2}\\\\\\\\\\\\vdots \\\\\\\\\\\\theta \\_{N}\\\\end{bmatrix}},} ![{\\\\displaystyle {\\\\begin{bmatrix}\\\\theta \\_{1}\\\\\\\\\\\\theta \\_{2}\\\\\\\\\\\\vdots \\\\\\\\\\\\theta \\_{N}\\\\end{bmatrix}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1c28d63a97dada5deb831115cd2588be050e522a)\n\nthen the Fisher information takes the form of an *N*×*N* [positive semidefinite](https:\/\/en.wikipedia.org\/wiki\/Positive_semidefinite_matrix \"Positive semidefinite matrix\") [symmetric matrix](https:\/\/en.wikipedia.org\/wiki\/Symmetric_matrix \"Symmetric matrix\"), the Fisher information matrix, with typical element:\n\n( I ( θ ) ) i , j \\= E ⁡ \\[ ∂ ln ⁡ L ∂ θ i ⋅ ∂ ln ⁡ L ∂ θ j \\] . {\\\\displaystyle ({\\\\mathcal {I}}(\\\\theta ))\\_{i,j}=\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{i}}}\\\\cdot {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{j}}}\\\\right\\].} ![{\\\\displaystyle ({\\\\mathcal {I}}(\\\\theta ))\\_{i,j}=\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{i}}}\\\\cdot {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{j}}}\\\\right\\].}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/94187c0032daae02409e5323c356fae5fdcb73fd)\n\nUnder certain regularity conditions,[\\[50\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Silvey-50) the Fisher Information Matrix may also be written in the following form, which is often more convenient for computation:\n\n( I ( θ ) ) i , j \\= − E ⁡ \\[ ∂ 2 ln ⁡ L ∂ θ i ∂ θ j \\] . {\\\\displaystyle ({\\\\mathcal {I}}(\\\\theta ))\\_{i,j}=-\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{i}\\\\,\\\\partial \\\\theta \\_{j}}}\\\\right\\]\\\\,.} ![{\\\\displaystyle ({\\\\mathcal {I}}(\\\\theta ))\\_{i,j}=-\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{i}\\\\,\\\\partial \\\\theta \\_{j}}}\\\\right\\]\\\\,.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/86df084593df54dd2ed7220a683b9fbc41f230d7)\n\nWith *X*1, ..., *XN* [iid](https:\/\/en.wikipedia.org\/wiki\/Iid \"Iid\") random variables, an *N*\\-dimensional \"box\" can be constructed with sides *X*1, ..., *XN*. Costa and Cover[\\[53\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-CostaCover-53) show that the (Shannon) differential entropy *h*(*X*) is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set.\n\n##### Two parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=58 \"Edit section: Two parameters\")\\]\n\nFor *X*1, ..., *X**N* independent random variables each having a beta distribution parametrized with shape parameters *α* and *β*, the joint log likelihood function for *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\nln ⁡ L ( α , β ∣ X ) \\= ( α − 1 ) ∑ i \\= 1 N ln ⁡ X i \\+ ( β − 1 ) ∑ i \\= 1 N ln ⁡ ( 1 − X i ) − N ln ⁡ B ( α , β ) {\\\\displaystyle \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )} ![{\\\\displaystyle \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/df7eddeed085e8e7d19d2a5336c6503d06111280)\n\ntherefore the joint log likelihood function per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is\n\n1 N ln ⁡ L ( α , β ∣ X ) \\= ( α − 1 ) 1 N ∑ i \\= 1 N ln ⁡ X i \\+ ( β − 1 ) 1 N ∑ i \\= 1 N ln ⁡ ( 1 − X i ) − ln ⁡ B ( α , β ) . {\\\\displaystyle {\\\\frac {1}{N}}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)=(\\\\alpha -1){\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1){\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-\\\\,\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta ).} ![{\\\\displaystyle {\\\\frac {1}{N}}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)=(\\\\alpha -1){\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1){\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-\\\\,\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta ).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6997aeb66fcf95e5d64d65ca0aab679a30a94f00)\n\nFor the two parameter case, the Fisher information has 4 components: 2 diagonal and 2 off-diagonal. Since the Fisher information matrix is symmetric, one of these off diagonal components is independent. Therefore, the Fisher information matrix has 3 independent components (2 diagonal and 1 off diagonal).\n\nAryal and Nadarajah[\\[54\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four-parameter case, from which the two parameter case can be obtained as follows:\n\n− ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α 2 \\= var ⁡ \\[ ln ⁡ ( X ) \\] \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) \\= I α , α \\= E ⁡ \\[ − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α 2 \\] \\= ln ⁡ var G X {\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\alpha ^{2}}}=\\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\alpha ^{2}}}\\\\right\\]=\\\\ln \\\\operatorname {var} \\_{GX}} ![{\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\alpha ^{2}}}=\\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\alpha ^{2}}}\\\\right\\]=\\\\ln \\\\operatorname {var} \\_{GX}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c90003d5bd2f6d2bcfe2c788585689726b4b7e36) − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ β 2 \\= var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) \\= I β , β \\= E ⁡ \\[ − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ β 2 \\] \\= ln ⁡ var G ( 1 − X ) {\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\beta ^{2}}}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\beta ^{2}}}\\\\right\\]=\\\\ln \\\\operatorname {var} \\_{G(1-X)}} ![{\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\beta ^{2}}}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\beta ^{2}}}\\\\right\\]=\\\\ln \\\\operatorname {var} \\_{G(1-X)}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b59bcc31f14f1f4b3b07bd92a66426bb6ac126b1) − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α ∂ β \\= cov ⁡ \\[ ln ⁡ X , ln ⁡ ( 1 − X ) \\] \\= − ψ 1 ( α \\+ β ) \\= I α , β \\= E ⁡ \\[ − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α ∂ β \\] \\= ln ⁡ cov G X , ( 1 − X ) {\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}\\\\right\\]=\\\\ln \\\\operatorname {cov} \\_{G{X,(1-X)}}} ![{\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}\\\\right\\]=\\\\ln \\\\operatorname {cov} \\_{G{X,(1-X)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a6c3268da3b23c062ce981ab02d4c454a0267365)\n\nSince the Fisher information matrix is symmetric\n\nI α , β \\= I β , α \\= ln ⁡ cov G X , ( 1 − X ) {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }={\\\\mathcal {I}}\\_{\\\\beta ,\\\\alpha }=\\\\ln \\\\operatorname {cov} \\_{G{X,(1-X)}}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }={\\\\mathcal {I}}\\_{\\\\beta ,\\\\alpha }=\\\\ln \\\\operatorname {cov} \\_{G{X,(1-X)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9f43e57ed436efb049ce99c8a2e92269acfa1128)\n\nThe Fisher information components are equal to the log geometric variances and log geometric covariance. Therefore, they can be expressed as **[trigamma functions](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted ψ1(α), the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), defined as the derivative of the [digamma](https:\/\/en.wikipedia.org\/wiki\/Digamma \"Digamma\") function:\n\nψ 1 ( α ) \\= d 2 ln ⁡ Γ ( α ) ∂ α 2 \\= ∂ ψ ( α ) ∂ α . {\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha ^{2}}}=\\\\,{\\\\frac {\\\\partial \\\\psi (\\\\alpha )}{\\\\partial \\\\alpha }}.} ![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha ^{2}}}=\\\\,{\\\\frac {\\\\partial \\\\psi (\\\\alpha )}{\\\\partial \\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0629292f9b4428a64ffd75dd814ea9263dad115c)\n\nThese derivatives are also derived in the [§ Two unknown parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Two_unknown_parameters) and plots of the log likelihood function are also shown in that section. [§ Geometric variance and covariance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_variance_and_covariance) contains plots and further discussion of the Fisher information matrix components: the log geometric variances and log geometric covariance as a function of the shape parameters α and β. [§ Moments of logarithmically transformed random variables](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moments_of_logarithmically_transformed_random_variables) contains formulas for moments of logarithmically transformed random variables. Images for the Fisher information components I α , α , I β , β {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha },{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha },{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4f6e447015dcb7f00c9d69c48c9b23ba0bc3ec0e) and I α , β {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fdbb33953faa5a53fe23c281a4e7dbf359c11fbc) are shown in [§ Geometric variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_variance).\n\nThe determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability). From the expressions for the individual components of the Fisher information matrix, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution is:\n\ndet ( I ( α , β ) ) \\= I α , α I β , β − I α , β I α , β \\= ( ψ 1 ( α ) − ψ 1 ( α \\+ β ) ) ( ψ 1 ( β ) − ψ 1 ( α \\+ β ) ) − ( − ψ 1 ( α \\+ β ) ) ( − ψ 1 ( α \\+ β ) ) \\= ψ 1 ( α ) ψ 1 ( β ) − ( ψ 1 ( α ) \\+ ψ 1 ( β ) ) ψ 1 ( α \\+ β ) lim α → 0 det ( I ( α , β ) ) \\= lim β → 0 det ( I ( α , β ) ) \\= ∞ lim α → ∞ det ( I ( α , β ) ) \\= lim β → ∞ det ( I ( α , β ) ) \\= 0 {\\\\displaystyle {\\\\begin{aligned}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }\\\\\\\\\\[4pt\\]&=(\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))(\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))-(-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))(-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))\\\\\\\\\\[4pt\\]&=\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\\\[4pt\\]\\\\lim \\_{\\\\alpha \\\\to 0}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&=\\\\lim \\_{\\\\beta \\\\to 0}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))=\\\\infty \\\\\\\\\\[4pt\\]\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))=0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }\\\\\\\\\\[4pt\\]&=(\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))(\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))-(-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))(-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))\\\\\\\\\\[4pt\\]&=\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\\\[4pt\\]\\\\lim \\_{\\\\alpha \\\\to 0}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&=\\\\lim \\_{\\\\beta \\\\to 0}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))=\\\\infty \\\\\\\\\\[4pt\\]\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b2c5ccf59b05ea730fc108360c07e9ac9634e829)\n\nFrom [Sylvester's criterion](https:\/\/en.wikipedia.org\/wiki\/Sylvester%27s_criterion \"Sylvester's criterion\") (checking whether the diagonal elements are all positive), it follows that the Fisher information matrix for the two parameter case is [positive-definite](https:\/\/en.wikipedia.org\/wiki\/Positive-definite_matrix \"Positive-definite matrix\") (under the standard condition that the shape parameters are positive *α* \\> 0 and *β* \\> 0).\n\n##### Four parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=59 \"Edit section: Four parameters\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/08\/Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png\/250px-Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Fisher_Information_I\\(a,a\\)_for_alpha%3Dbeta_vs_range_\\(c-a\\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png)\n\nFisher Information *I*(*a*,*a*) for *α* = *β* vs range (*c* − *a*) and exponent *α* = *β*\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/64\/Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png\/250px-Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Fisher_Information_I\\(alpha,a\\)_for_alpha%3Dbeta,_vs._range_\\(c_-_a\\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png)\n\nFisher Information *I*(*α*,*a*) for *α* = *β*, vs. range (*c* − *a*) and exponent *α* = *β*\n\nIf *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters: the exponents *α* and *β*, and also *a* (the minimum of the distribution range), and *c* (the maximum of the distribution range) (section titled \"Alternative parametrizations\", \"Four parameters\"), with [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\"):\n\nf ( y ; α , β , a , c ) \\= f ( x ; α , β ) c − a \\= ( y − a c − a ) α − 1 ( c − y c − a ) β − 1 ( c − a ) B ( α , β ) \\= ( y − a ) α − 1 ( c − y ) β − 1 ( c − a ) α \\+ β − 1 B ( α , β ) . {\\\\displaystyle f(y;\\\\alpha ,\\\\beta ,a,c)={\\\\frac {f(x;\\\\alpha ,\\\\beta )}{c-a}}={\\\\frac {\\\\left({\\\\frac {y-a}{c-a}}\\\\right)^{\\\\alpha -1}\\\\left({\\\\frac {c-y}{c-a}}\\\\right)^{\\\\beta -1}}{(c-a)B(\\\\alpha ,\\\\beta )}}={\\\\frac {(y-a)^{\\\\alpha -1}(c-y)^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}B(\\\\alpha ,\\\\beta )}}.} ![{\\\\displaystyle f(y;\\\\alpha ,\\\\beta ,a,c)={\\\\frac {f(x;\\\\alpha ,\\\\beta )}{c-a}}={\\\\frac {\\\\left({\\\\frac {y-a}{c-a}}\\\\right)^{\\\\alpha -1}\\\\left({\\\\frac {c-y}{c-a}}\\\\right)^{\\\\beta -1}}{(c-a)B(\\\\alpha ,\\\\beta )}}={\\\\frac {(y-a)^{\\\\alpha -1}(c-y)^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}B(\\\\alpha ,\\\\beta )}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5e3a650c9f6ecc04d36869cc99297e5c853dd2f1)\n\nthe joint log likelihood function per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\n1 N ln ⁡ L ( α , β , a , c ∣ Y ) \\= α − 1 N ∑ i \\= 1 N ln ⁡ ( Y i − a ) \\+ β − 1 N ∑ i \\= 1 N ln ⁡ ( c − Y i ) − ln ⁡ B ( α , β ) − ( α \\+ β − 1 ) ln ⁡ ( c − a ) {\\\\displaystyle {\\\\frac {1}{N}}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)={\\\\frac {\\\\alpha -1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)+{\\\\frac {\\\\beta -1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-(\\\\alpha +\\\\beta -1)\\\\ln(c-a)} ![{\\\\displaystyle {\\\\frac {1}{N}}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)={\\\\frac {\\\\alpha -1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)+{\\\\frac {\\\\beta -1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-(\\\\alpha +\\\\beta -1)\\\\ln(c-a)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f915551b8f5e29a172c67ffc3e45f6c1ce349920)\n\nFor the four parameter case, the Fisher information has 4\\*4=16 components. It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12\/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components. Aryal and Nadarajah[\\[54\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four parameter case as follows:\n\n− 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α 2 \\= var ⁡ \\[ ln ⁡ ( X ) \\] \\= ψ 1 ( α ) − ψ 1 ( α \\+ β ) \\= I α , α \\= E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α 2 \\] \\= ln ⁡ ( v a r G X ) {\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha ^{2}}}=\\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha ^{2}}}\\\\right\\]=\\\\ln(\\\\operatorname {var\\_{GX}} )} ![{\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha ^{2}}}=\\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha ^{2}}}\\\\right\\]=\\\\ln(\\\\operatorname {var\\_{GX}} )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/31be86f2c53663c6d3975bc2676806ba3e538423) − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β 2 \\= var ⁡ \\[ ln ⁡ ( 1 − X ) \\] \\= ψ 1 ( β ) − ψ 1 ( α \\+ β ) \\= I β , β \\= E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β 2 \\] \\= ln ⁡ ( v a r G ( 1 \\- X ) ) {\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta ^{2}}}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta ^{2}}}\\\\right\\]=\\\\ln(\\\\operatorname {var\\_{G(1-X)}} )} ![{\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta ^{2}}}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta ^{2}}}\\\\right\\]=\\\\ln(\\\\operatorname {var\\_{G(1-X)}} )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/25ab885119f25fae0b9919326db96395d13e3bc3) − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ β \\= cov ⁡ \\[ ln ⁡ X , ( 1 − X ) \\] \\= − ψ 1 ( α \\+ β ) \\= I α , β \\= E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ β \\] \\= ln ⁡ ( cov G X , ( 1 − X ) ) {\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}=\\\\operatorname {cov} \\[\\\\ln X,(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}\\\\right\\]=\\\\ln(\\\\operatorname {cov} \\_{G{X,(1-X)}})} ![{\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}=\\\\operatorname {cov} \\[\\\\ln X,(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}\\\\right\\]=\\\\ln(\\\\operatorname {cov} \\_{G{X,(1-X)}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/02a56af746fb9315340cf382951fe0c3f3640678)\n\nIn the above expressions, the use of *X* instead of *Y* in the expressions var\\[ln(*X*)\\] = ln(var*GX*) is *not an error*. The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter *X* ~ Beta(*α*, *β*) parametrization because when taking the partial derivatives with respect to the exponents (*α*, *β*) in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum *a* and maximum *c* of the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents *α* and *β* is the second derivative of the log of the beta function: ln(B(*α*, *β*)). This term is independent of the minimum *a* and maximum *c* of the distribution's range. Double differentiation of this term results in trigamma functions. The sections titled \"Maximum likelihood\", \"Two unknown parameters\" and \"Four unknown parameters\" also show this fact.\n\nThe Fisher information for *N* [i.i.d.](https:\/\/en.wikipedia.org\/wiki\/I.i.d. \"I.i.d.\") samples is *N* times the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas[\\[28\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Cover_and_Thomas-28)). (Aryal and Nadarajah[\\[54\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Aryal-54) take a single observation, *N* = 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per *N* observations. Moreover, below the erroneous expression for I a , a {\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) in Aryal and Nadarajah has been corrected.)\n\nα \\> 2 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a 2 \\] \\= I a , a \\= β ( α \\+ β − 1 ) ( α − 2 ) ( c − a ) 2 β \\> 2 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ c 2 \\] \\= I c , c \\= α ( α \\+ β − 1 ) ( β − 2 ) ( c − a ) 2 E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a ∂ c \\] \\= I a , c \\= ( α \\+ β − 1 ) ( c − a ) 2 α \\> 1 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ a \\] \\= I α , a \\= β ( α − 1 ) ( c − a ) E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ c \\] \\= I α , c \\= 1 ( c − a ) E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β ∂ a \\] \\= I β , a \\= − 1 ( c − a ) β \\> 1 : E ⁡ \\[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β ∂ c \\] \\= I β , c \\= − α ( β − 1 ) ( c − a ) {\\\\displaystyle {\\\\begin{aligned}\\\\alpha \\>2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a^{2}}}\\\\right\\]&={\\\\mathcal {I}}\\_{a,a}={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{(\\\\alpha -2)(c-a)^{2}}}\\\\\\\\\\\\beta \\>2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c^{2}}}\\\\right\\]&={\\\\mathcal {I}}\\_{c,c}={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{(\\\\beta -2)(c-a)^{2}}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a\\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{a,c}={\\\\frac {(\\\\alpha +\\\\beta -1)}{(c-a)^{2}}}\\\\\\\\\\\\alpha \\>1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial a}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\alpha ,a}={\\\\frac {\\\\beta }{(\\\\alpha -1)(c-a)}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\alpha ,c}={\\\\frac {1}{(c-a)}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\,\\\\partial a}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\beta ,a}=-{\\\\frac {1}{(c-a)}}\\\\\\\\\\\\beta \\>1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\beta ,c}=-{\\\\frac {\\\\alpha }{(\\\\beta -1)(c-a)}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha \\>2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a^{2}}}\\\\right\\]&={\\\\mathcal {I}}\\_{a,a}={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{(\\\\alpha -2)(c-a)^{2}}}\\\\\\\\\\\\beta \\>2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c^{2}}}\\\\right\\]&={\\\\mathcal {I}}\\_{c,c}={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{(\\\\beta -2)(c-a)^{2}}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a\\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{a,c}={\\\\frac {(\\\\alpha +\\\\beta -1)}{(c-a)^{2}}}\\\\\\\\\\\\alpha \\>1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial a}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\alpha ,a}={\\\\frac {\\\\beta }{(\\\\alpha -1)(c-a)}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\alpha ,c}={\\\\frac {1}{(c-a)}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\,\\\\partial a}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\beta ,a}=-{\\\\frac {1}{(c-a)}}\\\\\\\\\\\\beta \\>1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\beta ,c}=-{\\\\frac {\\\\alpha }{(\\\\beta -1)(c-a)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/636646f51bdb1a3193b1721483878e98f4f19c3e)\n\nThe lower two diagonal entries of the Fisher information matrix, with respect to the parameter *a* (the minimum of the distribution's range): I a , a {\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99), and with respect to the parameter *c* (the maximum of the distribution's range): I c , c {\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) are only defined for exponents *α* \\> 2 and *β* \\> 2 respectively. The Fisher information matrix component I a , a {\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) for the minimum *a* approaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component I c , c {\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) for the maximum *c* approaches infinity for exponent *β* approaching 2 from above.\n\nThe Fisher information matrix for the four parameter case does not depend on the individual values of the minimum *a* and the maximum *c*, but only on the total range (*c* − *a*). Moreover, the components of the Fisher information matrix that depend on the range (*c* − *a*), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range (*c* − *a*).\n\nThe accompanying images show the Fisher information components I a , a {\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) and I α , a {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,a}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2e7931360bd161a55cc85431049248a347269b12). Images for the Fisher information components I α , α {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b9abc676c8d87651a340ff84cb55e3674bab2683) and I β , β {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2659a9ad09d8af2a486a30d8aa82cc39d743a524) are shown in [§ Geometric variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_variance). All these Fisher information components look like a basin, with the \"walls\" of the basin being located at low values of the parameters.\n\nThe following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter: *X* ~ Beta(α, β) expectations of the transformed ratio ((1 − *X*)\/*X*) and of its mirror image (*X*\/(1 − *X*)), scaled by the range (*c* − *a*), which may be helpful for interpretation:\n\nI α , a \\= E ⁡ \\[ 1 − X X \\] c − a \\= β ( α − 1 ) ( c − a ) if α \\> 1 {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,a}={\\\\frac {\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]}{c-a}}={\\\\frac {\\\\beta }{(\\\\alpha -1)(c-a)}}{\\\\text{ if }}\\\\alpha \\>1} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,a}={\\\\frac {\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]}{c-a}}={\\\\frac {\\\\beta }{(\\\\alpha -1)(c-a)}}{\\\\text{ if }}\\\\alpha \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e670565bb8d06bace69cf892864520f5c83b5449) I β , c \\= − E ⁡ \\[ X 1 − X \\] c − a \\= − α ( β − 1 ) ( c − a ) if β \\> 1 {\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\beta ,c}=-{\\\\frac {\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]}{c-a}}=-{\\\\frac {\\\\alpha }{(\\\\beta -1)(c-a)}}{\\\\text{ if }}\\\\beta \\>1} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\beta ,c}=-{\\\\frac {\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]}{c-a}}=-{\\\\frac {\\\\alpha }{(\\\\beta -1)(c-a)}}{\\\\text{ if }}\\\\beta \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/94f9b7788a4f19e1cbc765ab8fc85a7ad55dec4f)\n\nThese are also the expected values of the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")) [\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) and its mirror image, scaled by the range (*c* − *a*).\n\nAlso, the following Fisher information components can be expressed in terms of the harmonic (1\/X) variances or of variances based on the ratio transformed variables ((1-X)\/X) as follows:\n\nα \\> 2 : I a , a \\= var ⁡ \\[ 1 X \\] ( α − 1 c − a ) 2 \\= var ⁡ \\[ 1 − X X \\] ( α − 1 c − a ) 2 \\= β ( α \\+ β − 1 ) ( α − 2 ) ( c − a ) 2 β \\> 2 : I c , c \\= var ⁡ \\[ 1 1 − X \\] ( β − 1 c − a ) 2 \\= var ⁡ \\[ X 1 − X \\] ( β − 1 c − a ) 2 \\= α ( α \\+ β − 1 ) ( β − 2 ) ( c − a ) 2 I a , c \\= cov ⁡ \\[ 1 X , 1 1 − X \\] ( α − 1 ) ( β − 1 ) ( c − a ) 2 \\= cov ⁡ \\[ 1 − X X , X 1 − X \\] ( α − 1 ) ( β − 1 ) ( c − a ) 2 \\= ( α \\+ β − 1 ) ( c − a ) 2 {\\\\displaystyle {\\\\begin{aligned}\\\\alpha \\>2:\\\\quad {\\\\mathcal {I}}\\_{a,a}&=\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]\\\\left({\\\\frac {\\\\alpha -1}{c-a}}\\\\right)^{2}=\\\\operatorname {var} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\left({\\\\frac {\\\\alpha -1}{c-a}}\\\\right)^{2}={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{(\\\\alpha -2)(c-a)^{2}}}\\\\\\\\\\\\beta \\>2:\\\\quad {\\\\mathcal {I}}\\_{c,c}&=\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]\\\\left({\\\\frac {\\\\beta -1}{c-a}}\\\\right)^{2}=\\\\operatorname {var} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\left({\\\\frac {\\\\beta -1}{c-a}}\\\\right)^{2}={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{(\\\\beta -2)(c-a)^{2}}}\\\\\\\\{\\\\mathcal {I}}\\_{a,c}&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {1}{1-X}}\\\\right\\]{\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{(c-a)^{2}}}=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {X}{1-X}}\\\\right\\]{\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{(c-a)^{2}}}={\\\\frac {(\\\\alpha +\\\\beta -1)}{(c-a)^{2}}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha \\>2:\\\\quad {\\\\mathcal {I}}\\_{a,a}&=\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]\\\\left({\\\\frac {\\\\alpha -1}{c-a}}\\\\right)^{2}=\\\\operatorname {var} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\left({\\\\frac {\\\\alpha -1}{c-a}}\\\\right)^{2}={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{(\\\\alpha -2)(c-a)^{2}}}\\\\\\\\\\\\beta \\>2:\\\\quad {\\\\mathcal {I}}\\_{c,c}&=\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]\\\\left({\\\\frac {\\\\beta -1}{c-a}}\\\\right)^{2}=\\\\operatorname {var} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\left({\\\\frac {\\\\beta -1}{c-a}}\\\\right)^{2}={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{(\\\\beta -2)(c-a)^{2}}}\\\\\\\\{\\\\mathcal {I}}\\_{a,c}&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {1}{1-X}}\\\\right\\]{\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{(c-a)^{2}}}=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {X}{1-X}}\\\\right\\]{\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{(c-a)^{2}}}={\\\\frac {(\\\\alpha +\\\\beta -1)}{(c-a)^{2}}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f1f89730020364bb58791ca0eb47d0de25c896c2)\n\nSee section \"Moments of linearly transformed, product and inverted random variables\" for these expectations.\n\nThe determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability). From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is:\n\ndet ( I ( α , β , a , c ) ) \\= − I a , c 2 I α , a I α , β \\+ I a , a I a , c I α , c I α , β \\+ I a , c 2 I α , β 2 − I a , a I c , c I α , β 2 − I a , c I α , a I α , c I β , a \\+ I a , c 2 I α , α I β , a \\+ 2 I c , c I α , a I α , β I β , a − 2 I a , c I α , c I α , β I β , a \\+ I α , c 2 I β , a 2 − I c , c I α , α I β , a 2 \\+ I a , c I α , a 2 I β , c − I a , a I a , c I α , α I β , c − I a , c I α , a I α , β I β , c \\+ I a , a I α , c I α , β I β , c − I α , a I α , c I β , a I β , c \\+ I a , c I α , α I β , a I β , c − I c , c I α , a 2 I β , β \\+ 2 I a , c I α , a I α , c I β , β − I a , a I α , c 2 I β , β − I a , c 2 I α , α I β , β \\+ I a , a I c , c I α , α I β , β if α , β \\> 2 {\\\\displaystyle {\\\\begin{aligned}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ,a,c))={}&-{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }+{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }^{2}-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }^{2}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,a}+{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}+2{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,a}\\\\\\\\&{}-2{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,a}+{\\\\mathcal {I}}\\_{\\\\alpha ,c}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,a}^{2}-{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}^{2}+{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,c}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,c}-{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,c}+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,c}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,a}{\\\\mathcal {I}}\\_{\\\\beta ,c}+{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}{\\\\mathcal {I}}\\_{\\\\beta ,c}-{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }\\\\\\\\&{}+2{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }{\\\\text{ if }}\\\\alpha ,\\\\beta \\>2\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ,a,c))={}&-{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }+{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }^{2}-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }^{2}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,a}+{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}+2{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,a}\\\\\\\\&{}-2{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,a}+{\\\\mathcal {I}}\\_{\\\\alpha ,c}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,a}^{2}-{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}^{2}+{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,c}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,c}-{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,c}+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,c}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,a}{\\\\mathcal {I}}\\_{\\\\beta ,c}+{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}{\\\\mathcal {I}}\\_{\\\\beta ,c}-{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }\\\\\\\\&{}+2{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }{\\\\text{ if }}\\\\alpha ,\\\\beta \\>2\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2736604fb3cf676756af731d77faaf9041e60ae9)\n\nUsing [Sylvester's criterion](https:\/\/en.wikipedia.org\/wiki\/Sylvester%27s_criterion \"Sylvester's criterion\") (checking whether the diagonal elements are all positive), and since diagonal components I a , a {\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) and I c , c {\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) have [singularities](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") at α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is [positive-definite](https:\/\/en.wikipedia.org\/wiki\/Positive-definite_matrix \"Positive-definite matrix\") for α\\>2 and β\\>2. Since for α \\> 2 and β \\> 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Continuous_uniform_distribution \"Continuous uniform distribution\") (Beta(1,1,a,c)) have Fisher information components (I a , a , I c , c , I α , a , I β , c {\\\\displaystyle {\\\\mathcal {I}}\\_{a,a},{\\\\mathcal {I}}\\_{c,c},{\\\\mathcal {I}}\\_{\\\\alpha ,a},{\\\\mathcal {I}}\\_{\\\\beta ,c}} ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a},{\\\\mathcal {I}}\\_{c,c},{\\\\mathcal {I}}\\_{\\\\alpha ,a},{\\\\mathcal {I}}\\_{\\\\beta ,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0efbfead578f297a9f2aa81caedf6c8066d5a0a0)) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case). The four-parameter [Wigner semicircle distribution](https:\/\/en.wikipedia.org\/wiki\/Wigner_semicircle_distribution \"Wigner semicircle distribution\") (Beta(3\/2,3\/2,*a*,*c*)) and [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") (Beta(1\/2,1\/2,*a*,*c*)) have negative Fisher information determinants for the four-parameter case.\n\n### Bayesian inference\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=60 \"Edit section: Bayesian inference\")\\]\n\nMain article: [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\")\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/09\/Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png\/250px-Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta\\(1,1\\)_Uniform_distribution_-_J._Rodal.png)\n\nB\n\ne\n\nt\n\na\n\n(\n\n1\n\n,\n\n1\n\n)\n\n{\\\\displaystyle Beta(1,1)}\n\n![{\\\\displaystyle Beta(1,1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1dc385b5aa92366df42c6b175833dd66518bcc5e)\n\n: The [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") probability density was proposed by [Thomas Bayes](https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes \"Thomas Bayes\") to represent ignorance of prior probabilities in [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\").\n\nThe use of Beta distributions in [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\") is due to the fact that they provide a family of [conjugate prior probability distributions](https:\/\/en.wikipedia.org\/wiki\/Conjugate_prior_distribution \"Conjugate prior distribution\") for [binomial](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") (including [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\")) and [geometric distributions](https:\/\/en.wikipedia.org\/wiki\/Geometric_distribution \"Geometric distribution\"). The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value *p*:[\\[24\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-MacKay-24)\n\nP ( p ; α , β ) \\= p α − 1 ( 1 − p ) β − 1 B ( α , β ) . {\\\\displaystyle P(p;\\\\alpha ,\\\\beta )={\\\\frac {p^{\\\\alpha -1}(1-p)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}.} ![{\\\\displaystyle P(p;\\\\alpha ,\\\\beta )={\\\\frac {p^{\\\\alpha -1}(1-p)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad22d0a4845670ac730383ed26b7028a9eec1314)\n\nExamples of beta distributions used as prior probabilities to represent ignorance of prior parameter values in Bayesian inference are Beta(1,1), Beta(0,0) and Beta(1\/2,1\/2).\n\n#### Rule of succession\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=61 \"Edit section: Rule of succession\")\\]\n\nMain article: [Rule of succession](https:\/\/en.wikipedia.org\/wiki\/Rule_of_succession \"Rule of succession\")\n\nA classic application of the beta distribution is the [rule of succession](https:\/\/en.wikipedia.org\/wiki\/Rule_of_succession \"Rule of succession\"), introduced in the 18th century by [Pierre-Simon Laplace](https:\/\/en.wikipedia.org\/wiki\/Pierre-Simon_Laplace \"Pierre-Simon Laplace\")[\\[55\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Laplace-55) in the course of treating the [sunrise problem](https:\/\/en.wikipedia.org\/wiki\/Sunrise_problem \"Sunrise problem\"). It states that, given *s* successes in *n* [conditionally independent](https:\/\/en.wikipedia.org\/wiki\/Conditional_independence \"Conditional independence\") [Bernoulli trials](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_trial \"Bernoulli trial\") with probability *p,* that the estimate of the expected value in the next trial is s \\+ 1 n \\+ 2 {\\\\displaystyle {\\\\frac {s+1}{n+2}}} ![{\\\\displaystyle {\\\\frac {s+1}{n+2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5a520f64a100600a0356d2562721ad1f6c907f5b). This estimate is the expected value of the posterior distribution over *p,* namely Beta(*s*\\+1, *n*−*s*\\+1), which is given by [Bayes' rule](https:\/\/en.wikipedia.org\/wiki\/Bayes%27_rule \"Bayes' rule\") if one assumes a uniform prior probability over *p* (i.e., Beta(1, 1)) and then observes that *p* generated *s* successes in *n* trials. Laplace's rule of succession has been criticized by prominent scientists. R. T. Cox described Laplace's application of the rule of succession to the [sunrise problem](https:\/\/en.wikipedia.org\/wiki\/Sunrise_problem \"Sunrise problem\") ([\\[56\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-CoxRT-56) p. 89) as \"a travesty of the proper use of the principle\". Keynes remarks ([\\[57\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 382) \"indeed this is so foolish a theorem that to entertain it is discreditable\". Karl Pearson[\\[58\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-PearsonRuleSuccession-58) showed that the probability that the next (*n* + 1) trials will be successes, after n successes in n trials, is only 50%, which has been considered too low by scientists like Jeffreys and unacceptable as a representation of the scientific process of experimentation to test a proposed scientific law. As pointed out by Jeffreys ([\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59) p. 128) (crediting [C. D. Broad](https:\/\/en.wikipedia.org\/wiki\/C._D._Broad \"C. D. Broad\")[\\[60\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BroadMind-60) ) Laplace's rule of succession establishes a high probability of success ((n+1)\/(n+2)) in the next trial, but only a moderate probability (50%) that a further sample (*n*\\+1) comparable in size will be equally successful. As pointed out by Perks,[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) \"The rule of succession itself is hard to accept. It assigns a probability to the next trial which implies the assumption that the actual run observed is an average run and that we are always at the end of an average run. It would, one would think, be more reasonable to assume that we were in the middle of an average run. Clearly a higher value for both probabilities is necessary if they are to accord with reasonable belief.\" These problems with Laplace's rule of succession motivated Haldane, Perks, Jeffreys and others to search for other forms of prior probability (see the next [§ Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Bayesian_inference)). According to Jaynes,[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) the main problem with the rule of succession is that it is not valid when s=0 or s=n (see [rule of succession](https:\/\/en.wikipedia.org\/wiki\/Rule_of_succession \"Rule of succession\"), for an analysis of its validity).\n\n#### Bayes–Laplace prior probability (Beta(1,1))\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=62 \"Edit section: Bayes–Laplace prior probability (Beta(1,1))\")\\]\n\nThe beta distribution achieves maximum differential entropy for Beta(1,1): the [uniform](https:\/\/en.wikipedia.org\/wiki\/Uniform_density \"Uniform density\") probability density, for which all values in the domain of the distribution have equal density. This uniform distribution Beta(1,1) was suggested (\"with a great deal of doubt\") by [Thomas Bayes](https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes \"Thomas Bayes\")[\\[62\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-ThomasBayes-62) as the prior probability distribution to express ignorance about the correct prior distribution. This prior distribution was adopted (apparently, from his writings, with little sign of doubt[\\[55\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Laplace-55)) by [Pierre-Simon Laplace](https:\/\/en.wikipedia.org\/wiki\/Pierre-Simon_Laplace \"Pierre-Simon Laplace\"), and hence it was also known as the \"Bayes–Laplace rule\" or the \"Laplace rule\" of \"[inverse probability](https:\/\/en.wikipedia.org\/wiki\/Inverse_probability \"Inverse probability\")\" in publications of the first half of the 20th century. In the later part of the 19th century and early part of the 20th century, scientists realized that the assumption of uniform \"equal\" probability density depended on the actual functions (for example whether a linear or a logarithmic scale was most appropriate) and parametrizations used. In particular, the behavior near the ends of distributions with finite support (for example near *x* = 0, for a distribution with initial support at *x* = 0) required particular attention. Keynes ([\\[57\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 381) criticized the use of Bayes's uniform prior probability (Beta(1,1)) that all values between zero and one are equiprobable, as follows: \"Thus experience, if it shows anything, shows that there is a very marked clustering of statistical ratios in the neighborhoods of zero and unity, of those for positive theories and for correlations between positive qualities in the neighborhood of zero, and of those for negative theories and for correlations between negative qualities in the neighborhood of unity. \"\n\n#### Haldane's prior probability (Beta(0,0))\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=63 \"Edit section: Haldane's prior probability (Beta(0,0))\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1b\/Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png\/250px-Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png)\n\nB\n\ne\n\nt\n\na\n\n(\n\n0\n\n,\n\n0\n\n)\n\n{\\\\displaystyle Beta(0,0)}\n\n![{\\\\displaystyle Beta(0,0)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7b6c60512da3a9f0ea488c28d019448ba513cb1f)\n\n: The Haldane prior probability expressing total ignorance about prior information, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure. As α, β → 0, the beta distribution approaches a two-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with all probability density concentrated at each end, at 0 and 1, and nothing in between. A coin-toss: one face of the coin being at 0 and the other face being at 1.\n\nThe Beta(0,0) distribution was proposed by [J.B.S. Haldane](https:\/\/en.wikipedia.org\/wiki\/J.B.S._Haldane \"J.B.S. Haldane\"),[\\[63\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-63) who suggested that the prior probability representing complete uncertainty should be proportional to *p*−1(1−*p*)−1. The function *p*−1(1−*p*)−1 can be viewed as the limit of the numerator of the beta distribution as both shape parameters approach zero: α, β → 0. The Beta function (in the denominator of the beta distribution) approaches infinity, for both parameters approaching zero, α, β → 0. Therefore, *p*−1(1−*p*)−1 divided by the Beta function approaches a 2-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each end, at 0 and 1, and nothing in between, as α, β → 0. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Haldane prior probability distribution Beta(0,0) is an \"[improper prior](https:\/\/en.wikipedia.org\/wiki\/Improper_prior \"Improper prior\")\" because its integration (from 0 to 1) fails to strictly converge to 1 due to the singularities at each end. However, this is not an issue for computing posterior probabilities unless the sample size is very small. Furthermore, Zellner[\\[64\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Zellner-64) points out that on the [log-odds](https:\/\/en.wikipedia.org\/wiki\/Log-odds \"Log-odds\") scale, (the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation log ⁡ ( p \/ ( 1 − p ) ) {\\\\displaystyle \\\\log(p\/(1-p))} ![{\\\\displaystyle \\\\log(p\/(1-p))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1873f1747bd2b5847dd9db649027cb857c53c451)), the Haldane prior is the uniformly flat prior. The fact that a uniform prior probability on the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformed variable ln(*p*\/1 − *p*) (with domain (−∞, ∞)) is equivalent to the Haldane prior on the domain \\[0, 1\\] was pointed out by [Harold Jeffreys](https:\/\/en.wikipedia.org\/wiki\/Harold_Jeffreys \"Harold Jeffreys\") in the first edition (1939) of his book Theory of Probability ([\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59) p. 123). Jeffreys writes \"Certainly if we take the Bayes–Laplace rule right up to the extremes we are led to results that do not correspond to anybody's way of thinking. The (Haldane) rule d*x*\/(*x*(1 − *x*)) goes too far the other way. It would lead to the conclusion that if a sample is of one type with respect to some property there is a probability 1 that the whole population is of that type.\" The fact that \"uniform\" depends on the parametrization, led Jeffreys to seek a form of prior that would be invariant under different parametrizations.\n\n#### Jeffreys' prior probability (Beta(1\/2,1\/2) for a Bernoulli or for a binomial distribution)\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=64 \"Edit section: Jeffreys' prior probability (Beta(1\/2,1\/2) for a Bernoulli or for a binomial distribution)\")\\]\n\nMain article: [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\")\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/30\/Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png\/250px-Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png)\n\n[Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability for the beta distribution: the square root of the determinant of [Fisher's information](https:\/\/en.wikipedia.org\/wiki\/Fisher%27s_information \"Fisher's information\") matrix:\n\ndet\n\n(\n\nI\n\n(\n\nα\n\n,\n\nβ\n\n)\n\n)\n\n\\=\n\nψ\n\n1\n\n(\n\nα\n\n)\n\nψ\n\n1\n\n(\n\nβ\n\n)\n\n−\n\n(\n\nψ\n\n1\n\n(\n\nα\n\n)\n\n\\+\n\nψ\n\n1\n\n(\n\nβ\n\n)\n\n)\n\nψ\n\n1\n\n(\n\nα\n\n\\+\n\nβ\n\n)\n\n{\\\\displaystyle \\\\scriptstyle {\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}={\\\\sqrt {\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )}}}\n\n![{\\\\displaystyle \\\\scriptstyle {\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}={\\\\sqrt {\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d2e6efbd72082ebee1c3de355648f844c92ffce8)\n\nis a function of the [trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\") ψ1 of shape parameters α, β\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/33\/Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png\/250px-Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png)\n\nPosterior Beta densities with samples having success = \"s\", failure = \"f\" of *s*\/(*s* + *f*) = 1\/2, and *s* + *f* = {3,10,50}, based on 3 different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1\/2). Significant differences appear for very small sample sizes (the flatter distribution for sample size of 3)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/49\/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png\/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_-_J._Rodal.png)\n\nPosterior Beta densities with samples having success = \"s\", failure = \"f\" of *s*\/(*s* + *f*) = 1\/4, and *s* + *f* ∈ {3,10,50}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1\/4). Significant differences appear for very small sample sizes (the very skewed distribution for the degenerate case of sample size = 3, in this degenerate and unlikely case the Haldane prior results in a reverse \"J\" shape with mode at *p* = 0 instead of *p* = 1\/4. If there is sufficient [sampling data](https:\/\/en.wikipedia.org\/wiki\/Sample_\\(statistics\\) \"Sample (statistics)\"), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1\/2,1\/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https:\/\/en.wikipedia.org\/wiki\/Posterior_probability \"Posterior probability\") densities.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/a5\/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png\/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_sample_size_%3D_\\(4,12,40\\)_-_J._Rodal.png)\n\nPosterior Beta densities with samples having success = *s*, failure = *f* of *s*\/(*s* + *f*) = 1\/4, and *s* + *f* ∈ {4,12,40}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 40 (with more pronounced peak near *p* = 1\/4). Significant differences appear for very small sample sizes\n\n[Harold Jeffreys](https:\/\/en.wikipedia.org\/wiki\/Harold_Jeffreys \"Harold Jeffreys\")[\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59)[\\[65\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JeffreysPRIOR-65) proposed to use an [uninformative prior](https:\/\/en.wikipedia.org\/wiki\/Uninformative_prior \"Uninformative prior\") probability measure that should be [invariant under reparameterization](https:\/\/en.wikipedia.org\/wiki\/Parametrization_invariance \"Parametrization invariance\"): proportional to the square root of the [determinant](https:\/\/en.wikipedia.org\/wiki\/Determinant \"Determinant\") of [Fisher's information](https:\/\/en.wikipedia.org\/wiki\/Fisher%27s_information \"Fisher's information\") matrix. For the [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), this can be shown as follows: for a coin that is \"heads\" with probability *p* ∈ \\[0, 1\\] and is \"tails\" with probability 1 − *p*, for a given (H,T) ∈ {(0,1), (1,0)} the probability is *pH*(1 − *p*)*T*. Since *T* = 1 − *H*, the [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") is *pH*(1 − *p*)1 − *H*. Considering *p* as the only parameter, it follows that the log likelihood for the Bernoulli distribution is\n\nln ⁡ L ( p ∣ H ) \\= H ln ⁡ p \\+ ( 1 − H ) ln ⁡ ( 1 − p ) . {\\\\displaystyle \\\\ln {\\\\mathcal {L}}(p\\\\mid H)=H\\\\ln p+(1-H)\\\\ln(1-p).} ![{\\\\displaystyle \\\\ln {\\\\mathcal {L}}(p\\\\mid H)=H\\\\ln p+(1-H)\\\\ln(1-p).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/98141843c7029d988f28ad6f672780ffb339e3e9)\n\nThe Fisher information matrix has only one component (it is a scalar, because there is only one parameter: *p*), therefore:\n\nI ( p ) \\= E \\[ ( d d p ln ⁡ L ( p ∣ H ) ) 2 \\] \\= E \\[ ( H p − 1 − H 1 − p ) 2 \\] \\= p 1 ( 1 − p ) 0 ( 1 p − 0 1 − p ) 2 \\+ p 0 ( 1 − p ) 1 ( 0 p − 1 1 − p ) 2 \\= 1 p ( 1 − p ) . {\\\\displaystyle {\\\\begin{aligned}{\\\\sqrt {{\\\\mathcal {I}}(p)}}&={\\\\sqrt {\\\\operatorname {E} \\\\!\\\\left\\[\\\\left({\\\\frac {d}{dp}}\\\\ln {\\\\mathcal {L}}(p\\\\mid H)\\\\right)^{2}\\\\right\\]}}\\\\\\\\\\[6pt\\]&={\\\\sqrt {\\\\operatorname {E} \\\\!\\\\left\\[\\\\left({\\\\frac {H}{p}}-{\\\\frac {1-H}{1-p}}\\\\right)^{2}\\\\right\\]}}\\\\\\\\\\[6pt\\]&={\\\\sqrt {p^{1}(1-p)^{0}\\\\left({\\\\frac {1}{p}}-{\\\\frac {0}{1-p}}\\\\right)^{2}+p^{0}(1-p)^{1}\\\\left({\\\\frac {0}{p}}-{\\\\frac {1}{1-p}}\\\\right)^{2}}}\\\\\\\\&={\\\\frac {1}{\\\\sqrt {p(1-p)}}}.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\sqrt {{\\\\mathcal {I}}(p)}}&={\\\\sqrt {\\\\operatorname {E} \\\\!\\\\left\\[\\\\left({\\\\frac {d}{dp}}\\\\ln {\\\\mathcal {L}}(p\\\\mid H)\\\\right)^{2}\\\\right\\]}}\\\\\\\\\\[6pt\\]&={\\\\sqrt {\\\\operatorname {E} \\\\!\\\\left\\[\\\\left({\\\\frac {H}{p}}-{\\\\frac {1-H}{1-p}}\\\\right)^{2}\\\\right\\]}}\\\\\\\\\\[6pt\\]&={\\\\sqrt {p^{1}(1-p)^{0}\\\\left({\\\\frac {1}{p}}-{\\\\frac {0}{1-p}}\\\\right)^{2}+p^{0}(1-p)^{1}\\\\left({\\\\frac {0}{p}}-{\\\\frac {1}{1-p}}\\\\right)^{2}}}\\\\\\\\&={\\\\frac {1}{\\\\sqrt {p(1-p)}}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0c2541cc4a3017abaab79170bd990ca92a64bc89)\n\nSimilarly, for the [Binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") with *n* [Bernoulli trials](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_trials \"Bernoulli trials\"), it can be shown that\n\nI ( p ) \\= n p ( 1 − p ) . {\\\\displaystyle {\\\\sqrt {{\\\\mathcal {I}}(p)}}={\\\\sqrt {\\\\frac {n}{p(1-p)}}}.} ![{\\\\displaystyle {\\\\sqrt {{\\\\mathcal {I}}(p)}}={\\\\sqrt {\\\\frac {n}{p(1-p)}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/636cdf433bd3d97188042cbcea0ad5a750f1479b)\n\nThus, for the [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), and [Binomial distributions](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\"), [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") is proportional to 1 p ( 1 − p ) {\\\\displaystyle \\\\scriptstyle {\\\\frac {1}{\\\\sqrt {p(1-p)}}}} ![{\\\\displaystyle \\\\scriptstyle {\\\\frac {1}{\\\\sqrt {p(1-p)}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a42c718aec7a9fce451ad65deb17197b0d516c56), which happens to be proportional to a beta distribution with domain variable *x* = *p*, and shape parameters α = β = 1\/2, the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\"):\n\nBeta ⁡ ( 1 2 , 1 2 ) \\= 1 π p ( 1 − p ) . {\\\\displaystyle \\\\operatorname {Beta} ({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}})={\\\\frac {1}{\\\\pi {\\\\sqrt {p(1-p)}}}}.} ![{\\\\displaystyle \\\\operatorname {Beta} ({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}})={\\\\frac {1}{\\\\pi {\\\\sqrt {p(1-p)}}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d350c13d6df17923903b2a413a7fdc29dee48898)\n\nIt will be shown in the next section that the normalizing constant for Jeffreys prior is immaterial to the final result because the normalizing constant cancels out in Bayes' theorem for the posterior probability. Hence Beta(1\/2,1\/2) is used as the Jeffreys prior for both Bernoulli and binomial distributions. As shown in the next section, when using this expression as a prior probability times the likelihood in [Bayes' theorem](https:\/\/en.wikipedia.org\/wiki\/Bayes%27_theorem \"Bayes' theorem\"), the posterior probability turns out to be a beta distribution. It is important to realize, however, that Jeffreys prior is proportional to 1 p ( 1 − p ) {\\\\textstyle {\\\\frac {1}{\\\\sqrt {p(1-p)}}}} ![{\\\\textstyle {\\\\frac {1}{\\\\sqrt {p(1-p)}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6a70f13cbeb0fa71cd4a677a4028f252f343dd49) for the Bernoulli and binomial distribution, but not for the beta distribution. Jeffreys prior for the beta distribution is given by the determinant of Fisher's information for the beta distribution, which, as shown in the [§ Fisher information matrix](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Fisher_information_matrix) is a function of the [trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\") ψ1 of shape parameters α and β as follows:\n\ndet ( I ( α , β ) ) \\= ψ 1 ( α ) ψ 1 ( β ) − ( ψ 1 ( α ) \\+ ψ 1 ( β ) ) ψ 1 ( α \\+ β ) lim α → 0 det ( I ( α , β ) ) \\= lim β → 0 det ( I ( α , β ) ) \\= ∞ lim α → ∞ det ( I ( α , β ) ) \\= lim β → ∞ det ( I ( α , β ) ) \\= 0 {\\\\displaystyle {\\\\begin{aligned}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&={\\\\sqrt {\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )}}\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&=\\\\lim \\_{\\\\beta \\\\to 0}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}=\\\\infty \\\\\\\\\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&=\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}=0\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&={\\\\sqrt {\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )}}\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&=\\\\lim \\_{\\\\beta \\\\to 0}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}=\\\\infty \\\\\\\\\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&=\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/07c5b390f9d9ecda1e940272b7746e29edcb4bc3)\n\nAs previously discussed, Jeffreys prior for the Bernoulli and binomial distributions is proportional to the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") Beta(1\/2,1\/2), a one-dimensional *curve* that looks like a basin as a function of the parameter *p* of the Bernoulli and binomial distributions. The walls of the basin are formed by *p* approaching the singularities at the ends *p* → 0 and *p* → 1, where Beta(1\/2,1\/2) approaches infinity. Jeffreys prior for the beta distribution is a *2-dimensional surface* (embedded in a three-dimensional space) that looks like a basin with only two of its walls meeting at the corner α = β = 0 (and missing the other two walls) as a function of the shape parameters α and β of the beta distribution. The two adjoining walls of this 2-dimensional surface are formed by the shape parameters α and β approaching the singularities (of the trigamma function) at α, β → 0. It has no walls for α, β → ∞ because in this case the determinant of Fisher's information matrix for the beta distribution approaches zero.\n\nIt will be shown in the next section that Jeffreys prior probability results in posterior probabilities (when multiplied by the binomial likelihood function) that are intermediate between the posterior probability results of the Haldane and Bayes prior probabilities.\n\nJeffreys prior may be difficult to obtain analytically, and for some cases it just doesn't exist (even for simple distribution functions like the asymmetric [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\")). Berger, Bernardo and Sun, in a 2009 paper[\\[66\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BergerBernardoSun-66) defined a reference prior probability distribution that (unlike Jeffreys prior) exists for the asymmetric [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\"). They cannot obtain a closed-form expression for their reference prior, but numerical calculations show it to be nearly perfectly fitted by the (proper) prior\n\nBeta ⁡ ( 1 2 , 1 2 ) ∼ 1 θ ( 1 − θ ) {\\\\displaystyle \\\\operatorname {Beta} ({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}})\\\\sim {\\\\frac {1}{\\\\sqrt {\\\\theta (1-\\\\theta )}}}} ![{\\\\displaystyle \\\\operatorname {Beta} ({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}})\\\\sim {\\\\frac {1}{\\\\sqrt {\\\\theta (1-\\\\theta )}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/48f294007f1cccbad01219b7034dd9156ad4c18e)\n\nwhere θ is the vertex variable for the asymmetric triangular distribution with support \\[0, 1\\] (corresponding to the following parameter values in Wikipedia's article on the [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\"): vertex *c* = *θ*, left end *a* = 0, and right end *b* = 1). Berger et al. also give a heuristic argument that Beta(1\/2,1\/2) could indeed be the exact Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution. Therefore, Beta(1\/2,1\/2) not only is Jeffreys prior for the Bernoulli and binomial distributions, but also seems to be the Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution (for which the Jeffreys prior does not exist), a distribution used in project management and [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\") analysis to describe the cost and duration of project tasks.\n\nClarke and Barron[\\[67\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-67) prove that, among continuous positive priors, Jeffreys prior (when it exists) asymptotically maximizes Shannon's [mutual information](https:\/\/en.wikipedia.org\/wiki\/Mutual_information \"Mutual information\") between a sample of size n and the parameter, and therefore *Jeffreys prior is the most uninformative prior* (measuring information as Shannon information). The proof rests on an examination of the [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") between probability density functions for [iid](https:\/\/en.wikipedia.org\/wiki\/Iid \"Iid\") random variables.\n\n#### Effect of different prior probability choices on the posterior beta distribution\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=65 \"Edit section: Effect of different prior probability choices on the posterior beta distribution\")\\]\n\nIf samples are drawn from the population of a random variable *X* that result in *s* successes and *f* failures in *n* [Bernoulli trials](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_trial \"Bernoulli trial\") *n* = *s* + *f*, then the [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\") for parameters *s* and *f* given *x* = *p* (the notation *x* = *p* in the expressions below will emphasize that the domain *x* stands for the value of the parameter *p* in the binomial distribution), is the following [binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\"):\n\nL ( s , f ∣ x \\= p ) \\= ( s \\+ f s ) x s ( 1 − x ) f \\= ( n s ) x s ( 1 − x ) n − s . {\\\\displaystyle {\\\\mathcal {L}}(s,f\\\\mid x=p)={s+f \\\\choose s}x^{s}(1-x)^{f}={n \\\\choose s}x^{s}(1-x)^{n-s}.} ![{\\\\displaystyle {\\\\mathcal {L}}(s,f\\\\mid x=p)={s+f \\\\choose s}x^{s}(1-x)^{f}={n \\\\choose s}x^{s}(1-x)^{n-s}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/666e972b7052ad62f6852366654b0aa56a1a4933)\n\nIf beliefs about [prior probability](https:\/\/en.wikipedia.org\/wiki\/Prior_probability \"Prior probability\") information are reasonably well approximated by a beta distribution with parameters *α* Prior and *β* Prior, then:\n\nPriorProbability ( x \\= p ; α Prior , β Prior ) \\= x α Prior − 1 ( 1 − x ) β Prior − 1 B ( α Prior , β Prior ) {\\\\displaystyle {\\\\operatorname {PriorProbability} }(x=p;\\\\alpha \\\\operatorname {Prior} ,\\\\beta \\\\operatorname {Prior} )={\\\\frac {x^{\\\\alpha \\\\operatorname {Prior} -1}(1-x)^{\\\\beta \\\\operatorname {Prior} -1}}{\\\\mathrm {B} (\\\\alpha \\\\operatorname {Prior} ,\\\\beta \\\\operatorname {Prior} )}}} ![{\\\\displaystyle {\\\\operatorname {PriorProbability} }(x=p;\\\\alpha \\\\operatorname {Prior} ,\\\\beta \\\\operatorname {Prior} )={\\\\frac {x^{\\\\alpha \\\\operatorname {Prior} -1}(1-x)^{\\\\beta \\\\operatorname {Prior} -1}}{\\\\mathrm {B} (\\\\alpha \\\\operatorname {Prior} ,\\\\beta \\\\operatorname {Prior} )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ea1f1fc91d398fc32f6c45a16e039b093b3b4cdd)\n\nAccording to [Bayes' theorem](https:\/\/en.wikipedia.org\/wiki\/Bayes%27_theorem \"Bayes' theorem\") for a continuous event space, the [posterior probability](https:\/\/en.wikipedia.org\/wiki\/Posterior_probability \"Posterior probability\") density is given by the product of the [prior probability](https:\/\/en.wikipedia.org\/wiki\/Prior_probability \"Prior probability\") and the likelihood function (given the evidence *s* and *f* = *n* − *s*), normalized so that the area under the curve equals one, as follows:\n\nposterior probability density ( x \\= p ∣ s , n − s ) \\= priorprobabilitydensity ⁡ ( x \\= p ; α prior , β prior ) L ( s , f ∣ x \\= p ) ∫ 0 1 prior probability density ( x \\= p ; α prior , β prior ) L ( s , f ∣ x \\= p ) d x \\= ( n s ) x s \\+ α prior − 1 ( 1 − x ) n − s \\+ β prior − 1 \/ B ( α prior , β prior ) ∫ 0 1 ( ( n s ) x s \\+ α prior − 1 ( 1 − x ) n − s \\+ β prior − 1 \/ B ( α prior , β prior ) ) d x \\= x s \\+ α prior − 1 ( 1 − x ) n − s \\+ β prior − 1 ∫ 0 1 ( x s \\+ α prior − 1 ( 1 − x ) n − s \\+ β prior − 1 ) d x \\= x s \\+ α prior − 1 ( 1 − x ) n − s \\+ β prior − 1 B ( s \\+ α prior , n − s \\+ β prior ) . {\\\\displaystyle {\\\\begin{aligned}&{\\\\text{posterior probability density}}(x=p\\\\mid s,n-s)\\\\\\\\\\[6pt\\]={}&{\\\\frac {\\\\operatorname {priorprobabilitydensity} (x=p;\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} ){\\\\mathcal {L}}(s,f\\\\mid x=p)}{\\\\int \\_{0}^{1}{\\\\text{prior probability density}}(x=p;\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} ){\\\\mathcal {L}}(s,f\\\\mid x=p)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {{n \\\\choose s}x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\/\\\\mathrm {B} (\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} )}{\\\\int \\_{0}^{1}\\\\left({n \\\\choose s}x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\/\\\\mathrm {B} (\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} )\\\\right)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}}{\\\\int \\_{0}^{1}\\\\left(x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\\\\right)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}}{\\\\mathrm {B} (s+\\\\alpha \\\\operatorname {prior} ,n-s+\\\\beta \\\\operatorname {prior} )}}.\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}&{\\\\text{posterior probability density}}(x=p\\\\mid s,n-s)\\\\\\\\\\[6pt\\]={}&{\\\\frac {\\\\operatorname {priorprobabilitydensity} (x=p;\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} ){\\\\mathcal {L}}(s,f\\\\mid x=p)}{\\\\int \\_{0}^{1}{\\\\text{prior probability density}}(x=p;\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} ){\\\\mathcal {L}}(s,f\\\\mid x=p)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {{n \\\\choose s}x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\/\\\\mathrm {B} (\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} )}{\\\\int \\_{0}^{1}\\\\left({n \\\\choose s}x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\/\\\\mathrm {B} (\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} )\\\\right)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}}{\\\\int \\_{0}^{1}\\\\left(x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\\\\right)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}}{\\\\mathrm {B} (s+\\\\alpha \\\\operatorname {prior} ,n-s+\\\\beta \\\\operatorname {prior} )}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/569ad0317cf545ff98538c8a845f216120e87c08)\n\nThe [binomial coefficient](https:\/\/en.wikipedia.org\/wiki\/Binomial_coefficient \"Binomial coefficient\")\n\n( s \\+ f s ) \\= ( n s ) \\= ( s \\+ f ) \\! s \\! f \\! \\= n \\! s \\! ( n − s ) \\! {\\\\displaystyle {s+f \\\\choose s}={n \\\\choose s}={\\\\frac {(s+f)!}{s!f!}}={\\\\frac {n!}{s!(n-s)!}}} ![{\\\\displaystyle {s+f \\\\choose s}={n \\\\choose s}={\\\\frac {(s+f)!}{s!f!}}={\\\\frac {n!}{s!(n-s)!}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bb56b3dc1365cfbe97c5e0f873d573c01086980)\n\nappears both in the numerator and the denominator of the posterior probability, and it does not depend on the integration variable *x*, hence it cancels out, and it is irrelevant to the final result. Similarly the normalizing factor for the prior probability, the beta function B(αPrior,βPrior) cancels out and it is immaterial to the final result. The same posterior probability result can be obtained if one uses an un-normalized prior\n\nx α prior − 1 ( 1 − x ) β prior − 1 {\\\\displaystyle x^{\\\\alpha \\\\operatorname {prior} -1}(1-x)^{\\\\beta \\\\operatorname {prior} -1}} ![{\\\\displaystyle x^{\\\\alpha \\\\operatorname {prior} -1}(1-x)^{\\\\beta \\\\operatorname {prior} -1}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/584f3f9c85036812a61ff48df8d78592be8af911)\n\nbecause the normalizing factors all cancel out. Several authors (including Jeffreys himself) thus use an un-normalized prior formula since the normalization constant cancels out. The numerator of the posterior probability ends up being just the (un-normalized) product of the prior probability and the likelihood function, and the denominator is its integral from zero to one. The beta function in the denominator, B(*s* + *α* Prior, *n* − *s* + *β* Prior), appears as a normalization constant to ensure that the total posterior probability integrates to unity.\n\nThe ratio *s*\/*n* of the number of successes to the total number of trials is a [sufficient statistic](https:\/\/en.wikipedia.org\/wiki\/Sufficient_statistic \"Sufficient statistic\") in the binomial case, which is relevant for the following results.\n\nFor the **Bayes'** prior probability (Beta(1,1)), the posterior probability is:\n\nposteriorprobability ⁡ ( p \\= x ∣ s , f ) \\= x s ( 1 − x ) n − s B ( s \\+ 1 , n − s \\+ 1 ) , with mean \\= s \\+ 1 n \\+ 2 , (and mode \\= s n if 0 \\< s \\< n ) . {\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={\\\\frac {x^{s}(1-x)^{n-s}}{\\\\mathrm {B} (s+1,n-s+1)}},{\\\\text{ with mean }}={\\\\frac {s+1}{n+2}},{\\\\text{ (and mode}}={\\\\frac {s}{n}}{\\\\text{ if }}0\\<s\\<n).} ![{\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={\\\\frac {x^{s}(1-x)^{n-s}}{\\\\mathrm {B} (s+1,n-s+1)}},{\\\\text{ with mean }}={\\\\frac {s+1}{n+2}},{\\\\text{ (and mode}}={\\\\frac {s}{n}}{\\\\text{ if }}0\\<s\\<n).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/673b752f999f342d723a5283c10bf9e701cd6350)\n\nFor the **Jeffreys'** prior probability (Beta(1\/2,1\/2)), the posterior probability is:\n\nposteriorprobability ⁡ ( p \\= x ∣ s , f ) \\= x s − 1 2 ( 1 − x ) n − s − 1 2 B ( s \\+ 1 2 , n − s \\+ 1 2 ) , with mean \\= s \\+ 1 2 n \\+ 1 , (and mode \\= s − 1 2 n − 1 if 1 2 \\< s \\< n − 1 2 ) . {\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={x^{s-{\\\\tfrac {1}{2}}}(1-x)^{n-s-{\\\\frac {1}{2}}} \\\\over \\\\mathrm {B} (s+{\\\\tfrac {1}{2}},n-s+{\\\\tfrac {1}{2}})},{\\\\text{ with mean}}={\\\\frac {s+{\\\\tfrac {1}{2}}}{n+1}},{\\\\text{ (and mode}}={\\\\frac {s-{\\\\tfrac {1}{2}}}{n-1}}{\\\\text{ if }}{\\\\tfrac {1}{2}}\\<s\\<n-{\\\\tfrac {1}{2}}).} ![{\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={x^{s-{\\\\tfrac {1}{2}}}(1-x)^{n-s-{\\\\frac {1}{2}}} \\\\over \\\\mathrm {B} (s+{\\\\tfrac {1}{2}},n-s+{\\\\tfrac {1}{2}})},{\\\\text{ with mean}}={\\\\frac {s+{\\\\tfrac {1}{2}}}{n+1}},{\\\\text{ (and mode}}={\\\\frac {s-{\\\\tfrac {1}{2}}}{n-1}}{\\\\text{ if }}{\\\\tfrac {1}{2}}\\<s\\<n-{\\\\tfrac {1}{2}}).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc61802878e4c0ab0fbe64a6b5d9cb1c4fce80a5)\n\nand for the **Haldane** prior probability (Beta(0,0)), the posterior probability is:\n\nposteriorprobability ⁡ ( p \\= x ∣ s , f ) \\= x s − 1 ( 1 − x ) n − s − 1 B ( s , n − s ) , with mean \\= s n , (and mode \\= s − 1 n − 2 if 1 \\< s \\< n − 1 ) . {\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={\\\\frac {x^{s-1}(1-x)^{n-s-1}}{\\\\mathrm {B} (s,n-s)}},{\\\\text{ with mean}}={\\\\frac {s}{n}},{\\\\text{ (and mode}}={\\\\frac {s-1}{n-2}}{\\\\text{ if }}1\\<s\\<n-1).} ![{\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={\\\\frac {x^{s-1}(1-x)^{n-s-1}}{\\\\mathrm {B} (s,n-s)}},{\\\\text{ with mean}}={\\\\frac {s}{n}},{\\\\text{ (and mode}}={\\\\frac {s-1}{n-2}}{\\\\text{ if }}1\\<s\\<n-1).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/79e9ef205b4458b9b10d3b81c34cf848e9598de4)\n\nFrom the above expressions it follows that for *s*\/*n* = 1\/2) all the above three prior probabilities result in the identical location for the posterior probability mean = mode = 1\/2. For *s*\/*n* \\< 1\/2, the mean of the posterior probabilities, using the following priors, are such that: mean for Bayes prior \\> mean for Jeffreys prior \\> mean for Haldane prior. For *s*\/*n* \\> 1\/2 the order of these inequalities is reversed such that the Haldane prior probability results in the largest posterior mean. The *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the \"next\" trial) identical to the ratio *s*\/*n* of the number of successes to the total number of trials. Therefore, the Haldane prior results in a posterior probability with expected value in the next trial equal to the maximum likelihood. The *Bayes* prior probability Beta(1,1) results in a posterior probability density with *mode* identical to the ratio *s*\/*n* (the maximum likelihood).\n\nIn the case that 100% of the trials have been successful *s* = *n*, the *Bayes* prior probability Beta(1,1) results in a posterior expected value equal to the rule of succession (*n* + 1)\/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of 1 (absolute certainty of success in the next trial). Jeffreys prior probability results in a posterior expected value equal to (*n* + 1\/2)\/(*n* + 1). Perks[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) (p. 303) points out: \"This provides a new rule of succession and expresses a 'reasonable' position to take up, namely, that after an unbroken run of n successes we assume a probability for the next trial equivalent to the assumption that we are about half-way through an average run, i.e. that we expect a failure once in (2*n* + 2) trials. The Bayes–Laplace rule implies that we are about at the end of an average run or that we expect a failure once in (*n* + 2) trials. The comparison clearly favours the new result (what is now called Jeffreys prior) from the point of view of 'reasonableness'.\"\n\nConversely, in the case that 100% of the trials have resulted in failure (*s* = 0), the *Bayes* prior probability Beta(1,1) results in a posterior expected value for success in the next trial equal to 1\/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of success in the next trial of 0 (absolute certainty of failure in the next trial). Jeffreys prior probability results in a posterior expected value for success in the next trial equal to (1\/2)\/(*n* + 1), which Perks[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) (p. 303) points out: \"is a much more reasonably remote result than the Bayes–Laplace result 1\/(*n* + 2)\".\n\nJaynes[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) questions (for the Haldane prior Beta(0,0)) the use of these formulas for the cases *s* = 0 or *s* = *n* because the integrals do not converge (Beta(0,0) is an improper prior for *s* = 0 or *s* = *n*). In practice, the conditions 0\\<s\\<n necessary for a mode to exist between both ends for the Bayes prior are usually met, and therefore the Bayes prior (as long as 0 \\< *s* \\< *n*) results in a posterior mode located between both ends of the domain.\n\nAs remarked in the section on the rule of succession, K. Pearson showed that after *n* successes in *n* trials the posterior probability (based on the Bayes Beta(1,1) distribution as the prior probability) that the next (*n* + 1) trials will all be successes is exactly 1\/2, whatever the value of *n*. Based on the Haldane Beta(0,0) distribution as the prior probability, this posterior probability is 1 (absolute certainty that after n successes in *n* trials the next (*n* + 1) trials will all be successes). Perks[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) (p. 303) shows that, for what is now known as the Jeffreys prior, this probability is ((*n* + 1\/2)\/(*n* + 1))((*n* + 3\/2)\/(*n* + 2))...(2*n* + 1\/2)\/(2*n* + 1), which for *n* = 1, 2, 3 gives 15\/24, 315\/480, 9009\/13440; rapidly approaching a limiting value of 1 \/ 2 \\= 0\\.70710678 … {\\\\displaystyle 1\/{\\\\sqrt {2}}=0.70710678\\\\ldots } ![{\\\\displaystyle 1\/{\\\\sqrt {2}}=0.70710678\\\\ldots }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8d8b88897e116bacec037ef4ef4711a4ce0305bd) as n tends to infinity. Perks remarks that what is now known as the Jeffreys prior: \"is clearly more 'reasonable' than either the Bayes–Laplace result or the result on the (Haldane) alternative rule rejected by Jeffreys which gives certainty as the probability. It clearly provides a very much better correspondence with the process of induction. Whether it is 'absolutely' reasonable for the purpose, i.e. whether it is yet large enough, without the absurdity of reaching unity, is a matter for others to decide. But it must be realized that the result depends on the assumption of complete indifference and absence of knowledge prior to the sampling experiment.\"\n\nFollowing are the variances of the posterior distribution obtained with these three prior probability distributions:\n\nfor the **Bayes'** prior probability (Beta(1,1)), the posterior variance is:\n\nvariance \\= ( n − s \\+ 1 ) ( s \\+ 1 ) ( 3 \\+ n ) ( 2 \\+ n ) 2 , which for s \\= n 2 results in variance \\= 1 12 \\+ 4 n {\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s+1)(s+1)}{(3+n)(2+n)^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in variance}}={\\\\frac {1}{12+4n}}} ![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s+1)(s+1)}{(3+n)(2+n)^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in variance}}={\\\\frac {1}{12+4n}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f8b0a181eec5d97c53b6a05b2f88aa63372e0cb2)\n\nfor the **Jeffreys'** prior probability (Beta(1\/2,1\/2)), the posterior variance is:\n\nvariance \\= ( n − s \\+ 1 2 ) ( s \\+ 1 2 ) ( 2 \\+ n ) ( 1 \\+ n ) 2 , which for s \\= n 2 results in var \\= 1 8 \\+ 4 n {\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s+{\\\\frac {1}{2}})(s+{\\\\frac {1}{2}})}{(2+n)(1+n)^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in var}}={\\\\frac {1}{8+4n}}} ![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s+{\\\\frac {1}{2}})(s+{\\\\frac {1}{2}})}{(2+n)(1+n)^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in var}}={\\\\frac {1}{8+4n}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8e259bf78be07a741013e0a7b30627f49bde4437)\n\nand for the **Haldane** prior probability (Beta(0,0)), the posterior variance is:\n\nvariance \\= ( n − s ) s ( 1 \\+ n ) n 2 , which for s \\= n 2 results in variance \\= 1 4 \\+ 4 n {\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s)s}{(1+n)n^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in variance}}={\\\\frac {1}{4+4n}}} ![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s)s}{(1+n)n^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in variance}}={\\\\frac {1}{4+4n}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/61213e281fd4a510a97341e147dee5077bb04a0f)\n\nSo, as remarked by Silvey,[\\[50\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Silvey-50) for large *n*, the variance is small and hence the posterior distribution is highly concentrated, whereas the assumed prior distribution was very diffuse. This is in accord with what one would hope for, as vague prior knowledge is transformed (through Bayes' theorem) into a more precise posterior knowledge by an informative experiment. For small *n* the Haldane Beta(0,0) prior results in the largest posterior variance while the Bayes Beta(1,1) prior results in the more concentrated posterior. Jeffreys prior Beta(1\/2,1\/2) results in a posterior variance in between the other two. As *n* increases, the variance rapidly decreases so that the posterior variance for all three priors converges to approximately the same value (approaching zero variance as *n* → ∞). Recalling the previous result that the *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the \"next\" trial) identical to the ratio s\/n of the number of successes to the total number of trials, it follows from the above expression that also the *Haldane* prior Beta(0,0) results in a posterior with *variance* identical to the variance expressed in terms of the max. likelihood estimate s\/n and sample size (in [§ Variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Variance)):\n\nvariance \\= μ ( 1 − μ ) 1 \\+ ν \\= ( n − s ) s ( 1 \\+ n ) n 2 {\\\\displaystyle {\\\\text{variance}}={\\\\frac {\\\\mu (1-\\\\mu )}{1+\\\\nu }}={\\\\frac {(n-s)s}{(1+n)n^{2}}}} ![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {\\\\mu (1-\\\\mu )}{1+\\\\nu }}={\\\\frac {(n-s)s}{(1+n)n^{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad6c410c1c6a0ed633fa773ddcc91a55eaa68fdc)\n\nwith the mean *μ* = *s*\/*n* and the sample size *ν* = *n*.\n\nIn Bayesian inference, using a [prior distribution](https:\/\/en.wikipedia.org\/wiki\/Prior_distribution \"Prior distribution\") Beta(*α*Prior,*β*Prior) prior to a binomial distribution is equivalent to adding (*α*Prior − 1) pseudo-observations of \"success\" and (*β*Prior − 1) pseudo-observations of \"failure\" to the actual number of successes and failures observed, then estimating the parameter *p* of the binomial distribution by the proportion of successes over both real- and pseudo-observations. A uniform prior Beta(1,1) does not add (or subtract) any pseudo-observations since for Beta(1,1) it follows that (*α*Prior − 1) = 0 and (*β*Prior − 1) = 0. The Haldane prior Beta(0,0) subtracts one pseudo observation from each and Jeffreys prior Beta(1\/2,1\/2) subtracts 1\/2 pseudo-observation of success and an equal number of failure. This subtraction has the effect of [smoothing](https:\/\/en.wikipedia.org\/wiki\/Smoothing \"Smoothing\") out the posterior distribution. If the proportion of successes is not 50% (*s*\/*n* ≠ 1\/2) values of *α*Prior and *β*Prior less than 1 (and therefore negative (*α*Prior − 1) and (*β*Prior − 1)) favor sparsity, i.e. distributions where the parameter *p* is closer to either 0 or 1. In effect, values of *α*Prior and *β*Prior between 0 and 1, when operating together, function as a [concentration parameter](https:\/\/en.wikipedia.org\/wiki\/Concentration_parameter \"Concentration parameter\").\n\nThe accompanying plots show the posterior probability density functions for sample sizes *n* ∈ {3,10,50}, successes *s* ∈ {*n*\/2,*n*\/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1\/2,1\/2),Beta(1,1)}. Also shown are the cases for *n* = {4,12,40}, success *s* = {*n*\/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1\/2,1\/2),Beta(1,1)}. The first plot shows the symmetric cases, for successes *s* ∈ {n\/2}, with mean = mode = 1\/2 and the second plot shows the skewed cases *s* ∈ {*n*\/4}. The images show that there is little difference between the priors for the posterior with sample size of 50 (characterized by a more pronounced peak near *p* = 1\/2). Significant differences appear for very small sample sizes (in particular for the flatter distribution for the degenerate case of sample size = 3). Therefore, the skewed cases, with successes *s* = {*n*\/4}, show a larger effect from the choice of prior, at small sample size, than the symmetric cases. For symmetric distributions, the Bayes prior Beta(1,1) results in the most \"peaky\" and highest posterior distributions and the Haldane prior Beta(0,0) results in the flattest and lowest peak distribution. The Jeffreys prior Beta(1\/2,1\/2) lies in between them. For nearly symmetric, not too skewed distributions the effect of the priors is similar. For very small sample size (in this case for a sample size of 3) and skewed distribution (in this example for *s* ∈ {*n*\/4}) the Haldane prior can result in a reverse-J-shaped distribution with a singularity at the left end. However, this happens only in degenerate cases (in this example *n* = 3 and hence *s* = 3\/4 \\< 1, a degenerate value because s should be greater than unity in order for the posterior of the Haldane prior to have a mode located between the ends, and because *s* = 3\/4 is not an integer number, hence it violates the initial assumption of a binomial distribution for the likelihood) and it is not an issue in generic cases of reasonable sample size (such that the condition 1 \\< *s* \\< *n* − 1, necessary for a mode to exist between both ends, is fulfilled).\n\nIn Chapter 12 (p. 385) of his book, Jaynes[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) asserts that the *Haldane prior* Beta(0,0) describes a *prior state of knowledge of complete ignorance*, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure, while the *Bayes (uniform) prior Beta(1,1) applies if* one knows that *both binary outcomes are possible*. Jaynes states: \"*interpret the Bayes–Laplace (Beta(1,1)) prior as describing not a state of complete ignorance*, but the state of knowledge in which we have observed one success and one failure...once we have seen at least one success and one failure, then we know that the experiment is a true binary one, in the sense of physical possibility.\" Jaynes [\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) does not specifically discuss Jeffreys prior Beta(1\/2,1\/2) (Jaynes discussion of \"Jeffreys prior\" on pp. 181, 423 and on chapter 12 of Jaynes book[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) refers instead to the improper, un-normalized, prior \"1\/*p* *dp*\" introduced by Jeffreys in the 1939 edition of his book,[\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59) seven years before he introduced what is now known as Jeffreys' invariant prior: the square root of the determinant of Fisher's information matrix. *\"1\/p\" is Jeffreys' (1946) invariant prior for the [exponential distribution](https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution \"Exponential distribution\"), not for the Bernoulli or binomial distributions*). However, it follows from the above discussion that Jeffreys Beta(1\/2,1\/2) prior represents a state of knowledge in between the Haldane Beta(0,0) and Bayes Beta (1,1) prior.\n\nSimilarly, [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") in his 1892 book [The Grammar of Science](https:\/\/en.wikipedia.org\/wiki\/The_Grammar_of_Science \"The Grammar of Science\")[\\[68\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-PearsonGrammar-68)[\\[69\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-PearsnGrammar2009-69) (p. 144 of 1900 edition) maintained that the Bayes (Beta(1,1) uniform prior was not a complete ignorance prior, and that it should be used when prior information justified to \"distribute our ignorance equally\"\". K. Pearson wrote: \"Yet the only supposition that we appear to have made is this: that, knowing nothing of nature, routine and anomy (from the Greek ανομία, namely: a- \"without\", and nomos \"law\") are to be considered as equally likely to occur. Now we were not really justified in making even this assumption, for it involves a knowledge that we do not possess regarding nature. We use our *experience* of the constitution and action of coins in general to assert that heads and tails are equally probable, but we have no right to assert before experience that, as we know nothing of nature, routine and breach are equally probable. In our ignorance we ought to consider before experience that nature may consist of all routines, all anomies (normlessness), or a mixture of the two in any proportion whatever, and that all such are equally probable. Which of these constitutions after experience is the most probable must clearly depend on what that experience has been like.\"\n\nIf there is sufficient [sampling data](https:\/\/en.wikipedia.org\/wiki\/Sample_\\(statistics\\) \"Sample (statistics)\"), *and the posterior probability mode is not located at one of the extremes of the domain* (*x* = 0 or *x* = 1), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1\/2,1\/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https:\/\/en.wikipedia.org\/wiki\/Posterior_probability \"Posterior probability\") densities. Otherwise, as Gelman et al.[\\[70\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Gelman-70) (p. 65) point out, \"if so few data are available that the choice of noninformative prior distribution makes a difference, one should put relevant information into the prior distribution\", or as Berger[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BergerDecisionTheory-4) (p. 125) points out \"when different reasonable priors yield substantially different answers, can it be right to state that there *is* a single answer? Would it not be better to admit that there is scientific uncertainty, with the conclusion depending on prior beliefs?.\"\n\n## Occurrence and applications\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=66 \"Edit section: Occurrence and applications\")\\]\n\n### Order statistics\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=67 \"Edit section: Order statistics\")\\]\n\nMain article: [Order statistic](https:\/\/en.wikipedia.org\/wiki\/Order_statistic \"Order statistic\")\n\nThe beta distribution has an important application in the theory of [order statistics](https:\/\/en.wikipedia.org\/wiki\/Order_statistic \"Order statistic\"). A basic result is that the distribution of the *k*th smallest of a sample of size *n* from a continuous [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") has a beta distribution.[\\[40\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David1-40) This result is summarized as\n\nU ( k ) ∼ Beta ⁡ ( k , n \\+ 1 − k ) . {\\\\displaystyle U\\_{(k)}\\\\sim \\\\operatorname {Beta} (k,n+1-k).} ![{\\\\displaystyle U\\_{(k)}\\\\sim \\\\operatorname {Beta} (k,n+1-k).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/28bbb16c265c1830b57c4bb5eb375ef984b360f2)\n\nFrom this, and application of the theory related to the [probability integral transform](https:\/\/en.wikipedia.org\/wiki\/Probability_integral_transform \"Probability integral transform\"), the distribution of any individual order statistic from any [continuous distribution](https:\/\/en.wikipedia.org\/wiki\/Continuous_distribution \"Continuous distribution\") can be derived.[\\[40\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David1-40)\n\n### Subjective logic\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=68 \"Edit section: Subjective logic\")\\]\n\nMain article: [Subjective logic](https:\/\/en.wikipedia.org\/wiki\/Subjective_logic \"Subjective logic\")\n\nIn standard logic, propositions are considered to be either true or false. In contradistinction, [subjective logic](https:\/\/en.wikipedia.org\/wiki\/Subjective_logic \"Subjective logic\") assumes that humans cannot determine with absolute certainty whether a proposition about the real world is absolutely true or false. In [subjective logic](https:\/\/en.wikipedia.org\/wiki\/Subjective_logic \"Subjective logic\") the [posteriori](https:\/\/en.wikipedia.org\/wiki\/A_posteriori \"A posteriori\") probability estimates of binary events can be represented by beta distributions.[\\[71\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-J01-71)\n\n### Wavelet analysis\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=69 \"Edit section: Wavelet analysis\")\\]\n\nMain article: [Beta wavelet](https:\/\/en.wikipedia.org\/wiki\/Beta_wavelet \"Beta wavelet\")\n\nA [wavelet](https:\/\/en.wikipedia.org\/wiki\/Wavelet \"Wavelet\") is a wave-like [oscillation](https:\/\/en.wikipedia.org\/wiki\/Oscillation \"Oscillation\") with an [amplitude](https:\/\/en.wikipedia.org\/wiki\/Amplitude \"Amplitude\") that starts out at zero, increases, and then decreases back to zero. It can typically be visualized as a \"brief oscillation\" that promptly decays. Wavelets can be used to extract information from many different kinds of data, including – but certainly not limited to – audio signals and images. Thus, wavelets are purposefully crafted to have specific properties that make them useful for [signal processing](https:\/\/en.wikipedia.org\/wiki\/Signal_processing \"Signal processing\"). Wavelets are localized in both time and [frequency](https:\/\/en.wikipedia.org\/wiki\/Frequency \"Frequency\") whereas the standard [Fourier transform](https:\/\/en.wikipedia.org\/wiki\/Fourier_transform \"Fourier transform\") is only localized in frequency. Therefore, standard Fourier Transforms are only applicable to [stationary processes](https:\/\/en.wikipedia.org\/wiki\/Stationary_process \"Stationary process\"), while [wavelets](https:\/\/en.wikipedia.org\/wiki\/Wavelet \"Wavelet\") are applicable to non-[stationary processes](https:\/\/en.wikipedia.org\/wiki\/Stationary_process \"Stationary process\"). Continuous wavelets can be constructed based on the beta distribution. [Beta wavelets](https:\/\/en.wikipedia.org\/wiki\/Beta_wavelet \"Beta wavelet\")[\\[72\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-wavelet_oliveira-72) can be viewed as a soft variety of [Haar wavelets](https:\/\/en.wikipedia.org\/wiki\/Haar_wavelet \"Haar wavelet\") whose shape is fine-tuned by two shape parameters α and β.\n\n### Population genetics\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=70 \"Edit section: Population genetics\")\\]\n\nMain article: [Balding–Nichols model](https:\/\/en.wikipedia.org\/wiki\/Balding%E2%80%93Nichols_model \"Balding–Nichols model\")\n\nFurther information: [F-statistics](https:\/\/en.wikipedia.org\/wiki\/F-statistics \"F-statistics\"), [Fixation index](https:\/\/en.wikipedia.org\/wiki\/Fixation_index \"Fixation index\"), and [Coefficient of relationship](https:\/\/en.wikipedia.org\/wiki\/Coefficient_of_relationship \"Coefficient of relationship\")\n\nThe [Balding–Nichols model](https:\/\/en.wikipedia.org\/wiki\/Balding%E2%80%93Nichols_model \"Balding–Nichols model\") is a two-parameter [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") of the beta distribution used in [population genetics](https:\/\/en.wikipedia.org\/wiki\/Population_genetics \"Population genetics\").[\\[73\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Balding-73) It is a statistical description of the [allele frequencies](https:\/\/en.wikipedia.org\/wiki\/Allele_frequencies \"Allele frequencies\") in the components of a sub-divided population:\n\nα \\= μ ν , β \\= ( 1 − μ ) ν , {\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/49c1e37dd960ad3b68ef0eaf2ab5fafd4b2209c8) where ν \\= α \\+ β \\= 1 − F F {\\\\displaystyle \\\\nu =\\\\alpha +\\\\beta ={\\\\frac {1-F}{F}}} ![{\\\\displaystyle \\\\nu =\\\\alpha +\\\\beta ={\\\\frac {1-F}{F}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/59b55447a019bc1bc8f690546bbb416e0dbece32) and 0 \\< F \\< 1 {\\\\displaystyle 0\\<F\\<1} ![{\\\\displaystyle 0\\<F\\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0f7e5ad065befe4700eeb8b5a154a1c558b44373); here *F* is (Wright's) genetic distance between two populations.\n\n### Project management: task cost and schedule modeling\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=71 \"Edit section: Project management: task cost and schedule modeling\")\\]\n\nThe beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\") — is used extensively in [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\"), [critical path method](https:\/\/en.wikipedia.org\/wiki\/Critical_path_method \"Critical path method\") (CPM), Joint Cost Schedule Modeling (JCSM) and other [project management](https:\/\/en.wikipedia.org\/wiki\/Project_management \"Project management\")\/control systems to describe the time to completion and the cost of a task. In project management, shorthand computations are widely used to estimate the [mean](https:\/\/en.wikipedia.org\/wiki\/Mean \"Mean\") and [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\") of the beta distribution:[\\[39\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Malcolm-39)\n\nμ ( X ) \\= a \\+ 4 b \\+ c 6 σ ( X ) \\= c − a 6 {\\\\displaystyle {\\\\begin{aligned}\\\\mu (X)&={\\\\frac {a+4b+c}{6}}\\\\\\\\\\[8pt\\]\\\\sigma (X)&={\\\\frac {c-a}{6}}\\\\end{aligned}}} ![{\\\\displaystyle {\\\\begin{aligned}\\\\mu (X)&={\\\\frac {a+4b+c}{6}}\\\\\\\\\\[8pt\\]\\\\sigma (X)&={\\\\frac {c-a}{6}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7a89a68d1250ebe659be15e88edb5a9eb3e0cf87)\n\nwhere *a* is the minimum, *c* is the maximum, and *b* is the most likely value (the [mode](https:\/\/en.wikipedia.org\/wiki\/Mode_\\(statistics\\) \"Mode (statistics)\") for *α* \\> 1 and *β* \\> 1).\n\nThe above estimate for the [mean](https:\/\/en.wikipedia.org\/wiki\/Mean \"Mean\") μ ( X ) \\= a \\+ 4 b \\+ c 6 {\\\\displaystyle \\\\mu (X)={\\\\frac {a+4b+c}{6}}} ![{\\\\displaystyle \\\\mu (X)={\\\\frac {a+4b+c}{6}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/55659b9a1a4f5b15000858659deca16e38dc01fe) is known as the [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\") [three-point estimation](https:\/\/en.wikipedia.org\/wiki\/Three-point_estimation \"Three-point estimation\") and it is exact for either of the following values of *β* (for arbitrary α within these ranges):\n\n*β* = *α* \\> 1 (symmetric case) with [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\")\n\nσ\n\n(\n\nX\n\n)\n\n\\=\n\nc\n\n−\n\na\n\n2\n\n1\n\n\\+\n\n2\n\nα\n\n{\\\\displaystyle \\\\sigma (X)={\\\\frac {c-a}{2{\\\\sqrt {1+2\\\\alpha }}}}}\n\n![{\\\\displaystyle \\\\sigma (X)={\\\\frac {c-a}{2{\\\\sqrt {1+2\\\\alpha }}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/023fd07dfd3669cde84d9cff72b6a6af3d8ffbab)\n\n, [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") = 0, and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") =\n\n−\n\n6\n\n3\n\n\\+\n\n2\n\nα\n\n{\\\\displaystyle {\\\\frac {-6}{3+2\\\\alpha }}}\n\n![{\\\\displaystyle {\\\\frac {-6}{3+2\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/85f97b76a4804eeb490db5708337a448815594f5)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/0b\/Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg\/250px-Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg)\n\nor\n\n*β* = 6 − *α* for 5 \\> *α* \\> 1 (skewed case) with [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\")\n\nσ ( X ) \\= ( c − a ) α ( 6 − α ) 6 7 , {\\\\displaystyle \\\\sigma (X)={\\\\frac {(c-a){\\\\sqrt {\\\\alpha (6-\\\\alpha )}}}{6{\\\\sqrt {7}}}},} ![{\\\\displaystyle \\\\sigma (X)={\\\\frac {(c-a){\\\\sqrt {\\\\alpha (6-\\\\alpha )}}}{6{\\\\sqrt {7}}}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3feac4fe845f042c246d8822d848826f95ac9a8d)\n\n[skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\")\\= ( 3 − α ) 7 2 α ( 6 − α ) {\\\\displaystyle {}={\\\\frac {(3-\\\\alpha ){\\\\sqrt {7}}}{2{\\\\sqrt {\\\\alpha (6-\\\\alpha )}}}}} ![{\\\\displaystyle {}={\\\\frac {(3-\\\\alpha ){\\\\sqrt {7}}}{2{\\\\sqrt {\\\\alpha (6-\\\\alpha )}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2dbd8720e8cbd9cb4b15fc3118e52d63c1d50e32), and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\")\\= 21 α ( 6 − α ) − 3 {\\\\displaystyle {}={\\\\frac {21}{\\\\alpha (6-\\\\alpha )}}-3} ![{\\\\displaystyle {}={\\\\frac {21}{\\\\alpha (6-\\\\alpha )}}-3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/246152d7c35d8e6cd42c1f13aca7419321bf3e7c)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/b9\/Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg\/250px-Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg)\n\nThe above estimate for the [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\") *σ*(*X*) = (*c* − *a*)\/6 is exact for either of the following values of *α* and *β*:\n\n*α* = *β* = 4 (symmetric) with [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") = 0, and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") = −6\/11.\n\n*β* = 6 − *α* and\n\nα\n\n\\=\n\n3\n\n−\n\n2\n\n{\\\\displaystyle \\\\alpha =3-{\\\\sqrt {2}}}\n\n![{\\\\displaystyle \\\\alpha =3-{\\\\sqrt {2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/92a7b6780ceaf0f8749f685b72f15717f3eb5495)\n\n(right-tailed, positive skew) with [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\")\n\n\\=\n\n1\n\n2\n\n{\\\\displaystyle {}={\\\\frac {1}{\\\\sqrt {2}}}}\n\n![{\\\\displaystyle {}={\\\\frac {1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/819e2bc3e97fb50bb05cf64ab50f15ecfc960dd4)\n\n, and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") = 0\n\n*β* = 6 − *α* and\n\nα\n\n\\=\n\n3\n\n\\+\n\n2\n\n{\\\\displaystyle \\\\alpha =3+{\\\\sqrt {2}}}\n\n![{\\\\displaystyle \\\\alpha =3+{\\\\sqrt {2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/208c420c6769274760ca91cb89d5ed807c49688e)\n\n(left-tailed, negative skew) with [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\")\n\n\\=\n\n−\n\n1\n\n2\n\n{\\\\displaystyle {}={\\\\frac {-1}{\\\\sqrt {2}}}}\n\n![{\\\\displaystyle {}={\\\\frac {-1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3bda0dcb0efa535b7484dbc63376664e24b20cdb)\n\n, and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") = 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/fc\/Beta_Distribution_for_conjugate_alpha_beta.svg\/250px-Beta_Distribution_for_conjugate_alpha_beta.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_Distribution_for_conjugate_alpha_beta.svg)\n\nOtherwise, these can be poor approximations for beta distributions with other values of α and β, exhibiting average errors of 40% in the mean and 549% in the variance.[\\[74\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-74)[\\[75\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-75)[\\[76\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-76)\n\n## Random variate generation\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=72 \"Edit section: Random variate generation\")\\]\n\nFurther information: [Non-uniform random variate generation](https:\/\/en.wikipedia.org\/wiki\/Non-uniform_random_variate_generation \"Non-uniform random variate generation\")\n\nIf *X* and *Y* are independent, with X ∼ Γ ( α , θ ) {\\\\displaystyle X\\\\sim \\\\Gamma (\\\\alpha ,\\\\theta )} ![{\\\\displaystyle X\\\\sim \\\\Gamma (\\\\alpha ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3cc24e37f0c7d4fb01955a76c7a624840eb4ccb0) and Y ∼ Γ ( β , θ ) {\\\\displaystyle Y\\\\sim \\\\Gamma (\\\\beta ,\\\\theta )} ![{\\\\displaystyle Y\\\\sim \\\\Gamma (\\\\beta ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/47a84f49e72d95c70ecdb7f3dcfc9dff7df0bd79) then\n\nX X \\+ Y ∼ B ( α , β ) . {\\\\displaystyle {\\\\frac {X}{X+Y}}\\\\sim \\\\mathrm {B} (\\\\alpha ,\\\\beta ).} ![{\\\\displaystyle {\\\\frac {X}{X+Y}}\\\\sim \\\\mathrm {B} (\\\\alpha ,\\\\beta ).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bd2b6dea25aeefae71af31b8751b855d8848543)\n\nSo one algorithm for generating beta variates is to generate X X \\+ Y {\\\\displaystyle {\\\\frac {X}{X+Y}}} ![{\\\\displaystyle {\\\\frac {X}{X+Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42bc8b3f6470f35915b2708902dab39673407a1e), where *X* is a [gamma variate](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution#Random_variate_generation \"Gamma distribution\") with parameters (α, 1) and *Y* is an independent gamma variate with parameters (β, 1).[\\[77\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-77) In fact, here X X \\+ Y {\\\\displaystyle {\\\\frac {X}{X+Y}}} ![{\\\\displaystyle {\\\\frac {X}{X+Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42bc8b3f6470f35915b2708902dab39673407a1e) and X \\+ Y {\\\\displaystyle X+Y} ![{\\\\displaystyle X+Y}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/191744cf9cddeff3ab2e750e22bcfce7766d355e) are independent, and X \\+ Y ∼ Γ ( α \\+ β , θ ) {\\\\displaystyle X+Y\\\\sim \\\\Gamma (\\\\alpha +\\\\beta ,\\\\theta )} ![{\\\\displaystyle X+Y\\\\sim \\\\Gamma (\\\\alpha +\\\\beta ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9d88f9521cace3408c444bb15dea5a4a3c2072d9). If Z ∼ Γ ( γ , θ ) {\\\\displaystyle Z\\\\sim \\\\Gamma (\\\\gamma ,\\\\theta )} ![{\\\\displaystyle Z\\\\sim \\\\Gamma (\\\\gamma ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/997d2e51e143bd9423b61aa7bc76d4caa6366cf2) and Z {\\\\displaystyle Z} ![{\\\\displaystyle Z}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1cc6b75e09a8aa3f04d8584b11db534f88fb56bd) is independent of X {\\\\displaystyle X} ![{\\\\displaystyle X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68baa052181f707c662844a465bfeeb135e82bab) and Y {\\\\displaystyle Y} ![{\\\\displaystyle Y}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/961d67d6b454b4df2301ac571808a3538b3a6d3f), then X \\+ Y X \\+ Y \\+ Z ∼ B ( α \\+ β , γ ) {\\\\displaystyle {\\\\frac {X+Y}{X+Y+Z}}\\\\sim \\\\mathrm {B} (\\\\alpha +\\\\beta ,\\\\gamma )} ![{\\\\displaystyle {\\\\frac {X+Y}{X+Y+Z}}\\\\sim \\\\mathrm {B} (\\\\alpha +\\\\beta ,\\\\gamma )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bd91125d334bb959e8ebf89504c23f7066be3b54) and X \\+ Y X \\+ Y \\+ Z {\\\\displaystyle {\\\\frac {X+Y}{X+Y+Z}}} ![{\\\\displaystyle {\\\\frac {X+Y}{X+Y+Z}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6d5f48f66d0085cac1f622af4b9443c532d8c5eb) is independent of X X \\+ Y {\\\\displaystyle {\\\\frac {X}{X+Y}}} ![{\\\\displaystyle {\\\\frac {X}{X+Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42bc8b3f6470f35915b2708902dab39673407a1e). This shows that the product of independent B ( α , β ) {\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )} ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1fea4d61abd27c28412c65add2f028b57b17fb12) and B ( α \\+ β , γ ) {\\\\displaystyle \\\\mathrm {B} (\\\\alpha +\\\\beta ,\\\\gamma )} ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha +\\\\beta ,\\\\gamma )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2f2504a4ad7a50cab34808b42e35ff1b73e863b9) random variables is a B ( α , β \\+ γ ) {\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta +\\\\gamma )} ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta +\\\\gamma )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8f991b0924c7dcba924e3123313bc19f762b4e79) random variable.\n\nAlso, the *k*th [order statistic](https:\/\/en.wikipedia.org\/wiki\/Order_statistic \"Order statistic\") of *n* [uniformly distributed](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") variates is B ( k , n \\+ 1 − k ) {\\\\displaystyle \\\\mathrm {B} (k,n+1-k)} ![{\\\\displaystyle \\\\mathrm {B} (k,n+1-k)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6f6a88b3a90aa4cd75f5a828258720d1f4bcaa79), so an alternative if *α* and *β* are small integers is to generate α + β − 1 uniform variates and choose the α-th smallest.[\\[40\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David1-40)\n\nAnother way to generate the Beta distribution is by [Pólya urn model](https:\/\/en.wikipedia.org\/wiki\/P%C3%B3lya_urn_model \"Pólya urn model\"). According to this method, one starts with an \"urn\" with α \"black\" balls and β \"white\" balls and draws uniformly with replacement. Every trial an additional ball is added according to the color of the last ball which was drawn. Asymptotically, the proportion of black and white balls will be distributed according to the Beta distribution, where each repetition of the experiment will produce a different value.\n\nIt is also possible to use the [inverse transform sampling](https:\/\/en.wikipedia.org\/wiki\/Inverse_transform_sampling \"Inverse transform sampling\").\n\n## Normal approximation to the Beta distribution\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=73 \"Edit section: Normal approximation to the Beta distribution\")\\]\n\nA beta distribution B ( α , β ) {\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )} ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1fea4d61abd27c28412c65add2f028b57b17fb12) with α ∼ β {\\\\displaystyle \\\\alpha \\\\sim \\\\beta } ![{\\\\displaystyle \\\\alpha \\\\sim \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6c257de6795c5c1704c89812118c26f38d33c81b) and α {\\\\displaystyle \\\\alpha } ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β \\>\\> 1 {\\\\displaystyle \\\\beta \\>\\>1} ![{\\\\displaystyle \\\\beta \\>\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f71342cab85bc922316af9012de28319cc26047d) is approximately normal with mean 1 \/ 2 {\\\\displaystyle 1\/2} ![{\\\\displaystyle 1\/2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e308a3a46b7fdce07cc09dcab9e8d8f73e37d935) and variance 1 \/ ( 4 ( 2 α \\+ 1 ) ) {\\\\displaystyle 1\/(4(2\\\\alpha +1))} ![{\\\\displaystyle 1\/(4(2\\\\alpha +1))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a5f494b7d9160e6d47406af3660f84f19a500d75). If α ≥ β {\\\\displaystyle \\\\alpha \\\\geq \\\\beta } ![{\\\\displaystyle \\\\alpha \\\\geq \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa366317721f774c7361a27f615e297691cf5468) the normal approximation can be improved by taking the cube-root of the logarithm of the reciprocal of B ( α , β ) {\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )} ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1fea4d61abd27c28412c65add2f028b57b17fb12)[\\[78\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-78)[\\[79\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-79)\n\n## History\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=74 \"Edit section: History\")\\]\n\n[Thomas Bayes](https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes \"Thomas Bayes\"), in a posthumous paper [\\[62\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-ThomasBayes-62) published in 1763 by [Richard Price](https:\/\/en.wikipedia.org\/wiki\/Richard_Price \"Richard Price\"), obtained a beta distribution as the density of the probability of success in Bernoulli trials (see [§ Applications, Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Applications,_Bayesian_inference)), but the paper does not analyze any of the moments of the beta distribution or discuss any of its properties.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/21\/Karl_Pearson_2.jpg\/250px-Karl_Pearson_2.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Karl_Pearson_2.jpg)\n\n[Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") analyzed the beta distribution as the solution Type I of Pearson distributions\n\nThe first systematic modern discussion of the beta distribution is probably due to [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\").[\\[80\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-80)[\\[81\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-rscat-81) In Pearson's papers[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21)[\\[33\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1895-33) the beta distribution is couched as a solution of a differential equation: [Pearson's Type I distribution](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\") which it is essentially identical to except for arbitrary shifting and re-scaling (the beta and Pearson Type I distributions can always be equalized by proper choice of parameters). In fact, in several English books and journal articles in the few decades prior to World War II, it was common to refer to the beta distribution as Pearson's Type I distribution. [William P. Elderton](https:\/\/en.wikipedia.org\/wiki\/William_Palin_Elderton \"William Palin Elderton\") in his 1906 monograph \"Frequency curves and correlation\"[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) further analyzes the beta distribution as Pearson's Type I distribution, including a full discussion of the method of moments for the four parameter case, and diagrams of (what Elderton describes as) U-shaped, J-shaped, twisted J-shaped, \"cocked-hat\" shapes, horizontal and angled straight-line cases. Elderton wrote \"I am chiefly indebted to Professor Pearson, but the indebtedness is of a kind for which it is impossible to offer formal thanks.\" [Elderton](https:\/\/en.wikipedia.org\/wiki\/William_Palin_Elderton \"William Palin Elderton\") in his 1906 monograph [\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) provides an impressive amount of information on the beta distribution, including equations for the origin of the distribution chosen to be the mode, as well as for other Pearson distributions: types I through VII. Elderton also included a number of appendixes, including one appendix (\"II\") on the beta and gamma functions. In later editions, Elderton added equations for the origin of the distribution chosen to be the mean, and analysis of Pearson distributions VIII through XII.\n\nAs remarked by Bowman and Shenton[\\[44\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BowmanShenton-44) \"Fisher and Pearson had a difference of opinion in the approach to (parameter) estimation, in particular relating to (Pearson's method of) moments and (Fisher's method of) maximum likelihood in the case of the Beta distribution.\" Also according to Bowman and Shenton, \"the case of a Type I (beta distribution) model being the center of the controversy was pure serendipity. A more difficult model of 4 parameters would have been hard to find.\" The long running public conflict of Fisher with Karl Pearson can be followed in a number of articles in prestigious journals. For example, concerning the estimation of the four parameters for the beta distribution, and Fisher's criticism of Pearson's method of moments as being arbitrary, see Pearson's article \"Method of moments and method of maximum likelihood\" [\\[45\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1936-45) (published three years after his retirement from University College, London, where his position had been divided between Fisher and Pearson's son Egon) in which Pearson writes \"I read (Koshai's paper in the Journal of the Royal Statistical Society, 1933) which as far as I am aware is the only case at present published of the application of Professor Fisher's method. To my astonishment that method depends on first working out the constants of the frequency curve by the (Pearson) Method of Moments and then superposing on it, by what Fisher terms \"the Method of Maximum Likelihood\" a further approximation to obtain, what he holds, he will thus get, 'more efficient values' of the curve constants\".\n\nDavid and Edwards's treatise on the history of statistics[\\[82\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David_History-82) cites the first modern treatment of the beta distribution, in 1911,[\\[83\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-83) using the beta designation that has become standard, due to [Corrado Gini](https:\/\/en.wikipedia.org\/wiki\/Corrado_Gini \"Corrado Gini\"), an Italian [statistician](https:\/\/en.wikipedia.org\/wiki\/Statistician \"Statistician\"), [demographer](https:\/\/en.wikipedia.org\/wiki\/Demography \"Demography\"), and [sociologist](https:\/\/en.wikipedia.org\/wiki\/Sociology \"Sociology\"), who developed the [Gini coefficient](https:\/\/en.wikipedia.org\/wiki\/Gini_coefficient \"Gini coefficient\"). [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\"), in their comprehensive and very informative monograph[\\[84\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-84) on leading historical personalities in statistical sciences credit [Corrado Gini](https:\/\/en.wikipedia.org\/wiki\/Corrado_Gini \"Corrado Gini\")[\\[85\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-85) as \"an early Bayesian...who dealt with the problem of eliciting the parameters of an initial Beta distribution, by singling out techniques which anticipated the advent of the so-called empirical Bayes approach.\"\n\n## References\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=75 \"Edit section: References\")\\]\n\n1. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-5) [***g***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-6) [***h***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-7) [***i***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-8) [***j***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-9) [***k***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-10) [***l***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-11) [***m***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-12) [***n***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-13) [***o***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-14) [***p***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-15) [***q***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-16) [***r***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-17) [***s***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-18) [***t***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-19) [***u***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-20) [***v***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-21) [***w***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-22) [***x***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-23) [***y***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-24)\n   Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). \"Chapter 25: Beta Distributions\". *Continuous Univariate Distributions Vol. 2* (2nd ed.). Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0-471-58494-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-471-58494-0 \"Special:BookSources\/978-0-471-58494-0\")\n   \n   .\n2. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-1)\n   Rose, Colin; Smith, Murray D. (2002). *Mathematical Statistics with MATHEMATICA*. Springer. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0387952345](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0387952345 \"Special:BookSources\/978-0387952345\")\n   \n   .\n3. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2011_3-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2011_3-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2011_3-2)\n   [Kruschke, John K.](https:\/\/en.wikipedia.org\/wiki\/John_K._Kruschke \"John K. Kruschke\") (2011). *Doing Bayesian data analysis: A tutorial with R and BUGS*. Academic Press \/ Elsevier. p. 83. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0123814852](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0123814852 \"Special:BookSources\/978-0123814852\")\n   \n   .\n4. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BergerDecisionTheory_4-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BergerDecisionTheory_4-1)\n   Berger, James O. (2010). *Statistical Decision Theory and Bayesian Analysis* (2nd ed.). Springer. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-1441930743](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1441930743 \"Special:BookSources\/978-1441930743\")\n   \n   .\n5. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Feller_5-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Feller_5-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Feller_5-2)\n   Feller, William (1971). [*An Introduction to Probability Theory and Its Applications, Vol. 2*](https:\/\/archive.org\/details\/introductiontopr00fell). Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0471257097](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471257097 \"Special:BookSources\/978-0471257097\")\n   \n   .\n6. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-6)**\n   Wadsworth, G. P. (1960). [*Introduction to Probability and Random Variables*](https:\/\/archive.org\/details\/introductiontopr0000wads). New York: McGraw-Hill. p. [52](https:\/\/archive.org\/details\/introductiontopr0000wads\/page\/52).\n7. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2015_7-0)**\n   [Kruschke, John K.](https:\/\/en.wikipedia.org\/wiki\/John_K._Kruschke \"John K. Kruschke\") (2015). *Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan*. Academic Press \/ Elsevier. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0-12-405888-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-12-405888-0 \"Special:BookSources\/978-0-12-405888-0\")\n   \n   .\n8. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Wadsworth_8-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Wadsworth_8-1)\n   Wadsworth, George P. and Joseph Bryan (1960). [*Introduction to Probability and Random Variables*](https:\/\/archive.org\/details\/introductiontopr0000wads). McGraw-Hill.\n9. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-5) [***g***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-6)\n   Gupta, Arjun K., ed. (2004). *Handbook of Beta Distribution and Its Applications*. CRC Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0824753962](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0824753962 \"Special:BookSources\/978-0824753962\")\n   \n   .\n10. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kerman2011_10-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kerman2011_10-1)\n    Kerman, Jouni (2011). \"A closed-form approximation for the median of the beta distribution\". [arXiv](https:\/\/en.wikipedia.org\/wiki\/ArXiv_\\(identifier\\) \"ArXiv (identifier)\"):[1111\\.0433](https:\/\/arxiv.org\/abs\/1111.0433) \\[[math.ST](https:\/\/arxiv.org\/archive\/math.ST)\\].\n11. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-MostellerTukey_11-0)**\n    Mosteller, Frederick and John Tukey (1977). [*Data Analysis and Regression: A Second Course in Statistics*](https:\/\/archive.org\/details\/dataanalysisregr0000most). Addison-Wesley Pub. Co. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1977dars.book.....M](https:\/\/ui.adsabs.harvard.edu\/abs\/1977dars.book.....M). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0201048544](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0201048544 \"Special:BookSources\/978-0201048544\")\n    \n    .\n12. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-WillyFeller1_12-0)**\n    Feller, William (1968). *An Introduction to Probability Theory and Its Applications*. Vol. 1 (3rd ed.). Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471257080](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471257080 \"Special:BookSources\/978-0471257080\")\n    \n    .\n13. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-13)** Philip J. Fleming and John J. Wallace. *How not to lie with statistics: the correct way to summarize benchmark results*. Communications of the ACM, 29(3):218–221, March 1986.\n14. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-14)**\n    [\"NIST\/SEMATECH e-Handbook of Statistical Methods 1.3.6.6.17. Beta Distribution\"](http:\/\/www.itl.nist.gov\/div898\/handbook\/eda\/section3\/eda366h.htm). *[National Institute of Standards and Technology](https:\/\/en.wikipedia.org\/wiki\/National_Institute_of_Standards_and_Technology \"National Institute of Standards and Technology\") Information Technology Laboratory*. April 2012. Retrieved May 31, 2016.\n15. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Oguamanam_15-0)**\n    Oguamanam, D.C.D.; Martin, H. R.; Huissoon, J. P. (1995). \"On the application of the beta distribution to gear damage analysis\". *Applied Acoustics*. **45** (3): 247–261\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1016\/0003-682X(95)00001-P](https:\/\/doi.org\/10.1016%2F0003-682X%2895%2900001-P).\n16. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Liang_16-0)**\n    Zhiqiang Liang; Jianming Wei; Junyu Zhao; Haitao Liu; Baoqing Li; Jie Shen; Chunlei Zheng (27 August 2008). [\"The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals\"](https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3705491). *Sensors*. **8** (8): 5106–5119\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2008Senso...8.5106L](https:\/\/ui.adsabs.harvard.edu\/abs\/2008Senso...8.5106L). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.3390\/s8085106](https:\/\/doi.org\/10.3390%2Fs8085106). [PMC](https:\/\/en.wikipedia.org\/wiki\/PMC_\\(identifier\\) \"PMC (identifier)\") [3705491](https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3705491). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [27873804](https:\/\/pubmed.ncbi.nlm.nih.gov\/27873804).\n17. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kenney_and_Keeping_17-0)**\n    Kenney, J. F., and E. S. Keeping (1951). *Mathematics of Statistics Part Two, 2nd edition*. D. Van Nostrand Company Inc.\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n18. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-3)\n    Abramowitz, Milton and Irene A. Stegun (1965). [*Handbook Of Mathematical Functions With Formulas, Graphs, And Mathematical Tables*](https:\/\/archive.org\/details\/handbookofmathe000abra). Dover. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0-486-61272-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-486-61272-0 \"Special:BookSources\/978-0-486-61272-0\")\n    \n    .\n19. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Weisstein.Kurtosi_19-0)**\n    Weisstein., Eric W. [\"Kurtosis\"](http:\/\/mathworld.wolfram.com\/Kurtosis.html). MathWorld--A Wolfram Web Resource. Retrieved 13 August 2012.\n20. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Panik_20-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Panik_20-1)\n    Panik, Michael J (2005). *Advanced Statistics from an Elementary Point of View*. Academic Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0120884940](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0120884940 \"Special:BookSources\/978-0120884940\")\n    \n    .\n21. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-5)\n    [Pearson, Karl](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") (1916). [\"Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation\"](https:\/\/doi.org\/10.1098%2Frsta.1916.0009). *Philosophical Transactions of the Royal Society A*. **216** (538–548\\): 429–457\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1916RSPTA.216..429P](https:\/\/ui.adsabs.harvard.edu\/abs\/1916RSPTA.216..429P). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rsta.1916.0009](https:\/\/doi.org\/10.1098%2Frsta.1916.0009). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [91092](https:\/\/www.jstor.org\/stable\/91092).\n22. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Zwillinger_2014_22-0)**\n    [Gradshteyn, Izrail Solomonovich](https:\/\/en.wikipedia.org\/wiki\/Izrail_Solomonovich_Gradshteyn \"Izrail Solomonovich Gradshteyn\"); [Ryzhik, Iosif Moiseevich](https:\/\/en.wikipedia.org\/wiki\/Iosif_Moiseevich_Ryzhik \"Iosif Moiseevich Ryzhik\"); [Geronimus, Yuri Veniaminovich](https:\/\/en.wikipedia.org\/wiki\/Yuri_Veniaminovich_Geronimus \"Yuri Veniaminovich Geronimus\"); [Tseytlin, Michail Yulyevich](https:\/\/en.wikipedia.org\/wiki\/Michail_Yulyevich_Tseytlin \"Michail Yulyevich Tseytlin\"); Jeffrey, Alan (2015) \\[October 2014\\]. Zwillinger, Daniel; [Moll, Victor Hugo](https:\/\/en.wikipedia.org\/wiki\/Victor_Hugo_Moll \"Victor Hugo Moll\") (eds.). [*Table of Integrals, Series, and Products*](https:\/\/en.wikipedia.org\/wiki\/Gradshteyn_and_Ryzhik \"Gradshteyn and Ryzhik\"). Translated by Scripta Technica, Inc. (8 ed.). [Academic Press, Inc.](https:\/\/en.wikipedia.org\/wiki\/Academic_Press,_Inc. \"Academic Press, Inc.\") [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0-12-384933-5](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-12-384933-5 \"Special:BookSources\/978-0-12-384933-5\")\n    \n    . [LCCN](https:\/\/en.wikipedia.org\/wiki\/LCCN_\\(identifier\\) \"LCCN (identifier)\") [2014010276](https:\/\/lccn.loc.gov\/2014010276).\n23. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-23)**\n    Billingsley, Patrick (1995). \"Section 30: The Method of Moments\". *Probability and measure* (3rd ed.). Wiley-Interscience. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0-471-00710-4](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-471-00710-4 \"Special:BookSources\/978-0-471-00710-4\")\n    \n    .\n24. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-MacKay_24-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-MacKay_24-1)\n    MacKay, David (2003). *Information Theory, Inference and Learning Algorithms*. Cambridge University Press; First Edition. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2003itil.book.....M](https:\/\/ui.adsabs.harvard.edu\/abs\/2003itil.book.....M). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0521642989](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0521642989 \"Special:BookSources\/978-0521642989\")\n    \n    .\n25. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JohnsonLogInv_25-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JohnsonLogInv_25-1)\n    Johnson, N.L. (1949). [\"Systems of frequency curves generated by methods of translation\"](http:\/\/dml.cz\/bitstream\/handle\/10338.dmlcz\/135506\/Kybernetika_39-2003-1_3.pdf) (PDF). *Biometrika*. **36** (1–2\\): 149–176\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1093\/biomet\/36.1-2.149](https:\/\/doi.org\/10.1093%2Fbiomet%2F36.1-2.149). [hdl](https:\/\/en.wikipedia.org\/wiki\/Hdl_\\(identifier\\) \"Hdl (identifier)\"):[10338\\.dmlcz\/135506](https:\/\/hdl.handle.net\/10338.dmlcz%2F135506). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [18132090](https:\/\/pubmed.ncbi.nlm.nih.gov\/18132090).\n26. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-26)**\n    Verdugo Lazo, A. C. G.; Rathie, P. N. (1978). \"On the entropy of continuous probability distributions\". *IEEE Trans. Inf. Theory*. **24** (1): 120–122\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1109\/TIT.1978.1055832](https:\/\/doi.org\/10.1109%2FTIT.1978.1055832).\n27. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-27)**\n    Shannon, Claude E. (1948). \"A Mathematical Theory of Communication\". *Bell System Technical Journal*. **27** (4): 623–656\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1002\/j.1538-7305.1948.tb01338.x](https:\/\/doi.org\/10.1002%2Fj.1538-7305.1948.tb01338.x).\n28. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Cover_and_Thomas_28-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Cover_and_Thomas_28-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Cover_and_Thomas_28-2)\n    Cover, Thomas M. and Joy A. Thomas (2006). *Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing)*. Wiley-Interscience; 2 edition. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471241959](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471241959 \"Special:BookSources\/978-0471241959\")\n    \n    .\n29. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Plunkett_29-0)**\n    Plunkett, Kim, and Jeffrey Elman (1997). [*Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations (Neural Network Modeling and Connectionism)*](https:\/\/archive.org\/details\/exercisesinrethi0000plun). A Bradford Book. p. 166. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0262661058](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0262661058 \"Special:BookSources\/978-0262661058\")\n    \n    .\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n30. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Nallapati_30-0)**\n    Nallapati, Ramesh (2006). [*The smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval*](http:\/\/maroo.cs.umass.edu\/pub\/web\/getpdf.php?id=679) (Thesis). Computer Science Dept., University of Massachusetts Amherst.\n31. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Egon_31-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Egon_31-1)\n    Pearson, Egon S. (July 1969). [\"Some historical reflections traced through the development of the use of frequency curves\"](http:\/\/www.smu.edu\/Dedman\/Academics\/Departments\/Statistics\/Research\/TechnicalReports). *THEMIS Statistical Analysis Research Program, Technical Report 38*. Office of Naval Research, Contract N000014-68-A-0515 (Project NR 042–260).\n32. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Hahn_and_Shapiro_32-0)**\n    Hahn, Gerald J.; Shapiro, S. (1994). *Statistical Models in Engineering (Wiley Classics Library)*. Wiley-Interscience. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471040651](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471040651 \"Special:BookSources\/978-0471040651\")\n    \n    .\n33. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1895_33-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1895_33-1)\n    [Pearson, Karl](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") (1895). [\"Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material\"](https:\/\/doi.org\/10.1098%2Frsta.1895.0010). *Philosophical Transactions of the Royal Society*. **186**: 343–414\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1895RSPTA.186..343P](https:\/\/ui.adsabs.harvard.edu\/abs\/1895RSPTA.186..343P). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rsta.1895.0010](https:\/\/doi.org\/10.1098%2Frsta.1895.0010). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [90649](https:\/\/www.jstor.org\/stable\/90649).\n34. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-34)**\n    Buchanan, K.; Rockway, J.; Sternberg, O.; Mai, N. N. (May 2016). [\"Sum-difference beamforming for radar applications using circularly tapered random arrays\"](https:\/\/zenodo.org\/record\/1279364). *2016 IEEE Radar Conference (RadarConf)*. pp. 1–5\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1109\/RADAR.2016.7485289](https:\/\/doi.org\/10.1109%2FRADAR.2016.7485289). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1-5090-0863-6](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-5090-0863-6 \"Special:BookSources\/978-1-5090-0863-6\")\n    \n    . [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [32525626](https:\/\/api.semanticscholar.org\/CorpusID:32525626).\n35. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-35)**\n    Buchanan, K.; Flores, C.; Wheeland, S.; Jensen, J.; Grayson, D.; Huff, G. (May 2017). \"Transmit beamforming for radar applications using circularly tapered random arrays\". *2017 IEEE Radar Conference (RadarConf)*. pp. 0112–0117\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1109\/RADAR.2017.7944181](https:\/\/doi.org\/10.1109%2FRADAR.2017.7944181). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1-4673-8823-8](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-4673-8823-8 \"Special:BookSources\/978-1-4673-8823-8\")\n    \n    . [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [38429370](https:\/\/api.semanticscholar.org\/CorpusID:38429370).\n36. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-36)**\n    Ryan, Buchanan, Kristopher (2014-05-29). [\"Theory and Applications of Aperiodic (Random) Phased Arrays\"](http:\/\/oaktrust.library.tamu.edu\/handle\/1969.1\/157918).\n    \n    `{{cite web}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n37. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pham-Gia2000_37-0)**\n    Pham-Gia, T. (January 2000). [\"Distributions of the ratios of independent beta variables and applications\"](https:\/\/doi.org\/10.1080\/03610920008832632). *Communications in Statistics - Theory and Methods*. **29** (12): 2693–2715\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1080\/03610920008832632](https:\/\/doi.org\/10.1080%2F03610920008832632). [ISSN](https:\/\/en.wikipedia.org\/wiki\/ISSN_\\(identifier\\) \"ISSN (identifier)\") [0361-0926](https:\/\/search.worldcat.org\/issn\/0361-0926). Retrieved 13 November 2024.\n38. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-NewPERT_38-0)** Herrerías-Velasco, José Manuel and Herrerías-Pleguezuelo, Rafael and René van Dorp, Johan. (2011). Revisiting the PERT mean and Variance. European Journal of Operational Research (210), p. 448–451.\n39. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Malcolm_39-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Malcolm_39-1)\n    Malcolm, D. G.; Roseboom, J. H.; Clark, C. E.; Fazar, W. (September–October 1958). \"Application of a Technique for Research and Development Program Evaluation\". *Operations Research*. **7** (5): 646–669\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1287\/opre.7.5.646](https:\/\/doi.org\/10.1287%2Fopre.7.5.646). [ISSN](https:\/\/en.wikipedia.org\/wiki\/ISSN_\\(identifier\\) \"ISSN (identifier)\") [0030-364X](https:\/\/search.worldcat.org\/issn\/0030-364X).\n40. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-3)\n    David, H. A., Nagaraja, H. N. (2003) *Order Statistics* (3rd Edition). Wiley, New Jersey pp 458.\n    \n    [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [0-471-38926-9](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/0-471-38926-9 \"Special:BookSources\/0-471-38926-9\")\n41. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-41)**\n    [\"1.3.6.6.17. Beta Distribution\"](https:\/\/www.itl.nist.gov\/div898\/handbook\/eda\/section3\/eda366h.htm). *www.itl.nist.gov*.\n42. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-5) [***g***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-6) [***h***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-7)\n    Elderton, William Palin (1906). [*Frequency-Curves and Correlation*](https:\/\/archive.org\/details\/frequencycurvesc00elderich). Charles and Edwin Layton (London).\n43. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton_and_Johnson_43-0)**\n    Elderton, William Palin and Norman Lloyd Johnson (2009). *Systems of Frequency Curves*. Cambridge University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0521093361](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0521093361 \"Special:BookSources\/978-0521093361\")\n    \n    .\n44. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BowmanShenton_44-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BowmanShenton_44-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BowmanShenton_44-2)\n    [Bowman, K. O.](https:\/\/en.wikipedia.org\/wiki\/Kimiko_O._Bowman \"Kimiko O. Bowman\"); Shenton, L. R. (2007). [\"The beta distribution, moment method, Karl Pearson and R.A. Fisher\"](http:\/\/www.csm.ornl.gov\/~bowman\/fjts232.pdf) (PDF). *Far East J. Theo. Stat*. **23** (2): 133–164\\.\n45. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1936_45-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1936_45-1)\n    Pearson, Karl (June 1936). \"Method of moments and method of maximum likelihood\". *Biometrika*. **28** (1\/2): 34–59\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.2307\/2334123](https:\/\/doi.org\/10.2307%2F2334123). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [2334123](https:\/\/www.jstor.org\/stable\/2334123).\n46. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Joanes_and_Gill_46-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Joanes_and_Gill_46-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Joanes_and_Gill_46-2)\n    Joanes, D. N.; C. A. Gill (1998). \"Comparing measures of sample skewness and kurtosis\". *The Statistician*. **47** (Part 1): 183–189\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1111\/1467-9884.00122](https:\/\/doi.org\/10.1111%2F1467-9884.00122).\n47. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-47)**\n    Beckman, R. J.; G. L. Tietjen (1978). \"Maximum likelihood estimation for the beta distribution\". *Journal of Statistical Computation and Simulation*. **7** (3–4\\): 253–258\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1080\/00949657808810232](https:\/\/doi.org\/10.1080%2F00949657808810232).\n48. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-48)**\n    Gnanadesikan, R., Pinkham and Hughes (1967). \"Maximum likelihood estimation of the parameters of the beta distribution from smallest order statistics\". *Technometrics*. **9** (4): 607–620\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.2307\/1266199](https:\/\/doi.org\/10.2307%2F1266199). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [1266199](https:\/\/www.jstor.org\/stable\/1266199).\n    \n    `{{cite journal}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n49. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-invpsi.m_49-0)**\n    Fackler, Paul. [\"Inverse Digamma Function (Matlab)\"](http:\/\/hips.seas.harvard.edu\/content\/inverse-digamma-function-matlab). Harvard University School of Engineering and Applied Sciences. Retrieved 2012-08-18.\n50. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Silvey_50-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Silvey_50-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Silvey_50-2)\n    Silvey, S.D. (1975). *Statistical Inference*. Chapman and Hal. p. 40. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0412138201](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0412138201 \"Special:BookSources\/978-0412138201\")\n    \n    .\n51. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-EdwardsLikelihood_51-0)**\n    Edwards, A. W. F. (1992). *Likelihood*. The Johns Hopkins University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0801844430](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0801844430 \"Special:BookSources\/978-0801844430\")\n    \n    .\n52. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-5)\n    Jaynes, E.T. (2003). *Probability theory, the logic of science*. Cambridge University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0521592710](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0521592710 \"Special:BookSources\/978-0521592710\")\n    \n    .\n53. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-CostaCover_53-0)**\n    Costa, Max, and Cover, Thomas (September 1983). [*On the similarity of the entropy power inequality and the Brunn Minkowski inequality*](https:\/\/isl.stanford.edu\/people\/cover\/papers\/transIT\/0837cost.pdf) (PDF). Tech.Report 48, Dept. Statistics, Stanford University.\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n54. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Aryal_54-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Aryal_54-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Aryal_54-2)\n    Aryal, Gokarna; Saralees Nadarajah (2004). [\"Information matrix for beta distributions\"](http:\/\/www.math.bas.bg\/serdica\/2004\/2004-513-526.pdf) (PDF). *Serdica Mathematical Journal (Bulgarian Academy of Science)*. **30**: 513–526\\.\n55. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Laplace_55-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Laplace_55-1)\n    Laplace, Pierre Simon, marquis de (1902). [*A philosophical essay on probabilities*](https:\/\/archive.org\/details\/philosophicaless00lapliala). New York : J. Wiley; London : Chapman & Hall. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1-60206-328-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-60206-328-0 \"Special:BookSources\/978-1-60206-328-0\")\n    \n    .\n    \n    `{{cite book}}`: ISBN \/ Date incompatibility ([help](https:\/\/en.wikipedia.org\/wiki\/Help:CS1_errors#invalid_isbn_date \"Help:CS1 errors\"))CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n56. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-CoxRT_56-0)**\n    Cox, Richard T. (1961). *Algebra of Probable Inference*. The Johns Hopkins University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0801869822](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0801869822 \"Special:BookSources\/978-0801869822\")\n    \n    .\n    \n    `{{cite book}}`: ISBN \/ Date incompatibility ([help](https:\/\/en.wikipedia.org\/wiki\/Help:CS1_errors#invalid_isbn_date \"Help:CS1 errors\"))\n57. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-KeynesTreatise_57-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-KeynesTreatise_57-1)\n    Keynes, John Maynard (2010) \\[1921\\]. *A Treatise on Probability: The Connection Between Philosophy and the History of Science*. Wildside Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1434406965](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1434406965 \"Special:BookSources\/978-1434406965\")\n    \n    .\n58. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-PearsonRuleSuccession_58-0)**\n    Pearson, Karl (1907). \"On the Influence of Past Experience on Future Expectation\". *Philosophical Magazine*. **6** (13): 365–378\\.\n59. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-3)\n    Jeffreys, Harold (1998). *Theory of Probability*. Oxford University Press, 3rd edition. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0198503682](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0198503682 \"Special:BookSources\/978-0198503682\")\n    \n    .\n60. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BroadMind_60-0)**\n    Broad, C. D. (October 1918). \"On the relation between induction and probability\". *MIND, A Quarterly Review of Psychology and Philosophy*. 27 (New Series) (108): 389–404\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1093\/mind\/XXVII.4.389](https:\/\/doi.org\/10.1093%2Fmind%2FXXVII.4.389). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [2249035](https:\/\/www.jstor.org\/stable\/2249035).\n61. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-3)\n    Perks, Wilfred (January 1947). [\"Some observations on inverse probability including a new indifference rule\"](https:\/\/web.archive.org\/web\/20140112111032\/http:\/\/www.actuaries.org.uk\/research-and-resources\/documents\/some-observations-inverse-probability-including-new-indifference-ru). *Journal of the Institute of Actuaries*. **73** (2): 285–334\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1017\/S0020268100012270](https:\/\/doi.org\/10.1017%2FS0020268100012270). Archived from [the original](http:\/\/www.actuaries.org.uk\/research-and-resources\/documents\/some-observations-inverse-probability-including-new-indifference-ru) on 2014-01-12. Retrieved 2012-09-19.\n62. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-ThomasBayes_62-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-ThomasBayes_62-1)\n    Bayes, Thomas; communicated by Richard Price (1763). [\"An Essay towards solving a Problem in the Doctrine of Chances\"](https:\/\/doi.org\/10.1098%2Frstl.1763.0053). *Philosophical Transactions of the Royal Society*. **53**: 370–418\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rstl.1763.0053](https:\/\/doi.org\/10.1098%2Frstl.1763.0053). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [105741](https:\/\/www.jstor.org\/stable\/105741).\n63. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-63)**\n    [Haldane, J. B. S.](https:\/\/en.wikipedia.org\/wiki\/J._B._S._Haldane \"J. B. S. Haldane\") (1932). \"A note on inverse probability\". *[Mathematical Proceedings of the Cambridge Philosophical Society](https:\/\/en.wikipedia.org\/wiki\/Mathematical_Proceedings_of_the_Cambridge_Philosophical_Society \"Mathematical Proceedings of the Cambridge Philosophical Society\")*. **28** (1): 55–61\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1932PCPS...28...55H](https:\/\/ui.adsabs.harvard.edu\/abs\/1932PCPS...28...55H). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1017\/s0305004100010495](https:\/\/doi.org\/10.1017%2Fs0305004100010495). [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [122773707](https:\/\/api.semanticscholar.org\/CorpusID:122773707).\n64. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Zellner_64-0)**\n    Zellner, Arnold (1971). *An Introduction to Bayesian Inference in Econometrics*. Wiley-Interscience. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471169376](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471169376 \"Special:BookSources\/978-0471169376\")\n    \n    .\n65. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JeffreysPRIOR_65-0)**\n    Jeffreys, Harold (September 1946). [\"An Invariant Form for the Prior Probability in Estimation Problems\"](https:\/\/doi.org\/10.1098%2Frspa.1946.0056). *Proceedings of the Royal Society*. A 24. **186** (1007): 453–461\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1946RSPSA.186..453J](https:\/\/ui.adsabs.harvard.edu\/abs\/1946RSPSA.186..453J). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rspa.1946.0056](https:\/\/doi.org\/10.1098%2Frspa.1946.0056). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [20998741](https:\/\/pubmed.ncbi.nlm.nih.gov\/20998741).\n66. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BergerBernardoSun_66-0)**\n    Berger, James; Bernardo, Jose; Sun, Dongchu (2009). [\"The formal definition of reference priors\"](http:\/\/projecteuclid.org\/DPubS\/Repository\/1.0\/Disseminate?view=body&id=pdfview_1&handle=euclid.aos\/1236693154). *The Annals of Statistics*. **37** (2): 905–938\\. [arXiv](https:\/\/en.wikipedia.org\/wiki\/ArXiv_\\(identifier\\) \"ArXiv (identifier)\"):[0904\\.0156](https:\/\/arxiv.org\/abs\/0904.0156). [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2009arXiv0904.0156B](https:\/\/ui.adsabs.harvard.edu\/abs\/2009arXiv0904.0156B). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/07-AOS587](https:\/\/doi.org\/10.1214%2F07-AOS587). [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [3221355](https:\/\/api.semanticscholar.org\/CorpusID:3221355).\n67. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-67)**\n    Clarke, Bertrand S.; Andrew R. Barron (1994). [\"Jeffreys' prior is asymptotically least favorable under entropy risk\"](http:\/\/www.stat.yale.edu\/~arb4\/publications_files\/jeffery's%20prior.pdf) (PDF). *Journal of Statistical Planning and Inference*. **41**: 37–60\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1016\/0378-3758(94)90153-8](https:\/\/doi.org\/10.1016%2F0378-3758%2894%2990153-8).\n68. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-PearsonGrammar_68-0)**\n    Pearson, Karl (1892). [*The Grammar of Science*](https:\/\/books.google.com\/books?id=IvdsEcFwcnsC&q=grammar+of+science&pg=PR19). Walter Scott, London.\n69. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-PearsnGrammar2009_69-0)**\n    Pearson, Karl (2009). *The Grammar of Science*. BiblioLife. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1110356119](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1110356119 \"Special:BookSources\/978-1110356119\")\n    \n    .\n70. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Gelman_70-0)**\n    Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). *Bayesian Data Analysis*. Chapman and Hall\/CRC. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1584883883](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1584883883 \"Special:BookSources\/978-1584883883\")\n    \n    .\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n71. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-J01_71-0)**\n    Jøsang, Audun (2001). [\"A logic for uncertain probabilities\"](https:\/\/scholar.archive.org\/work\/nilorkzfvjccjir72m75zk3pgy). *International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems*. **9** (3): 279–311\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1142\/S0218488501000831](https:\/\/doi.org\/10.1142%2FS0218488501000831). [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [1843261](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=1843261).\n72. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-wavelet_oliveira_72-0)** H.M. de Oliveira and G.A.A. Araújo,. Compactly Supported One-cyclic Wavelets Derived from Beta Distributions. *Journal of Communication and Information Systems.* vol.20, n.3, pp.27-33, 2005.\n73. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Balding_73-0)**\n    [Balding, David J.](https:\/\/en.wikipedia.org\/wiki\/David_Balding \"David Balding\"); Nichols, Richard A. (1995). \"A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity\". *Genetica*. **96** (1–2\\). Springer: 3–12\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1007\/BF01441146](https:\/\/doi.org\/10.1007%2FBF01441146). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [7607457](https:\/\/pubmed.ncbi.nlm.nih.gov\/7607457). [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [30680826](https:\/\/api.semanticscholar.org\/CorpusID:30680826).\n74. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-74)** Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 1086–1091.\n75. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-75)** Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609.\n76. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-76)**\n    [\"Defense Resource Management Institute - Naval Postgraduate School\"](https:\/\/www.nps.edu\/web\/drmi\/). *www.nps.edu*.\n77. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-77)**\n    van der Waerden, B. L., \"Mathematical Statistics\", Springer,\n    \n    [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-3-540-04507-6](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-3-540-04507-6 \"Special:BookSources\/978-3-540-04507-6\")\n    \n    .\n78. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-78)** On normalizing the incomplete beta-function for fitting to dose-response curves M.E. Wise Biometrika vol 47, No. 1\/2, June 1960, pp. 173–175\n79. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-79)** Pratt, John W. “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, II.” Journal of the American Statistical Association, vol. 63, no. 324, 1968, pp. 1457–83. JSTOR, <https:\/\/doi.org\/10.2307\/2285896>. Accessed 21 Oct. 2025.\n80. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-80)**\n    [Yule, G. U.](https:\/\/en.wikipedia.org\/wiki\/Udny_Yule \"Udny Yule\"); Filon, L. N. G. (1936). [\"Karl Pearson. 1857–1936\"](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\"). *[Obituary Notices of Fellows of the Royal Society](https:\/\/en.wikipedia.org\/wiki\/Obituary_Notices_of_Fellows_of_the_Royal_Society \"Obituary Notices of Fellows of the Royal Society\")*. **2** (5): 72. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rsbm.1936.0007](https:\/\/doi.org\/10.1098%2Frsbm.1936.0007). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [769130](https:\/\/www.jstor.org\/stable\/769130).\n81. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-rscat_81-0)**\n    [\"Library and Archive catalogue\"](https:\/\/web.archive.org\/web\/20111025030931\/http:\/\/www2.royalsociety.org\/DServe\/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29\\)). *Sackler Digital Archive*. Royal Society. Archived from [the original](http:\/\/www2.royalsociety.org\/DServe\/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29%29) on 2011-10-25. Retrieved 2011-07-01.\n82. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David_History_82-0)**\n    David, H. A. and A.W.F. Edwards (2001). *Annotated Readings in the History of Statistics*. Springer; 1 edition. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0387988443](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0387988443 \"Special:BookSources\/978-0387988443\")\n    \n    .\n83. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-83)**\n    Gini, Corrado (1911). \"Considerazioni Sulle Probabilità Posteriori e Applicazioni al Rapporto dei Sessi Nelle Nascite Umane\". *Studi Economico-Giuridici della Università de Cagliari*. Anno III (reproduced in Metron 15, 133, 171, 1949): 5–41\\.\n84. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-84)**\n    Johnson, Norman L. and Samuel Kotz, ed. (1997). *Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present (Wiley Series in Probability and Statistics*. Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471163817](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471163817 \"Special:BookSources\/978-0471163817\")\n    \n    .\n85. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-85)**\n    Metron journal. [\"Biography of Corrado Gini\"](https:\/\/web.archive.org\/web\/20120716202225\/http:\/\/www.metronjournal.it\/storia\/ginibio.htm). Metron Journal. Archived from [the original](http:\/\/www.metronjournal.it\/storia\/ginibio.htm) on 2012-07-16. Retrieved 2012-08-18.\n\n## External links\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=76 \"Edit section: External links\")\\]\n\n[![Wikimedia Commons logo](https:\/\/upload.wikimedia.org\/wikipedia\/en\/thumb\/4\/4a\/Commons-logo.svg\/40px-Commons-logo.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Commons-logo.svg)\n\nWikimedia Commons has media related to [Beta distribution](https:\/\/commons.wikimedia.org\/wiki\/Category:Beta_distribution \"commons:Category:Beta distribution\").\n\n- [\"Beta Distribution\"](http:\/\/demonstrations.wolfram.com\/BetaDistribution\/) by Fiona Maclachlan, the [Wolfram Demonstrations Project](https:\/\/en.wikipedia.org\/wiki\/Wolfram_Demonstrations_Project \"Wolfram Demonstrations Project\"), 2007.\n- [Beta Distribution – Overview and Example](http:\/\/www.xycoon.com\/beta.htm), xycoon.com\n- [Beta Distribution](https:\/\/web.archive.org\/web\/20120829140915\/http:\/\/www.brighton-webs.co.uk\/distributions\/beta.htm), brighton-webs.co.uk\n- [Beta Distribution Video](http:\/\/www.exstrom.com\/blog\/snark\/posts\/dancingbeta.html), exstrom.com\n- [\"Beta-distribution\"](https:\/\/www.encyclopediaofmath.org\/index.php?title=Beta-distribution), *[Encyclopedia of Mathematics](https:\/\/en.wikipedia.org\/wiki\/Encyclopedia_of_Mathematics \"Encyclopedia of Mathematics\")*, [EMS Press](https:\/\/en.wikipedia.org\/wiki\/European_Mathematical_Society \"European Mathematical Society\"), 2001 \\[1994\\]\n- [Weisstein, Eric W.](https:\/\/en.wikipedia.org\/wiki\/Eric_W._Weisstein \"Eric W. Weisstein\") [\"Beta Distribution\"](https:\/\/mathworld.wolfram.com\/BetaDistribution.html). *[MathWorld](https:\/\/en.wikipedia.org\/wiki\/MathWorld \"MathWorld\")*.\n- [Harvard University Statistics 110 Lecture 23 Beta Distribution, Prof. Joe Blitzstein](https:\/\/www.youtube.com\/watch?v=UZjlBQbV1KU)\n\n| [v](https:\/\/en.wikipedia.org\/wiki\/Template:Probability_distributions \"Template:Probability distributions\") [t](https:\/\/en.wikipedia.org\/wiki\/Template_talk:Probability_distributions \"Template talk:Probability distributions\") [e](https:\/\/en.wikipedia.org\/wiki\/Special:EditPage\/Template:Probability_distributions \"Special:EditPage\/Template:Probability distributions\")[Probability distributions](https:\/\/en.wikipedia.org\/wiki\/Probability_distribution \"Probability distribution\") ([list](https:\/\/en.wikipedia.org\/wiki\/List_of_probability_distributions \"List of probability distributions\")) |   |\n|---|---|\n| Discrete  univariate |   |\n|   |   |\n| with finite  support | [Benford](https:\/\/en.wikipedia.org\/wiki\/Benford%27s_law \"Benford's law\") [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") [Beta-binomial](https:\/\/en.wikipedia.org\/wiki\/Beta-binomial_distribution \"Beta-binomial distribution\") [Binomial](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") [Categorical](https:\/\/en.wikipedia.org\/wiki\/Categorical_distribution \"Categorical distribution\") [Hypergeometric](https:\/\/en.wikipedia.org\/wiki\/Hypergeometric_distribution \"Hypergeometric distribution\") [Negative](https:\/\/en.wikipedia.org\/wiki\/Negative_hypergeometric_distribution \"Negative hypergeometric distribution\") [Poisson binomial](https:\/\/en.wikipedia.org\/wiki\/Poisson_binomial_distribution \"Poisson binomial distribution\") [Rademacher](https:\/\/en.wikipedia.org\/wiki\/Rademacher_distribution \"Rademacher distribution\") [Soliton](https:\/\/en.wikipedia.org\/wiki\/Soliton_distribution \"Soliton distribution\") [Discrete uniform](https:\/\/en.wikipedia.org\/wiki\/Discrete_uniform_distribution \"Discrete uniform distribution\") [Zipf](https:\/\/en.wikipedia.org\/wiki\/Zipf%27s_law \"Zipf's law\") [Zipf–Mandelbrot](https:\/\/en.wikipedia.org\/wiki\/Zipf%E2%80%93Mandelbrot_law \"Zipf–Mandelbrot law\") |\n| with infinite  support | [Beta negative binomial](https:\/\/en.wikipedia.org\/wiki\/Beta_negative_binomial_distribution \"Beta negative binomial distribution\") [Borel](https:\/\/en.wikipedia.org\/wiki\/Borel_distribution \"Borel distribution\") [Conway–Maxwell–Poisson](https:\/\/en.wikipedia.org\/wiki\/Conway%E2%80%93Maxwell%E2%80%93Poisson_distribution \"Conway–Maxwell–Poisson distribution\") [Discrete phase-type](https:\/\/en.wikipedia.org\/wiki\/Discrete_phase-type_distribution \"Discrete phase-type distribution\") [Delaporte](https:\/\/en.wikipedia.org\/wiki\/Delaporte_distribution \"Delaporte distribution\") [Extended negative binomial](https:\/\/en.wikipedia.org\/wiki\/Extended_negative_binomial_distribution \"Extended negative binomial distribution\") [Flory–Schulz](https:\/\/en.wikipedia.org\/wiki\/Flory%E2%80%93Schulz_distribution \"Flory–Schulz distribution\") [Gauss–Kuzmin](https:\/\/en.wikipedia.org\/wiki\/Gauss%E2%80%93Kuzmin_distribution \"Gauss–Kuzmin distribution\") [Geometric](https:\/\/en.wikipedia.org\/wiki\/Geometric_distribution \"Geometric distribution\") [Logarithmic](https:\/\/en.wikipedia.org\/wiki\/Logarithmic_distribution \"Logarithmic distribution\") [Mixed Poisson](https:\/\/en.wikipedia.org\/wiki\/Mixed_Poisson_distribution \"Mixed Poisson distribution\") [Negative binomial](https:\/\/en.wikipedia.org\/wiki\/Negative_binomial_distribution \"Negative binomial distribution\") [Panjer](https:\/\/en.wikipedia.org\/wiki\/\\(a,b,0\\)_class_of_distributions \"(a,b,0) class of distributions\") [Parabolic fractal](https:\/\/en.wikipedia.org\/wiki\/Parabolic_fractal_distribution \"Parabolic fractal distribution\") [Poisson](https:\/\/en.wikipedia.org\/wiki\/Poisson_distribution \"Poisson distribution\") [Skellam](https:\/\/en.wikipedia.org\/wiki\/Skellam_distribution \"Skellam distribution\") [Yule–Simon](https:\/\/en.wikipedia.org\/wiki\/Yule%E2%80%93Simon_distribution \"Yule–Simon distribution\") [Zeta](https:\/\/en.wikipedia.org\/wiki\/Zeta_distribution \"Zeta distribution\") |\n| Continuous  univariate |   |\n|   |   |\n| supported on a  bounded interval | [Arcsine](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") [ARGUS](https:\/\/en.wikipedia.org\/wiki\/ARGUS_distribution \"ARGUS distribution\") [Balding–Nichols](https:\/\/en.wikipedia.org\/wiki\/Balding%E2%80%93Nichols_model \"Balding–Nichols model\") [Bates](https:\/\/en.wikipedia.org\/wiki\/Bates_distribution \"Bates distribution\") [Beta]() [Generalized](https:\/\/en.wikipedia.org\/wiki\/Generalized_beta_distribution \"Generalized beta distribution\") [Beta rectangular](https:\/\/en.wikipedia.org\/wiki\/Beta_rectangular_distribution \"Beta rectangular distribution\") [Continuous Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Continuous_Bernoulli_distribution \"Continuous Bernoulli distribution\") [Irwin–Hall](https:\/\/en.wikipedia.org\/wiki\/Irwin%E2%80%93Hall_distribution \"Irwin–Hall distribution\") [Kumaraswamy](https:\/\/en.wikipedia.org\/wiki\/Kumaraswamy_distribution \"Kumaraswamy distribution\") [Logit-normal](https:\/\/en.wikipedia.org\/wiki\/Logit-normal_distribution \"Logit-normal distribution\") [Noncentral beta](https:\/\/en.wikipedia.org\/wiki\/Noncentral_beta_distribution \"Noncentral beta distribution\") [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT_distribution \"PERT distribution\") [Power function](https:\/\/en.wikipedia.org\/w\/index.php?title=Power_function_distribution&action=edit&redlink=1 \"Power function distribution (page does not exist)\") [Raised cosine](https:\/\/en.wikipedia.org\/wiki\/Raised_cosine_distribution \"Raised cosine distribution\") [Reciprocal](https:\/\/en.wikipedia.org\/wiki\/Reciprocal_distribution \"Reciprocal distribution\") [Triangular](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\") [U-quadratic](https:\/\/en.wikipedia.org\/wiki\/U-quadratic_distribution \"U-quadratic distribution\") [Uniform](https:\/\/en.wikipedia.org\/wiki\/Continuous_uniform_distribution \"Continuous uniform distribution\") [Wigner semicircle](https:\/\/en.wikipedia.org\/wiki\/Wigner_semicircle_distribution \"Wigner semicircle distribution\") |\n| supported on a  semi-infinite  interval | [Benini](https:\/\/en.wikipedia.org\/wiki\/Benini_distribution \"Benini distribution\") [Benktander 1st kind](https:\/\/en.wikipedia.org\/wiki\/Benktander_type_I_distribution \"Benktander type I distribution\") [Benktander 2nd kind](https:\/\/en.wikipedia.org\/wiki\/Benktander_type_II_distribution \"Benktander type II distribution\") [Beta prime](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") [Burr](https:\/\/en.wikipedia.org\/wiki\/Burr_distribution \"Burr distribution\") [Chi](https:\/\/en.wikipedia.org\/wiki\/Chi_distribution \"Chi distribution\") [Chi-squared](https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution \"Chi-squared distribution\") [Noncentral](https:\/\/en.wikipedia.org\/wiki\/Noncentral_chi-squared_distribution \"Noncentral chi-squared distribution\") [Inverse](https:\/\/en.wikipedia.org\/wiki\/Inverse-chi-squared_distribution \"Inverse-chi-squared distribution\") [Scaled](https:\/\/en.wikipedia.org\/wiki\/Scaled_inverse_chi-squared_distribution \"Scaled inverse chi-squared distribution\") [Dagum](https:\/\/en.wikipedia.org\/wiki\/Dagum_distribution \"Dagum distribution\") [Davis](https:\/\/en.wikipedia.org\/wiki\/Davis_distribution \"Davis distribution\") [Erlang](https:\/\/en.wikipedia.org\/wiki\/Erlang_distribution \"Erlang distribution\") [Hyper](https:\/\/en.wikipedia.org\/wiki\/Hyper-Erlang_distribution \"Hyper-Erlang distribution\") [Exponential](https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution \"Exponential distribution\") [Hyperexponential](https:\/\/en.wikipedia.org\/wiki\/Hyperexponential_distribution \"Hyperexponential distribution\") [Hypoexponential](https:\/\/en.wikipedia.org\/wiki\/Hypoexponential_distribution \"Hypoexponential distribution\") [Logarithmic](https:\/\/en.wikipedia.org\/wiki\/Exponential-logarithmic_distribution \"Exponential-logarithmic distribution\") [*F*](https:\/\/en.wikipedia.org\/wiki\/F-distribution \"F-distribution\") [Noncentral](https:\/\/en.wikipedia.org\/wiki\/Noncentral_F-distribution \"Noncentral F-distribution\") [Folded normal](https:\/\/en.wikipedia.org\/wiki\/Folded_normal_distribution \"Folded normal distribution\") [Fréchet](https:\/\/en.wikipedia.org\/wiki\/Fr%C3%A9chet_distribution \"Fréchet distribution\") [Gamma](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\") [Generalized](https:\/\/en.wikipedia.org\/wiki\/Generalized_gamma_distribution \"Generalized gamma distribution\") [Inverse](https:\/\/en.wikipedia.org\/wiki\/Inverse-gamma_distribution \"Inverse-gamma distribution\") [gamma\/Gompertz](https:\/\/en.wikipedia.org\/wiki\/Gamma\/Gompertz_distribution \"Gamma\/Gompertz distribution\") [Gompertz](https:\/\/en.wikipedia.org\/wiki\/Gompertz_distribution \"Gompertz distribution\") [Shifted](https:\/\/en.wikipedia.org\/wiki\/Shifted_Gompertz_distribution \"Shifted Gompertz distribution\") [Half-logistic](https:\/\/en.wikipedia.org\/wiki\/Half-logistic_distribution \"Half-logistic distribution\") [Half-normal](https:\/\/en.wikipedia.org\/wiki\/Half-normal_distribution \"Half-normal distribution\") [Hotelling's *T*\\-squared](https:\/\/en.wikipedia.org\/wiki\/Hotelling%27s_T-squared_distribution \"Hotelling's T-squared distribution\") [Hartman–Watson](https:\/\/en.wikipedia.org\/wiki\/Hartman%E2%80%93Watson_distribution \"Hartman–Watson distribution\") [Inverse Gaussian](https:\/\/en.wikipedia.org\/wiki\/Inverse_Gaussian_distribution \"Inverse Gaussian distribution\") [Generalized](https:\/\/en.wikipedia.org\/wiki\/Generalized_inverse_Gaussian_distribution \"Generalized inverse Gaussian distribution\") [Kolmogorov](https:\/\/en.wikipedia.org\/wiki\/Kolmogorov%E2%80%93Smirnov_test \"Kolmogorov–Smirnov test\") [Lévy](https:\/\/en.wikipedia.org\/wiki\/L%C3%A9vy_distribution \"Lévy distribution\") [Log-Cauchy](https:\/\/en.wikipedia.org\/wiki\/Log-Cauchy_distribution \"Log-Cauchy distribution\") [Log-Laplace](https:\/\/en.wikipedia.org\/wiki\/Log-Laplace_distribution \"Log-Laplace distribution\") [Log-logistic](https:\/\/en.wikipedia.org\/wiki\/Log-logistic_distribution \"Log-logistic distribution\") [Log-normal](https:\/\/en.wikipedia.org\/wiki\/Log-normal_distribution \"Log-normal distribution\") [Log-t](https:\/\/en.wikipedia.org\/wiki\/Log-t_distribution \"Log-t distribution\") [Lomax](https:\/\/en.wikipedia.org\/wiki\/Lomax_distribution \"Lomax distribution\") [Matrix-exponential](https:\/\/en.wikipedia.org\/wiki\/Matrix-exponential_distribution \"Matrix-exponential distribution\") [Maxwell–Boltzmann](https:\/\/en.wikipedia.org\/wiki\/Maxwell%E2%80%93Boltzmann_distribution \"Maxwell–Boltzmann distribution\") [Maxwell–Jüttner](https:\/\/en.wikipedia.org\/wiki\/Maxwell%E2%80%93J%C3%BCttner_distribution \"Maxwell–Jüttner distribution\") [Mittag-Leffler](https:\/\/en.wikipedia.org\/wiki\/Mittag-Leffler_distribution \"Mittag-Leffler distribution\") [Nakagami](https:\/\/en.wikipedia.org\/wiki\/Nakagami_distribution \"Nakagami distribution\") [Pareto](https:\/\/en.wikipedia.org\/wiki\/Pareto_distribution \"Pareto distribution\") [Phase-type](https:\/\/en.wikipedia.org\/wiki\/Phase-type_distribution \"Phase-type distribution\") [Poly-Weibull](https:\/\/en.wikipedia.org\/wiki\/Poly-Weibull_distribution \"Poly-Weibull distribution\") [Rayleigh](https:\/\/en.wikipedia.org\/wiki\/Rayleigh_distribution \"Rayleigh distribution\") [Relativistic Breit–Wigner](https:\/\/en.wikipedia.org\/wiki\/Relativistic_Breit%E2%80%93Wigner_distribution \"Relativistic Breit–Wigner distribution\") [Rice](https:\/\/en.wikipedia.org\/wiki\/Rice_distribution \"Rice distribution\") [Truncated normal](https:\/\/en.wikipedia.org\/wiki\/Truncated_normal_distribution \"Truncated normal distribution\") [type-2 Gumbel](https:\/\/en.wikipedia.org\/wiki\/Type-2_Gumbel_distribution \"Type-2 Gumbel distribution\") [Weibull](https:\/\/en.wikipedia.org\/wiki\/Weibull_distribution \"Weibull distribution\") [Discrete](https:\/\/en.wikipedia.org\/wiki\/Discrete_Weibull_distribution \"Discrete Weibull distribution\") [Wilks's lambda](https:\/\/en.wikipedia.org\/wiki\/Wilks%27s_lambda_distribution \"Wilks's lambda distribution\") |\n| supported  on the whole  real line | [Cauchy](https:\/\/en.wikipedia.org\/wiki\/Cauchy_distribution \"Cauchy distribution\") [Exponential power](https:\/\/en.wikipedia.org\/wiki\/Generalized_normal_distribution#Version_1 \"Generalized normal distribution\") [Fisher's *z*](https:\/\/en.wikipedia.org\/wiki\/Fisher%27s_z-distribution \"Fisher's z-distribution\") [Kaniadakis κ-Gaussian](https:\/\/en.wikipedia.org\/wiki\/Kaniadakis_Gaussian_distribution \"Kaniadakis Gaussian distribution\") [Gaussian *q*](https:\/\/en.wikipedia.org\/wiki\/Gaussian_q-distribution \"Gaussian q-distribution\") [Generalized hyperbolic](https:\/\/en.wikipedia.org\/wiki\/Generalised_hyperbolic_distribution \"Generalised hyperbolic distribution\") [Generalized logistic (logistic-beta)](https:\/\/en.wikipedia.org\/wiki\/Generalized_logistic_distribution \"Generalized logistic distribution\") [Generalized normal](https:\/\/en.wikipedia.org\/wiki\/Generalized_normal_distribution \"Generalized normal distribution\") [Geometric stable](https:\/\/en.wikipedia.org\/wiki\/Geometric_stable_distribution \"Geometric stable distribution\") [Gumbel](https:\/\/en.wikipedia.org\/wiki\/Gumbel_distribution \"Gumbel distribution\") [Holtsmark](https:\/\/en.wikipedia.org\/wiki\/Holtsmark_distribution \"Holtsmark distribution\") [Hyperbolic secant](https:\/\/en.wikipedia.org\/wiki\/Hyperbolic_secant_distribution \"Hyperbolic secant distribution\") [Johnson's *SU*](https:\/\/en.wikipedia.org\/wiki\/Johnson%27s_SU-distribution \"Johnson's SU-distribution\") [Landau](https:\/\/en.wikipedia.org\/wiki\/Landau_distribution \"Landau distribution\") [Laplace](https:\/\/en.wikipedia.org\/wiki\/Laplace_distribution \"Laplace distribution\") [Asymmetric](https:\/\/en.wikipedia.org\/wiki\/Asymmetric_Laplace_distribution \"Asymmetric Laplace distribution\") [Logistic](https:\/\/en.wikipedia.org\/wiki\/Logistic_distribution \"Logistic distribution\") [Noncentral *t*](https:\/\/en.wikipedia.org\/wiki\/Noncentral_t-distribution \"Noncentral t-distribution\") [Normal (Gaussian)](https:\/\/en.wikipedia.org\/wiki\/Normal_distribution \"Normal distribution\") [Normal-inverse Gaussian](https:\/\/en.wikipedia.org\/wiki\/Normal-inverse_Gaussian_distribution \"Normal-inverse Gaussian distribution\") [Skew normal](https:\/\/en.wikipedia.org\/wiki\/Skew_normal_distribution \"Skew normal distribution\") [Slash](https:\/\/en.wikipedia.org\/wiki\/Slash_distribution \"Slash distribution\") [Stable](https:\/\/en.wikipedia.org\/wiki\/Stable_distribution \"Stable distribution\") [Student's *t*](https:\/\/en.wikipedia.org\/wiki\/Student%27s_t-distribution \"Student's t-distribution\") [Tracy–Widom](https:\/\/en.wikipedia.org\/wiki\/Tracy%E2%80%93Widom_distribution \"Tracy–Widom distribution\") [Variance-gamma](https:\/\/en.wikipedia.org\/wiki\/Variance-gamma_distribution \"Variance-gamma distribution\") [Voigt](https:\/\/en.wikipedia.org\/wiki\/Voigt_profile \"Voigt profile\") |\n| with support  whose type varies | [Generalized chi-squared](https:\/\/en.wikipedia.org\/wiki\/Generalized_chi-squared_distribution \"Generalized chi-squared distribution\") [Generalized extreme value](https:\/\/en.wikipedia.org\/wiki\/Generalized_extreme_value_distribution \"Generalized extreme value distribution\") [Generalized Pareto](https:\/\/en.wikipedia.org\/wiki\/Generalized_Pareto_distribution \"Generalized Pareto distribution\") [Marchenko–Pastur](https:\/\/en.wikipedia.org\/wiki\/Marchenko%E2%80%93Pastur_distribution \"Marchenko–Pastur distribution\") [Kaniadakis *κ*\\-exponential](https:\/\/en.wikipedia.org\/wiki\/Kaniadakis_Exponential_distribution \"Kaniadakis Exponential distribution\") [Kaniadakis *κ*\\-Gamma](https:\/\/en.wikipedia.org\/wiki\/Kaniadakis_Gamma_distribution \"Kaniadakis Gamma distribution\") [Kaniadakis *κ*\\-Weibull](https:\/\/en.wikipedia.org\/wiki\/Kaniadakis_Weibull_distribution \"Kaniadakis Weibull distribution\") [Kaniadakis *κ*\\-Logistic](https:\/\/en.wikipedia.org\/wiki\/Kaniadakis_Logistic_distribution \"Kaniadakis Logistic distribution\") [Kaniadakis *κ*\\-Erlang](https:\/\/en.wikipedia.org\/wiki\/Kaniadakis_Erlang_distribution \"Kaniadakis Erlang distribution\") [*q*\\-exponential](https:\/\/en.wikipedia.org\/wiki\/Q-exponential_distribution \"Q-exponential distribution\") [*q*\\-Gaussian](https:\/\/en.wikipedia.org\/wiki\/Q-Gaussian_distribution \"Q-Gaussian distribution\") [*q*\\-Weibull](https:\/\/en.wikipedia.org\/wiki\/Q-Weibull_distribution \"Q-Weibull distribution\") [Shifted log-logistic](https:\/\/en.wikipedia.org\/wiki\/Shifted_log-logistic_distribution \"Shifted log-logistic distribution\") [Tukey lambda](https:\/\/en.wikipedia.org\/wiki\/Tukey_lambda_distribution \"Tukey lambda distribution\") |\n| Mixed  univariate |   |\n|   |   |\n| continuous- discrete | [Rectified Gaussian](https:\/\/en.wikipedia.org\/wiki\/Rectified_Gaussian_distribution \"Rectified Gaussian distribution\") |\n| [Multivariate  (joint)](https:\/\/en.wikipedia.org\/wiki\/Joint_probability_distribution \"Joint probability distribution\") | *Discrete:* [Ewens](https:\/\/en.wikipedia.org\/wiki\/Ewens%27s_sampling_formula \"Ewens's sampling formula\") [Multinomial](https:\/\/en.wikipedia.org\/wiki\/Multinomial_distribution \"Multinomial distribution\") [Dirichlet](https:\/\/en.wikipedia.org\/wiki\/Dirichlet-multinomial_distribution \"Dirichlet-multinomial distribution\") [Negative](https:\/\/en.wikipedia.org\/wiki\/Negative_multinomial_distribution \"Negative multinomial distribution\") *Continuous:* [Dirichlet](https:\/\/en.wikipedia.org\/wiki\/Dirichlet_distribution \"Dirichlet distribution\") [Generalized](https:\/\/en.wikipedia.org\/wiki\/Generalized_Dirichlet_distribution \"Generalized Dirichlet distribution\") [Multivariate Laplace](https:\/\/en.wikipedia.org\/wiki\/Multivariate_Laplace_distribution \"Multivariate Laplace distribution\") [Multivariate normal](https:\/\/en.wikipedia.org\/wiki\/Multivariate_normal_distribution \"Multivariate normal distribution\") [Multivariate stable](https:\/\/en.wikipedia.org\/wiki\/Multivariate_stable_distribution \"Multivariate stable distribution\") [Multivariate *t*](https:\/\/en.wikipedia.org\/wiki\/Multivariate_t-distribution \"Multivariate t-distribution\") [Normal-gamma](https:\/\/en.wikipedia.org\/wiki\/Normal-gamma_distribution \"Normal-gamma distribution\") [Inverse](https:\/\/en.wikipedia.org\/wiki\/Normal-inverse-gamma_distribution \"Normal-inverse-gamma distribution\") *[Matrix-valued:](https:\/\/en.wikipedia.org\/wiki\/Random_matrix \"Random matrix\")* [LKJ](https:\/\/en.wikipedia.org\/wiki\/Lewandowski-Kurowicka-Joe_distribution \"Lewandowski-Kurowicka-Joe distribution\") [Matrix beta](https:\/\/en.wikipedia.org\/wiki\/Matrix_variate_beta_distribution \"Matrix variate beta distribution\") [Matrix *F*](https:\/\/en.wikipedia.org\/wiki\/Matrix_F-distribution \"Matrix F-distribution\") [Matrix normal](https:\/\/en.wikipedia.org\/wiki\/Matrix_normal_distribution \"Matrix normal distribution\") [Matrix *t*](https:\/\/en.wikipedia.org\/wiki\/Matrix_t-distribution \"Matrix t-distribution\") [Matrix gamma](https:\/\/en.wikipedia.org\/wiki\/Matrix_gamma_distribution \"Matrix gamma distribution\") [Inverse](https:\/\/en.wikipedia.org\/wiki\/Inverse_matrix_gamma_distribution \"Inverse matrix gamma distribution\") [Wishart](https:\/\/en.wikipedia.org\/wiki\/Wishart_distribution \"Wishart distribution\") [Normal](https:\/\/en.wikipedia.org\/wiki\/Normal-Wishart_distribution \"Normal-Wishart distribution\") [Inverse](https:\/\/en.wikipedia.org\/wiki\/Inverse-Wishart_distribution \"Inverse-Wishart distribution\") [Normal-inverse](https:\/\/en.wikipedia.org\/wiki\/Normal-inverse-Wishart_distribution \"Normal-inverse-Wishart distribution\") [Complex](https:\/\/en.wikipedia.org\/wiki\/Complex_Wishart_distribution \"Complex Wishart distribution\") [Uniform distribution on a Stiefel manifold](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_on_a_Stiefel_manifold \"Uniform distribution on a Stiefel manifold\") |\n| [Directional](https:\/\/en.wikipedia.org\/wiki\/Directional_statistics \"Directional statistics\") | *Univariate (circular) [directional](https:\/\/en.wikipedia.org\/wiki\/Directional_statistics \"Directional statistics\")* [Circular uniform](https:\/\/en.wikipedia.org\/wiki\/Circular_uniform_distribution \"Circular uniform distribution\") [Univariate von Mises](https:\/\/en.wikipedia.org\/wiki\/Von_Mises_distribution \"Von Mises distribution\") [Wrapped normal](https:\/\/en.wikipedia.org\/wiki\/Wrapped_normal_distribution \"Wrapped normal distribution\") [Wrapped Cauchy](https:\/\/en.wikipedia.org\/wiki\/Wrapped_Cauchy_distribution \"Wrapped Cauchy distribution\") [Wrapped exponential](https:\/\/en.wikipedia.org\/wiki\/Wrapped_exponential_distribution \"Wrapped exponential distribution\") [Wrapped asymmetric Laplace](https:\/\/en.wikipedia.org\/wiki\/Wrapped_asymmetric_Laplace_distribution \"Wrapped asymmetric Laplace distribution\") [Wrapped Lévy](https:\/\/en.wikipedia.org\/wiki\/Wrapped_L%C3%A9vy_distribution \"Wrapped Lévy distribution\") *Bivariate (spherical)* [Kent](https:\/\/en.wikipedia.org\/wiki\/Kent_distribution \"Kent distribution\") *Bivariate (toroidal)* [Bivariate von Mises](https:\/\/en.wikipedia.org\/wiki\/Bivariate_von_Mises_distribution \"Bivariate von Mises distribution\") *Multivariate* [von Mises–Fisher](https:\/\/en.wikipedia.org\/wiki\/Von_Mises%E2%80%93Fisher_distribution \"Von Mises–Fisher distribution\") [Bingham](https:\/\/en.wikipedia.org\/wiki\/Bingham_distribution \"Bingham distribution\") |\n| [Degenerate](https:\/\/en.wikipedia.org\/wiki\/Degenerate_distribution \"Degenerate distribution\")  and [singular](https:\/\/en.wikipedia.org\/wiki\/Singular_distribution \"Singular distribution\") | *Degenerate* [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") *Singular* [Cantor](https:\/\/en.wikipedia.org\/wiki\/Cantor_distribution \"Cantor distribution\") |\n| Families | [Circular](https:\/\/en.wikipedia.org\/wiki\/Circular_distribution \"Circular distribution\") [Compound Poisson](https:\/\/en.wikipedia.org\/wiki\/Compound_Poisson_distribution \"Compound Poisson distribution\") [Elliptical](https:\/\/en.wikipedia.org\/wiki\/Elliptical_distribution \"Elliptical distribution\") [Exponential](https:\/\/en.wikipedia.org\/wiki\/Exponential_family \"Exponential family\") [Natural exponential](https:\/\/en.wikipedia.org\/wiki\/Natural_exponential_family \"Natural exponential family\") [Location–scale](https:\/\/en.wikipedia.org\/wiki\/Location%E2%80%93scale_family \"Location–scale family\") [Maximum entropy](https:\/\/en.wikipedia.org\/wiki\/Maximum_entropy_probability_distribution \"Maximum entropy probability distribution\") [Mixture](https:\/\/en.wikipedia.org\/wiki\/Mixture_distribution \"Mixture distribution\") [Pearson](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\") [Tweedie](https:\/\/en.wikipedia.org\/wiki\/Tweedie_distribution \"Tweedie distribution\") [Wrapped](https:\/\/en.wikipedia.org\/wiki\/Wrapped_distribution \"Wrapped distribution\") |\n| ![](https:\/\/upload.wikimedia.org\/wikipedia\/en\/thumb\/9\/96\/Symbol_category_class.svg\/20px-Symbol_category_class.svg.png) [Category](https:\/\/en.wikipedia.org\/wiki\/Category:Probability_distributions \"Category:Probability distributions\") [![](https:\/\/upload.wikimedia.org\/wikipedia\/en\/thumb\/4\/4a\/Commons-logo.svg\/20px-Commons-logo.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Commons-logo.svg \"Commons page\") [Commons](https:\/\/commons.wikimedia.org\/wiki\/Category:Probability_distributions \"commons:Category:Probability distributions\") |   |\n\n![](https:\/\/en.wikipedia.org\/wiki\/Special:CentralAutoLogin\/start?useformat=desktop&type=1x1&usesul3=1)\n\nRetrieved from \"<https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&oldid=1345309542>\"\n\n[Categories](https:\/\/en.wikipedia.org\/wiki\/Help:Category \"Help:Category\"):\n\n- [Continuous distributions](https:\/\/en.wikipedia.org\/wiki\/Category:Continuous_distributions \"Category:Continuous distributions\")\n- [Factorial and binomial topics](https:\/\/en.wikipedia.org\/wiki\/Category:Factorial_and_binomial_topics \"Category:Factorial and binomial topics\")\n- [Conjugate prior distributions](https:\/\/en.wikipedia.org\/wiki\/Category:Conjugate_prior_distributions \"Category:Conjugate prior distributions\")\n- [Exponential family distributions](https:\/\/en.wikipedia.org\/wiki\/Category:Exponential_family_distributions \"Category:Exponential family distributions\")\n\nHidden categories:\n\n- [CS1 maint: multiple names: authors list](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\")\n- [CS1: long volume value](https:\/\/en.wikipedia.org\/wiki\/Category:CS1:_long_volume_value \"Category:CS1: long volume value\")\n- [CS1 errors: ISBN date](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_errors:_ISBN_date \"Category:CS1 errors: ISBN date\")\n- [Articles with short description](https:\/\/en.wikipedia.org\/wiki\/Category:Articles_with_short_description \"Category:Articles with short description\")\n- [Short description is different from Wikidata](https:\/\/en.wikipedia.org\/wiki\/Category:Short_description_is_different_from_Wikidata \"Category:Short description is different from Wikidata\")\n- [All articles with unsourced statements](https:\/\/en.wikipedia.org\/wiki\/Category:All_articles_with_unsourced_statements \"Category:All articles with unsourced statements\")\n- [Articles with unsourced statements from February 2013](https:\/\/en.wikipedia.org\/wiki\/Category:Articles_with_unsourced_statements_from_February_2013 \"Category:Articles with unsourced statements from February 2013\")\n- [Articles with unsourced statements from December 2024](https:\/\/en.wikipedia.org\/wiki\/Category:Articles_with_unsourced_statements_from_December_2024 \"Category:Articles with unsourced statements from December 2024\")\n- [Commons category link from Wikidata](https:\/\/en.wikipedia.org\/wiki\/Category:Commons_category_link_from_Wikidata \"Category:Commons category link from Wikidata\")\n\n- This page was last edited on 25 March 2026, at 12:43 (UTC).\n- Text is available under the [Creative Commons Attribution-ShareAlike 4.0 License](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License \"Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License\"); additional terms may apply. By using this site, you agree to the [Terms of Use](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Terms_of_Use \"foundation:Special:MyLanguage\/Policy:Terms of Use\") and [Privacy Policy](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Privacy_policy \"foundation:Special:MyLanguage\/Policy:Privacy policy\"). Wikipedia® is a registered trademark of the [Wikimedia Foundation, Inc.](https:\/\/wikimediafoundation.org\/), a non-profit organization.\n\n- [Privacy policy](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Privacy_policy)\n- [About Wikipedia](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:About)\n- [Disclaimers](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:General_disclaimer)\n- [Contact Wikipedia](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Contact_us)\n- [Legal & safety contacts](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Legal:Wikimedia_Foundation_Legal_and_Safety_Contact_Information)\n- [Code of Conduct](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Universal_Code_of_Conduct)\n- [Developers](https:\/\/developer.wikimedia.org\/)\n- [Statistics](https:\/\/stats.wikimedia.org\/#\/en.wikipedia.org)\n- [Cookie statement](https:\/\/foundation.wikimedia.org\/wiki\/Special:MyLanguage\/Policy:Cookie_statement)\n- [Mobile view](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&mobileaction=toggle_view_mobile)\n\n- [![Wikimedia Foundation](https:\/\/en.wikipedia.org\/static\/images\/footer\/wikimedia.svg)](https:\/\/www.wikimedia.org\/)\n- [![Powered by MediaWiki](https:\/\/en.wikipedia.org\/w\/resources\/assets\/mediawiki_compact.svg)](https:\/\/www.mediawiki.org\/)\n\nSearch\n\nToggle the table of contents\n\nBeta distribution\n\n27 languages\n\n[Add topic](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution)","attrs_readable_markdown":"| Beta |   |\n|---|---|\n| Probability density function[![Probability density function for the beta distribution](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/f3\/Beta_distribution_pdf.svg\/330px-Beta_distribution_pdf.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_pdf.svg \"Probability density function for the beta distribution\") |   |\n| Cumulative distribution function[![Cumulative distribution function for the beta distribution](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/11\/Beta_distribution_cdf.svg\/330px-Beta_distribution_cdf.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_cdf.svg \"Cumulative distribution function for the beta distribution\") |   |\n| Notation | Beta(*α*, *β*) |\n| [Parameters](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") | *α* \\> 0 [shape](https:\/\/en.wikipedia.org\/wiki\/Shape_parameter \"Shape parameter\") ([real](https:\/\/en.wikipedia.org\/wiki\/Real_number \"Real number\")) *β* \\> 0 [shape](https:\/\/en.wikipedia.org\/wiki\/Shape_parameter \"Shape parameter\") ([real](https:\/\/en.wikipedia.org\/wiki\/Real_number \"Real number\")) |\n| [Support](https:\/\/en.wikipedia.org\/wiki\/Support_\\(mathematics\\) \"Support (mathematics)\") | ![{\\\\displaystyle x\\\\in \\[0,1\\]\\\\!}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/09601f74a28f3e2cad381be1a915ab0c02fe39c6) or ![{\\\\displaystyle x\\\\in (0,1)\\\\!}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6c4bd4921b023da2cf81472604e1583c7526af1d) |\n\nIn [probability theory](https:\/\/en.wikipedia.org\/wiki\/Probability_theory \"Probability theory\") and [statistics](https:\/\/en.wikipedia.org\/wiki\/Statistics \"Statistics\"), the **beta distribution** is a family of continuous [probability distributions](https:\/\/en.wikipedia.org\/wiki\/Probability_distribution \"Probability distribution\") defined on the interval \\[0, 1\\] or (0, 1) in terms of two positive [parameters](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\"), denoted by *alpha* (*α*) and *beta* (*β*), that appear as exponents of the variable and its complement to 1, respectively, and control the [shape](https:\/\/en.wikipedia.org\/wiki\/Shape_parameter \"Shape parameter\") of the distribution.\n\nThe beta distribution has been applied to model the behavior of [random variables](https:\/\/en.wikipedia.org\/wiki\/Random_variables \"Random variables\") limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions.\n\nIn [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\"), the beta distribution is the [conjugate prior probability distribution](https:\/\/en.wikipedia.org\/wiki\/Conjugate_prior_distribution \"Conjugate prior distribution\") for the [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), [binomial](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\"), [negative binomial](https:\/\/en.wikipedia.org\/wiki\/Negative_binomial_distribution \"Negative binomial distribution\"), and [geometric](https:\/\/en.wikipedia.org\/wiki\/Geometric_distribution \"Geometric distribution\") distributions.\n\nThe formulation of the beta distribution discussed here is also known as the **beta distribution of the first kind**, whereas *beta distribution of the second kind* is an alternative name for the [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\"). The generalization to multiple variables is called a [Dirichlet distribution](https:\/\/en.wikipedia.org\/wiki\/Dirichlet_distribution \"Dirichlet distribution\").\n\n### Probability density function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=2 \"Edit section: Probability density function\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/7\/78\/PDF_of_the_Beta_distribution.gif\/250px-PDF_of_the_Beta_distribution.gif)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_of_the_Beta_distribution.gif)\n\nAn animation of the beta distribution for different values of its parameters.\n\nThe [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") (PDF) of the beta distribution, for ![{\\\\displaystyle 0\\\\leq x\\\\leq 1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/30810e06ad49f3a837bd2193d4392eda1f74e7ab) or ![{\\\\displaystyle 0\\<x\\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a440e33e5630b5f22cd3ca24cfdf85f56965ac8f), and shape parameters ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3), ![{\\\\displaystyle \\\\beta \\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4a87dc52878418173659e6d0ff8e77ab2897eac9), is a [power function](https:\/\/en.wikipedia.org\/wiki\/Power_function \"Power function\") of the variable ![{\\\\displaystyle x}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/87f9e315fd7e2ba406057a97300593c4802b53e4) and of its [reflection](https:\/\/en.wikipedia.org\/wiki\/Reflection_formula \"Reflection formula\") ![{\\\\displaystyle (1-x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0539aebb17a9b292910a1a750233eed6c569cd00) as follows:\n\n![{\\\\displaystyle {\\\\begin{aligned}f(x;\\\\alpha ,\\\\beta )&=\\\\mathrm {constant} \\\\cdot x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\\\\\\\[3pt\\]&={\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\displaystyle \\\\int \\_{0}^{1}u^{\\\\alpha -1}(1-u)^{\\\\beta -1}\\\\,du}}\\\\\\\\\\[6pt\\]&={\\\\frac {\\\\Gamma (\\\\alpha +\\\\beta )}{\\\\Gamma (\\\\alpha )\\\\Gamma (\\\\beta )}}\\\\,x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\\\\\\\[6pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5fc18388353b219c482e8e35ca4aae808ab1be81)\n\nwhere ![{\\\\displaystyle \\\\Gamma (z)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/11ca17f880240539116aac7e6326909299e2a080) is the [gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"). The [beta function](https:\/\/en.wikipedia.org\/wiki\/Beta_function \"Beta function\"), ![{\\\\displaystyle \\\\mathrm {B} }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/93003d072991ba424a73ed1e081afe55c124b6ce), is a [normalization constant](https:\/\/en.wikipedia.org\/wiki\/Normalization_constant \"Normalization constant\") to ensure that the total probability is 1. In the above equations ![{\\\\displaystyle x}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/87f9e315fd7e2ba406057a97300593c4802b53e4) is a [realization](https:\/\/en.wikipedia.org\/wiki\/Realization_\\(probability\\) \"Realization (probability)\")—an observed value that actually occurred—of a [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") ![{\\\\displaystyle X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68baa052181f707c662844a465bfeeb135e82bab).\n\nSeveral authors, including [N. L. Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S. Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\"),[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) use the symbols ![{\\\\displaystyle p}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/81eac1e205430d1f40810df36a0edffdc367af36) and ![{\\\\displaystyle q}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/06809d64fa7c817ffc7e323f85997f783dbdf71d) (instead of ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\\\displaystyle \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ed48a5e36207156fb792fa79d29925d2f7901e8)) for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\\\displaystyle \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ed48a5e36207156fb792fa79d29925d2f7901e8) approach zero.\n\nIn the following, a random variable ![{\\\\displaystyle X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68baa052181f707c662844a465bfeeb135e82bab) beta-distributed with parameters ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\\\displaystyle \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ed48a5e36207156fb792fa79d29925d2f7901e8) will be denoted by:[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2)[\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2011-3)\n\n![{\\\\displaystyle X\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/36783d6420752d49ce41b434457741100627c50a)\n\nOther notations for beta-distributed random variables used in the statistical literature are ![{\\\\displaystyle X\\\\sim {\\\\mathcal {B}}e(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/56a8faad7cb5575778caa99bdb3a393dbc22f42a)[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BergerDecisionTheory-4) and ![{\\\\displaystyle X\\\\sim \\\\beta \\_{\\\\alpha ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cda49e80784b8a7df54ac2a81b26c67d9a01831a).[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Feller-5)\n\n### Cumulative distribution function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=3 \"Edit section: Cumulative distribution function\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/d3\/CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg\/250px-CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg)\n\nCDF for symmetric beta distribution vs. *x* and *α* = *β*\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/9\/9f\/CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg\/250px-CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg)\n\nCDF for skewed beta distribution vs. *x* and *β* = 5*α*\n\nThe [cumulative distribution function](https:\/\/en.wikipedia.org\/wiki\/Cumulative_distribution_function \"Cumulative distribution function\") is\n\n![{\\\\displaystyle F(x;\\\\alpha ,\\\\beta )={\\\\frac {\\\\mathrm {B} {}(x;\\\\alpha ,\\\\beta )}{\\\\mathrm {B} {}(\\\\alpha ,\\\\beta )}}=I\\_{x}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4ef58bb8473944bfb8efa7f5477fb7201d39ac21)\n\nwhere ![{\\\\displaystyle \\\\mathrm {B} (x;\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/95174df7b06d48c98cf8c754e2964784f71f1530) is the [incomplete beta function](https:\/\/en.wikipedia.org\/wiki\/Beta_function#Incomplete_beta_function \"Beta function\") and ![{\\\\displaystyle I\\_{x}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/663054c3f3dc36c0c8c445386d9e52aea7e26b07) is the [regularized incomplete beta function](https:\/\/en.wikipedia.org\/wiki\/Regularized_incomplete_beta_function \"Regularized incomplete beta function\").\n\nFor positive integers *α* and *β*, the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a [binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") with[\\[6\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-6)\n\n![{\\\\displaystyle F\\_{\\\\text{beta}}(x;\\\\alpha ,\\\\beta )=F\\_{\\\\text{binomial}}(\\\\beta -1;\\\\alpha +\\\\beta -1,1-x).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/38f787a104f98d5db23c517a28454d8fa997c134)\n\n### Alternative parameterizations\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=4 \"Edit section: Alternative parameterizations\")\\]\n\n##### Mean and sample size\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=6 \"Edit section: Mean and sample size\")\\]\n\nThe beta distribution may also be reparameterized in terms of its mean *μ* (0 \\< *μ* \\< 1) and the sum of the two shape parameters *ν* = *α* + *β* \\> 0([\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2011-3) p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = *ν* = *α*·Posterior + *β*·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = *α*·Posterior + *β* Posterior − 2, or *ν* = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Bayesian_inference) for further details.) *ν* = *α* + *β* is referred to as the \"sample size\" of a beta distribution, but one should remember that it is, strictly speaking, the \"sample size\" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem.\n\nThis parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ *θ* ≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters *α* and *β* via[\\[3\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2011-3)\n\n*α* = *μν*, *β* = (1 − *μ*)*ν*\n\nUnder this [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\"), one may place an [uninformative prior](https:\/\/en.wikipedia.org\/wiki\/Uninformative_prior \"Uninformative prior\") probability over the mean, and a vague prior probability (such as an [exponential](https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution \"Exponential distribution\") or [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\")) over the positive reals for the sample size, if they are independent, and prior data and\/or beliefs justify it.\n\n##### Mode and concentration\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=7 \"Edit section: Mode and concentration\")\\]\n\n[Concave](https:\/\/en.wikipedia.org\/wiki\/Concave_function \"Concave function\") beta distributions, which have ![{\\\\displaystyle \\\\alpha ,\\\\beta \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc3f33fc553c096bb6e12987a13ab58edef863b6), can be parametrized in terms of mode and \"concentration\". The mode, ![{\\\\displaystyle \\\\omega ={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3af2c8c28b40e585a0ee03663cd5e9a0a9663303), and concentration, ![{\\\\displaystyle \\\\kappa =\\\\alpha +\\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/083a5361b63ff338f0f521f743abe6c497967dbb), can be used to define the usual shape parameters as follows:[\\[7\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kruschke2015-7) ![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\omega (\\\\kappa -2)+1\\\\\\\\\\\\beta &=(1-\\\\omega )(\\\\kappa -2)+1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2f0609418607d04de71bfcf1aaccbe653ebc97dd) For the mode, ![{\\\\displaystyle 0\\<\\\\omega \\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/31c81e865b4e001c098ca75a143060ec563cfbf3), to be well-defined, we need ![{\\\\displaystyle \\\\alpha ,\\\\beta \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc3f33fc553c096bb6e12987a13ab58edef863b6), or equivalently ![{\\\\displaystyle \\\\kappa \\>2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4e53f495cab81087d61f1a1efc9e5bbbb91e3632). If instead we define the concentration as ![{\\\\displaystyle c=\\\\alpha +\\\\beta -2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c8ddb6cd3165b37fc58796c628e74e2b8028c57c), the condition simplifies to ![{\\\\displaystyle c\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2ba126f626d61752f62eaacaf11761a54de4dc84) and the beta density at ![{\\\\displaystyle \\\\alpha =1+c\\\\omega }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba7c36ac271ae5f8221d66e2ad723511bd7e8c97) and ![{\\\\displaystyle \\\\beta =1+c(1-\\\\omega )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f2b5f8667e2d553e1b907808d7ba1a77f455c81f) can be written as: ![{\\\\displaystyle f(x;\\\\omega ,c)={\\\\frac {x^{c\\\\omega }(1-x)^{c(1-\\\\omega )}}{\\\\mathrm {B} {\\\\bigl (}1+c\\\\omega ,1+c(1-\\\\omega ){\\\\bigr )}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2d3157f6c9c92fb94a2058c8ee7a01b607676c9c) where ![{\\\\displaystyle c}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/86a67b81c2de995bd608d5b2df50cd8cd7d92455) directly scales the [sufficient statistics](https:\/\/en.wikipedia.org\/wiki\/Sufficient_statistics \"Sufficient statistics\"), ![{\\\\displaystyle \\\\log(x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4157d3b51ac7b147fca145d431d58ec92abc1f70) and ![{\\\\displaystyle \\\\log(1-x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d88b5d2afd979fe80e61f8c9c217d7801c810160). Note also that in the limit, ![{\\\\displaystyle c\\\\to 0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/daa7595054d0a6c13cd4431f85cb517c857a2109), the distribution becomes flat.\n\nSolving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters *α* and *β*, one can express the *α* and *β* parameters in terms of the mean (*μ*) and the variance (var):\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\nu &=\\\\alpha +\\\\beta ={\\\\frac {\\\\mu (1-\\\\mu )}{\\\\mathrm {var} }}-1,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,{\\\\text{ therefore: }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )\\\\\\\\\\\\alpha &=\\\\mu \\\\nu =\\\\mu \\\\left({\\\\frac {\\\\mu (1-\\\\mu )}{\\\\text{var}}}-1\\\\right),{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu =(1-\\\\mu )\\\\left({\\\\frac {\\\\mu (1-\\\\mu )}{\\\\text{var}}}-1\\\\right),{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a68ffe42cadd81968db6bc0e75ee9c8b739c2416)\n\nThis [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters *α* and *β*. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance:\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/68\/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg\/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c2\/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg\/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/a4\/Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/9\/95\/Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/01\/Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg\/330px-Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/66\/Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg\/330px-Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/02\/Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/8f\/Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg)\n\nA beta distribution with the two shape parameters *α* and *β* is supported on the range \\[0,1\\] or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, *a*, and maximum *c* (*c* \\> *a*), values of the distribution,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) by a linear transformation substituting the non-dimensional variable *x* in terms of the new variable *y* (with support \\[*a*,*c*\\] or (*a*,*c*)) and the parameters *a* and *c*:\n\n![{\\\\displaystyle y=x(c-a)+a,{\\\\text{ therefore }}x={\\\\frac {y-a}{c-a}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8570165db81ac45831ccffe1b07b63fb5292fecc)\n\nThe [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (*c* − *a*), (so that the total area under the density curve equals a probability of one), and with the \"y\" variable shifted and scaled as follows: ![{\\\\displaystyle {\\\\begin{aligned}f(y;\\\\alpha ,\\\\beta ,a,c)={\\\\frac {f(x;\\\\alpha ,\\\\beta )}{c-a}}&={\\\\frac {\\\\left({\\\\frac {y-a}{c-a}}\\\\right)^{\\\\alpha -1}\\\\left({\\\\frac {c-y}{c-a}}\\\\right)^{\\\\beta -1}}{(c-a)B(\\\\alpha ,\\\\beta )}}\\\\\\\\\\[1ex\\]&={\\\\frac {(y-a)^{\\\\alpha -1}(c-y)^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}B(\\\\alpha ,\\\\beta )}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ebfbb9c4da37593762747522d2d91a4ca72e0011)\n\nThat a random variable *Y* is beta-distributed with four parameters *α*, *β*, *a*, and *c* will be denoted by:\n\n![{\\\\displaystyle Y\\\\sim \\\\operatorname {Beta} (\\\\alpha ,\\\\beta ,a,c).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/085ce268909196f9aff5e61f3dca6e308bdb4797)\n\nSome measures of central location are scaled (by (*c* − *a*)) and shifted (by *a*), as follows:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\mu \\_{Y}&=\\\\mu \\_{X}(c-a)+a\\\\\\\\\\[1ex\\]&={\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\left(c-a\\\\right)+a={\\\\frac {\\\\alpha c+\\\\beta a}{\\\\alpha +\\\\beta }}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/86a8b4fe30b5075b8c038d5b5b3e1f6ee8e5963f)\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{mode}}(Y)&={\\\\text{mode}}(X)(c-a)+a\\\\\\\\\\[1ex\\]&={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}\\\\left(c-a\\\\right)+a\\\\\\\\\\[1ex\\]&={\\\\frac {(\\\\alpha -1)c+(\\\\beta -1)a}{\\\\alpha +\\\\beta -2}}\\\\ ,&{\\\\text{ if }}\\\\alpha ,\\\\,\\\\beta \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/768c42362dbb2d2904c218dcfc6df1de62b5f635)\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{median}}(Y)&={\\\\text{median}}(X)(c-a)+a\\\\\\\\\\[1ex\\]&=I\\_{\\\\frac {1}{2}}^{\\[-1\\]}(\\\\alpha ,\\\\beta )\\\\left(c-a\\\\right)+a\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9b039773a1da5743c1e27109447176e3cfe1925e)\n\nNote: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can.\n\nThe shape parameters of *Y* can be written in term of its mean and variance as\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &={\\\\frac {\\\\left(a-\\\\mu \\_{Y}\\\\right)\\\\left(a\\\\,c-a\\\\,\\\\mu \\_{Y}-c\\\\,\\\\mu \\_{Y}+\\\\mu \\_{Y}^{2}+\\\\sigma \\_{Y}^{2}\\\\right)}{\\\\sigma \\_{Y}^{2}(c-a)}}\\\\\\\\\\\\beta &=-{\\\\frac {\\\\left(c-\\\\mu \\_{Y}\\\\right)\\\\left(a\\\\,c-a\\\\,\\\\mu \\_{Y}-c\\\\,\\\\mu \\_{Y}+\\\\mu \\_{Y}^{2}+\\\\sigma \\_{Y}^{2}\\\\right)}{\\\\sigma \\_{Y}^{2}(c-a)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c05caf453d32d506f7290c48a091d9c48ddd4250)\n\nThe statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (*c* − *a*), linearly for the mean deviation and nonlinearly for the variance:\n\n![{\\\\displaystyle {\\\\begin{aligned}&{\\\\text{(mean deviation around mean)}}(Y)\\\\\\\\\\[1ex\\]&=({\\\\text{(mean deviation around mean)}}(X))(c-a)\\\\\\\\&={\\\\frac {2\\\\alpha ^{\\\\alpha }\\\\beta ^{\\\\beta }}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )(\\\\alpha +\\\\beta )^{\\\\alpha +\\\\beta +1}}}(c-a)\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/769fc1182a10805b989db4ae5c207769240dc3b5) ![{\\\\displaystyle {\\\\text{var}}(Y)={\\\\text{var}}(X)(c-a)^{2}={\\\\frac {\\\\alpha \\\\beta (c-a)^{2}}{(\\\\alpha +\\\\beta )^{2}(\\\\alpha +\\\\beta +1)}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/211065eb20fc7ce222957c21a73d8bd8a5906a5a)\n\nSince the [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") are non-dimensional quantities (as [moments](https:\/\/en.wikipedia.org\/wiki\/Moment_\\(mathematics\\) \"Moment (mathematics)\") centered on the mean and normalized by the [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\")), they are independent of the parameters *a* and *c*, and therefore equal to the expressions given above in terms of *X* (with support \\[0,1\\] or (0,1)):\n\n![{\\\\displaystyle {\\\\text{skewness}}(Y)={\\\\text{skewness}}(X)={\\\\frac {2(\\\\beta -\\\\alpha ){\\\\sqrt {\\\\alpha +\\\\beta +1}}}{(\\\\alpha +\\\\beta +2){\\\\sqrt {\\\\alpha \\\\beta }}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/884dd9bbf9a6dcc30f70bba80abdfcbd1c4a18ed)\n\n![{\\\\displaystyle {\\\\text{kurtosis excess}}(Y)={\\\\text{kurtosis excess}}(X)={\\\\frac {6\\\\left\\[(\\\\alpha -\\\\beta )^{2}(\\\\alpha +\\\\beta +1)-\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)\\\\right\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/76f8b524c994e9cdaf8317555457b4369ab2271e)\n\n### Measures of central tendency\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=11 \"Edit section: Measures of central tendency\")\\]\n\nThe [mode](https:\/\/en.wikipedia.org\/wiki\/Mode_\\(statistics\\) \"Mode (statistics)\") of a beta distributed [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* with *α*, *β* \\> 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\n![{\\\\displaystyle {\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/064149e2700adff9c3fb957a3682577905181336)\n\nWhen both parameters are less than one (*α*, *β* \\< 1), this is the anti-mode: the lowest point of the probability density curve.[\\[8\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Wadsworth-8)\n\nLetting *α* = *β*, the expression for the mode simplifies to 1\/2, showing that for *α* = *β* \\> 1 the mode (resp. anti-mode when *α*, *β* \\< 1), is at the center of the distribution: it is symmetric in those cases. See [Shapes](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Shapes) section in this article for a full list of mode cases, for arbitrary values of *α* and *β*. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of *α* = 2, *β* = 1 (or *α* = 1, *β* = 2), the density function becomes a [right-triangle distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\") which is finite at both ends. In several other cases there is a [singularity](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") at one end, where the value of the density function approaches infinity. For example, in the case *α* = *β* = 1\/2, the beta distribution simplifies to become the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\"). There is debate among mathematicians about some of these cases and whether the ends (*x* = 0, and *x* = 1) can be called *modes* or not.[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\\[2\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/5\/56\/Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)\n\nMode for beta distribution for 1 ≤ *α* ≤ 5 and 1 ≤ β ≤ 5\n\n- Whether the ends are part of the [domain](https:\/\/en.wikipedia.org\/wiki\/Domain_of_a_function \"Domain of a function\") of the density function\n- Whether a [singularity](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") can ever be called a *mode*\n- Whether cases with two maxima should be called *bimodal*\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/42\/Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg\/330px-Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)\n\nMedian for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e9\/%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg\/250px-%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:\\(Mean_-_Median\\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)\n\n(Mean–median) for beta distribution versus alpha and beta from 0 to 2\n\nThe median of the beta distribution is the unique real number ![{\\\\displaystyle x=I\\_{1\/2}^{\\[-1\\]}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d7510f94efa49f254eb3924678b527a6fd22d0fc) for which the [regularized incomplete beta function](https:\/\/en.wikipedia.org\/wiki\/Regularized_incomplete_beta_function \"Regularized incomplete beta function\") ![{\\\\displaystyle I\\_{x}(\\\\alpha ,\\\\beta )={\\\\tfrac {1}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/80f2c2ef73043b11fddaf3488eb5108dabb78a4c). There is no general [closed-form expression](https:\/\/en.wikipedia.org\/wiki\/Closed-form_expression \"Closed-form expression\") for the [median](https:\/\/en.wikipedia.org\/wiki\/Median \"Median\") of the beta distribution for arbitrary values of *α* and *β*. [Closed-form expressions](https:\/\/en.wikipedia.org\/wiki\/Closed-form_expression \"Closed-form expression\") for particular values of the parameters *α* and *β* follow:\\[*[citation needed](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Citation_needed \"Wikipedia:Citation needed\")*\\]\n\nThe following are the limits with one parameter finite (non-zero) and the other approaching these limits:\\[*[citation needed](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Citation_needed \"Wikipedia:Citation needed\")*\\]\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}{\\\\text{median}}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{median}}=1,\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\text{median}}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{median}}=0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cfb9ecb081e25c8eb5633c4e1ad99a00924d9f96)\n\nA reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kerman2011-10)\n\n![{\\\\displaystyle {\\\\text{median}}\\\\approx {\\\\frac {\\\\alpha -{\\\\tfrac {1}{3}}}{\\\\alpha +\\\\beta -{\\\\tfrac {2}{3}}}}{\\\\text{ for }}\\\\alpha ,\\\\beta \\\\geq 1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/90cd0f42e583dd11e7add651ef1851597d88f184)\n\nWhen *α*, *β* ≥ 1, the [relative error](https:\/\/en.wikipedia.org\/wiki\/Relative_error \"Relative error\") (the [absolute error](https:\/\/en.wikipedia.org\/wiki\/Approximation_error \"Approximation error\") divided by the median) in this approximation is less than 4% and for both *α* ≥ 2 and *β* ≥ 2 it is less than 1%. The [absolute error](https:\/\/en.wikipedia.org\/wiki\/Approximation_error \"Approximation error\") divided by the difference between the mean and the mode is similarly small:\n\n[![Abs\\[(Median-Appr.)\/Median\\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/af\/Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg \"Abs[(Median-Appr.)\/Median] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5\")[![Abs\\[(Median-Appr.)\/(Mean-Mode)\\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e8\/Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg \"Abs[(Median-Appr.)\/(Mean-Mode)] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5\")\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/12\/Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg\/330px-Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)\n\nMean for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5\n\nThe [expected value](https:\/\/en.wikipedia.org\/wiki\/Expected_value \"Expected value\") (mean) (*μ*) of a beta distribution [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* with two parameters *α* and *β* is a function of only the ratio *β*\/*α* of these parameters:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\mu =\\\\operatorname {E} \\[X\\]&=\\\\int \\_{0}^{1}xf(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\&=\\\\int \\_{0}^{1}x\\\\,{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx\\\\\\\\&={\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\\\\\&={\\\\frac {1}{1+{\\\\frac {\\\\beta }{\\\\alpha }}}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e9137834d9d47360ed6c23550c6236fed5fd35f7)\n\nLetting *α* = *β* in the above expression one obtains *μ* = 1\/2, showing that for *α* = *β* the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{{\\\\frac {\\\\beta }{\\\\alpha }}\\\\to 0}\\\\mu =1\\\\\\\\\\\\lim \\_{{\\\\frac {\\\\beta }{\\\\alpha }}\\\\to \\\\infty }\\\\mu =0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/89775a4dd28774cf29d02e7bc054848f5d617946)\n\nTherefore, for *β*\/*α* → 0, or for *α*\/*β* → ∞, the mean is located at the right end, *x* = 1. For these limit ratios, the beta distribution becomes a one-point [degenerate distribution](https:\/\/en.wikipedia.org\/wiki\/Degenerate_distribution \"Degenerate distribution\") with a [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") spike at the right end, *x* = 1, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, *x* = 1.\n\nSimilarly, for *β*\/*α* → ∞, or for *α*\/*β* → 0, the mean is located at the left end, *x* = 0. The beta distribution becomes a 1-point [Degenerate distribution](https:\/\/en.wikipedia.org\/wiki\/Degenerate_distribution \"Degenerate distribution\") with a [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") spike at the left end, *x* = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, *x* = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}\\\\mu =\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\mu =1\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}\\\\mu =\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\mu =0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/79321fbb81bc184dbb8f51471b36495d844c7a14)\n\nWhile for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(*α*, *β*) such that *α*, *β* \\> 2) it is known that the sample mean (as an estimate of location) is not as [robust](https:\/\/en.wikipedia.org\/wiki\/Robust_statistics \"Robust statistics\") as the sample median, the opposite is the case for uniform or \"U-shaped\" bimodal distributions (with Beta(*α*, *β*) such that *α*, *β* ≤ 1), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ([\\[11\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-MostellerTukey-11) p. 207) \"the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight.\" By contrast, it follows that the median of \"U-shaped\" bimodal distributions with modes at the edge of the distribution (with Beta(*α*, *β*) such that *α*, *β* ≤ 1) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for [random walks](https:\/\/en.wikipedia.org\/wiki\/Random_walk \"Random walk\"), since the probability for the time of the last visit to the origin in a random walk is distributed as the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") Beta(1\/2, 1\/2):[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Feller-5)[\\[12\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-WillyFeller1-12) the mean of a number of [realizations](https:\/\/en.wikipedia.org\/wiki\/Realization_\\(probability\\) \"Realization (probability)\") of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case).\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/9\/9f\/%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg\/250px-%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:\\(Mean_-_GeometricMean\\)_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)\n\n(Mean − GeometricMean) for beta distribution versus *α* and *β* from 0 to 2, showing the asymmetry between *α* and *β* for the geometric mean\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Geometric_Means_for_Beta_distribution_Purple%3DG\\(X\\),_Yellow%3DG\\(1-X\\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nGeometric means for beta distribution Purple = *G*(*x*), Yellow = *G*(1 − *x*), smaller values *α* and *β* in front\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Geometric_Means_for_Beta_distribution_Purple%3DG\\(X\\),_Yellow%3DG\\(1-X\\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nGeometric means for beta distribution. purple = *G*(*x*), yellow = *G*(1 − *x*), larger values *α* and *β* in front\n\nThe logarithm of the [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") *GX* of a distribution with [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* is the arithmetic mean of ln(*X*), or, equivalently, its expected value:\n\n![{\\\\displaystyle \\\\ln G\\_{X}=\\\\operatorname {E} \\[\\\\ln X\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/64b67cb73b90bc0e09ba41003b44f84b6e1d3feb)\n\nFor a beta distribution, the expected value integral gives:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\\\ln X\\]&=\\\\int \\_{0}^{1}\\\\ln x\\\\,f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}\\\\ln x\\\\,{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,\\\\int \\_{0}^{1}{\\\\frac {\\\\partial x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{\\\\partial \\\\alpha }}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}{\\\\frac {\\\\partial }{\\\\partial \\\\alpha }}\\\\int \\_{0}^{1}x^{\\\\alpha -1}(1-x)^{\\\\beta -1}\\\\,dx\\\\\\\\\\[4pt\\]&={\\\\frac {1}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}{\\\\frac {\\\\partial \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&={\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&={\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[4pt\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cd9db519e08e3c72cd6f9e2f0c90a7c57bdba035)\n\nwhere *ψ* is the [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\").\n\nTherefore, the geometric mean of a beta distribution with shape parameters *α* and *β* is the exponential of the digamma functions of *α* and *β* as follows:\n\n![{\\\\displaystyle G\\_{X}=e^{\\\\operatorname {E} \\[\\\\ln X\\]}=e^{\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c93ffa7f0155fa3816fcb151c3eb677700aabca2)\n\nWhile for a beta distribution with equal shape parameters *α* = *β*, it follows that skewness = 0 and mode = mean = median = 1\/2, the geometric mean is less than 1\/2: 0 \\< *G**X* \\< 1\/2. The reason for this is that the logarithmic transformation strongly weights the values of *X* close to zero, as ln(*X*) strongly tends towards negative infinity as *X* approaches zero, while ln(*X*) flattens towards zero as *X* → 1.\n\nAlong a line *α* = *β*, the following limits apply:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}G\\_{X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }G\\_{X}={\\\\tfrac {1}{2}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f79aaab766e7ff7eadf78bed8e0ba71401906469)\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}G\\_{X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }G\\_{X}=1\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}G\\_{X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }G\\_{X}=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/17991f065f9f550de7ce3f62d4f6c0818871b4ff)\n\nThe accompanying plot shows the difference between the mean and the geometric mean for shape parameters *α* and *β* from zero to 2. Besides the fact that the difference between them approaches zero as *α* and *β* approach infinity and that the difference becomes large for values of *α* and *β* approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters *α* and *β*. The difference between the geometric mean and the mean is larger for small values of *α* in relation to *β* than when exchanging the magnitudes of *β* and *α*.\n\n[N. L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S. Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) suggest the logarithmic approximation to the digamma function *ψ*(*α*) ≈ ln(*α* − 1\/2) which results in the following approximation to the geometric mean:\n\n![{\\\\displaystyle G\\_{X}\\\\approx {\\\\frac {\\\\alpha \\\\,-{\\\\frac {1}{2}}}{\\\\alpha +\\\\beta -{\\\\frac {1}{2}}}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b99248644aa6d645f217ee91b14fd9dc653c044e)\n\nNumerical values for the [relative error](https:\/\/en.wikipedia.org\/wiki\/Relative_error \"Relative error\") in this approximation follow: \\[(*α* = *β* = 1): 9.39%\\]; \\[(*α* = *β* = 2): 1.29%\\]; \\[(*α* = 2, *β* = 3): 1.51%\\]; \\[(*α* = 3, *β* = 2): 0.44%\\]; \\[(*α* = *β* = 3): 0.51%\\]; \\[(*α* = *β* = 4): 0.26%\\]; \\[(*α* = 3, *β* = 4): 0.55%\\]; \\[(*α* = 4, *β* = 3): 0.24%\\].\n\nSimilarly, one can calculate the value of shape parameters required for the geometric mean to equal 1\/2. Given the value of the parameter *β*, what would be the value of the other parameter, *α*, required for the geometric mean to equal 1\/2?. The answer is that (for *β* \\> 1), the value of *α* required tends towards *β* + 1\/2 as *β* → ∞. For example, all these couples have the same geometric mean of 1\/2: \\[*β* = 1, *α* = 1.4427\\], \\[*β* = 2, *α* = 2.46958\\], \\[*β* = 3, *α* = 3.47943\\], \\[*β* = 4, *α* = 4.48449\\], \\[*β* = 5, *α* = 5.48756\\], \\[*β* = 10, *α* = 10.4938\\], \\[*β* = 100, *α* = 100.499\\].\n\nThe fundamental property of the geometric mean, which can be proven to be false for any other mean, is\n\n![{\\\\displaystyle G{\\\\left({\\\\frac {X\\_{i}}{Y\\_{i}}}\\\\right)}={\\\\frac {G(X\\_{i})}{G(Y\\_{i})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7f6dfb89e5cbd95bc0d8f931a46f5f7d8643feb9)\n\nThis makes the geometric mean the only correct mean when averaging *normalized* results, that is results that are presented as ratios to reference values.[\\[13\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-13) This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section \"Parameter estimation, maximum likelihood.\" Actually, when performing maximum likelihood estimation, besides the [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") *GX* based on the random variable X, also another geometric mean appears naturally: the [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") based on the linear transformation ––(1 − *X*), the mirror-image of *X*, denoted by *G*(1−*X*):\n\n![{\\\\displaystyle G\\_{1-X}=e^{\\\\operatorname {E} \\[\\\\ln(1-X)\\]}=e^{\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/58d36e067302e87f85db3f0fb1e2902201e38d76)\n\nAlong a line *α* = *β*, the following limits apply:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}G\\_{1-X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }G\\_{1-X}={\\\\tfrac {1}{2}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/003b2e791a1e220ab169db260dac39b0a180f7b4)\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}G\\_{(1-X)}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }G\\_{(1-X)}=0\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}G\\_{(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }G\\_{(1-X)}=1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0b3abc831911811340bd23f919c81b72ba4cbdbe)\n\nIt has the following approximate value:\n\n![{\\\\displaystyle G\\_{(1-X)}\\\\approx {\\\\frac {\\\\beta -{\\\\frac {1}{2}}}{\\\\alpha +\\\\beta -{\\\\frac {1}{2}}}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/40d7459baac164c2fedfad9dd8316320553e3d10)\n\nAlthough both *G**X* and *G*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the geometric means are equal: *G**X* = *G*(1−*X*). This equality follows from the following symmetry displayed between both geometric means:\n\n![{\\\\displaystyle G\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=G\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4153dc018580179899eb2d817b44418862eb0922)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/8d\/Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg\/250px-Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)\n\nHarmonic mean for beta distribution for 0 \\< *α* \\< 5 and 0 \\< *β* \\< 5\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c0\/%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg\/250px-%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:\\(Mean_-_HarmonicMean\\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)\n\nHarmonic mean for beta distribution versus *α* and *β* from 0 to 2\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/32\/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\\(X\\),_Yellow%3DH\\(1-X\\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nHarmonic means for beta distribution Purple = *H*(*X*), Yellow = *H*(1 − *X*), smaller values *α* and *β* in front\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/00\/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg\/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\\(X\\),_Yellow%3DH\\(1-X\\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)\n\nHarmonic means for beta distribution: purple = *H*(*X*), yellow = *H*(1 − *X*), larger values *α* and *β* in front\n\nThe inverse of the [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*HX*) of a distribution with [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* is the arithmetic mean of 1\/*X*, or, equivalently, its expected value. Therefore, the [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*HX*) of a beta distribution with shape parameters *α* and *β* is:\n\n![{\\\\displaystyle {\\\\begin{aligned}H\\_{X}&={\\\\frac {1}{\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]}}\\\\\\\\&={\\\\frac {1}{\\\\int \\_{0}^{1}{\\\\frac {f(x;\\\\alpha ,\\\\beta )}{x}}\\\\,dx}}\\\\\\\\&={\\\\frac {1}{\\\\int \\_{0}^{1}{\\\\frac {x^{\\\\alpha -1}(1-x)^{\\\\beta -1}}{x\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\,dx}}\\\\\\\\&={\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -1}}{\\\\text{ if }}\\\\alpha \\>1{\\\\text{ and }}\\\\beta \\>0\\\\\\\\\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ed7d99dd7493b9c085cd5d407861730e2a2abf6c)\n\nThe [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*HX*) of a beta distribution with *α* \\< 1 is undefined, because its defining expression is not bounded in \\[0, 1\\] for shape parameter *α* less than unity.\n\nLetting *α* = *β* in the above expression one obtains\n\n![{\\\\displaystyle H\\_{X}={\\\\frac {\\\\alpha -1}{2\\\\alpha -1}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0748b4780dc8ea57db97db149af39c2c37fcd769)\n\nshowing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1\/2, for *α* = *β* → ∞.\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}H\\_{X}{\\\\text{ is undefined}}\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 1}H\\_{X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }H\\_{X}=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}H\\_{X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }H\\_{X}=1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f9c9be97ae581b7d1ba77fc375e8cedfd799bb4e)\n\nThe harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean *HX* based on the random variable *X*, also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − *X*), the mirror-image of *X*, denoted by *H*1 − *X*:\n\n![{\\\\displaystyle H\\_{1-X}={\\\\frac {1}{\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]}}={\\\\frac {\\\\beta -1}{\\\\alpha +\\\\beta -1}}{\\\\text{ if }}\\\\beta \\>1,{\\\\text{ and }}\\\\alpha \\>0.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/48f4fd69f20c4259cb8a50e754df8dfed5a1ddca)\n\nThe [harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Harmonic_mean \"Harmonic mean\") (*H*(1 − *X*)) of a beta distribution with *β* \\< 1 is undefined, because its defining expression is not bounded in \\[0, 1\\] for shape parameter *β* less than unity.\n\nLetting *α* = *β* in the above expression one obtains\n\n![{\\\\displaystyle H\\_{(1-X)}={\\\\frac {\\\\beta -1}{2\\\\beta -1}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2e44f14a6bf3119656952a5ff13b75bc5b81cdec)\n\nshowing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1\/2, for *α* = *β* → ∞.\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\beta \\\\to 0}H\\_{1-X}{\\\\text{ is undefined}}\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 1}H\\_{1-X}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }H\\_{1-X}=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}H\\_{1-X}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }H\\_{1-X}=1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/383c5ac1f2d11de963103ce8fec670e2a5eba78c)\n\nAlthough both *H**X* and *H*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the harmonic means are equal: *H**X* = *H*1−*X*. This equality follows from the following symmetry displayed between both harmonic means:\n\n![{\\\\displaystyle H\\_{X}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=H\\_{1-X}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )){\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0e80c207c2d510bbda4077f80954f86f872c4986)\n\n### Measures of statistical dispersion\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=17 \"Edit section: Measures of statistical dispersion\")\\]\n\nThe [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") (the second moment centered on the mean) of a beta distribution [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* with parameters *α* and *β* is:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[14\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-14)\n\n![{\\\\displaystyle \\\\operatorname {var} (X)=\\\\operatorname {E} \\\\left\\[(X-\\\\mu )^{2}\\\\right\\]={\\\\frac {\\\\alpha \\\\beta }{\\\\left(\\\\alpha +\\\\beta \\\\right)^{2}\\\\left(\\\\alpha +\\\\beta +1\\\\right)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d7d2effe47b57f9a004264ee6aac04029d3954de)\n\nLetting *α* = *β* in the above expression one obtains\n\n![{\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {1}{4(2\\\\beta +1)}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9ddb2c17489caef8b881faaf9005d70cbfdb113f)\n\nshowing that for *α* = *β* the variance decreases monotonically as *α* = *β* increases. Setting *α* = *β* = 0 in this expression, one finds the maximum variance var(*X*) = 1\/4[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) which only occurs approaching the limit, at *α* = *β* = 0.\n\nThe beta distribution may also be [parametrized](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of its mean *μ* (0 \\< *μ* \\< 1) and sample size *ν* = *α* + *β* (*ν* \\> 0) (see subsection [Mean and sample size](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Mean_and_sample_size)):\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/20a45fa8a94426b232051253608b906c6822b55f)\n\nUsing this [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\"), one can express the variance in terms of the mean *μ* and the sample size *ν* as follows:\n\n![{\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {\\\\mu (1-\\\\mu )}{1+\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c47a19d1f9adc5d491983b5709e2cf1b54ccdc7f)\n\nSince *ν* = *α* + *β* \\> 0, it follows that var(*X*) \\< *μ*(1 − *μ*).\n\nFor a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1\/2, and therefore:\n\n![{\\\\displaystyle \\\\operatorname {var} (X)={\\\\frac {1}{4(1+\\\\nu )}}{\\\\text{ if }}\\\\mu ={\\\\tfrac {1}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ce5b637b913b97c48ca2fb6582eaf867338b149a)\n\nAlso, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\beta \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\operatorname {var} (X)=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\operatorname {var} (X)=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\operatorname {var} (X)=\\\\lim \\_{\\\\mu \\\\to 0}\\\\operatorname {var} (X)=\\\\lim \\_{\\\\mu \\\\to 1}\\\\operatorname {var} (X)=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}\\\\operatorname {var} (X)=\\\\mu (1-\\\\mu )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7fbf483ab2c1f8a471abc14c877b0b4ca13cbaff)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/49\/Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg\/330px-Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)\n\n#### Geometric variance and covariance\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=19 \"Edit section: Geometric variance and covariance\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/36\/Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png\/250px-Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png)\n\nlog geometric variances vs. *α* and *β*\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png\/250px-Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png)\n\nlog geometric variances vs. *α* and *β*\n\nThe logarithm of the geometric variance, ln(var*GX*), of a distribution with [random variable](https:\/\/en.wikipedia.org\/wiki\/Random_variable \"Random variable\") *X* is the second moment of the logarithm of *X* centered on the geometric mean of *X*, ln(*GX*):\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\operatorname {var} \\_{GX}&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X-\\\\ln G\\_{X}\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X-\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\right\\]\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln X\\\\right)^{2}\\\\right\\]-\\\\left(\\\\operatorname {E} \\[\\\\ln X\\]\\\\right)^{2}\\\\\\\\&=\\\\operatorname {var} \\[\\\\ln X\\]\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/03642aabe02f83fe6e01e5a136c245d82b898904)\n\nand therefore, the geometric variance is:\n\n![{\\\\displaystyle \\\\operatorname {var} \\_{GX}=e^{\\\\operatorname {var} \\[\\\\ln X\\]}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/524cf664ccfd5eb381fd1987926209f1c401a200)\n\nIn the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\") matrix, and the curvature of the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\"), the logarithm of the geometric variance of the [reflected](https:\/\/en.wikipedia.org\/wiki\/Reflection_formula \"Reflection formula\") variable 1 − *X* and the logarithm of the geometric covariance between *X* and 1 − *X* appear:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\operatorname {var\\_{G(1-X)}} &=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln(1-X)-\\\\ln G\\_{1-X}\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left(\\\\ln(1-X)-\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\right)^{2}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[(\\\\ln(1-X))^{2}\\\\right\\]-\\\\left(\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\right)^{2}\\\\\\\\&=\\\\operatorname {var} \\[\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {var\\_{G(1-X)}} &=e^{\\\\operatorname {var} \\[\\\\ln(1-X)\\]}\\\\\\\\&\\\\\\\\\\\\ln \\\\operatorname {cov\\_{G{X,1-X}}} &=\\\\operatorname {E} \\[(\\\\ln X-\\\\ln G\\_{X})(\\\\ln(1-X)-\\\\ln G\\_{1-X})\\]\\\\\\\\&=\\\\operatorname {E} \\[(\\\\ln X-\\\\operatorname {E} \\[\\\\ln X\\])(\\\\ln(1-X)-\\\\operatorname {E} \\[\\\\ln(1-X)\\])\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\ln(1-X)\\\\right\\]-\\\\operatorname {E} \\[\\\\ln X\\]\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\\\\\&=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {cov} \\_{G{X,(1-X)}}&=e^{\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/657c11ca41846366cb8d9843536af9e002ea4cdf)\n\nFor a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section [§ Moments of logarithmically transformed random variables](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moments_of_logarithmically_transformed_random_variables). The [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of the logarithmic variables and [covariance](https:\/\/en.wikipedia.org\/wiki\/Covariance \"Covariance\") of ln *X* and ln(1−*X*) are:\n\n![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e396e8700267735eb741f73e8906445579c43bc6) ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/70eefadef46c7d56cc13c8221aa3df1d71596b7f) ![{\\\\displaystyle \\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e7a515ada0b9d62c5a3a7b35662b03256d66e3b9)\n\nwhere the **[trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), and is defined as the derivative of the [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\"):\n\n![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{d\\\\alpha ^{2}}}={\\\\frac {d\\\\psi (\\\\alpha )}{d\\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/041bbff527a17a19f628022e9d3bbb548e7c9f87)\n\nTherefore,\n\n![{\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{GX}=\\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/194b00552edda5d8d026a24872cdb27b604516c9) ![{\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/96dd82553307c025c84da68a3c373aad7467abd2) ![{\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,1-X}=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/40a793c0271e457f671edb0668edc15bbae8740f)\n\nThe accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters *α* and *β*. The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters *α* and *β* greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values *α* and *β* less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for *α* and *β* less than unity.\n\nFollowing are the limits with one parameter finite (non-zero) and the other approaching these limits:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX}=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)}=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX}=\\\\psi \\_{1}(\\\\alpha )\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)}=\\\\psi \\_{1}(\\\\beta )\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=-\\\\psi \\_{1}(\\\\beta )\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}=-\\\\psi \\_{1}(\\\\alpha )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/52b49d30ad8262df98b1571219f266c2555dc215)\n\nLimits with two parameters varying:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{GX})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\ln \\\\operatorname {var} \\_{G(1-X)})=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=0\\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {var} \\_{GX})=\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {var} \\_{G(1-X)})=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=\\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to 0}\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)})=-\\\\infty \\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0bddeff859939cba195caabb8a8340195320ad31)\n\nAlthough both ln(var*GX*) and ln(var*G*(1 − *X*)) are asymmetric, when the shape parameters are equal, *α* = *β*, one has: ln(var*GX*) = ln(var*G*(1−*X*)). This equality follows from the following symmetry displayed between both log geometric variances:\n\n![{\\\\displaystyle \\\\ln \\\\operatorname {var} \\_{GX}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {var} \\_{G(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha )).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/73e9d65f936352d6c5ba0168fdc0a51177ec77ba)\n\nThe log geometric covariance is symmetric:\n\n![{\\\\displaystyle \\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\alpha ,\\\\beta ))=\\\\ln \\\\operatorname {cov} \\_{GX,(1-X)}(\\\\mathrm {B} (\\\\beta ,\\\\alpha ))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d59fb83efc45a84f3606874dc5791681812ca46b)\n\n#### Mean absolute deviation around the mean\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=20 \"Edit section: Mean absolute deviation around the mean\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/60\/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg\/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)\n\nRatio of, ean abs.dev. to std.dev. for beta distribution with α and β ranging from 0 to 5\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/10\/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg\/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg)\n\nRatio of mean abs.dev. to std.dev. for beta distribution with mean 0 ≤ *μ* ≤ 1 and sample size 0 \\< *ν* ≤ 10\n\nThe [mean absolute deviation](https:\/\/en.wikipedia.org\/wiki\/Mean_absolute_deviation \"Mean absolute deviation\") around the mean for the beta distribution with shape parameters *α* and *β* is:[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)\n\n![{\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2\\\\alpha ^{\\\\alpha }\\\\beta ^{\\\\beta }}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )(\\\\alpha +\\\\beta )^{\\\\alpha +\\\\beta +1}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6d1c6330a91df22b40cedc7903dbc70120d66cf9)\n\nThe mean absolute deviation around the mean is a more [robust](https:\/\/en.wikipedia.org\/wiki\/Robust_statistics \"Robust statistics\") [estimator](https:\/\/en.wikipedia.org\/wiki\/Estimator \"Estimator\") of [statistical dispersion](https:\/\/en.wikipedia.org\/wiki\/Statistical_dispersion \"Statistical dispersion\") than the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta(*α*, *β*) distributions with *α*,*β* \\> 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean. Therefore, the effect of very large deviations from the mean are not as overly weighted.\n\nUsing [Stirling's approximation](https:\/\/en.wikipedia.org\/wiki\/Stirling%27s_approximation \"Stirling's approximation\") to the [Gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"), [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) derived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for *α* = *β* = 1, and it decreases to zero as *α* → ∞, *β* → ∞):\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\text{mean abs. dev. from mean}}{\\\\text{standard deviation}}}&={\\\\frac {\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]}{\\\\sqrt {\\\\operatorname {var} (X)}}}\\\\\\\\&\\\\approx {\\\\sqrt {\\\\frac {2}{\\\\pi }}}\\\\left(1+{\\\\frac {7}{12(\\\\alpha +\\\\beta )}}{}-{\\\\frac {1}{12\\\\alpha }}-{\\\\frac {1}{12\\\\beta }}\\\\right),{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c196a5a2eb110b71471a3dc019241c6cb8c3f927)\n\nAt the limit *α* → ∞, *β* → ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution: ![{\\\\displaystyle {\\\\sqrt {\\\\frac {2}{\\\\pi }}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/14c49a2b1362a06b132b9477b3669977ad5633dd). For *α* = *β* = 1 this ratio equals ![{\\\\displaystyle {\\\\frac {\\\\sqrt {3}}{2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4864a0c173339d1d88e89ca3c943f016744c879a), so that from *α* = *β* = 1 to *α*, *β* → ∞ the ratio decreases by 8.5%. For *α* = *β* = 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from *α* = *β* = 0 to *α* = *β* = 1, and by 25% from *α* = *β* = 0 to *α*, *β* → ∞ . However, for skewed beta distributions such that *α* → 0 or *β* → 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation.\n\nUsing the [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of mean *μ* and sample size *ν* = *α* + *β* \\> 0:\n\n*α* = *μν*, *β* = (1 − *μ*)*ν*\n\none can express the mean [absolute deviation](https:\/\/en.wikipedia.org\/wiki\/Absolute_deviation \"Absolute deviation\") around the mean in terms of the mean *μ* and the sample size *ν* as follows:\n\n![{\\\\displaystyle \\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2\\\\mu ^{\\\\mu \\\\nu }(1-\\\\mu )^{(1-\\\\mu )\\\\nu }}{\\\\nu \\\\mathrm {B} (\\\\mu \\\\nu ,(1-\\\\mu )\\\\nu )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/027efecf8aaefea8c805194e47a1374ffcb63cb8)\n\nFor a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1\/2, and therefore:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]={\\\\frac {2^{1-\\\\nu }}{\\\\nu \\\\mathrm {B} ({\\\\tfrac {\\\\nu }{2}},{\\\\tfrac {\\\\nu }{2}})}}&={\\\\frac {2^{1-\\\\nu }\\\\Gamma (\\\\nu )}{\\\\nu (\\\\Gamma ({\\\\tfrac {\\\\nu }{2}}))^{2}}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to 0}\\\\left(\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]\\\\right)&={\\\\frac {1}{2}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\left(\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]\\\\right)&=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/745a053a1ef3cc7edf07332763b401bd09b40e42)\n\nAlso, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\lim \\_{\\\\beta \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\alpha \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\mu \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=\\\\lim \\_{\\\\mu \\\\to 1}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]=0\\\\\\\\\\\\lim \\_{\\\\nu \\\\to 0}\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&={\\\\sqrt {\\\\mu (1-\\\\mu )}}\\\\\\\\\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\operatorname {E} \\[\\|X-E\\[X\\]\\|\\]&=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/87c43b4a05f8ea3acf3f15b0a16f6ee07811ac6b)\n\n#### Mean absolute difference\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=21 \"Edit section: Mean absolute difference\")\\]\n\nThe [mean absolute difference](https:\/\/en.wikipedia.org\/wiki\/Mean_absolute_difference \"Mean absolute difference\") for the beta distribution is:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\mathrm {MD} &=\\\\int \\_{0}^{1}\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\,f(y;\\\\alpha ,\\\\beta )\\\\left\\|x-y\\\\right\\|dx\\\\,dy\\\\\\\\\\[1ex\\]&={\\\\frac {4}{\\\\alpha +\\\\beta }}{\\\\frac {B(\\\\alpha +\\\\beta ,\\\\alpha +\\\\beta )}{B(\\\\alpha ,\\\\alpha )B(\\\\beta ,\\\\beta )}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0de290eea66b8a424727bff1b9a02f53f2607361)\n\nThe [Gini coefficient](https:\/\/en.wikipedia.org\/wiki\/Gini_coefficient \"Gini coefficient\") for the beta distribution is half of the relative mean absolute difference:\n\n![{\\\\displaystyle \\\\mathrm {G} =\\\\left({\\\\frac {2}{\\\\alpha }}\\\\right){\\\\frac {B(\\\\alpha +\\\\beta ,\\\\alpha +\\\\beta )}{B(\\\\alpha ,\\\\alpha )B(\\\\beta ,\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0b4dc9f001aea3434b57f12eaaabd341347cc169)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1b\/Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg\/330px-Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg)\n\nSkewness for beta distribution as a function of variance and mean\n\nThe [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") (the third moment centered on the mean, normalized by the 3\/2 power of the variance) of the beta distribution is[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\n![{\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\\\left\\[\\\\left(X-\\\\mu \\\\right)^{3}\\\\right\\]}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3\/2}}}={\\\\frac {2\\\\left(\\\\beta -\\\\alpha \\\\right){\\\\sqrt {\\\\alpha +\\\\beta +1}}}{\\\\left(\\\\alpha +\\\\beta +2\\\\right){\\\\sqrt {\\\\alpha \\\\beta }}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1c880f0b8322d91fe382f87eaa4f8730faa164ed)\n\nLetting *α* = *β* in the above expression one obtains *γ*1 = 0, showing once again that for *α* = *β* the distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for *α* \\< *β*, negative skew (left-tailed) for *α* \\> *β*.\n\nUsing the [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of mean *μ* and sample size *ν* = *α* + *β*:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,&{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/20a45fa8a94426b232051253608b906c6822b55f)\n\none can express the skewness in terms of the mean *μ* and the sample size ν as follows:\n\n![{\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3\/2}}}={\\\\frac {2(1-2\\\\mu ){\\\\sqrt {1+\\\\nu }}}{(2+\\\\nu ){\\\\sqrt {\\\\mu (1-\\\\mu )}}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c88399efb587f7d2443ddb232e4b24f2763050b9)\n\nThe skewness can also be expressed just in terms of the variance *var* and the mean *μ* as follows:\n\n![{\\\\displaystyle \\\\gamma \\_{1}={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]}{(\\\\operatorname {var} (X))^{3\/2}}}={\\\\frac {2(1-2\\\\mu ){\\\\sqrt {\\\\operatorname {var} }}}{\\\\mu (1-\\\\mu )+\\\\operatorname {var} }}{\\\\text{ if }}\\\\operatorname {var} \\<\\\\mu (1-\\\\mu )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6b48373ac7ce8096381e7f74edf9e44bc435ad13)\n\nThe accompanying plot of skewness as a function of variance and mean shows that maximum variance (1\/4) is coupled with zero skewness and the symmetry condition (*μ* = 1\/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the \"mass\" of the probability distribution is concentrated at the ends (minimum variance).\n\nThe following expression for the square of the skewness, in terms of the sample size *ν* = *α* + *β* and the variance var, is useful for the method of moments estimation of four parameters:\n\n![{\\\\displaystyle (\\\\gamma \\_{1})^{2}={\\\\frac {\\\\left(\\\\operatorname {E} \\[(X-\\\\mu )^{3}\\]\\\\right)^{2}}{\\\\left(\\\\operatorname {var} (X)\\\\right)^{3}}}={\\\\frac {4}{(2+\\\\nu )^{2}}}\\\\left({\\\\frac {1}{\\\\operatorname {var} }}-4(1+\\\\nu )\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b4f7b7d9a6d73e9bbcc43812b8f3fd573bc02ee3)\n\nThis expression correctly gives a skewness of zero for *α* = *β*, since in that case (see [§ Variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Variance)): ![{\\\\displaystyle \\\\operatorname {var} ={\\\\frac {1}{4(1+\\\\nu )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ceee4bc548c8256d3770abfd91ced35cdaeb4305).\n\nFor the symmetric case (*α* = *β*), skewness = 0 over the whole range, and the following limits apply:\n\n![{\\\\displaystyle \\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1}=\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\nu \\\\to \\\\infty }\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}\\\\gamma \\_{1}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/62067392844dd260a0af419672fd2f6e8c964ea8)\n\nFor the asymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to 0}\\\\gamma \\_{1}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\beta \\\\to 0}\\\\gamma \\_{1}=\\\\lim \\_{\\\\mu \\\\to 1}\\\\gamma \\_{1}=-\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1}=-{\\\\frac {2}{\\\\sqrt {\\\\beta }}},\\\\quad \\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1})=-\\\\infty ,\\\\quad \\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\gamma \\_{1})=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1}={\\\\frac {2}{\\\\sqrt {\\\\alpha }}},\\\\quad \\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1})=\\\\infty ,\\\\quad \\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\gamma \\_{1})=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1}={\\\\frac {1-2\\\\mu }{\\\\sqrt {\\\\mu (1-\\\\mu )}}},\\\\quad \\\\lim \\_{\\\\mu \\\\to 0}(\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1})=\\\\infty ,\\\\quad \\\\lim \\_{\\\\mu \\\\to 1}(\\\\lim \\_{\\\\nu \\\\to 0}\\\\gamma \\_{1})=-\\\\infty \\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bba90eaf084c6a3d43b4c92915911f61b0b7a77)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/43\/Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/48\/Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg\/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/69\/Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg\/330px-Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg)\n\nExcess Kurtosis for Beta Distribution as a function of variance and mean\n\nThe beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear.[\\[15\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Oguamanam-15) Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc.[\\[16\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Liang-16) Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping[\\[17\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kenney_and_Keeping-17) use the symbol γ2 for the [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\"), but [Abramowitz and Stegun](https:\/\/en.wikipedia.org\/wiki\/Abramowitz_and_Stegun \"Abramowitz and Stegun\")[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18) use different terminology. To prevent confusion[\\[19\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Weisstein.Kurtosi-19) between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows:[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\\[20\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Panik-20)\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{excess kurtosis}}&={\\\\text{kurtosis}}-3\\\\\\\\&={\\\\frac {\\\\operatorname {E} \\[(X-\\\\mu )^{4}\\]}{(\\\\operatorname {var} (X))^{2}}}-3\\\\\\\\&={\\\\frac {6\\[\\\\alpha ^{3}-\\\\alpha ^{2}(2\\\\beta -1)+\\\\beta ^{2}(\\\\beta +1)-2\\\\alpha \\\\beta (\\\\beta +2)\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}\\\\\\\\&={\\\\frac {6\\[(\\\\alpha -\\\\beta )^{2}(\\\\alpha +\\\\beta +1)-\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)\\]}{\\\\alpha \\\\beta (\\\\alpha +\\\\beta +2)(\\\\alpha +\\\\beta +3)}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ed8320d4f38ba9260f8ad91c30238abc08306dc8)\n\nLetting *α* = *β* in the above expression one obtains\n\n![{\\\\displaystyle {\\\\text{excess kurtosis}}=-{\\\\frac {6}{3+2\\\\alpha }}{\\\\text{ if }}\\\\alpha =\\\\beta .}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f71fa5123180a4ba12fefe14575f8ee902ecf6f5)\n\nTherefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as {*α* = *β*} → 0, and approaching a maximum value of zero as {*α* = *β*} → ∞. The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve. This minimum value is reached when all the probability density is entirely concentrated at each end *x* = 0 and *x* = 1, with nothing in between: a 2-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each end (a coin toss: see section below \"Kurtosis bounded by the square of the skewness\" for further discussion). The description of [kurtosis](https:\/\/en.wikipedia.org\/wiki\/Kurtosis \"Kurtosis\") as a measure of the \"potential outliers\" (or \"potential rare, extreme values\") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For *α* ≠ *β*, skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for *α* → 0 for finite *β*, or for *β* → 0 for finite *α*) because the side away from the mode will produce occasional extreme values. Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends.\n\nUsing the [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") in terms of mean *μ* and sample size *ν* = *α* + *β*:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &{}=\\\\mu \\\\nu ,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0\\\\\\\\\\\\beta &{}=(1-\\\\mu )\\\\nu ,{\\\\text{ where }}\\\\nu =(\\\\alpha +\\\\beta )\\>0.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e9235083c23a44820d57502412277b6492733df3)\n\none can express the excess kurtosis in terms of the mean *μ* and the sample size *ν* as follows:\n\n![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}{\\\\bigg (}{\\\\frac {(1-2\\\\mu )^{2}(1+\\\\nu )}{\\\\mu (1-\\\\mu )(2+\\\\nu )}}-1{\\\\bigg )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/93b68e62d58b197fa50fe15bc12d94b9a4accd9a)\n\nThe excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size *ν* as follows:\n\n![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{(3+\\\\nu )(2+\\\\nu )}}\\\\left({\\\\frac {1}{\\\\text{ var }}}-6-5\\\\nu \\\\right){\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c62c8758de9adab39b9c265430d2c8be8c1156be)\n\nand, in terms of the variance *var* and the mean *μ* as follows:\n\n![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6{\\\\text{ var }}(1-{\\\\text{ var }}-5\\\\mu (1-\\\\mu ))}{({\\\\text{var }}+\\\\mu (1-\\\\mu ))(2{\\\\text{ var }}+\\\\mu (1-\\\\mu ))}}{\\\\text{ if }}{\\\\text{var}}\\<\\\\mu (1-\\\\mu )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/742a17be5afc4513ea91bfa55f1b1483f0a66a02)\n\nThe plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1\/4) and the symmetry condition: the mean occurring at the midpoint (*μ* = 1\/2). This occurs for the symmetric case of *α* = *β* = 0, with zero skewness. At the limit, this is the 2 point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") end *x* = 0 and *x* = 1 and zero probability everywhere else. (A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1.) Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end. Excess kurtosis is minimum: the probability density \"mass\" is zero at the mean and it is concentrated at the two peaks at each end. Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-\"peaky\" with nothing in between them.\n\nOn the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end (*μ* = 0 or *μ* = 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end.\n\nAlternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows:\n\n![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}{\\\\bigg (}{\\\\frac {(2+\\\\nu )}{4}}({\\\\text{skewness}})^{2}-1{\\\\bigg )}{\\\\text{ if (skewness)}}^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5c4a2a00762216460d3146b5e185c584ca7894da)\n\nFrom this last expression, one can obtain the same limits published over a century ago by [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\")[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) for the beta distribution (see section below titled \"Kurtosis bounded by the square of the skewness\"). Setting *α* + *β* = *ν* = 0 in the above expression, one obtains Pearson's lower boundary (values for the skewness and excess kurtosis below the boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") appropriately called the region below this boundary the \"impossible region\"). The limit of *α* + *β* = *ν* → ∞ determines Pearson's upper boundary.\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}}=({\\\\text{skewness}})^{2}-2\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5c1b3a5082942b2039499009068f1640b3cb8507)\n\ntherefore:\n\n![{\\\\displaystyle ({\\\\text{skewness}})^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1cfacc2713aca2945a2a4d65e4a41d7ce3486ec4)\n\nValues of *ν* = *α* + *β* such that *ν* ranges from zero to infinity, 0 \\< *ν* \\< ∞, span the whole region of the beta distribution in the plane of excess kurtosis versus squared skewness.\n\nFor the symmetric case (*α* = *β*), the following limits apply:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to 0}{\\\\text{excess kurtosis}}=-2\\\\\\\\&\\\\lim \\_{\\\\alpha =\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}}=0\\\\\\\\&\\\\lim \\_{\\\\mu \\\\to {\\\\frac {1}{2}}}{\\\\text{excess kurtosis}}=-{\\\\frac {6}{3+\\\\nu }}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/faf6d13398e34d7d3fab238a94b36633d89460b0)\n\nFor the unsymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:\n\n![{\\\\displaystyle {\\\\begin{aligned}&\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\beta \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\mu \\\\to 0}{\\\\text{excess kurtosis}}=\\\\lim \\_{\\\\mu \\\\to 1}{\\\\text{excess kurtosis}}=\\\\infty \\\\\\\\&\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\frac {6}{\\\\beta }},{\\\\text{ }}\\\\lim \\_{\\\\beta \\\\to 0}(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\beta \\\\to \\\\infty }(\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\text{excess kurtosis}})=0\\\\\\\\&\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}}={\\\\frac {6}{\\\\alpha }},{\\\\text{ }}\\\\lim \\_{\\\\alpha \\\\to 0}(\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\alpha \\\\to \\\\infty }(\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\text{excess kurtosis}})=0\\\\\\\\&\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}}=-6+{\\\\frac {1}{\\\\mu (1-\\\\mu )}},{\\\\text{ }}\\\\lim \\_{\\\\mu \\\\to 0}(\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}})=\\\\infty ,{\\\\text{ }}\\\\lim \\_{\\\\mu \\\\to 1}(\\\\lim \\_{\\\\nu \\\\to 0}{\\\\text{excess kurtosis}})=\\\\infty \\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/28a968a269123a07cf9a32ccfe2d75ee09e42460)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c9\/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg\/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/26\/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg\/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg)\n\n### Characteristic function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=24 \"Edit section: Characteristic function\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/24\/Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg\/330px-Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacteristicFunction\\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") symmetric case *α* = *β* ranging from 25 to 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/f3\/Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg\/330px-Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacteristicFunc\\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") symmetric case *α* = *β* ranging from 0 to 25\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/24\/Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg\/330px-Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacteristFunc\\)_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") *β* = *α* + 1\/2; *α* ranging from 25 to 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg\/330px-Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacterFunc\\)_Beta_Distrib._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") *α* = *β* + 1\/2; *β* ranging from 25 to 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/0f\/Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg\/330px-Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Re\\(CharacterFunc\\)_Beta_Distr._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg)\n\n[Re(characteristic function)](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") *α* = *β* + 1\/2; *β* ranging from 0 to 25\n\nThe [characteristic function](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") is the [Fourier transform](https:\/\/en.wikipedia.org\/wiki\/Fourier_transform \"Fourier transform\") of the probability density function. The characteristic function of the beta distribution is [Kummer's confluent hypergeometric function](https:\/\/en.wikipedia.org\/wiki\/Confluent_hypergeometric_function \"Confluent hypergeometric function\") (of the first kind):[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18)[\\[22\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Zwillinger_2014-22)\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\varphi \\_{X}(\\\\alpha ;\\\\beta ;t)&=\\\\operatorname {E} \\\\left\\[e^{itX}\\\\right\\]\\\\\\\\&=\\\\int \\_{0}^{1}e^{itx}f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\&={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\!\\\\\\\\&=\\\\sum \\_{n=0}^{\\\\infty }{\\\\frac {\\\\alpha ^{\\\\overline {n}}(it)^{n}}{(\\\\alpha +\\\\beta )^{\\\\overline {n}}n!}}\\\\\\\\&=1+\\\\sum \\_{k=1}^{\\\\infty }\\\\left(\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}\\\\right){\\\\frac {(it)^{k}}{k!}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2e0e8f5d3bf4ec0cbe9b911e26c961e9ebaefdd8)\n\nwhere\n\n![{\\\\displaystyle x^{\\\\overline {n}}=x(x+1)(x+2)\\\\cdots (x+n-1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4621f37bf97baedc02455832a6345db0233819aa)\n\nis the [rising factorial](https:\/\/en.wikipedia.org\/wiki\/Rising_factorial \"Rising factorial\"). The value of the characteristic function for *t* = 0, is one:\n\n![{\\\\displaystyle \\\\varphi \\_{X}(\\\\alpha ;\\\\beta ;0)={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;0)=1.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4b9e9291a6bd2e67688aa52ef57fdf08b4c50bc4)\n\nAlso, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable *t*:\n\n![{\\\\displaystyle \\\\operatorname {Re} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\right\\]=\\\\operatorname {Re} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\\\right\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/468f2135d76bd1b522c84092679209ff6abd5845) ![{\\\\displaystyle \\\\operatorname {Im} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;it)\\\\right\\]=-\\\\operatorname {Im} \\\\left\\[{}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;-it)\\\\right\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0fca8984292cc85cb8a37ecb5a2b7c04b5596282)\n\nThe symmetric case *α* = *β* simplifies the characteristic function of the beta distribution to a [Bessel function](https:\/\/en.wikipedia.org\/wiki\/Bessel_function \"Bessel function\"), since in the special case *α* + *β* = 2*α* the [confluent hypergeometric function](https:\/\/en.wikipedia.org\/wiki\/Confluent_hypergeometric_function \"Confluent hypergeometric function\") (of the first kind) reduces to a [Bessel function](https:\/\/en.wikipedia.org\/wiki\/Bessel_function \"Bessel function\") (the modified Bessel function of the first kind ![{\\\\displaystyle I\\_{\\\\alpha -{\\\\frac {1}{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/93d199b8c1bcbdb8b27bc66ea2a0eb51102aa71e) ) using [Kummer's](https:\/\/en.wikipedia.org\/wiki\/Ernst_Kummer \"Ernst Kummer\") second transformation as follows:\n\n![{\\\\displaystyle {\\\\begin{aligned}{}\\_{1}F\\_{1}(\\\\alpha ;2\\\\alpha ;it)&=e^{\\\\frac {it}{2}}{}\\_{0}F\\_{1}\\\\left(;\\\\alpha +{\\\\tfrac {1}{2}};{\\\\frac {(it)^{2}}{16}}\\\\right)\\\\\\\\&=e^{\\\\frac {it}{2}}\\\\left({\\\\frac {it}{4}}\\\\right)^{{\\\\frac {1}{2}}-\\\\alpha }\\\\Gamma \\\\left(\\\\alpha +{\\\\tfrac {1}{2}}\\\\right)I\\_{\\\\alpha -{\\\\frac {1}{2}}}\\\\left({\\\\frac {it}{2}}\\\\right).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f6729c240186898dc4200acd85640a1bc43a37f8)\n\nIn the accompanying plots, the [real part](https:\/\/en.wikipedia.org\/wiki\/Complex_number \"Complex number\") (Re) of the [characteristic function](https:\/\/en.wikipedia.org\/wiki\/Characteristic_function_\\(probability_theory\\) \"Characteristic function (probability theory)\") of the beta distribution is displayed for symmetric (*α* = *β*) and skewed (*α* ≠ *β*) cases.\n\n#### Moment generating function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=26 \"Edit section: Moment generating function\")\\]\n\nIt also follows[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9) that the [moment generating function](https:\/\/en.wikipedia.org\/wiki\/Moment_generating_function \"Moment generating function\") is\n\n![{\\\\displaystyle {\\\\begin{aligned}M\\_{X}(\\\\alpha ;\\\\beta ;t)&=\\\\operatorname {E} \\\\left\\[e^{tX}\\\\right\\]\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}e^{tx}f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&={}\\_{1}F\\_{1}(\\\\alpha ;\\\\alpha +\\\\beta ;t)\\\\\\\\\\[4pt\\]&=\\\\sum \\_{n=0}^{\\\\infty }{\\\\frac {\\\\alpha ^{\\\\overline {n}}}{(\\\\alpha +\\\\beta )^{\\\\overline {n}}}}{\\\\frac {t^{n}}{n!}}\\\\\\\\\\[4pt\\]&=1+\\\\sum \\_{k=1}^{\\\\infty }\\\\left(\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}\\\\right){\\\\frac {t^{k}}{k!}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7c5c0eea7bcffadb73fef0c85534a8bdca36ebbb)\n\nIn particular *M**X*(*α*; *β*; 0) = 1.\n\nUsing the [moment generating function](https:\/\/en.wikipedia.org\/wiki\/Moment_generating_function \"Moment generating function\"), the *k*\\-th [raw moment](https:\/\/en.wikipedia.org\/wiki\/Raw_moment \"Raw moment\") is given by[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) the factor\n\n![{\\\\displaystyle \\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2ebd486800857fa26dc780277828ac6e8549b6dd)\n\nmultiplying the (exponential series) term ![{\\\\displaystyle \\\\left({\\\\frac {t^{k}}{k!}}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0fa171a890daf87017345709927744967da720eb) in the series of the [moment generating function](https:\/\/en.wikipedia.org\/wiki\/Moment_generating_function \"Moment generating function\")\n\n![{\\\\displaystyle \\\\operatorname {E} \\[X^{k}\\]={\\\\frac {\\\\alpha ^{\\\\overline {k}}}{(\\\\alpha +\\\\beta )^{\\\\overline {k}}}}=\\\\prod \\_{r=0}^{k-1}{\\\\frac {\\\\alpha +r}{\\\\alpha +\\\\beta +r}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/af7e0927f7f82eccc61e9fb55883ce96d7f958f1)\n\nwhere (*x*)(*k*) is a [Pochhammer symbol](https:\/\/en.wikipedia.org\/wiki\/Pochhammer_symbol \"Pochhammer symbol\") representing rising factorial. It can also be written in a recursive form as\n\n![{\\\\displaystyle \\\\operatorname {E} \\[X^{k}\\]={\\\\frac {\\\\alpha +k-1}{\\\\alpha +\\\\beta +k-1}}\\\\operatorname {E} \\[X^{k-1}\\].}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/069cb373a905b1e8a5a82a0e3b028e88f63672e2)\n\nSince the moment generating function ![{\\\\displaystyle M\\_{X}(\\\\alpha ;\\\\beta ;\\\\cdot )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2da3ca5fb26a0197cfdb61c2bad8a839475aedb4) has a positive radius of convergence,\\[*[citation needed](https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Citation_needed \"Wikipedia:Citation needed\")*\\] the beta distribution is [determined by its moments](https:\/\/en.wikipedia.org\/wiki\/Moment_problem \"Moment problem\").[\\[23\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-23)\n\n#### Moments of transformed random variables\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=28 \"Edit section: Moments of transformed random variables\")\\]\n\n##### Moments of linearly transformed, product and inverted random variables\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=29 \"Edit section: Moments of linearly transformed, product and inverted random variables\")\\]\n\nOne can also show the following expectations for a transformed random variable,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) where the random variable *X* is Beta-distributed with parameters *α* and *β*: *X* ~ Beta(*α*, *β*). The expected value of the variable 1 − *X* is the mirror-symmetry of the expected value based on *X*:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[1-X\\]&={\\\\frac {\\\\beta }{\\\\alpha +\\\\beta }}\\\\\\\\\\\\operatorname {E} \\[X(1-X)\\]&=\\\\operatorname {E} \\[(1-X)X\\]={\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )(\\\\alpha +\\\\beta +1)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/43fc49b9eafccd56d39c236b26d222dde51638ce)\n\nDue to the mirror-symmetry of the probability density function of the beta distribution, the variances based on variables *X* and 1 − *X* are identical, and the covariance on *X*(1 − *X*) is the negative of the variance:\n\n![{\\\\displaystyle \\\\operatorname {var} \\[(1-X)\\]=\\\\operatorname {var} \\[X\\]=-\\\\operatorname {cov} \\[X,(1-X)\\]={\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{2}(\\\\alpha +\\\\beta +1)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7273cc84a6c789724b985c34059fa75a62bce631)\n\nThese are the expected values for inverted variables, (these are related to the harmonic means, see [§ Harmonic mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Harmonic_mean)):\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]&={\\\\frac {\\\\alpha +\\\\beta -1}{\\\\alpha -1}}&&{\\\\text{ if }}\\\\alpha \\>1\\\\\\\\\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]&={\\\\frac {\\\\alpha +\\\\beta -1}{\\\\beta -1}}&&{\\\\text{ if }}\\\\beta \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/972a9c1853e6991aac6666b06c0a633a0caad5b7)\n\nThe following transformation by dividing the variable *X* by its mirror-image *X*\/(1 − *X*)) results in the expected value of the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")):[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]&={\\\\frac {\\\\alpha }{\\\\beta -1}}&&{\\\\text{ if }}\\\\beta \\>1\\\\\\\\\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]&={\\\\frac {\\\\beta }{\\\\alpha -1}}&&{\\\\text{ if }}\\\\alpha \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/db2a0928a7f906bcbe5cfd9d0c57713c9ab5cfb7)\n\nVariances of these transformed variables can be obtained by integration, as the expected values of the second moments centered on the corresponding variables:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1}{X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]\\\\right)^{2}\\\\right\\]=\\\\operatorname {var} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1-X}{X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\right)^{2}\\\\right\\]={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{\\\\left(\\\\alpha -2\\\\right)\\\\left(\\\\alpha -1\\\\right)^{2}}}{\\\\text{ if }}\\\\alpha \\>2\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ab8c3db5d04798c92c6360624060ba583ebdf3df)\n\nThe following variance of the variable *X* divided by its mirror-image (*X*\/(1−*X*) results in the variance of the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")):[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {1}{1-X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]\\\\right)^{2}\\\\right\\]=\\\\operatorname {var} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {X}{1-X}}-\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\right)^{2}\\\\right\\]={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{\\\\left(\\\\beta -2\\\\right)\\\\left(\\\\beta -1\\\\right)^{2}}}{\\\\text{ if }}\\\\beta \\>2\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/76d83000746ff4852e63b063948f9db8110d2d22)\n\nThe covariances are:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {X}{1-X}}\\\\right\\]=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {X}{1-X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {1}{1-X}}\\\\right\\]={\\\\frac {\\\\alpha +\\\\beta -1}{(\\\\alpha -1)(\\\\beta -1)}}{\\\\text{ if }}\\\\alpha ,\\\\beta \\>1\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ce6ebb40e9f206c0269353799307092c2edbcfa0) These expectations and variances appear in the four-parameter Fisher information matrix ([§ Fisher information](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Fisher_information).)\n\n##### Moments of logarithmically transformed random variables\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=30 \"Edit section: Moments of logarithmically transformed random variables\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/c\/c8\/Logit.svg\/500px-Logit.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Logit.svg)\n\nPlot of logit(*X*) = ln(*X*\/(1 −*X*)) (vertical axis) vs. *X* in the domain of 0 to 1 (horizontal axis). Logit transformations are interesting, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable\n\nExpected values for [logarithmic transformations](https:\/\/en.wikipedia.org\/wiki\/Logarithm_transformation \"Logarithm transformation\") (useful for [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimates, see [§ Parameter estimation, Maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Parameter_estimation,_Maximum_likelihood)) are discussed in this section. The following logarithmic linear transformations are related to the geometric means *GX* and *G*1−*X* (see [§ Geometric Mean](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_Mean)):\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\[\\\\ln X\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{X}}\\\\right\\],\\\\\\\\\\\\operatorname {E} \\[\\\\ln(1-X)\\]&=\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\].\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/af6c3188ac96160054472db2a23bab9a22f1e486)\n\nWhere the **[digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\")** *ψ*(*α*) is defined as the [logarithmic derivative](https:\/\/en.wikipedia.org\/wiki\/Logarithmic_derivative \"Logarithmic derivative\") of the [gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"):[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18)\n\n![{\\\\displaystyle \\\\psi (\\\\alpha )={\\\\frac {d}{d\\\\alpha }}\\\\ln \\\\Gamma (\\\\alpha )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7990c6470eee454b17e98d894aa6a2b31960a8f7)\n\n[Logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformations are interesting,[\\[24\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-MacKay-24) as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]&=\\\\psi (\\\\alpha )-\\\\psi (\\\\beta )=\\\\operatorname {E} \\[\\\\ln X\\]+\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\],\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]&=\\\\psi (\\\\beta )-\\\\psi (\\\\alpha )=-\\\\operatorname {E} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\].\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a30b0fbbefff93de41d6775f2ab623670f09a4b2)\n\nJohnson[\\[25\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JohnsonLogInv-25) considered the distribution of the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") – transformed variable ln(*X*\/1 − *X*), including its moment generating function and approximations for large values of the shape parameters. This transformation extends the finite support \\[0, 1\\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the [logistic-beta distribution](https:\/\/en.wikipedia.org\/wiki\/Logistic-beta_distribution \"Logistic-beta distribution\").\n\nHigher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {E} \\\\left\\[\\\\ln ^{2}(X)\\\\right\\]&=(\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta ))^{2}+\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln ^{2}(1-X)\\\\right\\]&=(\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta ))^{2}+\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {E} \\\\left\\[\\\\ln(X)\\\\ln(1-X)\\\\right\\]&=(\\\\psi (\\\\alpha )-\\\\psi (\\\\alpha +\\\\beta ))(\\\\psi (\\\\beta )-\\\\psi (\\\\alpha +\\\\beta ))-\\\\psi \\_{1}(\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b42eb1276e349df39df3051df11e0e16afe88e2e)\n\ntherefore the [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of the logarithmic variables and [covariance](https:\/\/en.wikipedia.org\/wiki\/Covariance \"Covariance\") of ln(*X*) and ln(1−*X*) are:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]&=\\\\operatorname {E} \\\\left\\[\\\\ln X\\\\ln(1-X)\\\\right\\]-\\\\operatorname {E} \\[\\\\ln X\\]\\\\operatorname {E} \\[\\\\ln(1-X)\\]\\\\\\\\&=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&\\\\\\\\\\\\operatorname {var} \\[\\\\ln X\\]&=\\\\operatorname {E} \\[\\\\ln ^{2}X\\]-(\\\\operatorname {E} \\[\\\\ln X\\])^{2}\\\\\\\\&=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&=\\\\psi \\_{1}(\\\\alpha )+\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\\\\\&\\\\\\\\\\\\operatorname {var} \\[\\\\ln(1-X)\\]&=\\\\operatorname {E} \\[\\\\ln ^{2}(1-X)\\]-(\\\\operatorname {E} \\[\\\\ln(1-X)\\])^{2}\\\\\\\\&=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\&=\\\\psi \\_{1}(\\\\beta )+\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53f5f8222528ab3aa1e5f610f49d440d348f7d40)\n\nwhere the **[trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), and is defined as the derivative of the [digamma](https:\/\/en.wikipedia.org\/wiki\/Digamma \"Digamma\") function:\n\n![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{d\\\\alpha ^{2}}}={\\\\frac {d\\\\psi (\\\\alpha )}{d\\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/041bbff527a17a19f628022e9d3bbb548e7c9f87)\n\nThe variances and covariance of the logarithmically transformed variables *X* and (1 − *X*) are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables *X* and (1 − *X*), as the logarithm approaches negative infinity for the variable approaching zero.\n\nThese logarithmic variances and covariance are the elements of the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\") matrix for the beta distribution. They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation).\n\nThe variances of the log inverse variables are identical to the variances of the log variables:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1}{X}}\\\\right\\]&=\\\\operatorname {var} \\[\\\\ln X\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ),\\\\\\\\\\\\operatorname {cov} \\\\left\\[\\\\ln {\\\\frac {1}{X}},\\\\,\\\\ln {\\\\frac {1}{1-X}}\\\\right\\]&=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/18739db82ac6e431571537bb8f09ff006d670b04)\n\nIt also follows that the variances of the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\")\\-transformed variables are\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]&=\\\\operatorname {var} \\\\left\\[\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\&=-\\\\operatorname {cov} \\\\left\\[\\\\ln {\\\\frac {X}{1-X}},\\\\,\\\\ln {\\\\frac {1-X}{X}}\\\\right\\]\\\\\\\\\\[1ex\\]&=\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b0f473b85a33bfd20c16bd2e752af80def37388a)\n\n### Quantities of information (entropy)\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=31 \"Edit section: Quantities of information (entropy)\")\\]\n\nGiven a beta distributed random variable, *X* ~ Beta(*α*, *β*), the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") of *X* is (measured in [nats](https:\/\/en.wikipedia.org\/wiki\/Nat_\\(unit\\) \"Nat (unit)\")),[\\[26\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-26) the expected value of the negative of the logarithm of the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\"):\n\n![{\\\\displaystyle {\\\\begin{aligned}h(X)&=\\\\operatorname {E} \\\\left\\[-\\\\ln f(X;\\\\alpha ,\\\\beta )\\\\right\\]\\\\\\\\\\[4pt\\]&=\\\\int \\_{0}^{1}-f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-(\\\\alpha -1)\\\\psi (\\\\alpha )-(\\\\beta -1)\\\\psi (\\\\beta )+(\\\\alpha +\\\\beta -2)\\\\psi (\\\\alpha +\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ee7535c40811773d4239dc63ea6c2200c4b7a63c)\n\nwhere *f*(*x*; *α*, *β*) is the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") of the beta distribution:\n\n![{\\\\displaystyle f(x;\\\\alpha ,\\\\beta )={\\\\frac {x^{\\\\alpha -1}\\\\left(1-x\\\\right)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4fe624be6412bdc2abe8b1775b1ca44b5a9ea27c)\n\nThe [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\") *ψ* appears in the formula for the differential entropy as a consequence of Euler's integral formula for the [harmonic numbers](https:\/\/en.wikipedia.org\/wiki\/Harmonic_number \"Harmonic number\") which follows from the integral:\n\n![{\\\\displaystyle \\\\int \\_{0}^{1}{\\\\frac {1-x^{\\\\alpha -1}}{1-x}}\\\\,dx=\\\\psi (\\\\alpha )-\\\\psi (1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/364bd5d460a8db9c5038b2f19cc1e5d088671ae8)\n\nThe [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") of the beta distribution is negative for all values of *α* and *β* greater than zero, except at *α* = *β* = 1 (for which values the beta distribution is the same as the [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\")), where the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") reaches its [maximum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of zero. It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable.\n\nFor *α* or *β* approaching zero, the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") approaches its [minimum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of negative infinity. For (either or both) *α* or *β* approaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both) *α* or *β* approaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order. If either *α* or *β* approaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else. If both shape parameters are equal (the symmetric case), *α* = *β*, and they approach infinity simultaneously, the probability density becomes a spike ([Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\")) concentrated at the middle *x* = 1\/2, and hence there is 100% probability at the middle *x* = 1\/2 and zero probability everywhere else.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e2\/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/2a\/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg\/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg)\n\nThe (continuous case) [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") was introduced by Shannon in his original paper (where he named it the \"entropy of a continuous distribution\"), as the concluding part of the same paper where he defined the [discrete entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\").[\\[27\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-27) It is known since then that the differential entropy may differ from the [infinitesimal](https:\/\/en.wikipedia.org\/wiki\/Infinitesimal \"Infinitesimal\") limit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy.\n\nGiven two beta distributed random variables, *X*1 ~ Beta(*α*, *β*) and *X*2 ~ Beta(*α′*, *β′*), the [cross-entropy](https:\/\/en.wikipedia.org\/wiki\/Cross-entropy \"Cross-entropy\") is (measured in nats)[\\[28\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Cover_and_Thomas-28)\n\n![{\\\\displaystyle {\\\\begin{aligned}H(X\\_{1},X\\_{2})&=\\\\int \\_{0}^{1}-f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ',\\\\beta ')\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\ln \\\\mathrm {B} (\\\\alpha ',\\\\beta ')-(\\\\alpha '-1)\\\\psi (\\\\alpha )-(\\\\beta '-1)\\\\psi (\\\\beta )+\\\\left(\\\\alpha '+\\\\beta '-2\\\\right)\\\\psi (\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f9e7662d68f802bd6e6b1019f0eb46e6a4bfc0a4)\n\nThe [cross entropy](https:\/\/en.wikipedia.org\/wiki\/Cross_entropy \"Cross entropy\") has been used as an error metric to measure the distance between two hypotheses.[\\[29\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Plunkett-29)[\\[30\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Nallapati-30) Its absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood [\\[28\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Cover_and_Thomas-28)(see section on \"Parameter estimation. Maximum likelihood estimation\")).\n\nThe relative entropy, or [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") *D*KL(*X*1 \\|\\| *X*2), is a measure of the inefficiency of assuming that the distribution is *X*2 ~ Beta(*α′*, *β′*) when the distribution is really *X*1 ~ Beta(*α*, *β*). It is defined as follows (measured in nats).\n\n![{\\\\displaystyle {\\\\begin{aligned}D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})&=\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\,\\\\ln {\\\\frac {f(x;\\\\alpha ,\\\\beta )}{f(x;\\\\alpha ',\\\\beta ')}}\\\\,dx\\\\\\\\\\[4pt\\]&=\\\\left(\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ,\\\\beta )\\\\,dx\\\\right)-\\\\left(\\\\int \\_{0}^{1}f(x;\\\\alpha ,\\\\beta )\\\\ln f(x;\\\\alpha ',\\\\beta ')\\\\,dx\\\\right)\\\\\\\\\\[4pt\\]&=-h(X\\_{1})+H(X\\_{1},X\\_{2})\\\\\\\\\\[4pt\\]&=\\\\ln {\\\\frac {\\\\mathrm {B} (\\\\alpha ',\\\\beta ')}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}+\\\\left(\\\\alpha -\\\\alpha '\\\\right)\\\\psi (\\\\alpha )+\\\\left(\\\\beta -\\\\beta '\\\\right)\\\\psi (\\\\beta )+\\\\left(\\\\alpha '-\\\\alpha +\\\\beta '-\\\\beta \\\\right)\\\\psi (\\\\alpha +\\\\beta ).\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/533f00a1d061ebd96a170013f1339d34fc8f1322)\n\nThe relative entropy, or [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\"), is always non-negative. A few numerical examples follow:\n\n- *X*1 ~ Beta(1, 1) and *X*2 ~ Beta(3, 3); *D*KL(*X*1 \\|\\| *X*2) = 0.598803; *D*KL(*X*2 \\|\\| *X*1) = 0.267864; *h*(*X*1) = 0; *h*(*X*2) = −0.267864\n- *X*1 ~ Beta(3, 0.5) and *X*2 ~ Beta(0.5, 3); *D*KL(*X*1 \\|\\| *X*2) = 7.21574; *D*KL(*X*2 \\|\\| *X*1) = 7.21574; *h*(*X*1) = −1.10805; *h*(*X*2) = −1.10805.\n\nThe [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") is not symmetric *D*KL(*X*1 \\|\\| *X*2) ≠ *D*KL(*X*2 \\|\\| *X*1) for the case in which the individual beta distributions Beta(1, 1) and Beta(3, 3) are symmetric, but have different entropies *h*(*X*1) ≠ *h*(*X*2). The value of the Kullback divergence depends on the direction traveled: whether going from a higher (differential) entropy to a lower (differential) entropy or the other way around. In the numerical example above, the Kullback divergence measures the inefficiency of assuming that the distribution is (bell-shaped) Beta(3, 3), rather than (uniform) Beta(1, 1). The \"h\" entropy of Beta(1, 1) is higher than the \"h\" entropy of Beta(3, 3) because the uniform distribution Beta(1, 1) has a maximum amount of disorder. The Kullback divergence is more than two times higher (0.598803 instead of 0.267864) when measured in the direction of decreasing entropy: the direction that assumes that the (uniform) Beta(1, 1) distribution is (bell-shaped) Beta(3, 3) rather than the other way around. In this restricted sense, the Kullback divergence is consistent with the [second law of thermodynamics](https:\/\/en.wikipedia.org\/wiki\/Second_law_of_thermodynamics \"Second law of thermodynamics\").\n\nThe [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") is symmetric *D*KL(*X*1 \\|\\| *X*2) = *D*KL(*X*2 \\|\\| *X*1) for the skewed cases Beta(3, 0.5) and Beta(0.5, 3) that have equal differential entropy *h*(*X*1) = *h*(*X*2).\n\nThe symmetry condition:\n\n![{\\\\displaystyle D\\_{\\\\mathrm {KL} }(X\\_{1}\\\\parallel X\\_{2})=D\\_{\\\\mathrm {KL} }(X\\_{2}\\\\parallel X\\_{1}),{\\\\text{ if }}h(X\\_{1})=h(X\\_{2}),{\\\\text{ for (skewed) }}\\\\alpha \\\\neq \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba60c74ffe89210448b12938900faa4019c8daea)\n\nfollows from the above definitions and the mirror-symmetry *f*(*x*; *α*, *β*) = *f*(1 − *x*; *α*, *β*) enjoyed by the beta distribution.\n\n### Relationships between statistical measures\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=32 \"Edit section: Relationships between statistical measures\")\\]\n\n#### Mean, mode and median relationship\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=33 \"Edit section: Mean, mode and median relationship\")\\]\n\nIf 1 \\< *α* \\< *β* then mode ≤ median ≤ mean.[\\[10\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Kerman2011-10) Expressing the mode (only for *α*, *β* \\> 1), and the mean in terms of *α* and *β*:\n\n![{\\\\displaystyle {\\\\frac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}\\\\leq {\\\\text{median}}\\\\leq {\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bce75b15e77da7773748f47ccb73d08207595281)\n\nIf 1 \\< *β* \\< *α* then the order of the inequalities are reversed. For *α*, *β* \\> 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of *x*. On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of *x*, for the ([pathological](https:\/\/en.wikipedia.org\/wiki\/Pathological_\\(mathematics\\) \"Pathological (mathematics)\")) case of *α* = 1 and *β* = 1, for which values the beta distribution approaches the uniform distribution and the [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") approaches its [maximum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value, and hence maximum \"disorder\".\n\nFor example, for *α* = 1.0001 and *β* = 1.00000001:\n\n- mode = 0.9999; PDF(mode) = 1.00010\n- mean = 0.500025; PDF(mean) = 1.00003\n- median = 0.500035; PDF(median) = 1.00003\n- mean − mode = −0.499875\n- mean − median = −9.65538 × 10−6\n\nwhere PDF stands for the value of the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\").\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/ff\/Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/88\/Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg\/330px-Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)\n\n#### Mean, geometric mean and harmonic mean relationship\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=34 \"Edit section: Mean, geometric mean and harmonic mean relationship\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/a0\/Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png\/250px-Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Mean,_Median,_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png)\n\n:Mean, median, geometric mean and harmonic mean for beta distribution with 0 \\< *α* = *β* \\< 5\n\nIt is known from the [inequality of arithmetic and geometric means](https:\/\/en.wikipedia.org\/wiki\/Inequality_of_arithmetic_and_geometric_means \"Inequality of arithmetic and geometric means\") that the geometric mean is lower than the mean. Similarly, the harmonic mean is lower than the geometric mean. The accompanying plot shows that for *α* = *β*, both the mean and the median are exactly equal to 1\/2, regardless of the value of *α* = *β*, and the mode is also equal to 1\/2 for *α* = *β* \\> 1, however the geometric and harmonic means are lower than 1\/2 and they only approach this value asymptotically as *α* = *β* → ∞.\n\n#### Kurtosis bounded by the square of the skewness\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=35 \"Edit section: Kurtosis bounded by the square of the skewness\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png\/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:\\(alpha_and_beta\\)_Parameter_estimates_vs._excess_Kurtosis_and_\\(squared\\)_Skewness_Beta_distribution_-_J._Rodal.png)\n\nBeta distribution *α* and *β* parameters vs. excess kurtosis and squared skewness\n\nAs remarked by [Feller](https:\/\/en.wikipedia.org\/wiki\/William_Feller \"William Feller\"),[\\[5\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Feller-5) in the [Pearson system](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\") the beta probability density appears as [type I](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\") (any difference between the beta distribution and Pearson's type I distribution is only superficial and it makes no difference for the following discussion regarding the relationship between kurtosis and skewness). [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") showed, in Plate 1 of his paper [\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) published in 1916, a graph with the [kurtosis](https:\/\/en.wikipedia.org\/wiki\/Kurtosis \"Kurtosis\") as the vertical axis ([ordinate](https:\/\/en.wikipedia.org\/wiki\/Ordinate \"Ordinate\")) and the square of the [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") as the horizontal axis ([abscissa](https:\/\/en.wikipedia.org\/wiki\/Abscissa \"Abscissa\")), in which a number of distributions were displayed.[\\[31\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Egon-31) The region occupied by the beta distribution is bounded by the following two [lines](https:\/\/en.wikipedia.org\/wiki\/Line_\\(geometry\\) \"Line (geometry)\") in the (skewness2,kurtosis) [plane](https:\/\/en.wikipedia.org\/wiki\/Cartesian_coordinate_system \"Cartesian coordinate system\"), or the (skewness2,excess kurtosis) [plane](https:\/\/en.wikipedia.org\/wiki\/Cartesian_coordinate_system \"Cartesian coordinate system\"):\n\n![{\\\\displaystyle ({\\\\text{skewness}})^{2}+1\\<{\\\\text{kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}+3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2061054109329b1c61c8bc91e10a8a45e80b9dc2)\n\nor, equivalently,\n\n![{\\\\displaystyle ({\\\\text{skewness}})^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\frac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f75ee7e52ed7a7bceb6f1d754ee547640bebb19f)\n\nAt a time when there were no powerful digital computers, [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") accurately computed further boundaries,[\\[32\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Hahn_and_Shapiro-32)[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) for example, separating the \"U-shaped\" from the \"J-shaped\" distributions. The lower boundary line (excess kurtosis + 2 − skewness2 = 0) is produced by skewed \"U-shaped\" beta distributions with both values of shape parameters *α* and *β* close to zero. The upper boundary line (excess kurtosis − (3\/2) skewness2 = 0) is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") showed[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21) that this upper boundary line (excess kurtosis − (3\/2) skewness2 = 0) is also the intersection with Pearson's distribution III, which has unlimited support in one direction (towards positive infinity), and can be bell-shaped or J-shaped. His son, [Egon Pearson](https:\/\/en.wikipedia.org\/wiki\/Egon_Pearson \"Egon Pearson\"), showed[\\[31\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Egon-31) that the region (in the kurtosis\/squared-skewness plane) occupied by the beta distribution (equivalently, Pearson's distribution I) as it approaches this boundary (excess kurtosis − (3\/2) skewness2 = 0) is shared with the [noncentral chi-squared distribution](https:\/\/en.wikipedia.org\/wiki\/Noncentral_chi-squared_distribution \"Noncentral chi-squared distribution\"). Karl Pearson[\\[33\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1895-33) (Pearson 1895, pp. 357, 360, 373–376) also showed that the [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\") is a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6\/*k* and the square of the skewness is 4\/*k*, hence (excess kurtosis − (3\/2) skewness2 = 0) is identically satisfied by the gamma distribution regardless of the value of the parameter \"k\"). Pearson later noted that the [chi-squared distribution](https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution \"Chi-squared distribution\") is a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the [chi-squared distribution](https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution \"Chi-squared distribution\") the excess kurtosis is 12\/*k* and the square of the skewness is 8\/*k*, hence (excess kurtosis − (3\/2) skewness2 = 0) is identically satisfied regardless of the value of the parameter \"k\"). This is to be expected, since the chi-squared distribution *X* ~ χ2(*k*) is a special case of the gamma distribution, with parametrization X ~ Γ(k\/2, 1\/2) where k is a positive integer that specifies the \"number of degrees of freedom\" of the chi-squared distribution.\n\nAn example of a beta distribution near the upper boundary (excess kurtosis − (3\/2) skewness2 = 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)\/(skewness2) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness2 = 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)\/(skewness2) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both *α* and *β* approaching zero symmetrically, the excess kurtosis reaches its minimum value at −2. This minimum value occurs at the point at which the lower boundary line intersects the vertical axis ([ordinate](https:\/\/en.wikipedia.org\/wiki\/Ordinate \"Ordinate\")). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards).\n\nValues for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") appropriately called the region below this boundary the \"impossible region\". The boundary for this \"impossible region\" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters *α* and *β* approach zero and hence all the probability density is concentrated at the ends: *x* = 0, 1 with practically nothing in between them. Since for *α* ≈ *β* ≈ 0 the probability density is concentrated at the two ends *x* = 0 and *x* = 1, this \"impossible boundary\" is determined by a [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), where the two only possible outcomes occur with respective probabilities *p* and *q* = 1 − *p*. For cases approaching this limit boundary with symmetry *α* = *β*, skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are *p* ≈ *q* ≈ 1\/2. For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness2, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities ![{\\\\displaystyle p={\\\\tfrac {\\\\beta }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and ![{\\\\displaystyle q=1-p={\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1.\n\nAll statements are conditional on *α*, *β* \\> 0:\n\n### Geometry of the probability density function\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=37 \"Edit section: Geometry of the probability density function\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/e\/e0\/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg\/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg)\n\nInflection point location versus α and β showing regions with one inflection point\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/37\/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg\/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg)\n\nInflection point location versus α and β showing region with two inflection points\n\nFor certain values of the shape parameters α and β, the [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\") has [inflection points](https:\/\/en.wikipedia.org\/wiki\/Inflection_points \"Inflection points\"), at which the [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") changes sign. The position of these inflection points can be useful as a measure of the [dispersion](https:\/\/en.wikipedia.org\/wiki\/Statistical_dispersion \"Statistical dispersion\") or spread of the distribution.\n\nDefining the following quantity:\n\n![{\\\\displaystyle \\\\kappa ={\\\\frac {\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0a25618bff371a2cacfa4d37cc75a397fc79eda3)\n\nPoints of inflection occur,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[8\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Wadsworth-8)[\\[9\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\\[20\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Panik-20) depending on the value of the shape parameters *α* and *β*, as follows:\n\n- (*α* \\> 2, *β* \\> 2) The distribution is bell-shaped (symmetric for *α* = *β* and skewed otherwise), with **two inflection points**, equidistant from the mode:\n\n![{\\\\displaystyle x={\\\\text{mode}}\\\\pm \\\\kappa ={\\\\frac {\\\\alpha -1\\\\pm {\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/824e9ad23c78338bd281a68fa12cd6488803800a)\n\n- (*α* = 2, *β* \\> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode:\n\n![{\\\\displaystyle x={\\\\text{mode}}+\\\\kappa ={\\\\frac {2}{\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/620d8f2d2763e7caf8633301e4142d00a754e60a)\n\n- (*α* \\> 2, β = 2) The distribution is unimodal, negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode:\n\n![{\\\\displaystyle x={\\\\text{mode}}-\\\\kappa =1-{\\\\frac {2}{\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1c3330dc70173881e3a955fb2d48c7531ce4501c)\n\n- (1 \\< *α* \\< 2, β \\> 2, *α* + *β* \\> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode:\n\n![{\\\\displaystyle x={\\\\text{mode}}+\\\\kappa ={\\\\frac {\\\\alpha -1+{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f6f4879f21644e05c71610250aae2fdf73b14418)\n\n- (0 \\< *α* \\< 1, 1 \\< *β* \\< 2) The distribution has a mode at the left end *x* = 0 and it is positively skewed, right-tailed. There is **one inflection point**, located to the right of the mode:\n\n![{\\\\displaystyle x={\\\\frac {\\\\alpha -1+{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c27732cec33ec3f248a4a2f1ef607aef29e374a1)\n\n- (*α* \\> 2, 1 \\< *β* \\< 2) The distribution is unimodal negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode:\n\n![{\\\\displaystyle x={\\\\text{mode}}-\\\\kappa ={\\\\frac {\\\\alpha -1-{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/711b8cb900645ea1295fc899aa63bdd0648315d4)\n\n- (1 \\< *α* \\< 2, 0 \\< *β* \\< 1) The distribution has a mode at the right end *x* = 1 and it is negatively skewed, left-tailed. There is **one inflection point**, located to the left of the mode:\n\n![{\\\\displaystyle x={\\\\frac {\\\\alpha -1-{\\\\sqrt {\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{\\\\alpha +\\\\beta -3}}}}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0d9005b2b06e722f6f78c8c0ace994575a519d65)\n\nThere are no inflection points in the remaining (symmetric and skewed) regions: U-shaped: (*α*, *β* \\< 1) upside-down-U-shaped: (1 \\< *α* \\< 2, 1 \\< *β* \\< 2), reverse-J-shaped (*α* \\< 1, *β* \\> 2) or J-shaped: (*α* \\> 2, *β* \\< 1)\n\nThe accompanying plots show the inflection point locations (shown vertically, ranging from 0 to 1) versus *α* and *β* (the horizontal axes ranging from 0 to 5). There are large cuts at surfaces intersecting the lines *α* = 1, *β* = 1, *α* = 2, and *β* = 2 because at these values the beta distribution change from 2 modes, to 1 mode to no mode.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/b9\/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg\/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg)\n\nPDF for symmetric beta distribution vs. *x* and *α* = *β* from 0 to 30\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/4e\/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg\/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg)\n\nPDF for symmetric beta distribution vs. x and *α* = *β* from 0 to 2\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/de\/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg\/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg)\n\nPDF for skewed beta distribution vs. *x* and *β* = 2.5*α* from 0 to 9\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/dc\/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg\/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg)\n\nPDF for skewed beta distribution vs. x and *β* = 5.5*α* from 0 to 9\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/8\/85\/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg\/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg)\n\nPDF for skewed beta distribution vs. x and *β* = 8*α* from 0 to 10\n\nThe beta density function can take a wide variety of different shapes depending on the values of the two parameters *α* and *β*. The ability of the beta distribution to take this great diversity of shapes (using only two parameters) is partly responsible for finding wide application for modeling actual measurements:\n\n- the density function is [symmetric](https:\/\/en.wikipedia.org\/wiki\/Symmetry \"Symmetry\") about 1\/2 (blue & teal plots).\n- median = mean = 1\/2.\n- skewness = 0.\n- variance = 1\/(4(2*α* + 1))\n- ***α* = *β* \\< 1**\n  - U-shaped (blue plot).\n  - bimodal: left mode = 0, right mode =1, anti-mode = 1\/2\n  - 1\/12 \\< var(*X*) \\< 1\/4[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n  - −2 \\< excess kurtosis(*X*) \\< −6\/5\n  - *α* = *β* = 1\/2 is the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\")\n    - var(*X*) = 1\/8\n    - excess kurtosis(*X*) = −3\/2\n    - CF = Rinc (t) [\\[34\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-34)\n  - *α* = *β* → 0 is a 2-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") end *x* = 0 and *x* = 1 and zero probability everywhere else. A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1.\n- **α = β = 1**\n  - the [uniform \\[0, 1\\] distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\")\n  - no mode\n  - var(*X*) = 1\/12\n  - excess kurtosis(*X*) = −6\/5\n  - The (negative anywhere else) [differential entropy](https:\/\/en.wikipedia.org\/wiki\/Information_entropy \"Information entropy\") reaches its [maximum](https:\/\/en.wikipedia.org\/wiki\/Maxima_and_minima \"Maxima and minima\") value of zero\n  - CF = Sinc (t)\n- ***α* = *β* \\> 1**\n  - symmetric [unimodal](https:\/\/en.wikipedia.org\/wiki\/Unimodal \"Unimodal\")\n  - mode = 1\/2.\n  - 0 \\< var(*X*) \\< 1\/12[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n  - −6\/5 \\< excess kurtosis(*X*) \\< 0\n  - *α* = *β* = 3\/2 is a semi-elliptic \\[0, 1\\] distribution, see: [Wigner semicircle distribution](https:\/\/en.wikipedia.org\/wiki\/Wigner_semicircle_distribution \"Wigner semicircle distribution\")[\\[35\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-35)\n    - var(*X*) = 1\/16.\n    - excess kurtosis(*X*) = −1\n    - CF = 2 Jinc (t)\n  - *α* = *β* = 2 is the parabolic \\[0, 1\\] distribution\n    - var(*X*) = 1\/20\n    - excess kurtosis(*X*) = −6\/7\n    - CF = 3 Tinc (t) [\\[36\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-36)\n  - *α* = *β* \\> 2 is bell-shaped, with [inflection points](https:\/\/en.wikipedia.org\/wiki\/Inflection_point \"Inflection point\") located to either side of the mode\n    - 0 \\< var(*X*) \\< 1\/20\n    - −6\/7 \\< excess kurtosis(*X*) \\< 0\n  - *α* = *β* → ∞ is a 1-point [Degenerate distribution](https:\/\/en.wikipedia.org\/wiki\/Degenerate_distribution \"Degenerate distribution\") with a [Dirac delta function](https:\/\/en.wikipedia.org\/wiki\/Dirac_delta_function \"Dirac delta function\") spike at the midpoint *x* = 1\/2 with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the single point *x* = 1\/2.\n\nThe density function is [skewed](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\"). An interchange of parameter values yields the [mirror image](https:\/\/en.wikipedia.org\/wiki\/Mirror_image \"Mirror image\") (the reverse) of the initial curve, some more specific cases:\n\n- ***α* \\< 1, *β* \\< 1**\n  - U-shaped\n  - Positive skew for *α* \\< *β*, negative skew for *α* \\> *β*.\n  - bimodal: left mode = 0, right mode = 1, anti-mode = ![{\\\\displaystyle {\\\\tfrac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1632b6874ce03bd0d75001d5816183986685c998)\n  - 0 \\< median \\< 1.\n  - 0 \\< var(*X*) \\< 1\/4\n- ***α* \\> 1, *β* \\> 1**\n  - [unimodal](https:\/\/en.wikipedia.org\/wiki\/Unimodal \"Unimodal\") (magenta & cyan plots),\n  - Positive skew for *α* \\< *β*, negative skew for *α* \\> *β*.\n  - ![{\\\\displaystyle {\\\\text{mode}}={\\\\tfrac {\\\\alpha -1}{\\\\alpha +\\\\beta -2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2a200ee365bda4944379e77c02b4b8278b4a8815)\n  - 0 \\< median \\< 1\n  - 0 \\< var(*X*) \\< 1\/12\n- ***α* \\< 1, *β* ≥ 1**\n- ***α* ≥ 1, *β* \\< 1**\n- ***α* = 1, *β* \\> 1**\n- **α \\> 1, β = 1**\n\n- If *X* ~ Beta(*α*, *β*) then 1 − *X* ~ Beta(*β*, *α*) [mirror-image](https:\/\/en.wikipedia.org\/wiki\/Mirror_image \"Mirror image\") symmetry\n- If *X* ~ Beta(*α*, *β*) then ![{\\\\displaystyle {\\\\tfrac {X}{1-X}}\\\\sim {\\\\beta '}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4729760d819d7894ca35e889bc9192ad601f5fd1). The [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\"), also called \"beta distribution of the second kind\".\n- If ![{\\\\displaystyle X\\\\sim {\\\\text{Beta}}(\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/54f5f5824479195eafef981fab9a5b7722002d15), then ![{\\\\displaystyle Y=\\\\log {\\\\frac {X}{1-X}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/45a2407291080b9e048ea5234c98581218a8aa46) has a [generalized logistic distribution](https:\/\/en.wikipedia.org\/wiki\/Generalized_logistic_distribution \"Generalized logistic distribution\"), with density ![{\\\\displaystyle {\\\\frac {\\\\sigma (y)^{\\\\alpha }\\\\sigma (-y)^{\\\\beta }}{B(\\\\alpha ,\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9c9e5c0ca7d0c451eeb1d6de5e6756a2ea1044c5), where ![{\\\\displaystyle \\\\sigma }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/59f59b7c3e6fdb1d0365a494b81fb9a696138c36) is the [logistic sigmoid](https:\/\/en.wikipedia.org\/wiki\/Logistic_sigmoid \"Logistic sigmoid\").\n- If *X* ~ Beta(*α*, *β*) then ![{\\\\displaystyle {\\\\tfrac {1}{X}}-1\\\\sim {\\\\beta '}(\\\\beta ,\\\\alpha )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/589767ba92a975c90e9ec4542a6caddacdd2ee04).\n- If ![{\\\\displaystyle X\\\\sim {\\\\text{Beta}}(\\\\alpha \\_{1},\\\\beta \\_{1})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ca69afff636e2e379147763ab1fcd9ad720360cd) and ![{\\\\displaystyle Y\\\\sim {\\\\text{Beta}}(\\\\alpha \\_{2},\\\\beta \\_{2})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68129647ad7f447682d0c7d50ac8eddb789eb0b4) then ![{\\\\displaystyle Z={\\\\tfrac {X}{Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ef4bd5ded089856779a9af407dc80a9c407b6851) has density ![{\\\\displaystyle {\\\\tfrac {B(\\\\alpha \\_{1}+\\\\alpha \\_{2},\\\\beta \\_{2})z^{\\\\alpha \\_{1}-1}{}\\_{2}F\\_{1}(\\\\alpha \\_{1}+\\\\alpha \\_{2},1-\\\\beta \\_{1};\\\\alpha \\_{1}+\\\\alpha \\_{2}+\\\\beta \\_{2};z)}{B(\\\\alpha \\_{1},\\\\beta \\_{1})B(\\\\alpha \\_{2},\\\\beta \\_{2})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1403166914066b28598d36efe3835e4a320ad620) for ![{\\\\displaystyle 0\\<z\\\\leq 1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2fd7451c82b8fde0587bd1de5a14e0384ef31691) and ![{\\\\displaystyle {\\\\tfrac {B(\\\\alpha \\_{1}+\\\\alpha \\_{2},\\\\beta \\_{1})z^{-(\\\\alpha \\_{2}+1)}{}\\_{2}F\\_{1}(\\\\alpha \\_{1}+\\\\alpha \\_{2},1-\\\\beta \\_{2};\\\\alpha \\_{1}+\\\\alpha \\_{2}+\\\\beta \\_{1};{\\\\tfrac {1}{z}})}{B(\\\\alpha \\_{1},\\\\beta \\_{1})B(\\\\alpha \\_{2},\\\\beta \\_{2})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3a5b9b30e91efa5764a5dbaf3d825e6f7c0c9584) for ![{\\\\displaystyle z\\\\geq 1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ec063fd092caf41bf16256d71e392658e394ee1f), where ![{\\\\displaystyle {}\\_{2}F\\_{1}(a,b;c;x)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad33456f0f2b7432a504572054b22396ba9d3692) is the [Hypergeometric function](https:\/\/en.wikipedia.org\/wiki\/Hypergeometric_function \"Hypergeometric function\").[\\[37\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pham-Gia2000-37)\n- If *X* ~ Beta(*n*\/2, *m*\/2) then ![{\\\\displaystyle {\\\\tfrac {mX}{n(1-X)}}\\\\sim F(n,m)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/457612b34711860ab2e561e98784b94be23068fc) (assuming *n* \\> 0 and *m* \\> 0), the [Fisher–Snedecor F distribution](https:\/\/en.wikipedia.org\/wiki\/F-distribution \"F-distribution\").\n- If ![{\\\\displaystyle X\\\\sim \\\\operatorname {Beta} \\\\left(1+\\\\lambda {\\\\tfrac {m-\\\\min }{\\\\max -\\\\min }},1+\\\\lambda {\\\\tfrac {\\\\max -m}{\\\\max -\\\\min }}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cf1ef11aa6ae7b24fc6add2ad05667a5ece7f8b0) then min + *X*(max − min) ~ PERT(min, max, *m*, *λ*) where *PERT* denotes a [PERT distribution](https:\/\/en.wikipedia.org\/wiki\/PERT_distribution \"PERT distribution\") used in [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\") analysis, and *m*\\=most likely value.[\\[38\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-NewPERT-38) Traditionally[\\[39\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Malcolm-39) *λ* = 4 in PERT analysis.\n- If *X* ~ Beta(1, *β*) then *X* ~ [Kumaraswamy distribution](https:\/\/en.wikipedia.org\/wiki\/Kumaraswamy_distribution \"Kumaraswamy distribution\") with parameters (1, *β*)\n- If *X* ~ Beta(*α*, 1) then *X* ~ [Kumaraswamy distribution](https:\/\/en.wikipedia.org\/wiki\/Kumaraswamy_distribution \"Kumaraswamy distribution\") with parameters (*α*, 1)\n- If *X* ~ Beta(*α*, 1) then −ln(*X*) ~ Exponential(*α*)\n\n### Special and limiting cases\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=44 \"Edit section: Special and limiting cases\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/da\/Random_Walk_example.svg\/250px-Random_Walk_example.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Random_Walk_example.svg)\n\nExample of eight realizations of a random walk in one dimension starting at 0: the probability for the time of the last visit to the origin is distributed as Beta(1\/2, 1\/2)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/d\/db\/Arcsin_density.svg\/250px-Arcsin_density.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Arcsin_density.svg)\n\nBeta(1\/2, 1\/2): The [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") probability density was proposed by [Harold Jeffreys](https:\/\/en.wikipedia.org\/wiki\/Harold_Jeffreys \"Harold Jeffreys\") to represent uncertainty for a [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") or a [binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") in [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\"), and is now commonly referred to as [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\"): *p*−1\/2(1 − *p*)−1\/2. This distribution also appears in several [random walk](https:\/\/en.wikipedia.org\/wiki\/Random_walk \"Random walk\") fundamental theorems\n\n- Beta(1, 1) ~ [U(0, 1)](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") with density 1 on that interval.\n- Beta(n, 1) ~ Maximum of *n* independent rvs. with [U(0, 1)](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\"), sometimes called a *a standard power function distribution* with density *n* *x**n*–1 on that interval.\n- Beta(1, n) ~ Minimum of *n* independent rvs. with [U(0, 1)](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") with density *n*(1 − *x*)*n*−1 on that interval.\n- If *X* ~ Beta(3\/2, 3\/2) and *r* \\> 0 then 2*rX* − *r* ~ [Wigner semicircle distribution](https:\/\/en.wikipedia.org\/wiki\/Wigner_semicircle_distribution \"Wigner semicircle distribution\").\n- Beta(1\/2, 1\/2) is equivalent to the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\"). This distribution is also [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability for the [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") and [binomial distributions](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\").\n- ![{\\\\displaystyle \\\\lim \\_{n\\\\to \\\\infty }n\\\\operatorname {Beta} (1,n)=\\\\operatorname {Exponential} (1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e36846ea4b3cb7e13970a0fb5618bb9e53c5b72f) the [exponential distribution](https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution \"Exponential distribution\").\n- ![{\\\\displaystyle \\\\lim \\_{n\\\\to \\\\infty }n\\\\operatorname {Beta} (k,n)=\\\\operatorname {Gamma} (k,1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3f88394ad2fa55d9ea30ac2220c0f12befc29869) the [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\").\n- For large ![{\\\\displaystyle n}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a601995d55609f2d9f5e233e36fbe9ea26011b3b), ![{\\\\displaystyle \\\\operatorname {Beta} (\\\\alpha n,\\\\beta n)\\\\to {\\\\mathcal {N}}\\\\left({\\\\frac {\\\\alpha }{\\\\alpha +\\\\beta }},{\\\\frac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{3}}}{\\\\frac {1}{n}}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/485d916ae0bdbd9f069c23bd938746587ed3b0ab) the [normal distribution](https:\/\/en.wikipedia.org\/wiki\/Normal_distribution \"Normal distribution\"). More precisely, if ![{\\\\displaystyle X\\_{n}\\\\sim \\\\operatorname {Beta} (\\\\alpha n,\\\\beta n)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/39946932655f63cd48bdd025f3fdb545ec112dbb) then ![{\\\\displaystyle {\\\\sqrt {n}}\\\\left(X\\_{n}-{\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9fdaade6d88a840030d07b9c38e2c27d7020f872) converges in distribution to a normal distribution with mean 0 and variance ![{\\\\displaystyle {\\\\tfrac {\\\\alpha \\\\beta }{(\\\\alpha +\\\\beta )^{3}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0114023401152c80846cac38a22673de12583f03) as *n* increases.\n\n### Derived from other distributions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=45 \"Edit section: Derived from other distributions\")\\]\n\n### Combination with other distributions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=46 \"Edit section: Combination with other distributions\")\\]\n\n- *X* ~ Beta(*α*, *β*) and *Y* ~ F(2*β*,2*α*) then ![{\\\\displaystyle \\\\Pr(X\\\\leq {\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta x}})=\\\\Pr(Y\\\\geq x)\\\\,}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ea4e0f1de85d4ebad666d63b2448cb71bea790f3) for all *x* \\> 0.\n\n### Compounding with other distributions\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=47 \"Edit section: Compounding with other distributions\")\\]\n\n- If *p* ~ Beta(α, β) and *X* ~ Bin(*k*, *p*) then *X* ~ [beta-binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Beta-binomial_distribution \"Beta-binomial distribution\")\n- If *p* ~ Beta(α, β) and *X* ~ NB(*r*, *p*) then *X* ~ [beta negative binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_negative_binomial_distribution \"Beta negative binomial distribution\")\n\n## Statistical inference\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=49 \"Edit section: Statistical inference\")\\]\n\n### Parameter estimation\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=50 \"Edit section: Parameter estimation\")\\]\n\n##### Two unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=52 \"Edit section: Two unknown parameters\")\\]\n\nTwo unknown parameters (![{\\\\displaystyle ({\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7877c02d2932e6ce21cf536d4f5bb3949fb7a285) of a beta distribution supported in the \\[0,1\\] interval) can be estimated, using the method of moments, with the first two moments (sample mean and sample variance) as follows. Let:\n\n![{\\\\displaystyle {\\\\text{sample mean(X)}}={\\\\bar {x}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}X\\_{i}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bca659167e8ac6d0b9c74970a03d8e0ceea9cd20)\n\nbe the [sample mean](https:\/\/en.wikipedia.org\/wiki\/Sample_mean \"Sample mean\") estimate and\n\n![{\\\\displaystyle {\\\\text{sample variance(X)}}={\\\\bar {v}}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}\\\\left(X\\_{i}-{\\\\bar {x}}\\\\right)^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b4817e917d747026cb3b0aeb8534247e3e67d7a5)\n\nbe the [sample variance](https:\/\/en.wikipedia.org\/wiki\/Sample_variance \"Sample variance\") estimate. The [method-of-moments](https:\/\/en.wikipedia.org\/wiki\/Method_of_moments_\\(statistics\\) \"Method of moments (statistics)\") estimates of the parameters are\n\n![{\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\bar {x}}\\\\left({\\\\frac {{\\\\bar {x}}(1-{\\\\bar {x}})}{\\\\bar {v}}}-1\\\\right)\\\\ {\\\\text{if}}\\\\ {\\\\bar {v}}\\<{\\\\bar {x}}(1-{\\\\bar {x}}),}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6897ae5cc4c9863166b6c18367d324cbf50b8086) ![{\\\\displaystyle {\\\\hat {\\\\beta }}=(1-{\\\\bar {x}})\\\\left({\\\\frac {{\\\\bar {x}}(1-{\\\\bar {x}})}{\\\\bar {v}}}-1\\\\right)\\\\ {\\\\text{if}}\\\\ {\\\\bar {v}}\\<{\\\\bar {x}}(1-{\\\\bar {x}}).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/622e18f3fe8fa5a34032e278238eff46fcd5f646)\n\nWhen the distribution is required over a known interval other than \\[0, 1\\] with random variable *X*, say \\[*a*, *c*\\] with random variable *Y*, then replace ![{\\\\displaystyle {\\\\bar {x}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/466e03e1c9533b4dab1b9949dad393883f385d80) with ![{\\\\displaystyle {\\\\frac {{\\\\bar {y}}-a}{c-a}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc1c50b9065f399be03b7ccfe3e4cfd1df2c228b) and ![{\\\\displaystyle {\\\\bar {v}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba1d09340f8f6c1979330c2f23e514e38f243a3b) with ![{\\\\displaystyle {\\\\frac {\\\\bar {v\\_{Y}}}{(c-a)^{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/891375739df6a6c363dd4836de599f29de1c790e) in the above couple of equations for the shape parameters (see the \"Four unknown parameters\" section below),[\\[41\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-41) where:\n\n![{\\\\displaystyle {\\\\text{sample mean(Y)}}={\\\\bar {y}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}Y\\_{i}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/064cd9b29b4931f7d973c01358f9b76979148e17) ![{\\\\displaystyle {\\\\text{sample variance(Y)}}={\\\\bar {v}}\\_{Y}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}\\\\left(Y\\_{i}-{\\\\bar {y}}\\\\right)^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/06ce4bb898c414da5e456519e24e270244e3d401)\n\n##### Four unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=53 \"Edit section: Four unknown parameters\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/06\/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png\/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:\\(alpha_and_beta\\)_Parameter_estimates_vs._excess_Kurtosis_and_\\(squared\\)_Skewness_Beta_distribution_-_J._Rodal.png)\n\nSolutions for parameter estimates vs. (sample) excess Kurtosis and (sample) squared Skewness Beta distribution\n\nAll four parameters (![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }},{\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8116b37df2fff6248cb3bce7dd137af10ed8e5ab) of a beta distribution supported in the \\[*a*, *c*\\] interval, see section [\"Alternative parametrizations, Four parameters\"](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Four_parameters)) can be estimated, using the method of moments developed by [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\"), by equating sample and population values of the first four central moments (mean, variance, skewness and excess kurtosis).[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)[\\[43\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton_and_Johnson-43) The excess kurtosis was expressed in terms of the square of the skewness, and the sample size ν = α + β, (see previous section [\"Kurtosis\"](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis)) as follows:\n\n![{\\\\displaystyle {\\\\text{excess kurtosis}}={\\\\frac {6}{3+\\\\nu }}\\\\left({\\\\frac {(2+\\\\nu )}{4}}({\\\\text{skewness}})^{2}-1\\\\right){\\\\text{ if (skewness)}}^{2}-2\\<{\\\\text{excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/40ad8aea80a012f7bbb462295760a9c2c6b2ea49)\n\nOne can use this equation to solve for the sample size ν= α + β in terms of the square of the skewness and the excess kurtosis as follows:[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)\n\n![{\\\\displaystyle {\\\\hat {\\\\nu }}={\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}=3{\\\\frac {({\\\\text{sample excess kurtosis}})-({\\\\text{sample skewness}})^{2}+2}{{\\\\frac {3}{2}}({\\\\text{sample skewness}})^{2}-{\\\\text{(sample excess kurtosis)}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f921c58ebdd7caa136034aee66eb29c214a96ff0) ![{\\\\displaystyle {\\\\text{ if (sample skewness)}}^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2)\n\nThis is the ratio (multiplied by a factor of 3) between the previously derived limit boundaries for the beta distribution in a space (as originally done by Karl Pearson[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21)) defined with coordinates of the square of the skewness in one axis and the excess kurtosis in the other axis (see [§ Kurtosis bounded by the square of the skewness](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness)):\n\nThe case of zero skewness, can be immediately solved because for zero skewness, *α* = *β* and hence *ν* = 2*α* = 2*β*, therefore *α* = *β* = *ν*\/2\n\n![{\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\hat {\\\\beta }}={\\\\frac {\\\\hat {\\\\nu }}{2}}={\\\\frac {{\\\\frac {3}{2}}({\\\\text{sample excess kurtosis}})+3}{-{\\\\text{(sample excess kurtosis)}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e7f2ee658c3932894542979710e9495be0ff74a5) ![{\\\\displaystyle {\\\\text{ if sample skewness}}=0{\\\\text{ and }}-2\\<{\\\\text{sample excess kurtosis}}\\<0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8f99d8fb52fee902bf792a5ca7699a79828a212c)\n\n(Excess kurtosis is negative for the beta distribution with zero skewness, ranging from -2 to 0, so that ![{\\\\displaystyle {\\\\hat {\\\\nu }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ba8c4f0785c6b4c01435dcc0aa5b9cfba84bb1c3) -and therefore the sample shape parameters- is positive, ranging from zero when the shape parameters approach zero and the excess kurtosis approaches -2, to infinity when the shape parameters approach infinity and the excess kurtosis approaches zero).\n\nFor non-zero sample skewness one needs to solve a system of two coupled equations. Since the skewness and the excess kurtosis are independent of the parameters ![{\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bfd585fc7cfe1831b2ddc258427b2e6ca017195d), the parameters ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) can be uniquely determined from the sample skewness and the sample excess kurtosis, by solving the coupled equations with two known variables (sample skewness and sample excess kurtosis) and two unknowns (the shape parameters):\n\n![{\\\\displaystyle ({\\\\text{sample skewness}})^{2}={\\\\frac {4\\\\left({\\\\hat {\\\\beta }}-{\\\\hat {\\\\alpha }}\\\\right)^{2}\\\\left(1+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}\\\\right)}{{\\\\hat {\\\\alpha }}{\\\\hat {\\\\beta }}\\\\left(2+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}\\\\right)^{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1f4c42ca60d8120a9ca85e67c6f6e722b7cbeb8d) ![{\\\\displaystyle {\\\\text{sample excess kurtosis}}={\\\\frac {6}{3+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}}}\\\\left({\\\\frac {(2+{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}{4}}({\\\\text{sample skewness}})^{2}-1\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6e903b748dd46c02966e45c5444f51314c452937) ![{\\\\displaystyle {\\\\text{ if (sample skewness)}}^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2)\n\nresulting in the following solution:[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)\n\n![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}={\\\\frac {\\\\hat {\\\\nu }}{2}}\\\\left(1\\\\pm {\\\\frac {1}{\\\\sqrt {1+{\\\\frac {16({\\\\hat {\\\\nu }}+1)}{({\\\\hat {\\\\nu }}+2)^{2}({\\\\text{sample skewness}})^{2}}}}}}\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9a9b9a43935818dc6c3a75fb41b49d803b2d2741)\n\n![{\\\\displaystyle {\\\\text{ if sample skewness}}\\\\neq 0{\\\\text{ and }}({\\\\text{sample skewness}})^{2}-2\\<{\\\\text{sample excess kurtosis}}\\<{\\\\tfrac {3}{2}}({\\\\text{sample skewness}})^{2}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8d876ee057046414ad35c2551c997575576d025d)\n\nWhere one should take the solutions as follows: ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\>{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bba17be3bb65a91cb1d98c314aa0545401c2109) for (negative) sample skewness \\< 0, and ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\<{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad87e83b4996b5b82de8452d1f861db48a0987ff) for (positive) sample skewness \\> 0.\n\nThe accompanying plot shows these two solutions as surfaces in a space with horizontal axes of (sample excess kurtosis) and (sample squared skewness) and the shape parameters as the vertical axis. The surfaces are constrained by the condition that the sample excess kurtosis must be bounded by the sample squared skewness as stipulated in the above equation. The two surfaces meet at the right edge defined by zero skewness. Along this right edge, both parameters are equal and the distribution is symmetric U-shaped for α = β \\< 1, uniform for α = β = 1, upside-down-U-shaped for 1 \\< α = β \\< 2 and bell-shaped for α = β \\> 2. The surfaces also meet at the front (lower) edge defined by \"the impossible boundary\" line (excess kurtosis + 2 - skewness2 = 0). Along this front (lower) boundary both shape parameters approach zero, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities ![{\\\\displaystyle p={\\\\tfrac {\\\\beta }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and ![{\\\\displaystyle q=1-p={\\\\tfrac {\\\\alpha }{\\\\alpha +\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1. The two surfaces become further apart towards the rear edge. At this rear edge the surface parameters are quite different from each other. As remarked, for example, by Bowman and Shenton,[\\[44\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BowmanShenton-44) sampling in the neighborhood of the line (sample excess kurtosis - (3\/2)(sample skewness)2 = 0) (the just-J-shaped portion of the rear edge where blue meets beige), \"is dangerously near to chaos\", because at that line the denominator of the expression above for the estimate ν = α + β becomes zero and hence ν approaches infinity as that line is approached. Bowman and Shenton [\\[44\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BowmanShenton-44) write that \"the higher moment parameters (kurtosis and skewness) are extremely fragile (near that line). However, the mean and standard deviation are fairly reliable.\" Therefore, the problem is for the case of four parameter estimation for very skewed distributions such that the excess kurtosis approaches (3\/2) times the square of the skewness. This boundary line is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. See [§ Kurtosis bounded by the square of the skewness](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness) for a numerical example and further comments about this rear edge boundary line (sample excess kurtosis - (3\/2)(sample skewness)2 = 0). As remarked by Karl Pearson himself [\\[45\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1936-45) this issue may not be of much practical importance as this trouble arises only for very skewed J-shaped (or mirror-image J-shaped) distributions with very different values of shape parameters that are unlikely to occur much in practice). The usual skewed-bell-shape distributions that occur in practice do not have this parameter estimation problem.\n\nThe remaining two parameters ![{\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) can be determined using the sample mean and the sample variance using a variety of equations.[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) One alternative is to calculate the support interval range ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample kurtosis. For this purpose one can solve, in terms of the range ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the excess kurtosis in terms of the sample variance, and the sample size ν (see [§ Kurtosis](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Kurtosis) and [§ Alternative parametrizations, four parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Alternative_parametrizations,_four_parameters)):\n\n![{\\\\displaystyle {\\\\text{sample excess kurtosis}}={\\\\frac {6}{(3+{\\\\hat {\\\\nu }})(2+{\\\\hat {\\\\nu }})}}{\\\\bigg (}{\\\\frac {({\\\\hat {c}}-{\\\\hat {a}})^{2}}{\\\\text{(sample variance)}}}-6-5{\\\\hat {\\\\nu }}{\\\\bigg )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0c460925d92ea38bdf5416e50b2dff13d5293329)\n\nto obtain:\n\n![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})={\\\\sqrt {\\\\text{(sample variance)}}}{\\\\sqrt {6+5{\\\\hat {\\\\nu }}+{\\\\frac {(2+{\\\\hat {\\\\nu }})(3+{\\\\hat {\\\\nu }})}{6}}{\\\\text{(sample excess kurtosis)}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7cdf062d6ad446f67133c1f259b23d81dc80c0e9)\n\nAnother alternative is to calculate the support interval range ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample skewness.[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) For this purpose one can solve, in terms of the range ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the squared skewness in terms of the sample variance, and the sample size ν (see section titled \"Skewness\" and \"Alternative parametrizations, four parameters\"):\n\n![{\\\\displaystyle ({\\\\text{sample skewness}})^{2}={\\\\frac {4}{(2+{\\\\hat {\\\\nu }})^{2}}}{\\\\bigg (}{\\\\frac {({\\\\hat {c}}-{\\\\hat {a}})^{2}}{\\\\text{(sample variance)}}}-4(1+{\\\\hat {\\\\nu }}){\\\\bigg )}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d92409417c7a56192ac25d364a949bbb6eade754)\n\nto obtain:[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42)\n\n![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}})={\\\\frac {\\\\sqrt {\\\\text{(sample variance)}}}{2}}{\\\\sqrt {(2+{\\\\hat {\\\\nu }})^{2}({\\\\text{sample skewness}})^{2}+16(1+{\\\\hat {\\\\nu }})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/94357c829648f16ff30339863e46cb9bc0755da6)\n\nThe remaining parameter can be determined from the sample mean and the previously obtained parameters: ![{\\\\displaystyle ({\\\\hat {c}}-{\\\\hat {a}}),{\\\\hat {\\\\alpha }},{\\\\hat {\\\\nu }}={\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dedfbdca756f6074846d73f732b0289a0751749b):\n\n![{\\\\displaystyle {\\\\hat {a}}=({\\\\text{sample mean}})-\\\\left({\\\\frac {\\\\hat {\\\\alpha }}{\\\\hat {\\\\nu }}}\\\\right)({\\\\hat {c}}-{\\\\hat {a}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5cd4a8f52bbe61a10c591db75f2ac8551280d692)\n\nand finally, ![{\\\\displaystyle {\\\\hat {c}}=({\\\\hat {c}}-{\\\\hat {a}})+{\\\\hat {a}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a703629c1b8091aacfe6a7f8f104ee1f592893db).\n\nIn the above formulas one may take, for example, as estimates of the sample moments:\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\text{sample mean}}&={\\\\overline {y}}={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}Y\\_{i}\\\\\\\\{\\\\text{sample variance}}&={\\\\overline {v}}\\_{Y}={\\\\frac {1}{N-1}}\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{2}\\\\\\\\{\\\\text{sample skewness}}&=G\\_{1}={\\\\frac {N}{(N-1)(N-2)}}{\\\\frac {\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{3}}{{\\\\overline {v}}\\_{Y}^{\\\\frac {3}{2}}}}\\\\\\\\{\\\\text{sample excess kurtosis}}&=G\\_{2}={\\\\frac {N(N+1)}{(N-1)(N-2)(N-3)}}{\\\\frac {\\\\sum \\_{i=1}^{N}(Y\\_{i}-{\\\\overline {y}})^{4}}{{\\\\overline {v}}\\_{Y}^{2}}}-{\\\\frac {3(N-1)^{2}}{(N-2)(N-3)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7afd9f1a8604887fe11cf57117dfea6848023586)\n\nThe estimators *G*1 for [sample skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") and *G*2 for [sample kurtosis](https:\/\/en.wikipedia.org\/wiki\/Kurtosis \"Kurtosis\") are used by [DAP](https:\/\/en.wikipedia.org\/wiki\/DAP_\\(software\\) \"DAP (software)\")\/[SAS](https:\/\/en.wikipedia.org\/wiki\/SAS_System \"SAS System\"), [PSPP](https:\/\/en.wikipedia.org\/wiki\/PSPP \"PSPP\")\/[SPSS](https:\/\/en.wikipedia.org\/wiki\/SPSS \"SPSS\"), and [Excel](https:\/\/en.wikipedia.org\/wiki\/Microsoft_Excel \"Microsoft Excel\"). However, they are not used by [BMDP](https:\/\/en.wikipedia.org\/wiki\/BMDP \"BMDP\") and (according to [\\[46\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Joanes_and_Gill-46)) they were not used by [MINITAB](https:\/\/en.wikipedia.org\/wiki\/MINITAB \"MINITAB\") in 1998. Actually, Joanes and Gill in their 1998 study[\\[46\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Joanes_and_Gill-46) concluded that the skewness and kurtosis estimators used in [BMDP](https:\/\/en.wikipedia.org\/wiki\/BMDP \"BMDP\") and in [MINITAB](https:\/\/en.wikipedia.org\/wiki\/MINITAB \"MINITAB\") (at that time) had smaller variance and mean-squared error in normal samples, but the skewness and kurtosis estimators used in [DAP](https:\/\/en.wikipedia.org\/wiki\/DAP_\\(software\\) \"DAP (software)\")\/[SAS](https:\/\/en.wikipedia.org\/wiki\/SAS_System \"SAS System\"), [PSPP](https:\/\/en.wikipedia.org\/wiki\/PSPP \"PSPP\")\/[SPSS](https:\/\/en.wikipedia.org\/wiki\/SPSS \"SPSS\"), namely *G*1 and *G*2, had smaller mean-squared error in samples from a very skewed distribution. It is for this reason that we have spelled out \"sample skewness\", etc., in the above formulas, to make it explicit that the user should choose the best estimator according to the problem at hand, as the best estimator for skewness and kurtosis depends on the amount of skewness (as shown by Joanes and Gill[\\[46\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Joanes_and_Gill-46)).\n\n##### Two unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=55 \"Edit section: Two unknown parameters\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/5\/58\/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png\/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Max_\\(Joint_Log_Likelihood_per_N\\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png)\n\nMax (joint log likelihood\/*N*) for beta distribution maxima at *α* = *β* = 2\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1d\/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png\/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Max_\\(Joint_Log_Likelihood_per_N\\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25,0.5,1,2,4,6,8_-_J._Rodal.png)\n\nMax (joint log likelihood\/*N*) for Beta distribution maxima at *α* = *β* ∈ {0.25,0.5,1,2,4,6,8}\n\nAs is also the case for [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimates for the [gamma distribution](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution \"Gamma distribution\"), the maximum likelihood estimates for the beta distribution do not have a general closed form solution for arbitrary values of the shape parameters. If *X*1, ..., *XN* are independent random variables each having a beta distribution, the joint log likelihood function for *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\ln \\\\,{\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\mathcal {L}}\\_{i}(\\\\alpha ,\\\\beta \\\\mid X\\_{i})\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln f(X\\_{i};\\\\alpha ,\\\\beta )\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}^{\\\\alpha -1}(1-X\\_{i})^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\\\\\&=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ca9d8693adcca31abd266e44f7de59dffc5e0b17)\n\nFinding the maximum with respect to a shape parameter involves taking the [partial derivative](https:\/\/en.wikipedia.org\/wiki\/Partial_derivative \"Partial derivative\") with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimator of the shape parameters:\n\n![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha }}=\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}-N{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d936dd94d5ad4b3c27e12654cb07764bcead5284) ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta }}=\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta }}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/12538d1b65457b831fbd8127a4820d24632decc1)\n\nwhere:\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha }}&=-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\alpha }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\beta )}{\\\\partial \\\\alpha }}\\\\\\\\\\[1ex\\]&=-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\alpha )+0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/82bf10edac73617ec99a3adbad3ff020391c4a71) ![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\partial \\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta }}&=-{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha +\\\\beta )}{\\\\partial \\\\beta }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\beta }}+{\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\beta )}{\\\\partial \\\\beta }}\\\\\\\\\\[1ex\\]&=-\\\\psi (\\\\alpha +\\\\beta )+0+\\\\psi (\\\\beta )\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/10b097547a81011b4e212824977d66b5350dc780)\n\nsince the **[digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\")** denoted ψ(α) is defined as the [logarithmic derivative](https:\/\/en.wikipedia.org\/wiki\/Logarithmic_derivative \"Logarithmic derivative\") of the [gamma function](https:\/\/en.wikipedia.org\/wiki\/Gamma_function \"Gamma function\"):[\\[18\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Abramowitz-18)\n\n![{\\\\displaystyle \\\\psi (\\\\alpha )={\\\\frac {\\\\partial \\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8a36357d4b6ef30c0ff68e5a25546b7873e11bd4)\n\nTo ensure that the values with zero tangent slope are indeed a maximum (instead of a saddle-point or a minimum) one has to also satisfy the condition that the curvature is negative. This amounts to satisfying that the second partial derivative with respect to the shape parameters is negative\n\n![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha ^{2}}}=-N{\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha ^{2}}}\\<0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4179be42b0a1f18afa18258ee78745457a592e5e) ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta ^{2}}}=-N{\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta ^{2}}}\\<0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bf0985833d003549771de21323f1db4aee770c7a)\n\nusing the previous equations, this is equivalent to:\n\n![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\alpha ^{2}}}=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1e6864a9409f9bba6a7bdda1e43695bad6c61cba) ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}{\\\\partial \\\\beta ^{2}}}=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c8355a7e7f6fa44f71e366b668191826ad5b051b)\n\nwhere the **[trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), and is defined as the derivative of the [digamma](https:\/\/en.wikipedia.org\/wiki\/Digamma \"Digamma\") function:\n\n![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {\\\\partial ^{2}\\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha ^{2}}}=\\\\,{\\\\frac {\\\\partial \\\\,\\\\psi (\\\\alpha )}{\\\\partial \\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/569f3595daf4226763ad208ce13b94c23b84c783)\n\nThese conditions are equivalent to stating that the variances of the logarithmically transformed variables are positive, since:\n\n![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\operatorname {E} \\[\\\\ln ^{2}(X)\\]-(\\\\operatorname {E} \\[\\\\ln(X)\\])^{2}=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d7737d681fea7490e27f8760c6bcc8fccb154904) ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\operatorname {E} \\[\\\\ln ^{2}(1-X)\\]-(\\\\operatorname {E} \\[\\\\ln(1-X)\\])^{2}=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f84c3747955206cf2190c61bd7875a6cd739ac04)\n\nTherefore, the condition of negative curvature at a maximum is equivalent to the statements:\n\n![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(X)\\]\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5fb5a5d0db057469fb9dad8df2902fe93e3f3b0d) ![{\\\\displaystyle \\\\operatorname {var} \\[\\\\ln(1-X)\\]\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c66a5aaa8362f578beb9b141b4108138b9d21e89)\n\nAlternatively, the condition of negative curvature at a maximum is also equivalent to stating that the following [logarithmic derivatives](https:\/\/en.wikipedia.org\/wiki\/Logarithmic_derivative \"Logarithmic derivative\") of the [geometric means](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") *GX* and *G(1−X)* are positive, since:\n\n![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\frac {\\\\partial \\\\ln G\\_{X}}{\\\\partial \\\\alpha }}\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dff4369c6551204082795c96a76d02e1c3c09f1d) ![{\\\\displaystyle \\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\frac {\\\\partial \\\\ln G\\_{(1-X)}}{\\\\partial \\\\beta }}\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8032acad733eb8da414442492eba553fc57b91ee)\n\nWhile these slopes are indeed positive, the other slopes are negative:\n\n![{\\\\displaystyle {\\\\frac {\\\\partial \\\\,\\\\ln G\\_{X}}{\\\\partial \\\\beta }},{\\\\frac {\\\\partial \\\\ln G\\_{1-X}}{\\\\partial \\\\alpha }}\\<0.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/73da64f6bf33fd6c54b1f14562bb49f420a3d9b5)\n\nThe slopes of the mean and the median with respect to *α* and *β* display similar sign behavior.\n\nFrom the condition that at a maximum, the partial derivative with respect to the shape parameter equals zero, we obtain the following system of coupled [maximum likelihood estimate](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood_estimate \"Maximum likelihood estimate\") equations (for the average log-likelihoods) that needs to be inverted to obtain the (unknown) shape parameter estimates ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) in terms of the (known) average of logarithms of the samples *X*1, ..., *XN*:[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1)\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\hat {\\\\operatorname {E} }}\\[\\\\ln(X)\\]&=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}=\\\\ln {\\\\hat {G}}\\_{X}\\\\\\\\{\\\\hat {\\\\operatorname {E} }}\\[\\\\ln(1-X)\\]&=\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})=\\\\ln {\\\\hat {G}}\\_{1-X}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42f099a4869ce2d200dc80c7675a237caae021e6)\n\nwhere we recognize ![{\\\\displaystyle \\\\log {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c12392eb5835891720681a6588ad3f5de311c23e) as the logarithm of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") and ![{\\\\displaystyle \\\\log {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f5d621e56a0238c0b3ae087ae15061fbd94a201c) as the logarithm of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") based on (1 − *X*), the mirror-image of *X*. For ![{\\\\displaystyle {\\\\hat {\\\\alpha }}={\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c8c3e8e6b17efa205ed99e5cdb6b9c673f0c1cd4), it follows that ![{\\\\displaystyle {\\\\hat {G}}\\_{X}={\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5159708756a9966cd39e45bd9256d1a613f9440b).\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\hat {G}}\\_{X}&=\\\\prod \\_{i=1}^{N}(X\\_{i})^{1\/N}\\\\\\\\{\\\\hat {G}}\\_{1-X}&=\\\\prod \\_{i=1}^{N}(1-X\\_{i})^{1\/N}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/77612533161d151c46ed38d52609f9ca1ddaab44)\n\nThese coupled equations containing [digamma functions](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\") of the shape parameter estimates ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) must be solved by numerical methods as done, for example, by Beckman et al.[\\[47\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-47) Gnanadesikan et al. give numerical solutions for a few cases.[\\[48\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-48) [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) suggest that for \"not too small\" shape parameter estimates ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07), the logarithmic approximation to the digamma function ![{\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})\\\\approx \\\\ln({\\\\hat {\\\\alpha }}-{\\\\tfrac {1}{2}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a3ec1442b30cbf1dc905e33ad571d2e53326bba1) may be used to obtain initial values for an iterative solution, since the equations resulting from this approximation can be solved exactly:\n\n![{\\\\displaystyle \\\\ln {\\\\frac {{\\\\hat {\\\\alpha }}-{\\\\frac {1}{2}}}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}}\\\\approx \\\\ln {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5aa1c1d36d619455ea3d54bf713efe79ebeafea1) ![{\\\\displaystyle \\\\ln {\\\\frac {{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-{\\\\frac {1}{2}}}}\\\\approx \\\\ln {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7ab9f6dd9b841f20d8cbba1c6c55709ce08905e8)\n\nwhich leads to the following solution for the initial values (of the estimate shape parameters in terms of the sample geometric means) for an iterative solution:\n\n![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\\\approx {\\\\frac {1}{2}}+{\\\\frac {{\\\\hat {G}}\\_{X}}{2\\\\left(1-{\\\\hat {G}}\\_{X}-{\\\\hat {G}}\\_{1-X}\\\\right)}}{\\\\text{ if }}{\\\\hat {\\\\alpha }}\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/983d617675f31cf20c27ac710259e57964c33c32) ![{\\\\displaystyle {\\\\hat {\\\\beta }}\\\\approx {\\\\frac {1}{2}}+{\\\\frac {{\\\\hat {G}}\\_{1-X}}{2\\\\left(1-{\\\\hat {G}}\\_{X}-{\\\\hat {G}}\\_{1-X}\\\\right)}}{\\\\text{ if }}{\\\\hat {\\\\beta }}\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/540effc6d6fe123a575d06e5502edc57188d8afa)\n\nAlternatively, the estimates provided by the method of moments can instead be used as initial values for an iterative solution of the maximum likelihood coupled equations in terms of the digamma functions.\n\nWhen the distribution is required over a known interval other than \\[0, 1\\] with random variable *X*, say \\[*a*, *c*\\] with random variable *Y*, then replace ln(*Xi*) in the first equation with\n\n![{\\\\displaystyle \\\\ln {\\\\frac {Y\\_{i}-a}{c-a}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6321d786e8900bc2fe0be4b9eca1e856fe524633)\n\nand replace ln(1−*Xi*) in the second equation with\n\n![{\\\\displaystyle \\\\ln {\\\\frac {c-Y\\_{i}}{c-a}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f56e30b510f8089a8d4fe6c41fca7ec41e6cb056)\n\n(see \"Alternative parametrizations, four parameters\" section below).\n\nIf one of the shape parameters is known, the problem is considerably simplified. The following [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation can be used to solve for the unknown shape parameter (for skewed cases such that ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\\\neq {\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/88de4dc6f2131efeb9861f9db76d8969f4d87db8), otherwise, if symmetric, both -equal- parameters are known when one is known):\n\n![{\\\\displaystyle {\\\\hat {\\\\operatorname {E} }}\\\\left\\[\\\\ln {\\\\frac {X}{1-X}}\\\\right\\]=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\beta }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}}{1-X\\_{i}}}=\\\\ln {\\\\hat {G}}\\_{X}-\\\\ln {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ca41c85f9e8cd1b427e96fd209fea0522c951d65)\n\nThis [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation is the logarithm of the transformation that divides the variable *X* by its mirror-image (*X*\/(1 - *X*) resulting in the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")) with support \\[0, +∞). As previously discussed in the section \"Moments of logarithmically transformed random variables,\" the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation ![{\\\\displaystyle \\\\ln {\\\\frac {X}{1-X}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/90cf9c0de659980f879076aa348d83a41ab985b0), studied by Johnson,[\\[25\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JohnsonLogInv-25) extends the finite support \\[0, 1\\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞).\n\nIf, for example, ![{\\\\displaystyle {\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/efdb50e00928e4013750a476dab75eeb3cbd5799) is known, the unknown parameter ![{\\\\displaystyle {\\\\hat {\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682d943d1947245b587f282aba6c88f0870fb302) can be obtained in terms of the inverse[\\[49\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-invpsi.m-49) digamma function of the right hand side of this equation:\n\n![{\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})={\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {X\\_{i}}{1-X\\_{i}}}+\\\\psi ({\\\\hat {\\\\beta }})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/29fdbc8a523905ff0f3b6f20200f8e5b4c258cad) ![{\\\\displaystyle {\\\\hat {\\\\alpha }}=\\\\psi ^{-1}\\\\left(\\\\ln {\\\\hat {G}}\\_{X}-\\\\ln {\\\\hat {G}}\\_{(1-X)}+\\\\psi ({\\\\hat {\\\\beta }})\\\\right)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/73f0a716979c28c7040b8c5595e3d4186d9aec0a)\n\nIn particular, if one of the shape parameters has a value of unity, for example for ![{\\\\displaystyle {\\\\hat {\\\\beta }}=1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a965585e069f798d68c78b5088a53097d4a338b7) (the power function distribution with bounded support \\[0,1\\]), using the identity ψ(*x* + 1) = ψ(*x*) + 1\/*x* in the equation ![{\\\\displaystyle \\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f58f925d2ede5013a91f3206874b2cabc827cb30), the maximum likelihood estimator for the unknown parameter ![{\\\\displaystyle {\\\\hat {\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682d943d1947245b587f282aba6c88f0870fb302) is,[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) exactly:\n\n![{\\\\displaystyle {\\\\hat {\\\\alpha }}=-{\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}}}=-{\\\\frac {1}{\\\\ln {\\\\hat {G}}\\_{X}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/83d2c1dcfb9a5fee567dce7e354517f77ffa0c2f)\n\nThe beta has support \\[0, 1\\], therefore ![{\\\\displaystyle {\\\\hat {G}}\\_{X}\\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/67fd685a3f6f53b538ee1a4e8a3cb21988b806c2), and hence ![{\\\\displaystyle (-\\\\ln {\\\\hat {G}}\\_{X})\\>0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1e33bf71520d5c05a66871474c69a695632be63b), and therefore ![{\\\\displaystyle {\\\\hat {\\\\alpha }}\\>0.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4717b1e2f228d1dcd6ef64e903f377faa8075e44)\n\nIn conclusion, the maximum likelihood estimates of the shape parameters of a beta distribution are (in general) a complicated function of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\"), and of the sample [geometric mean](https:\/\/en.wikipedia.org\/wiki\/Geometric_mean \"Geometric mean\") based on (1−*X*)), the mirror-image of *X*. One may ask, if the variance (in addition to the mean) is necessary to estimate two shape parameters with the method of moments, why is the (logarithmic or geometric) variance not necessary to estimate two shape parameters with the maximum likelihood method, for which only the geometric means suffice? The answer is because the mean does not provide as much information as the geometric mean. For a beta distribution with equal shape parameters *α* = *β*, the mean is exactly 1\/2, regardless of the value of the shape parameters, and therefore regardless of the value of the statistical dispersion (the variance). On the other hand, the geometric mean of a beta distribution with equal shape parameters *α* = *β*, depends on the value of the shape parameters, and therefore it contains more information. Also, the geometric mean of a beta distribution does not satisfy the symmetry conditions satisfied by the mean, therefore, by employing both the geometric mean based on *X* and geometric mean based on (1 − *X*), the maximum likelihood method is able to provide best estimates for both parameters *α* = *β*, without need of employing the variance.\n\nOne can express the joint log likelihood per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations in terms of the *[sufficient statistics](https:\/\/en.wikipedia.org\/wiki\/Sufficient_statistic \"Sufficient statistic\")* (the sample geometric means) as follows:\n\n![{\\\\displaystyle {\\\\frac {\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}=(\\\\alpha -1)\\\\ln {\\\\hat {G}}\\_{X}+(\\\\beta -1)\\\\ln {\\\\hat {G}}\\_{(1-X)}-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta ).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3d51aaa07c41f9f5c49c303cc059ae1aeb54489d)\n\nWe can plot the joint log likelihood per *N* observations for fixed values of the sample geometric means to see the behavior of the likelihood function as a function of the shape parameters α and β. In such a plot, the shape parameter estimators ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c26bb8b654aff9b053b200fa71dce1dac87dfa07) correspond to the maxima of the likelihood function. See the accompanying graph that shows that all the likelihood functions intersect at α = β = 1, which corresponds to the values of the shape parameters that give the maximum entropy (the maximum entropy occurs for shape parameters equal to unity: the uniform distribution). It is evident from the plot that the likelihood function gives sharp peaks for values of the shape parameter estimators close to zero, but that for values of the shape parameters estimators greater than one, the likelihood function becomes quite flat, with less defined peaks. Obviously, the maximum likelihood parameter estimation method for the beta distribution becomes less acceptable for larger values of the shape parameter estimators, as the uncertainty in the peak definition increases with the value of the shape parameter estimators. One can arrive at the same conclusion by noticing that the expression for the curvature of the likelihood function is in terms of the geometric variances\n\n![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\alpha ^{2}}}=-\\\\operatorname {var} \\[\\\\ln X\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/517a09a3b13d22689a3e1e400cbcab3af08e130c) ![{\\\\displaystyle {\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{\\\\partial \\\\beta ^{2}}}=-\\\\operatorname {var} \\[\\\\ln(1-X)\\]}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1715f50fc408ceba307005a3f9e404520edd5a20)\n\nThese variances (and therefore the curvatures) are much larger for small values of the shape parameter α and β. However, for shape parameter values α, β \\> 1, the variances (and therefore the curvatures) flatten out. Equivalently, this result follows from the [Cramér–Rao bound](https:\/\/en.wikipedia.org\/wiki\/Cram%C3%A9r%E2%80%93Rao_bound \"Cramér–Rao bound\"), since the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\") matrix components for the beta distribution are these logarithmic variances. The [Cramér–Rao bound](https:\/\/en.wikipedia.org\/wiki\/Cram%C3%A9r%E2%80%93Rao_bound \"Cramér–Rao bound\") states that the [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of any *unbiased* estimator ![{\\\\displaystyle {\\\\hat {\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/682d943d1947245b587f282aba6c88f0870fb302) of α is bounded by the [reciprocal](https:\/\/en.wikipedia.org\/wiki\/Multiplicative_inverse \"Multiplicative inverse\") of the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\"):\n\n![{\\\\displaystyle \\\\mathrm {var} ({\\\\hat {\\\\alpha }})\\\\geq {\\\\frac {1}{\\\\operatorname {var} \\[\\\\ln X\\]}}\\\\geq {\\\\frac {1}{\\\\psi \\_{1}({\\\\hat {\\\\alpha }})-\\\\psi \\_{1}({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/744f1e8421337ed7a2e6cae00fccec1eaf68e3dc) ![{\\\\displaystyle \\\\mathrm {var} ({\\\\hat {\\\\beta }})\\\\geq {\\\\frac {1}{\\\\operatorname {var} \\[\\\\ln(1-X)\\]}}\\\\geq {\\\\frac {1}{\\\\psi \\_{1}({\\\\hat {\\\\beta }})-\\\\psi \\_{1}({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f01dc5c3eb614cfa71a500fd34f3fa430c183c76)\n\nso the variance of the estimators increases with increasing α and β, as the logarithmic variances decrease.\n\nAlso one can express the joint log likelihood per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations in terms of the [digamma function](https:\/\/en.wikipedia.org\/wiki\/Digamma_function \"Digamma function\") expressions for the logarithms of the sample geometric means as follows:\n\n![{\\\\displaystyle {\\\\frac {\\\\ln \\\\,{\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}=(\\\\alpha -1)(\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}))+(\\\\beta -1)(\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}))-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/25a4676c5e903b8ecd627fe6effa892db4341aa0)\n\nthis expression is identical to the negative of the cross-entropy (see section on \"Quantities of information (entropy)\"). Therefore, finding the maximum of the joint log likelihood of the shape parameters, per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations, is identical to finding the minimum of the cross-entropy for the beta distribution, as a function of the shape parameters.\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\frac {\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N}}&=-H=-h-D\\_{\\\\mathrm {KL} }\\\\\\\\&=-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )+(\\\\alpha -1)\\\\psi ({\\\\hat {\\\\alpha }})+(\\\\beta -1)\\\\psi ({\\\\hat {\\\\beta }})-(\\\\alpha +\\\\beta -2)\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b0a1878d4956258885b927cbfd24f522b8ebc9e5)\n\nwith the cross-entropy defined as follows:\n\n![{\\\\displaystyle H=\\\\int \\_{0}^{1}-f(X;{\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }})\\\\ln(f(X;\\\\alpha ,\\\\beta ))\\\\,{\\\\rm {d}}X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1dacaf170c607c27fac660f26875d1e7915dffcd)\n\n##### Four unknown parameters\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=56 \"Edit section: Four unknown parameters\")\\]\n\nThe procedure is similar to the one followed in the two unknown parameter case. If *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters, the joint log likelihood function for *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)&=\\\\sum \\_{i=1}^{N}\\\\ln \\\\,{\\\\mathcal {L}}\\_{i}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y\\_{i})\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln f(Y\\_{i};\\\\alpha ,\\\\beta ,a,c)\\\\\\\\&=\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {(Y\\_{i}-a)^{\\\\alpha -1}(c-Y\\_{i})^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}\\\\\\\\&=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-N(\\\\alpha +\\\\beta -1)\\\\ln(c-a)\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/568e233a41b47ffbcb62039f7f1018e98bebddfb)\n\nFinding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https:\/\/en.wikipedia.org\/wiki\/Maximum_likelihood \"Maximum likelihood\") estimator of the shape parameters:\n\n![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha }}=\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)-N(-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\alpha ))-N\\\\ln(c-a)=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/dec8ea829b8ce52d65edcf18375e9843f34f3bab) ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta }}=\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-N(-\\\\psi (\\\\alpha +\\\\beta )+\\\\psi (\\\\beta ))-N\\\\ln(c-a)=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/decc8da83539ce590981c48b27edc90b521d7d0b) ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a}}=-(\\\\alpha -1)\\\\sum \\_{i=1}^{N}{\\\\frac {1}{Y\\_{i}-a}}\\\\,+N(\\\\alpha +\\\\beta -1){\\\\frac {1}{c-a}}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/025a6f1a4e3fdcf2f6740a331d8f8b3985fde351) ![{\\\\displaystyle {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c}}=(\\\\beta -1)\\\\sum \\_{i=1}^{N}{\\\\frac {1}{c-Y\\_{i}}}\\\\,-N(\\\\alpha +\\\\beta -1){\\\\frac {1}{c-a}}=0}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/976c431dd6db2b070e95ab4c7a685e3dc9039db2)\n\nthese equations can be re-arranged as the following system of four coupled equations (the first two equations are geometric means and the second two equations are the harmonic means) in terms of the maximum likelihood estimates for the four parameters ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }},{\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8116b37df2fff6248cb3bce7dd137af10ed8e5ab):\n\n![{\\\\displaystyle {\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {Y\\_{i}-{\\\\hat {a}}}{{\\\\hat {c}}-{\\\\hat {a}}}}=\\\\psi ({\\\\hat {\\\\alpha }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3d025e03965c2a5d2c23a9e493d7314c3f2dfca1) ![{\\\\displaystyle {\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln {\\\\frac {{\\\\hat {c}}-Y\\_{i}}{{\\\\hat {c}}-{\\\\hat {a}}}}=\\\\psi ({\\\\hat {\\\\beta }})-\\\\psi ({\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }})=\\\\ln {\\\\hat {G}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b2851705444792cb8f4089f5dea6443823e5a15b) ![{\\\\displaystyle {\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}{\\\\frac {{\\\\hat {c}}-{\\\\hat {a}}}{Y\\_{i}-{\\\\hat {a}}}}}}={\\\\frac {{\\\\hat {\\\\alpha }}-1}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-1}}={\\\\hat {H}}\\_{X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3918a55675f88d637c3c1d91ac5743155433350a) ![{\\\\displaystyle {\\\\frac {1}{{\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}{\\\\frac {{\\\\hat {c}}-{\\\\hat {a}}}{{\\\\hat {c}}-Y\\_{i}}}}}={\\\\frac {{\\\\hat {\\\\beta }}-1}{{\\\\hat {\\\\alpha }}+{\\\\hat {\\\\beta }}-1}}={\\\\hat {H}}\\_{1-X}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0d929f9e0acad7d24ac0112cd9fd865eb67f64ee)\n\nwith sample geometric means:\n\n![{\\\\displaystyle {\\\\hat {G}}\\_{X}=\\\\prod \\_{i=1}^{N}\\\\left({\\\\frac {Y\\_{i}-{\\\\hat {a}}}{{\\\\hat {c}}-{\\\\hat {a}}}}\\\\right)^{\\\\frac {1}{N}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/24c05fd2cdd1acc7e2370a6c9155f0658e8bfc73) ![{\\\\displaystyle {\\\\hat {G}}\\_{(1-X)}=\\\\prod \\_{i=1}^{N}\\\\left({\\\\frac {{\\\\hat {c}}-Y\\_{i}}{{\\\\hat {c}}-{\\\\hat {a}}}}\\\\right)^{\\\\frac {1}{N}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/66b3e17f4fea833bd98721a8bcef4e5769e8892f)\n\nThe parameters ![{\\\\displaystyle {\\\\hat {a}},{\\\\hat {c}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) are embedded inside the geometric mean expressions in a nonlinear way (to the power 1\/*N*). This precludes, in general, a closed form solution, even for an initial value approximation for iteration purposes. One alternative is to use as initial values for iteration the values obtained from the method of moments solution for the four parameter case. Furthermore, the expressions for the harmonic means are well-defined only for ![{\\\\displaystyle {\\\\hat {\\\\alpha }},{\\\\hat {\\\\beta }}\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8cf28629a51347b164038f8ed1561affcfc32a08), which precludes a maximum likelihood solution for shape parameters less than unity in the four-parameter case. Fisher's information matrix for the four parameter case is [positive-definite](https:\/\/en.wikipedia.org\/wiki\/Positive-definite_matrix \"Positive-definite matrix\") only for α, β \\> 2 (for further discussion, see section on Fisher information matrix, four parameter case), for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. The following Fisher information components (that represent the expectations of the curvature of the log likelihood function) have [singularities](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") at the following values:\n\n![{\\\\displaystyle \\\\alpha =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a^{2}}}\\\\right\\]={\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/53538160a2404a5d7b74ae2033fbbb2dbc1045eb) ![{\\\\displaystyle \\\\beta =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c^{2}}}\\\\right\\]={\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ee2c6ecfcafe60e54799ab4a16451cf478d65f7d) ![{\\\\displaystyle \\\\alpha =2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\partial a}}\\\\right\\]={\\\\mathcal {I}}\\_{\\\\alpha ,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8b8af544d7c5e0cc278aa725daba7f3de2f70d31) ![{\\\\displaystyle \\\\beta =1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\partial c}}\\\\right\\]={\\\\mathcal {I}}\\_{\\\\beta ,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2f87b32b39997964dcbc95bc64c1364e6832db1d)\n\n(for further discussion see section on Fisher information matrix). Thus, it is not possible to strictly carry on the maximum likelihood estimation for some well known distributions belonging to the four-parameter beta distribution family, like the [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Continuous_uniform_distribution \"Continuous uniform distribution\") (Beta(1, 1, *a*, *c*)), and the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") (Beta(1\/2, 1\/2, *a*, *c*)). [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\")[\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) ignore the equations for the harmonic means and instead suggest \"If a and c are unknown, and maximum likelihood estimators of *a*, *c*, α and β are required, the above procedure (for the two unknown parameter case, with *X* transformed as *X* = (*Y* − *a*)\/(*c* − *a*)) can be repeated using a succession of trial values of *a* and *c*, until the pair (*a*, *c*) for which maximum likelihood (given *a* and *c*) is as great as possible, is attained\" (where, for the purpose of clarity, their notation for the parameters has been translated into the present notation).\n\n#### Fisher information matrix\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=57 \"Edit section: Fisher information matrix\")\\]\n\nLet a random variable X have a probability density *f*(*x*;*α*). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\") is called the [score](https:\/\/en.wikipedia.org\/wiki\/Score_\\(statistics\\) \"Score (statistics)\"). The second moment of the score is called the [Fisher information](https:\/\/en.wikipedia.org\/wiki\/Fisher_information \"Fisher information\"):\n\n![{\\\\displaystyle {\\\\mathcal {I}}(\\\\alpha )=\\\\operatorname {E} \\\\left\\[\\\\left({\\\\frac {\\\\partial }{\\\\partial \\\\alpha }}\\\\ln {\\\\mathcal {L}}(\\\\alpha \\\\mid X)\\\\right)^{2}\\\\right\\],}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/daec13972d17a073bcd447abfde55a6b0e168720)\n\nThe [expectation](https:\/\/en.wikipedia.org\/wiki\/Expected_value \"Expected value\") of the [score](https:\/\/en.wikipedia.org\/wiki\/Score_\\(statistics\\) \"Score (statistics)\") is zero, therefore the Fisher information is also the second moment centered on the mean of the score: the [variance](https:\/\/en.wikipedia.org\/wiki\/Variance \"Variance\") of the score.\n\nIf the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\") is twice differentiable with respect to the parameter α, and under certain regularity conditions,[\\[50\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Silvey-50) then the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes):\n\n![{\\\\displaystyle {\\\\mathcal {I}}(\\\\alpha )=-\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial ^{2}}{\\\\partial \\\\alpha ^{2}}}\\\\ln {\\\\mathcal {L}}(\\\\alpha \\\\mid X)\\\\right\\].}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3fdd5f6730d5ffb0a5f833c89ba784362322cbc8)\n\nThus, the Fisher information is the negative of the expectation of the second [derivative](https:\/\/en.wikipedia.org\/wiki\/Derivative \"Derivative\") with respect to the parameter α of the log [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\"). Therefore, Fisher information is a measure of the [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") of the log likelihood function of α. A low [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") (and therefore high [radius of curvature](https:\/\/en.wikipedia.org\/wiki\/Radius_of_curvature_\\(mathematics\\) \"Radius of curvature (mathematics)\")), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large [curvature](https:\/\/en.wikipedia.org\/wiki\/Curvature \"Curvature\") (and therefore low [radius of curvature](https:\/\/en.wikipedia.org\/wiki\/Radius_of_curvature_\\(mathematics\\) \"Radius of curvature (mathematics)\")) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters (\"the observed Fisher information matrix\") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms.[\\[51\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-EdwardsLikelihood-51) The word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators. The [Cramér–Rao bound](https:\/\/en.wikipedia.org\/wiki\/Cram%C3%A9r%E2%80%93Rao_bound \"Cramér–Rao bound\") states that the inverse of the Fisher information is a lower bound on the variance of any [estimator](https:\/\/en.wikipedia.org\/wiki\/Estimator \"Estimator\") of a parameter α:\n\n![{\\\\displaystyle \\\\operatorname {var} \\[{\\\\hat {\\\\alpha }}\\]\\\\geq {\\\\frac {1}{{\\\\mathcal {I}}(\\\\alpha )}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d93d4983a2717258c52eb47d1562e849a3a66c5c)\n\nThe precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two [alternative hypothesis](https:\/\/en.wikipedia.org\/wiki\/Alternative_hypothesis \"Alternative hypothesis\") of a parameter.[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52)\n\nWhen there are *N* parameters\n\n![{\\\\displaystyle {\\\\begin{bmatrix}\\\\theta \\_{1}\\\\\\\\\\\\theta \\_{2}\\\\\\\\\\\\vdots \\\\\\\\\\\\theta \\_{N}\\\\end{bmatrix}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1c28d63a97dada5deb831115cd2588be050e522a)\n\nthen the Fisher information takes the form of an *N*×*N* [positive semidefinite](https:\/\/en.wikipedia.org\/wiki\/Positive_semidefinite_matrix \"Positive semidefinite matrix\") [symmetric matrix](https:\/\/en.wikipedia.org\/wiki\/Symmetric_matrix \"Symmetric matrix\"), the Fisher information matrix, with typical element:\n\n![{\\\\displaystyle ({\\\\mathcal {I}}(\\\\theta ))\\_{i,j}=\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{i}}}\\\\cdot {\\\\frac {\\\\partial \\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{j}}}\\\\right\\].}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/94187c0032daae02409e5323c356fae5fdcb73fd)\n\nUnder certain regularity conditions,[\\[50\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Silvey-50) the Fisher Information Matrix may also be written in the following form, which is often more convenient for computation:\n\n![{\\\\displaystyle ({\\\\mathcal {I}}(\\\\theta ))\\_{i,j}=-\\\\operatorname {E} \\\\left\\[{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}}{\\\\partial \\\\theta \\_{i}\\\\,\\\\partial \\\\theta \\_{j}}}\\\\right\\]\\\\,.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/86df084593df54dd2ed7220a683b9fbc41f230d7)\n\nWith *X*1, ..., *XN* [iid](https:\/\/en.wikipedia.org\/wiki\/Iid \"Iid\") random variables, an *N*\\-dimensional \"box\" can be constructed with sides *X*1, ..., *XN*. Costa and Cover[\\[53\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-CostaCover-53) show that the (Shannon) differential entropy *h*(*X*) is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set.\n\nFor *X*1, ..., *X**N* independent random variables each having a beta distribution parametrized with shape parameters *α* and *β*, the joint log likelihood function for *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\n![{\\\\displaystyle \\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)=(\\\\alpha -1)\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1)\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-N\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/df7eddeed085e8e7d19d2a5336c6503d06111280)\n\ntherefore the joint log likelihood function per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is\n\n![{\\\\displaystyle {\\\\frac {1}{N}}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)=(\\\\alpha -1){\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln X\\_{i}+(\\\\beta -1){\\\\frac {1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(1-X\\_{i})-\\\\,\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta ).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6997aeb66fcf95e5d64d65ca0aab679a30a94f00)\n\nFor the two parameter case, the Fisher information has 4 components: 2 diagonal and 2 off-diagonal. Since the Fisher information matrix is symmetric, one of these off diagonal components is independent. Therefore, the Fisher information matrix has 3 independent components (2 diagonal and 1 off diagonal).\n\nAryal and Nadarajah[\\[54\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four-parameter case, from which the two parameter case can be obtained as follows:\n\n![{\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\alpha ^{2}}}=\\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\alpha ^{2}}}\\\\right\\]=\\\\ln \\\\operatorname {var} \\_{GX}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/c90003d5bd2f6d2bcfe2c788585689726b4b7e36) ![{\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\beta ^{2}}}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\partial \\\\beta ^{2}}}\\\\right\\]=\\\\ln \\\\operatorname {var} \\_{G(1-X)}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b59bcc31f14f1f4b3b07bd92a66426bb6ac126b1) ![{\\\\displaystyle -{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}=\\\\operatorname {cov} \\[\\\\ln X,\\\\ln(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta \\\\mid X)}{N\\\\,\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}\\\\right\\]=\\\\ln \\\\operatorname {cov} \\_{G{X,(1-X)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a6c3268da3b23c062ce981ab02d4c454a0267365)\n\nSince the Fisher information matrix is symmetric\n\n![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }={\\\\mathcal {I}}\\_{\\\\beta ,\\\\alpha }=\\\\ln \\\\operatorname {cov} \\_{G{X,(1-X)}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9f43e57ed436efb049ce99c8a2e92269acfa1128)\n\nThe Fisher information components are equal to the log geometric variances and log geometric covariance. Therefore, they can be expressed as **[trigamma functions](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\")**, denoted ψ1(α), the second of the [polygamma functions](https:\/\/en.wikipedia.org\/wiki\/Polygamma_function \"Polygamma function\"), defined as the derivative of the [digamma](https:\/\/en.wikipedia.org\/wiki\/Digamma \"Digamma\") function:\n\n![{\\\\displaystyle \\\\psi \\_{1}(\\\\alpha )={\\\\frac {d^{2}\\\\ln \\\\Gamma (\\\\alpha )}{\\\\partial \\\\alpha ^{2}}}=\\\\,{\\\\frac {\\\\partial \\\\psi (\\\\alpha )}{\\\\partial \\\\alpha }}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0629292f9b4428a64ffd75dd814ea9263dad115c)\n\nThese derivatives are also derived in the [§ Two unknown parameters](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Two_unknown_parameters) and plots of the log likelihood function are also shown in that section. [§ Geometric variance and covariance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_variance_and_covariance) contains plots and further discussion of the Fisher information matrix components: the log geometric variances and log geometric covariance as a function of the shape parameters α and β. [§ Moments of logarithmically transformed random variables](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Moments_of_logarithmically_transformed_random_variables) contains formulas for moments of logarithmically transformed random variables. Images for the Fisher information components ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha },{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/4f6e447015dcb7f00c9d69c48c9b23ba0bc3ec0e) and ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fdbb33953faa5a53fe23c281a4e7dbf359c11fbc) are shown in [§ Geometric variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_variance).\n\nThe determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability). From the expressions for the individual components of the Fisher information matrix, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution is:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }\\\\\\\\\\[4pt\\]&=(\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))(\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))-(-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))(-\\\\psi \\_{1}(\\\\alpha +\\\\beta ))\\\\\\\\\\[4pt\\]&=\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )\\\\\\\\\\[4pt\\]\\\\lim \\_{\\\\alpha \\\\to 0}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&=\\\\lim \\_{\\\\beta \\\\to 0}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))=\\\\infty \\\\\\\\\\[4pt\\]\\\\lim \\_{\\\\alpha \\\\to \\\\infty }\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))&=\\\\lim \\_{\\\\beta \\\\to \\\\infty }\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b2c5ccf59b05ea730fc108360c07e9ac9634e829)\n\nFrom [Sylvester's criterion](https:\/\/en.wikipedia.org\/wiki\/Sylvester%27s_criterion \"Sylvester's criterion\") (checking whether the diagonal elements are all positive), it follows that the Fisher information matrix for the two parameter case is [positive-definite](https:\/\/en.wikipedia.org\/wiki\/Positive-definite_matrix \"Positive-definite matrix\") (under the standard condition that the shape parameters are positive *α* \\> 0 and *β* \\> 0).\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/08\/Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png\/250px-Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Fisher_Information_I\\(a,a\\)_for_alpha%3Dbeta_vs_range_\\(c-a\\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png)\n\nFisher Information *I*(*a*,*a*) for *α* = *β* vs range (*c* − *a*) and exponent *α* = *β*\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/6\/64\/Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png\/250px-Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Fisher_Information_I\\(alpha,a\\)_for_alpha%3Dbeta,_vs._range_\\(c_-_a\\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png)\n\nFisher Information *I*(*α*,*a*) for *α* = *β*, vs. range (*c* − *a*) and exponent *α* = *β*\n\nIf *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters: the exponents *α* and *β*, and also *a* (the minimum of the distribution range), and *c* (the maximum of the distribution range) (section titled \"Alternative parametrizations\", \"Four parameters\"), with [probability density function](https:\/\/en.wikipedia.org\/wiki\/Probability_density_function \"Probability density function\"):\n\n![{\\\\displaystyle f(y;\\\\alpha ,\\\\beta ,a,c)={\\\\frac {f(x;\\\\alpha ,\\\\beta )}{c-a}}={\\\\frac {\\\\left({\\\\frac {y-a}{c-a}}\\\\right)^{\\\\alpha -1}\\\\left({\\\\frac {c-y}{c-a}}\\\\right)^{\\\\beta -1}}{(c-a)B(\\\\alpha ,\\\\beta )}}={\\\\frac {(y-a)^{\\\\alpha -1}(c-y)^{\\\\beta -1}}{(c-a)^{\\\\alpha +\\\\beta -1}B(\\\\alpha ,\\\\beta )}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5e3a650c9f6ecc04d36869cc99297e5c853dd2f1)\n\nthe joint log likelihood function per *N* [iid](https:\/\/en.wikipedia.org\/wiki\/Independent_and_identically_distributed_random_variables \"Independent and identically distributed random variables\") observations is:\n\n![{\\\\displaystyle {\\\\frac {1}{N}}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)={\\\\frac {\\\\alpha -1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(Y\\_{i}-a)+{\\\\frac {\\\\beta -1}{N}}\\\\sum \\_{i=1}^{N}\\\\ln(c-Y\\_{i})-\\\\ln \\\\mathrm {B} (\\\\alpha ,\\\\beta )-(\\\\alpha +\\\\beta -1)\\\\ln(c-a)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f915551b8f5e29a172c67ffc3e45f6c1ce349920)\n\nFor the four parameter case, the Fisher information has 4\\*4=16 components. It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12\/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components. Aryal and Nadarajah[\\[54\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four parameter case as follows:\n\n![{\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha ^{2}}}=\\\\operatorname {var} \\[\\\\ln(X)\\]=\\\\psi \\_{1}(\\\\alpha )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha ^{2}}}\\\\right\\]=\\\\ln(\\\\operatorname {var\\_{GX}} )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/31be86f2c53663c6d3975bc2676806ba3e538423) ![{\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta ^{2}}}=\\\\operatorname {var} \\[\\\\ln(1-X)\\]=\\\\psi \\_{1}(\\\\beta )-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta ^{2}}}\\\\right\\]=\\\\ln(\\\\operatorname {var\\_{G(1-X)}} )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/25ab885119f25fae0b9919326db96395d13e3bc3) ![{\\\\displaystyle -{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}=\\\\operatorname {cov} \\[\\\\ln X,(1-X)\\]=-\\\\psi \\_{1}(\\\\alpha +\\\\beta )={\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }=\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial \\\\beta }}\\\\right\\]=\\\\ln(\\\\operatorname {cov} \\_{G{X,(1-X)}})}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/02a56af746fb9315340cf382951fe0c3f3640678)\n\nIn the above expressions, the use of *X* instead of *Y* in the expressions var\\[ln(*X*)\\] = ln(var*GX*) is *not an error*. The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter *X* ~ Beta(*α*, *β*) parametrization because when taking the partial derivatives with respect to the exponents (*α*, *β*) in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum *a* and maximum *c* of the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents *α* and *β* is the second derivative of the log of the beta function: ln(B(*α*, *β*)). This term is independent of the minimum *a* and maximum *c* of the distribution's range. Double differentiation of this term results in trigamma functions. The sections titled \"Maximum likelihood\", \"Two unknown parameters\" and \"Four unknown parameters\" also show this fact.\n\nThe Fisher information for *N* [i.i.d.](https:\/\/en.wikipedia.org\/wiki\/I.i.d. \"I.i.d.\") samples is *N* times the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas[\\[28\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Cover_and_Thomas-28)). (Aryal and Nadarajah[\\[54\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Aryal-54) take a single observation, *N* = 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per *N* observations. Moreover, below the erroneous expression for ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) in Aryal and Nadarajah has been corrected.)\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha \\>2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a^{2}}}\\\\right\\]&={\\\\mathcal {I}}\\_{a,a}={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{(\\\\alpha -2)(c-a)^{2}}}\\\\\\\\\\\\beta \\>2:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial c^{2}}}\\\\right\\]&={\\\\mathcal {I}}\\_{c,c}={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{(\\\\beta -2)(c-a)^{2}}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial a\\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{a,c}={\\\\frac {(\\\\alpha +\\\\beta -1)}{(c-a)^{2}}}\\\\\\\\\\\\alpha \\>1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial a}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\alpha ,a}={\\\\frac {\\\\beta }{(\\\\alpha -1)(c-a)}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\alpha \\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\alpha ,c}={\\\\frac {1}{(c-a)}}\\\\\\\\\\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\,\\\\partial a}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\beta ,a}=-{\\\\frac {1}{(c-a)}}\\\\\\\\\\\\beta \\>1:\\\\quad \\\\operatorname {E} \\\\left\\[-{\\\\frac {1}{N}}{\\\\frac {\\\\partial ^{2}\\\\ln {\\\\mathcal {L}}(\\\\alpha ,\\\\beta ,a,c\\\\mid Y)}{\\\\partial \\\\beta \\\\,\\\\partial c}}\\\\right\\]&={\\\\mathcal {I}}\\_{\\\\beta ,c}=-{\\\\frac {\\\\alpha }{(\\\\beta -1)(c-a)}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/636646f51bdb1a3193b1721483878e98f4f19c3e)\n\nThe lower two diagonal entries of the Fisher information matrix, with respect to the parameter *a* (the minimum of the distribution's range): ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99), and with respect to the parameter *c* (the maximum of the distribution's range): ![{\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) are only defined for exponents *α* \\> 2 and *β* \\> 2 respectively. The Fisher information matrix component ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) for the minimum *a* approaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component ![{\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) for the maximum *c* approaches infinity for exponent *β* approaching 2 from above.\n\nThe Fisher information matrix for the four parameter case does not depend on the individual values of the minimum *a* and the maximum *c*, but only on the total range (*c* − *a*). Moreover, the components of the Fisher information matrix that depend on the range (*c* − *a*), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range (*c* − *a*).\n\nThe accompanying images show the Fisher information components ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) and ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2e7931360bd161a55cc85431049248a347269b12). Images for the Fisher information components ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b9abc676c8d87651a340ff84cb55e3674bab2683) and ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2659a9ad09d8af2a486a30d8aa82cc39d743a524) are shown in [§ Geometric variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Geometric_variance). All these Fisher information components look like a basin, with the \"walls\" of the basin being located at low values of the parameters.\n\nThe following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter: *X* ~ Beta(α, β) expectations of the transformed ratio ((1 − *X*)\/*X*) and of its mirror image (*X*\/(1 − *X*)), scaled by the range (*c* − *a*), which may be helpful for interpretation:\n\n![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\alpha ,a}={\\\\frac {\\\\operatorname {E} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]}{c-a}}={\\\\frac {\\\\beta }{(\\\\alpha -1)(c-a)}}{\\\\text{ if }}\\\\alpha \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e670565bb8d06bace69cf892864520f5c83b5449) ![{\\\\displaystyle {\\\\mathcal {I}}\\_{\\\\beta ,c}=-{\\\\frac {\\\\operatorname {E} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]}{c-a}}=-{\\\\frac {\\\\alpha }{(\\\\beta -1)(c-a)}}{\\\\text{ if }}\\\\beta \\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/94f9b7788a4f19e1cbc765ab8fc85a7ad55dec4f)\n\nThese are also the expected values of the \"inverted beta distribution\" or [beta prime distribution](https:\/\/en.wikipedia.org\/wiki\/Beta_prime_distribution \"Beta prime distribution\") (also known as beta distribution of the second kind or [Pearson's Type VI](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\")) [\\[1\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JKB-1) and its mirror image, scaled by the range (*c* − *a*).\n\nAlso, the following Fisher information components can be expressed in terms of the harmonic (1\/X) variances or of variances based on the ratio transformed variables ((1-X)\/X) as follows:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha \\>2:\\\\quad {\\\\mathcal {I}}\\_{a,a}&=\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{X}}\\\\right\\]\\\\left({\\\\frac {\\\\alpha -1}{c-a}}\\\\right)^{2}=\\\\operatorname {var} \\\\left\\[{\\\\frac {1-X}{X}}\\\\right\\]\\\\left({\\\\frac {\\\\alpha -1}{c-a}}\\\\right)^{2}={\\\\frac {\\\\beta (\\\\alpha +\\\\beta -1)}{(\\\\alpha -2)(c-a)^{2}}}\\\\\\\\\\\\beta \\>2:\\\\quad {\\\\mathcal {I}}\\_{c,c}&=\\\\operatorname {var} \\\\left\\[{\\\\frac {1}{1-X}}\\\\right\\]\\\\left({\\\\frac {\\\\beta -1}{c-a}}\\\\right)^{2}=\\\\operatorname {var} \\\\left\\[{\\\\frac {X}{1-X}}\\\\right\\]\\\\left({\\\\frac {\\\\beta -1}{c-a}}\\\\right)^{2}={\\\\frac {\\\\alpha (\\\\alpha +\\\\beta -1)}{(\\\\beta -2)(c-a)^{2}}}\\\\\\\\{\\\\mathcal {I}}\\_{a,c}&=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1}{X}},{\\\\frac {1}{1-X}}\\\\right\\]{\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{(c-a)^{2}}}=\\\\operatorname {cov} \\\\left\\[{\\\\frac {1-X}{X}},{\\\\frac {X}{1-X}}\\\\right\\]{\\\\frac {(\\\\alpha -1)(\\\\beta -1)}{(c-a)^{2}}}={\\\\frac {(\\\\alpha +\\\\beta -1)}{(c-a)^{2}}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f1f89730020364bb58791ca0eb47d0de25c896c2)\n\nSee section \"Moments of linearly transformed, product and inverted random variables\" for these expectations.\n\nThe determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability). From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ,a,c))={}&-{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }+{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }^{2}-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }^{2}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,a}+{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}+2{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,a}\\\\\\\\&{}-2{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,a}+{\\\\mathcal {I}}\\_{\\\\alpha ,c}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,a}^{2}-{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}^{2}+{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,c}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,c}-{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,c}+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\beta }{\\\\mathcal {I}}\\_{\\\\beta ,c}\\\\\\\\&{}-{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,a}{\\\\mathcal {I}}\\_{\\\\beta ,c}+{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,a}{\\\\mathcal {I}}\\_{\\\\beta ,c}-{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }\\\\\\\\&{}+2{\\\\mathcal {I}}\\_{a,c}{\\\\mathcal {I}}\\_{\\\\alpha ,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{\\\\alpha ,c}^{2}{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }-{\\\\mathcal {I}}\\_{a,c}^{2}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }+{\\\\mathcal {I}}\\_{a,a}{\\\\mathcal {I}}\\_{c,c}{\\\\mathcal {I}}\\_{\\\\alpha ,\\\\alpha }{\\\\mathcal {I}}\\_{\\\\beta ,\\\\beta }{\\\\text{ if }}\\\\alpha ,\\\\beta \\>2\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2736604fb3cf676756af731d77faaf9041e60ae9)\n\nUsing [Sylvester's criterion](https:\/\/en.wikipedia.org\/wiki\/Sylvester%27s_criterion \"Sylvester's criterion\") (checking whether the diagonal elements are all positive), and since diagonal components ![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f278587aa7ba2520daa70abd786de17e083f9a99) and ![{\\\\displaystyle {\\\\mathcal {I}}\\_{c,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) have [singularities](https:\/\/en.wikipedia.org\/wiki\/Mathematical_singularity \"Mathematical singularity\") at α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is [positive-definite](https:\/\/en.wikipedia.org\/wiki\/Positive-definite_matrix \"Positive-definite matrix\") for α\\>2 and β\\>2. Since for α \\> 2 and β \\> 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Continuous_uniform_distribution \"Continuous uniform distribution\") (Beta(1,1,a,c)) have Fisher information components (![{\\\\displaystyle {\\\\mathcal {I}}\\_{a,a},{\\\\mathcal {I}}\\_{c,c},{\\\\mathcal {I}}\\_{\\\\alpha ,a},{\\\\mathcal {I}}\\_{\\\\beta ,c}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0efbfead578f297a9f2aa81caedf6c8066d5a0a0)) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case). The four-parameter [Wigner semicircle distribution](https:\/\/en.wikipedia.org\/wiki\/Wigner_semicircle_distribution \"Wigner semicircle distribution\") (Beta(3\/2,3\/2,*a*,*c*)) and [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") (Beta(1\/2,1\/2,*a*,*c*)) have negative Fisher information determinants for the four-parameter case.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/09\/Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png\/250px-Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta\\(1,1\\)_Uniform_distribution_-_J._Rodal.png)\n\n![{\\\\displaystyle Beta(1,1)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1dc385b5aa92366df42c6b175833dd66518bcc5e): The [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") probability density was proposed by [Thomas Bayes](https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes \"Thomas Bayes\") to represent ignorance of prior probabilities in [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\").\n\nThe use of Beta distributions in [Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Bayesian_inference \"Bayesian inference\") is due to the fact that they provide a family of [conjugate prior probability distributions](https:\/\/en.wikipedia.org\/wiki\/Conjugate_prior_distribution \"Conjugate prior distribution\") for [binomial](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") (including [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\")) and [geometric distributions](https:\/\/en.wikipedia.org\/wiki\/Geometric_distribution \"Geometric distribution\"). The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value *p*:[\\[24\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-MacKay-24)\n\n![{\\\\displaystyle P(p;\\\\alpha ,\\\\beta )={\\\\frac {p^{\\\\alpha -1}(1-p)^{\\\\beta -1}}{\\\\mathrm {B} (\\\\alpha ,\\\\beta )}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad22d0a4845670ac730383ed26b7028a9eec1314)\n\nExamples of beta distributions used as prior probabilities to represent ignorance of prior parameter values in Bayesian inference are Beta(1,1), Beta(0,0) and Beta(1\/2,1\/2).\n\nA classic application of the beta distribution is the [rule of succession](https:\/\/en.wikipedia.org\/wiki\/Rule_of_succession \"Rule of succession\"), introduced in the 18th century by [Pierre-Simon Laplace](https:\/\/en.wikipedia.org\/wiki\/Pierre-Simon_Laplace \"Pierre-Simon Laplace\")[\\[55\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Laplace-55) in the course of treating the [sunrise problem](https:\/\/en.wikipedia.org\/wiki\/Sunrise_problem \"Sunrise problem\"). It states that, given *s* successes in *n* [conditionally independent](https:\/\/en.wikipedia.org\/wiki\/Conditional_independence \"Conditional independence\") [Bernoulli trials](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_trial \"Bernoulli trial\") with probability *p,* that the estimate of the expected value in the next trial is ![{\\\\displaystyle {\\\\frac {s+1}{n+2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/5a520f64a100600a0356d2562721ad1f6c907f5b). This estimate is the expected value of the posterior distribution over *p,* namely Beta(*s*\\+1, *n*−*s*\\+1), which is given by [Bayes' rule](https:\/\/en.wikipedia.org\/wiki\/Bayes%27_rule \"Bayes' rule\") if one assumes a uniform prior probability over *p* (i.e., Beta(1, 1)) and then observes that *p* generated *s* successes in *n* trials. Laplace's rule of succession has been criticized by prominent scientists. R. T. Cox described Laplace's application of the rule of succession to the [sunrise problem](https:\/\/en.wikipedia.org\/wiki\/Sunrise_problem \"Sunrise problem\") ([\\[56\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-CoxRT-56) p. 89) as \"a travesty of the proper use of the principle\". Keynes remarks ([\\[57\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 382) \"indeed this is so foolish a theorem that to entertain it is discreditable\". Karl Pearson[\\[58\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-PearsonRuleSuccession-58) showed that the probability that the next (*n* + 1) trials will be successes, after n successes in n trials, is only 50%, which has been considered too low by scientists like Jeffreys and unacceptable as a representation of the scientific process of experimentation to test a proposed scientific law. As pointed out by Jeffreys ([\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59) p. 128) (crediting [C. D. Broad](https:\/\/en.wikipedia.org\/wiki\/C._D._Broad \"C. D. Broad\")[\\[60\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BroadMind-60) ) Laplace's rule of succession establishes a high probability of success ((n+1)\/(n+2)) in the next trial, but only a moderate probability (50%) that a further sample (*n*\\+1) comparable in size will be equally successful. As pointed out by Perks,[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) \"The rule of succession itself is hard to accept. It assigns a probability to the next trial which implies the assumption that the actual run observed is an average run and that we are always at the end of an average run. It would, one would think, be more reasonable to assume that we were in the middle of an average run. Clearly a higher value for both probabilities is necessary if they are to accord with reasonable belief.\" These problems with Laplace's rule of succession motivated Haldane, Perks, Jeffreys and others to search for other forms of prior probability (see the next [§ Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Bayesian_inference)). According to Jaynes,[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) the main problem with the rule of succession is that it is not valid when s=0 or s=n (see [rule of succession](https:\/\/en.wikipedia.org\/wiki\/Rule_of_succession \"Rule of succession\"), for an analysis of its validity).\n\n#### Bayes–Laplace prior probability (Beta(1,1))\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=62 \"Edit section: Bayes–Laplace prior probability (Beta(1,1))\")\\]\n\nThe beta distribution achieves maximum differential entropy for Beta(1,1): the [uniform](https:\/\/en.wikipedia.org\/wiki\/Uniform_density \"Uniform density\") probability density, for which all values in the domain of the distribution have equal density. This uniform distribution Beta(1,1) was suggested (\"with a great deal of doubt\") by [Thomas Bayes](https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes \"Thomas Bayes\")[\\[62\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-ThomasBayes-62) as the prior probability distribution to express ignorance about the correct prior distribution. This prior distribution was adopted (apparently, from his writings, with little sign of doubt[\\[55\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Laplace-55)) by [Pierre-Simon Laplace](https:\/\/en.wikipedia.org\/wiki\/Pierre-Simon_Laplace \"Pierre-Simon Laplace\"), and hence it was also known as the \"Bayes–Laplace rule\" or the \"Laplace rule\" of \"[inverse probability](https:\/\/en.wikipedia.org\/wiki\/Inverse_probability \"Inverse probability\")\" in publications of the first half of the 20th century. In the later part of the 19th century and early part of the 20th century, scientists realized that the assumption of uniform \"equal\" probability density depended on the actual functions (for example whether a linear or a logarithmic scale was most appropriate) and parametrizations used. In particular, the behavior near the ends of distributions with finite support (for example near *x* = 0, for a distribution with initial support at *x* = 0) required particular attention. Keynes ([\\[57\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 381) criticized the use of Bayes's uniform prior probability (Beta(1,1)) that all values between zero and one are equiprobable, as follows: \"Thus experience, if it shows anything, shows that there is a very marked clustering of statistical ratios in the neighborhoods of zero and unity, of those for positive theories and for correlations between positive qualities in the neighborhood of zero, and of those for negative theories and for correlations between negative qualities in the neighborhood of unity. \"\n\n#### Haldane's prior probability (Beta(0,0))\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=63 \"Edit section: Haldane's prior probability (Beta(0,0))\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1b\/Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png\/250px-Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png)\n\n![{\\\\displaystyle Beta(0,0)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7b6c60512da3a9f0ea488c28d019448ba513cb1f): The Haldane prior probability expressing total ignorance about prior information, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure. As α, β → 0, the beta distribution approaches a two-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with all probability density concentrated at each end, at 0 and 1, and nothing in between. A coin-toss: one face of the coin being at 0 and the other face being at 1.\n\nThe Beta(0,0) distribution was proposed by [J.B.S. Haldane](https:\/\/en.wikipedia.org\/wiki\/J.B.S._Haldane \"J.B.S. Haldane\"),[\\[63\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-63) who suggested that the prior probability representing complete uncertainty should be proportional to *p*−1(1−*p*)−1. The function *p*−1(1−*p*)−1 can be viewed as the limit of the numerator of the beta distribution as both shape parameters approach zero: α, β → 0. The Beta function (in the denominator of the beta distribution) approaches infinity, for both parameters approaching zero, α, β → 0. Therefore, *p*−1(1−*p*)−1 divided by the Beta function approaches a 2-point [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") with equal probability 1\/2 at each end, at 0 and 1, and nothing in between, as α, β → 0. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Haldane prior probability distribution Beta(0,0) is an \"[improper prior](https:\/\/en.wikipedia.org\/wiki\/Improper_prior \"Improper prior\")\" because its integration (from 0 to 1) fails to strictly converge to 1 due to the singularities at each end. However, this is not an issue for computing posterior probabilities unless the sample size is very small. Furthermore, Zellner[\\[64\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Zellner-64) points out that on the [log-odds](https:\/\/en.wikipedia.org\/wiki\/Log-odds \"Log-odds\") scale, (the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformation ![{\\\\displaystyle \\\\log(p\/(1-p))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1873f1747bd2b5847dd9db649027cb857c53c451)), the Haldane prior is the uniformly flat prior. The fact that a uniform prior probability on the [logit](https:\/\/en.wikipedia.org\/wiki\/Logit \"Logit\") transformed variable ln(*p*\/1 − *p*) (with domain (−∞, ∞)) is equivalent to the Haldane prior on the domain \\[0, 1\\] was pointed out by [Harold Jeffreys](https:\/\/en.wikipedia.org\/wiki\/Harold_Jeffreys \"Harold Jeffreys\") in the first edition (1939) of his book Theory of Probability ([\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59) p. 123). Jeffreys writes \"Certainly if we take the Bayes–Laplace rule right up to the extremes we are led to results that do not correspond to anybody's way of thinking. The (Haldane) rule d*x*\/(*x*(1 − *x*)) goes too far the other way. It would lead to the conclusion that if a sample is of one type with respect to some property there is a probability 1 that the whole population is of that type.\" The fact that \"uniform\" depends on the parametrization, led Jeffreys to seek a form of prior that would be invariant under different parametrizations.\n\n#### Jeffreys' prior probability (Beta(1\/2,1\/2) for a Bernoulli or for a binomial distribution)\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=64 \"Edit section: Jeffreys' prior probability (Beta(1\/2,1\/2) for a Bernoulli or for a binomial distribution)\")\\]\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/30\/Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png\/250px-Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png)\n\n[Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") probability for the beta distribution: the square root of the determinant of [Fisher's information](https:\/\/en.wikipedia.org\/wiki\/Fisher%27s_information \"Fisher's information\") matrix: ![{\\\\displaystyle \\\\scriptstyle {\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}={\\\\sqrt {\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d2e6efbd72082ebee1c3de355648f844c92ffce8) is a function of the [trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\") ψ1 of shape parameters α, β\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/33\/Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png\/250px-Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png)\n\nPosterior Beta densities with samples having success = \"s\", failure = \"f\" of *s*\/(*s* + *f*) = 1\/2, and *s* + *f* = {3,10,50}, based on 3 different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1\/2). Significant differences appear for very small sample sizes (the flatter distribution for sample size of 3)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/49\/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png\/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_-_J._Rodal.png)\n\nPosterior Beta densities with samples having success = \"s\", failure = \"f\" of *s*\/(*s* + *f*) = 1\/4, and *s* + *f* ∈ {3,10,50}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1\/4). Significant differences appear for very small sample sizes (the very skewed distribution for the degenerate case of sample size = 3, in this degenerate and unlikely case the Haldane prior results in a reverse \"J\" shape with mode at *p* = 0 instead of *p* = 1\/4. If there is sufficient [sampling data](https:\/\/en.wikipedia.org\/wiki\/Sample_\\(statistics\\) \"Sample (statistics)\"), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1\/2,1\/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https:\/\/en.wikipedia.org\/wiki\/Posterior_probability \"Posterior probability\") densities.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/a5\/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png\/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_sample_size_%3D_\\(4,12,40\\)_-_J._Rodal.png)\n\nPosterior Beta densities with samples having success = *s*, failure = *f* of *s*\/(*s* + *f*) = 1\/4, and *s* + *f* ∈ {4,12,40}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1\/2,1\/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 40 (with more pronounced peak near *p* = 1\/4). Significant differences appear for very small sample sizes\n\n[Harold Jeffreys](https:\/\/en.wikipedia.org\/wiki\/Harold_Jeffreys \"Harold Jeffreys\")[\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59)[\\[65\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-JeffreysPRIOR-65) proposed to use an [uninformative prior](https:\/\/en.wikipedia.org\/wiki\/Uninformative_prior \"Uninformative prior\") probability measure that should be [invariant under reparameterization](https:\/\/en.wikipedia.org\/wiki\/Parametrization_invariance \"Parametrization invariance\"): proportional to the square root of the [determinant](https:\/\/en.wikipedia.org\/wiki\/Determinant \"Determinant\") of [Fisher's information](https:\/\/en.wikipedia.org\/wiki\/Fisher%27s_information \"Fisher's information\") matrix. For the [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), this can be shown as follows: for a coin that is \"heads\" with probability *p* ∈ \\[0, 1\\] and is \"tails\" with probability 1 − *p*, for a given (H,T) ∈ {(0,1), (1,0)} the probability is *pH*(1 − *p*)*T*. Since *T* = 1 − *H*, the [Bernoulli distribution](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\") is *pH*(1 − *p*)1 − *H*. Considering *p* as the only parameter, it follows that the log likelihood for the Bernoulli distribution is\n\n![{\\\\displaystyle \\\\ln {\\\\mathcal {L}}(p\\\\mid H)=H\\\\ln p+(1-H)\\\\ln(1-p).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/98141843c7029d988f28ad6f672780ffb339e3e9)\n\nThe Fisher information matrix has only one component (it is a scalar, because there is only one parameter: *p*), therefore:\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\sqrt {{\\\\mathcal {I}}(p)}}&={\\\\sqrt {\\\\operatorname {E} \\\\!\\\\left\\[\\\\left({\\\\frac {d}{dp}}\\\\ln {\\\\mathcal {L}}(p\\\\mid H)\\\\right)^{2}\\\\right\\]}}\\\\\\\\\\[6pt\\]&={\\\\sqrt {\\\\operatorname {E} \\\\!\\\\left\\[\\\\left({\\\\frac {H}{p}}-{\\\\frac {1-H}{1-p}}\\\\right)^{2}\\\\right\\]}}\\\\\\\\\\[6pt\\]&={\\\\sqrt {p^{1}(1-p)^{0}\\\\left({\\\\frac {1}{p}}-{\\\\frac {0}{1-p}}\\\\right)^{2}+p^{0}(1-p)^{1}\\\\left({\\\\frac {0}{p}}-{\\\\frac {1}{1-p}}\\\\right)^{2}}}\\\\\\\\&={\\\\frac {1}{\\\\sqrt {p(1-p)}}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0c2541cc4a3017abaab79170bd990ca92a64bc89)\n\nSimilarly, for the [Binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\") with *n* [Bernoulli trials](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_trials \"Bernoulli trials\"), it can be shown that\n\n![{\\\\displaystyle {\\\\sqrt {{\\\\mathcal {I}}(p)}}={\\\\sqrt {\\\\frac {n}{p(1-p)}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/636cdf433bd3d97188042cbcea0ad5a750f1479b)\n\nThus, for the [Bernoulli](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_distribution \"Bernoulli distribution\"), and [Binomial distributions](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\"), [Jeffreys prior](https:\/\/en.wikipedia.org\/wiki\/Jeffreys_prior \"Jeffreys prior\") is proportional to ![{\\\\displaystyle \\\\scriptstyle {\\\\frac {1}{\\\\sqrt {p(1-p)}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a42c718aec7a9fce451ad65deb17197b0d516c56), which happens to be proportional to a beta distribution with domain variable *x* = *p*, and shape parameters α = β = 1\/2, the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\"):\n\n![{\\\\displaystyle \\\\operatorname {Beta} ({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}})={\\\\frac {1}{\\\\pi {\\\\sqrt {p(1-p)}}}}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/d350c13d6df17923903b2a413a7fdc29dee48898)\n\nIt will be shown in the next section that the normalizing constant for Jeffreys prior is immaterial to the final result because the normalizing constant cancels out in Bayes' theorem for the posterior probability. Hence Beta(1\/2,1\/2) is used as the Jeffreys prior for both Bernoulli and binomial distributions. As shown in the next section, when using this expression as a prior probability times the likelihood in [Bayes' theorem](https:\/\/en.wikipedia.org\/wiki\/Bayes%27_theorem \"Bayes' theorem\"), the posterior probability turns out to be a beta distribution. It is important to realize, however, that Jeffreys prior is proportional to ![{\\\\textstyle {\\\\frac {1}{\\\\sqrt {p(1-p)}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6a70f13cbeb0fa71cd4a677a4028f252f343dd49) for the Bernoulli and binomial distribution, but not for the beta distribution. Jeffreys prior for the beta distribution is given by the determinant of Fisher's information for the beta distribution, which, as shown in the [§ Fisher information matrix](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Fisher_information_matrix) is a function of the [trigamma function](https:\/\/en.wikipedia.org\/wiki\/Trigamma_function \"Trigamma function\") ψ1 of shape parameters α and β as follows:\n\n![{\\\\displaystyle {\\\\begin{aligned}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&={\\\\sqrt {\\\\psi \\_{1}(\\\\alpha )\\\\psi \\_{1}(\\\\beta )-(\\\\psi \\_{1}(\\\\alpha )+\\\\psi \\_{1}(\\\\beta ))\\\\psi \\_{1}(\\\\alpha +\\\\beta )}}\\\\\\\\\\\\lim \\_{\\\\alpha \\\\to 0}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&=\\\\lim \\_{\\\\beta \\\\to 0}{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}=\\\\infty \\\\\\\\\\\\lim \\_{\\\\alpha \\\\to \\\\infty }{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}&=\\\\lim \\_{\\\\beta \\\\to \\\\infty }{\\\\sqrt {\\\\det({\\\\mathcal {I}}(\\\\alpha ,\\\\beta ))}}=0\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/07c5b390f9d9ecda1e940272b7746e29edcb4bc3)\n\nAs previously discussed, Jeffreys prior for the Bernoulli and binomial distributions is proportional to the [arcsine distribution](https:\/\/en.wikipedia.org\/wiki\/Arcsine_distribution \"Arcsine distribution\") Beta(1\/2,1\/2), a one-dimensional *curve* that looks like a basin as a function of the parameter *p* of the Bernoulli and binomial distributions. The walls of the basin are formed by *p* approaching the singularities at the ends *p* → 0 and *p* → 1, where Beta(1\/2,1\/2) approaches infinity. Jeffreys prior for the beta distribution is a *2-dimensional surface* (embedded in a three-dimensional space) that looks like a basin with only two of its walls meeting at the corner α = β = 0 (and missing the other two walls) as a function of the shape parameters α and β of the beta distribution. The two adjoining walls of this 2-dimensional surface are formed by the shape parameters α and β approaching the singularities (of the trigamma function) at α, β → 0. It has no walls for α, β → ∞ because in this case the determinant of Fisher's information matrix for the beta distribution approaches zero.\n\nIt will be shown in the next section that Jeffreys prior probability results in posterior probabilities (when multiplied by the binomial likelihood function) that are intermediate between the posterior probability results of the Haldane and Bayes prior probabilities.\n\nJeffreys prior may be difficult to obtain analytically, and for some cases it just doesn't exist (even for simple distribution functions like the asymmetric [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\")). Berger, Bernardo and Sun, in a 2009 paper[\\[66\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BergerBernardoSun-66) defined a reference prior probability distribution that (unlike Jeffreys prior) exists for the asymmetric [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\"). They cannot obtain a closed-form expression for their reference prior, but numerical calculations show it to be nearly perfectly fitted by the (proper) prior\n\n![{\\\\displaystyle \\\\operatorname {Beta} ({\\\\tfrac {1}{2}},{\\\\tfrac {1}{2}})\\\\sim {\\\\frac {1}{\\\\sqrt {\\\\theta (1-\\\\theta )}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/48f294007f1cccbad01219b7034dd9156ad4c18e)\n\nwhere θ is the vertex variable for the asymmetric triangular distribution with support \\[0, 1\\] (corresponding to the following parameter values in Wikipedia's article on the [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\"): vertex *c* = *θ*, left end *a* = 0, and right end *b* = 1). Berger et al. also give a heuristic argument that Beta(1\/2,1\/2) could indeed be the exact Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution. Therefore, Beta(1\/2,1\/2) not only is Jeffreys prior for the Bernoulli and binomial distributions, but also seems to be the Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution (for which the Jeffreys prior does not exist), a distribution used in project management and [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\") analysis to describe the cost and duration of project tasks.\n\nClarke and Barron[\\[67\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-67) prove that, among continuous positive priors, Jeffreys prior (when it exists) asymptotically maximizes Shannon's [mutual information](https:\/\/en.wikipedia.org\/wiki\/Mutual_information \"Mutual information\") between a sample of size n and the parameter, and therefore *Jeffreys prior is the most uninformative prior* (measuring information as Shannon information). The proof rests on an examination of the [Kullback–Leibler divergence](https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence \"Kullback–Leibler divergence\") between probability density functions for [iid](https:\/\/en.wikipedia.org\/wiki\/Iid \"Iid\") random variables.\n\n#### Effect of different prior probability choices on the posterior beta distribution\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=65 \"Edit section: Effect of different prior probability choices on the posterior beta distribution\")\\]\n\nIf samples are drawn from the population of a random variable *X* that result in *s* successes and *f* failures in *n* [Bernoulli trials](https:\/\/en.wikipedia.org\/wiki\/Bernoulli_trial \"Bernoulli trial\") *n* = *s* + *f*, then the [likelihood function](https:\/\/en.wikipedia.org\/wiki\/Likelihood_function \"Likelihood function\") for parameters *s* and *f* given *x* = *p* (the notation *x* = *p* in the expressions below will emphasize that the domain *x* stands for the value of the parameter *p* in the binomial distribution), is the following [binomial distribution](https:\/\/en.wikipedia.org\/wiki\/Binomial_distribution \"Binomial distribution\"):\n\n![{\\\\displaystyle {\\\\mathcal {L}}(s,f\\\\mid x=p)={s+f \\\\choose s}x^{s}(1-x)^{f}={n \\\\choose s}x^{s}(1-x)^{n-s}.}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/666e972b7052ad62f6852366654b0aa56a1a4933)\n\nIf beliefs about [prior probability](https:\/\/en.wikipedia.org\/wiki\/Prior_probability \"Prior probability\") information are reasonably well approximated by a beta distribution with parameters *α* Prior and *β* Prior, then:\n\n![{\\\\displaystyle {\\\\operatorname {PriorProbability} }(x=p;\\\\alpha \\\\operatorname {Prior} ,\\\\beta \\\\operatorname {Prior} )={\\\\frac {x^{\\\\alpha \\\\operatorname {Prior} -1}(1-x)^{\\\\beta \\\\operatorname {Prior} -1}}{\\\\mathrm {B} (\\\\alpha \\\\operatorname {Prior} ,\\\\beta \\\\operatorname {Prior} )}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ea1f1fc91d398fc32f6c45a16e039b093b3b4cdd)\n\nAccording to [Bayes' theorem](https:\/\/en.wikipedia.org\/wiki\/Bayes%27_theorem \"Bayes' theorem\") for a continuous event space, the [posterior probability](https:\/\/en.wikipedia.org\/wiki\/Posterior_probability \"Posterior probability\") density is given by the product of the [prior probability](https:\/\/en.wikipedia.org\/wiki\/Prior_probability \"Prior probability\") and the likelihood function (given the evidence *s* and *f* = *n* − *s*), normalized so that the area under the curve equals one, as follows:\n\n![{\\\\displaystyle {\\\\begin{aligned}&{\\\\text{posterior probability density}}(x=p\\\\mid s,n-s)\\\\\\\\\\[6pt\\]={}&{\\\\frac {\\\\operatorname {priorprobabilitydensity} (x=p;\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} ){\\\\mathcal {L}}(s,f\\\\mid x=p)}{\\\\int \\_{0}^{1}{\\\\text{prior probability density}}(x=p;\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} ){\\\\mathcal {L}}(s,f\\\\mid x=p)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {{n \\\\choose s}x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\/\\\\mathrm {B} (\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} )}{\\\\int \\_{0}^{1}\\\\left({n \\\\choose s}x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\/\\\\mathrm {B} (\\\\alpha \\\\operatorname {prior} ,\\\\beta \\\\operatorname {prior} )\\\\right)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}}{\\\\int \\_{0}^{1}\\\\left(x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}\\\\right)\\\\,dx}}\\\\\\\\\\[6pt\\]={}&{\\\\frac {x^{s+\\\\alpha \\\\operatorname {prior} -1}(1-x)^{n-s+\\\\beta \\\\operatorname {prior} -1}}{\\\\mathrm {B} (s+\\\\alpha \\\\operatorname {prior} ,n-s+\\\\beta \\\\operatorname {prior} )}}.\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/569ad0317cf545ff98538c8a845f216120e87c08)\n\nThe [binomial coefficient](https:\/\/en.wikipedia.org\/wiki\/Binomial_coefficient \"Binomial coefficient\")\n\n![{\\\\displaystyle {s+f \\\\choose s}={n \\\\choose s}={\\\\frac {(s+f)!}{s!f!}}={\\\\frac {n!}{s!(n-s)!}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bb56b3dc1365cfbe97c5e0f873d573c01086980)\n\nappears both in the numerator and the denominator of the posterior probability, and it does not depend on the integration variable *x*, hence it cancels out, and it is irrelevant to the final result. Similarly the normalizing factor for the prior probability, the beta function B(αPrior,βPrior) cancels out and it is immaterial to the final result. The same posterior probability result can be obtained if one uses an un-normalized prior\n\n![{\\\\displaystyle x^{\\\\alpha \\\\operatorname {prior} -1}(1-x)^{\\\\beta \\\\operatorname {prior} -1}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/584f3f9c85036812a61ff48df8d78592be8af911)\n\nbecause the normalizing factors all cancel out. Several authors (including Jeffreys himself) thus use an un-normalized prior formula since the normalization constant cancels out. The numerator of the posterior probability ends up being just the (un-normalized) product of the prior probability and the likelihood function, and the denominator is its integral from zero to one. The beta function in the denominator, B(*s* + *α* Prior, *n* − *s* + *β* Prior), appears as a normalization constant to ensure that the total posterior probability integrates to unity.\n\nThe ratio *s*\/*n* of the number of successes to the total number of trials is a [sufficient statistic](https:\/\/en.wikipedia.org\/wiki\/Sufficient_statistic \"Sufficient statistic\") in the binomial case, which is relevant for the following results.\n\nFor the **Bayes'** prior probability (Beta(1,1)), the posterior probability is:\n\n![{\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={\\\\frac {x^{s}(1-x)^{n-s}}{\\\\mathrm {B} (s+1,n-s+1)}},{\\\\text{ with mean }}={\\\\frac {s+1}{n+2}},{\\\\text{ (and mode}}={\\\\frac {s}{n}}{\\\\text{ if }}0\\<s\\<n).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/673b752f999f342d723a5283c10bf9e701cd6350)\n\nFor the **Jeffreys'** prior probability (Beta(1\/2,1\/2)), the posterior probability is:\n\n![{\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={x^{s-{\\\\tfrac {1}{2}}}(1-x)^{n-s-{\\\\frac {1}{2}}} \\\\over \\\\mathrm {B} (s+{\\\\tfrac {1}{2}},n-s+{\\\\tfrac {1}{2}})},{\\\\text{ with mean}}={\\\\frac {s+{\\\\tfrac {1}{2}}}{n+1}},{\\\\text{ (and mode}}={\\\\frac {s-{\\\\tfrac {1}{2}}}{n-1}}{\\\\text{ if }}{\\\\tfrac {1}{2}}\\<s\\<n-{\\\\tfrac {1}{2}}).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/cc61802878e4c0ab0fbe64a6b5d9cb1c4fce80a5)\n\nand for the **Haldane** prior probability (Beta(0,0)), the posterior probability is:\n\n![{\\\\displaystyle \\\\operatorname {posteriorprobability} (p=x\\\\mid s,f)={\\\\frac {x^{s-1}(1-x)^{n-s-1}}{\\\\mathrm {B} (s,n-s)}},{\\\\text{ with mean}}={\\\\frac {s}{n}},{\\\\text{ (and mode}}={\\\\frac {s-1}{n-2}}{\\\\text{ if }}1\\<s\\<n-1).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/79e9ef205b4458b9b10d3b81c34cf848e9598de4)\n\nFrom the above expressions it follows that for *s*\/*n* = 1\/2) all the above three prior probabilities result in the identical location for the posterior probability mean = mode = 1\/2. For *s*\/*n* \\< 1\/2, the mean of the posterior probabilities, using the following priors, are such that: mean for Bayes prior \\> mean for Jeffreys prior \\> mean for Haldane prior. For *s*\/*n* \\> 1\/2 the order of these inequalities is reversed such that the Haldane prior probability results in the largest posterior mean. The *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the \"next\" trial) identical to the ratio *s*\/*n* of the number of successes to the total number of trials. Therefore, the Haldane prior results in a posterior probability with expected value in the next trial equal to the maximum likelihood. The *Bayes* prior probability Beta(1,1) results in a posterior probability density with *mode* identical to the ratio *s*\/*n* (the maximum likelihood).\n\nIn the case that 100% of the trials have been successful *s* = *n*, the *Bayes* prior probability Beta(1,1) results in a posterior expected value equal to the rule of succession (*n* + 1)\/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of 1 (absolute certainty of success in the next trial). Jeffreys prior probability results in a posterior expected value equal to (*n* + 1\/2)\/(*n* + 1). Perks[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) (p. 303) points out: \"This provides a new rule of succession and expresses a 'reasonable' position to take up, namely, that after an unbroken run of n successes we assume a probability for the next trial equivalent to the assumption that we are about half-way through an average run, i.e. that we expect a failure once in (2*n* + 2) trials. The Bayes–Laplace rule implies that we are about at the end of an average run or that we expect a failure once in (*n* + 2) trials. The comparison clearly favours the new result (what is now called Jeffreys prior) from the point of view of 'reasonableness'.\"\n\nConversely, in the case that 100% of the trials have resulted in failure (*s* = 0), the *Bayes* prior probability Beta(1,1) results in a posterior expected value for success in the next trial equal to 1\/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of success in the next trial of 0 (absolute certainty of failure in the next trial). Jeffreys prior probability results in a posterior expected value for success in the next trial equal to (1\/2)\/(*n* + 1), which Perks[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) (p. 303) points out: \"is a much more reasonably remote result than the Bayes–Laplace result 1\/(*n* + 2)\".\n\nJaynes[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) questions (for the Haldane prior Beta(0,0)) the use of these formulas for the cases *s* = 0 or *s* = *n* because the integrals do not converge (Beta(0,0) is an improper prior for *s* = 0 or *s* = *n*). In practice, the conditions 0\\<s\\<n necessary for a mode to exist between both ends for the Bayes prior are usually met, and therefore the Bayes prior (as long as 0 \\< *s* \\< *n*) results in a posterior mode located between both ends of the domain.\n\nAs remarked in the section on the rule of succession, K. Pearson showed that after *n* successes in *n* trials the posterior probability (based on the Bayes Beta(1,1) distribution as the prior probability) that the next (*n* + 1) trials will all be successes is exactly 1\/2, whatever the value of *n*. Based on the Haldane Beta(0,0) distribution as the prior probability, this posterior probability is 1 (absolute certainty that after n successes in *n* trials the next (*n* + 1) trials will all be successes). Perks[\\[61\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Perks-61) (p. 303) shows that, for what is now known as the Jeffreys prior, this probability is ((*n* + 1\/2)\/(*n* + 1))((*n* + 3\/2)\/(*n* + 2))...(2*n* + 1\/2)\/(2*n* + 1), which for *n* = 1, 2, 3 gives 15\/24, 315\/480, 9009\/13440; rapidly approaching a limiting value of ![{\\\\displaystyle 1\/{\\\\sqrt {2}}=0.70710678\\\\ldots }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8d8b88897e116bacec037ef4ef4711a4ce0305bd) as n tends to infinity. Perks remarks that what is now known as the Jeffreys prior: \"is clearly more 'reasonable' than either the Bayes–Laplace result or the result on the (Haldane) alternative rule rejected by Jeffreys which gives certainty as the probability. It clearly provides a very much better correspondence with the process of induction. Whether it is 'absolutely' reasonable for the purpose, i.e. whether it is yet large enough, without the absurdity of reaching unity, is a matter for others to decide. But it must be realized that the result depends on the assumption of complete indifference and absence of knowledge prior to the sampling experiment.\"\n\nFollowing are the variances of the posterior distribution obtained with these three prior probability distributions:\n\nfor the **Bayes'** prior probability (Beta(1,1)), the posterior variance is:\n\n![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s+1)(s+1)}{(3+n)(2+n)^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in variance}}={\\\\frac {1}{12+4n}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f8b0a181eec5d97c53b6a05b2f88aa63372e0cb2)\n\nfor the **Jeffreys'** prior probability (Beta(1\/2,1\/2)), the posterior variance is:\n\n![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s+{\\\\frac {1}{2}})(s+{\\\\frac {1}{2}})}{(2+n)(1+n)^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in var}}={\\\\frac {1}{8+4n}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8e259bf78be07a741013e0a7b30627f49bde4437)\n\nand for the **Haldane** prior probability (Beta(0,0)), the posterior variance is:\n\n![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {(n-s)s}{(1+n)n^{2}}},{\\\\text{ which for }}s={\\\\frac {n}{2}}{\\\\text{ results in variance}}={\\\\frac {1}{4+4n}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/61213e281fd4a510a97341e147dee5077bb04a0f)\n\nSo, as remarked by Silvey,[\\[50\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Silvey-50) for large *n*, the variance is small and hence the posterior distribution is highly concentrated, whereas the assumed prior distribution was very diffuse. This is in accord with what one would hope for, as vague prior knowledge is transformed (through Bayes' theorem) into a more precise posterior knowledge by an informative experiment. For small *n* the Haldane Beta(0,0) prior results in the largest posterior variance while the Bayes Beta(1,1) prior results in the more concentrated posterior. Jeffreys prior Beta(1\/2,1\/2) results in a posterior variance in between the other two. As *n* increases, the variance rapidly decreases so that the posterior variance for all three priors converges to approximately the same value (approaching zero variance as *n* → ∞). Recalling the previous result that the *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the \"next\" trial) identical to the ratio s\/n of the number of successes to the total number of trials, it follows from the above expression that also the *Haldane* prior Beta(0,0) results in a posterior with *variance* identical to the variance expressed in terms of the max. likelihood estimate s\/n and sample size (in [§ Variance](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Variance)):\n\n![{\\\\displaystyle {\\\\text{variance}}={\\\\frac {\\\\mu (1-\\\\mu )}{1+\\\\nu }}={\\\\frac {(n-s)s}{(1+n)n^{2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/ad6c410c1c6a0ed633fa773ddcc91a55eaa68fdc)\n\nwith the mean *μ* = *s*\/*n* and the sample size *ν* = *n*.\n\nIn Bayesian inference, using a [prior distribution](https:\/\/en.wikipedia.org\/wiki\/Prior_distribution \"Prior distribution\") Beta(*α*Prior,*β*Prior) prior to a binomial distribution is equivalent to adding (*α*Prior − 1) pseudo-observations of \"success\" and (*β*Prior − 1) pseudo-observations of \"failure\" to the actual number of successes and failures observed, then estimating the parameter *p* of the binomial distribution by the proportion of successes over both real- and pseudo-observations. A uniform prior Beta(1,1) does not add (or subtract) any pseudo-observations since for Beta(1,1) it follows that (*α*Prior − 1) = 0 and (*β*Prior − 1) = 0. The Haldane prior Beta(0,0) subtracts one pseudo observation from each and Jeffreys prior Beta(1\/2,1\/2) subtracts 1\/2 pseudo-observation of success and an equal number of failure. This subtraction has the effect of [smoothing](https:\/\/en.wikipedia.org\/wiki\/Smoothing \"Smoothing\") out the posterior distribution. If the proportion of successes is not 50% (*s*\/*n* ≠ 1\/2) values of *α*Prior and *β*Prior less than 1 (and therefore negative (*α*Prior − 1) and (*β*Prior − 1)) favor sparsity, i.e. distributions where the parameter *p* is closer to either 0 or 1. In effect, values of *α*Prior and *β*Prior between 0 and 1, when operating together, function as a [concentration parameter](https:\/\/en.wikipedia.org\/wiki\/Concentration_parameter \"Concentration parameter\").\n\nThe accompanying plots show the posterior probability density functions for sample sizes *n* ∈ {3,10,50}, successes *s* ∈ {*n*\/2,*n*\/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1\/2,1\/2),Beta(1,1)}. Also shown are the cases for *n* = {4,12,40}, success *s* = {*n*\/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1\/2,1\/2),Beta(1,1)}. The first plot shows the symmetric cases, for successes *s* ∈ {n\/2}, with mean = mode = 1\/2 and the second plot shows the skewed cases *s* ∈ {*n*\/4}. The images show that there is little difference between the priors for the posterior with sample size of 50 (characterized by a more pronounced peak near *p* = 1\/2). Significant differences appear for very small sample sizes (in particular for the flatter distribution for the degenerate case of sample size = 3). Therefore, the skewed cases, with successes *s* = {*n*\/4}, show a larger effect from the choice of prior, at small sample size, than the symmetric cases. For symmetric distributions, the Bayes prior Beta(1,1) results in the most \"peaky\" and highest posterior distributions and the Haldane prior Beta(0,0) results in the flattest and lowest peak distribution. The Jeffreys prior Beta(1\/2,1\/2) lies in between them. For nearly symmetric, not too skewed distributions the effect of the priors is similar. For very small sample size (in this case for a sample size of 3) and skewed distribution (in this example for *s* ∈ {*n*\/4}) the Haldane prior can result in a reverse-J-shaped distribution with a singularity at the left end. However, this happens only in degenerate cases (in this example *n* = 3 and hence *s* = 3\/4 \\< 1, a degenerate value because s should be greater than unity in order for the posterior of the Haldane prior to have a mode located between the ends, and because *s* = 3\/4 is not an integer number, hence it violates the initial assumption of a binomial distribution for the likelihood) and it is not an issue in generic cases of reasonable sample size (such that the condition 1 \\< *s* \\< *n* − 1, necessary for a mode to exist between both ends, is fulfilled).\n\nIn Chapter 12 (p. 385) of his book, Jaynes[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) asserts that the *Haldane prior* Beta(0,0) describes a *prior state of knowledge of complete ignorance*, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure, while the *Bayes (uniform) prior Beta(1,1) applies if* one knows that *both binary outcomes are possible*. Jaynes states: \"*interpret the Bayes–Laplace (Beta(1,1)) prior as describing not a state of complete ignorance*, but the state of knowledge in which we have observed one success and one failure...once we have seen at least one success and one failure, then we know that the experiment is a true binary one, in the sense of physical possibility.\" Jaynes [\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) does not specifically discuss Jeffreys prior Beta(1\/2,1\/2) (Jaynes discussion of \"Jeffreys prior\" on pp. 181, 423 and on chapter 12 of Jaynes book[\\[52\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jaynes-52) refers instead to the improper, un-normalized, prior \"1\/*p* *dp*\" introduced by Jeffreys in the 1939 edition of his book,[\\[59\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Jeffreys-59) seven years before he introduced what is now known as Jeffreys' invariant prior: the square root of the determinant of Fisher's information matrix. *\"1\/p\" is Jeffreys' (1946) invariant prior for the [exponential distribution](https:\/\/en.wikipedia.org\/wiki\/Exponential_distribution \"Exponential distribution\"), not for the Bernoulli or binomial distributions*). However, it follows from the above discussion that Jeffreys Beta(1\/2,1\/2) prior represents a state of knowledge in between the Haldane Beta(0,0) and Bayes Beta (1,1) prior.\n\nSimilarly, [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") in his 1892 book [The Grammar of Science](https:\/\/en.wikipedia.org\/wiki\/The_Grammar_of_Science \"The Grammar of Science\")[\\[68\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-PearsonGrammar-68)[\\[69\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-PearsnGrammar2009-69) (p. 144 of 1900 edition) maintained that the Bayes (Beta(1,1) uniform prior was not a complete ignorance prior, and that it should be used when prior information justified to \"distribute our ignorance equally\"\". K. Pearson wrote: \"Yet the only supposition that we appear to have made is this: that, knowing nothing of nature, routine and anomy (from the Greek ανομία, namely: a- \"without\", and nomos \"law\") are to be considered as equally likely to occur. Now we were not really justified in making even this assumption, for it involves a knowledge that we do not possess regarding nature. We use our *experience* of the constitution and action of coins in general to assert that heads and tails are equally probable, but we have no right to assert before experience that, as we know nothing of nature, routine and breach are equally probable. In our ignorance we ought to consider before experience that nature may consist of all routines, all anomies (normlessness), or a mixture of the two in any proportion whatever, and that all such are equally probable. Which of these constitutions after experience is the most probable must clearly depend on what that experience has been like.\"\n\nIf there is sufficient [sampling data](https:\/\/en.wikipedia.org\/wiki\/Sample_\\(statistics\\) \"Sample (statistics)\"), *and the posterior probability mode is not located at one of the extremes of the domain* (*x* = 0 or *x* = 1), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1\/2,1\/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https:\/\/en.wikipedia.org\/wiki\/Posterior_probability \"Posterior probability\") densities. Otherwise, as Gelman et al.[\\[70\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Gelman-70) (p. 65) point out, \"if so few data are available that the choice of noninformative prior distribution makes a difference, one should put relevant information into the prior distribution\", or as Berger[\\[4\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BergerDecisionTheory-4) (p. 125) points out \"when different reasonable priors yield substantially different answers, can it be right to state that there *is* a single answer? Would it not be better to admit that there is scientific uncertainty, with the conclusion depending on prior beliefs?.\"\n\n## Occurrence and applications\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=66 \"Edit section: Occurrence and applications\")\\]\n\nThe beta distribution has an important application in the theory of [order statistics](https:\/\/en.wikipedia.org\/wiki\/Order_statistic \"Order statistic\"). A basic result is that the distribution of the *k*th smallest of a sample of size *n* from a continuous [uniform distribution](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") has a beta distribution.[\\[40\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David1-40) This result is summarized as\n\n![{\\\\displaystyle U\\_{(k)}\\\\sim \\\\operatorname {Beta} (k,n+1-k).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/28bbb16c265c1830b57c4bb5eb375ef984b360f2)\n\nFrom this, and application of the theory related to the [probability integral transform](https:\/\/en.wikipedia.org\/wiki\/Probability_integral_transform \"Probability integral transform\"), the distribution of any individual order statistic from any [continuous distribution](https:\/\/en.wikipedia.org\/wiki\/Continuous_distribution \"Continuous distribution\") can be derived.[\\[40\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David1-40)\n\nIn standard logic, propositions are considered to be either true or false. In contradistinction, [subjective logic](https:\/\/en.wikipedia.org\/wiki\/Subjective_logic \"Subjective logic\") assumes that humans cannot determine with absolute certainty whether a proposition about the real world is absolutely true or false. In [subjective logic](https:\/\/en.wikipedia.org\/wiki\/Subjective_logic \"Subjective logic\") the [posteriori](https:\/\/en.wikipedia.org\/wiki\/A_posteriori \"A posteriori\") probability estimates of binary events can be represented by beta distributions.[\\[71\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-J01-71)\n\nA [wavelet](https:\/\/en.wikipedia.org\/wiki\/Wavelet \"Wavelet\") is a wave-like [oscillation](https:\/\/en.wikipedia.org\/wiki\/Oscillation \"Oscillation\") with an [amplitude](https:\/\/en.wikipedia.org\/wiki\/Amplitude \"Amplitude\") that starts out at zero, increases, and then decreases back to zero. It can typically be visualized as a \"brief oscillation\" that promptly decays. Wavelets can be used to extract information from many different kinds of data, including – but certainly not limited to – audio signals and images. Thus, wavelets are purposefully crafted to have specific properties that make them useful for [signal processing](https:\/\/en.wikipedia.org\/wiki\/Signal_processing \"Signal processing\"). Wavelets are localized in both time and [frequency](https:\/\/en.wikipedia.org\/wiki\/Frequency \"Frequency\") whereas the standard [Fourier transform](https:\/\/en.wikipedia.org\/wiki\/Fourier_transform \"Fourier transform\") is only localized in frequency. Therefore, standard Fourier Transforms are only applicable to [stationary processes](https:\/\/en.wikipedia.org\/wiki\/Stationary_process \"Stationary process\"), while [wavelets](https:\/\/en.wikipedia.org\/wiki\/Wavelet \"Wavelet\") are applicable to non-[stationary processes](https:\/\/en.wikipedia.org\/wiki\/Stationary_process \"Stationary process\"). Continuous wavelets can be constructed based on the beta distribution. [Beta wavelets](https:\/\/en.wikipedia.org\/wiki\/Beta_wavelet \"Beta wavelet\")[\\[72\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-wavelet_oliveira-72) can be viewed as a soft variety of [Haar wavelets](https:\/\/en.wikipedia.org\/wiki\/Haar_wavelet \"Haar wavelet\") whose shape is fine-tuned by two shape parameters α and β.\n\n### Population genetics\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=70 \"Edit section: Population genetics\")\\]\n\nThe [Balding–Nichols model](https:\/\/en.wikipedia.org\/wiki\/Balding%E2%80%93Nichols_model \"Balding–Nichols model\") is a two-parameter [parametrization](https:\/\/en.wikipedia.org\/wiki\/Statistical_parameter \"Statistical parameter\") of the beta distribution used in [population genetics](https:\/\/en.wikipedia.org\/wiki\/Population_genetics \"Population genetics\").[\\[73\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Balding-73) It is a statistical description of the [allele frequencies](https:\/\/en.wikipedia.org\/wiki\/Allele_frequencies \"Allele frequencies\") in the components of a sub-divided population:\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\alpha &=\\\\mu \\\\nu ,\\\\\\\\\\\\beta &=(1-\\\\mu )\\\\nu ,\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/49c1e37dd960ad3b68ef0eaf2ab5fafd4b2209c8) where ![{\\\\displaystyle \\\\nu =\\\\alpha +\\\\beta ={\\\\frac {1-F}{F}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/59b55447a019bc1bc8f690546bbb416e0dbece32) and ![{\\\\displaystyle 0\\<F\\<1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/0f7e5ad065befe4700eeb8b5a154a1c558b44373); here *F* is (Wright's) genetic distance between two populations.\n\n### Project management: task cost and schedule modeling\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=71 \"Edit section: Project management: task cost and schedule modeling\")\\]\n\nThe beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the [triangular distribution](https:\/\/en.wikipedia.org\/wiki\/Triangular_distribution \"Triangular distribution\") — is used extensively in [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\"), [critical path method](https:\/\/en.wikipedia.org\/wiki\/Critical_path_method \"Critical path method\") (CPM), Joint Cost Schedule Modeling (JCSM) and other [project management](https:\/\/en.wikipedia.org\/wiki\/Project_management \"Project management\")\/control systems to describe the time to completion and the cost of a task. In project management, shorthand computations are widely used to estimate the [mean](https:\/\/en.wikipedia.org\/wiki\/Mean \"Mean\") and [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\") of the beta distribution:[\\[39\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Malcolm-39)\n\n![{\\\\displaystyle {\\\\begin{aligned}\\\\mu (X)&={\\\\frac {a+4b+c}{6}}\\\\\\\\\\[8pt\\]\\\\sigma (X)&={\\\\frac {c-a}{6}}\\\\end{aligned}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/7a89a68d1250ebe659be15e88edb5a9eb3e0cf87)\n\nwhere *a* is the minimum, *c* is the maximum, and *b* is the most likely value (the [mode](https:\/\/en.wikipedia.org\/wiki\/Mode_\\(statistics\\) \"Mode (statistics)\") for *α* \\> 1 and *β* \\> 1).\n\nThe above estimate for the [mean](https:\/\/en.wikipedia.org\/wiki\/Mean \"Mean\") ![{\\\\displaystyle \\\\mu (X)={\\\\frac {a+4b+c}{6}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/55659b9a1a4f5b15000858659deca16e38dc01fe) is known as the [PERT](https:\/\/en.wikipedia.org\/wiki\/PERT \"PERT\") [three-point estimation](https:\/\/en.wikipedia.org\/wiki\/Three-point_estimation \"Three-point estimation\") and it is exact for either of the following values of *β* (for arbitrary α within these ranges):\n\n*β* = *α* \\> 1 (symmetric case) with [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\") ![{\\\\displaystyle \\\\sigma (X)={\\\\frac {c-a}{2{\\\\sqrt {1+2\\\\alpha }}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/023fd07dfd3669cde84d9cff72b6a6af3d8ffbab), [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") = 0, and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") = ![{\\\\displaystyle {\\\\frac {-6}{3+2\\\\alpha }}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/85f97b76a4804eeb490db5708337a448815594f5)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/0\/0b\/Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg\/250px-Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg)\n\nor\n\n*β* = 6 − *α* for 5 \\> *α* \\> 1 (skewed case) with [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\")\n\n![{\\\\displaystyle \\\\sigma (X)={\\\\frac {(c-a){\\\\sqrt {\\\\alpha (6-\\\\alpha )}}}{6{\\\\sqrt {7}}}},}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3feac4fe845f042c246d8822d848826f95ac9a8d)\n\n[skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\")![{\\\\displaystyle {}={\\\\frac {(3-\\\\alpha ){\\\\sqrt {7}}}{2{\\\\sqrt {\\\\alpha (6-\\\\alpha )}}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2dbd8720e8cbd9cb4b15fc3118e52d63c1d50e32), and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\")![{\\\\displaystyle {}={\\\\frac {21}{\\\\alpha (6-\\\\alpha )}}-3}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/246152d7c35d8e6cd42c1f13aca7419321bf3e7c)\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/b9\/Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg\/250px-Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg)\n\nThe above estimate for the [standard deviation](https:\/\/en.wikipedia.org\/wiki\/Standard_deviation \"Standard deviation\") *σ*(*X*) = (*c* − *a*)\/6 is exact for either of the following values of *α* and *β*:\n\n*α* = *β* = 4 (symmetric) with [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\") = 0, and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") = −6\/11.\n\n*β* = 6 − *α* and ![{\\\\displaystyle \\\\alpha =3-{\\\\sqrt {2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/92a7b6780ceaf0f8749f685b72f15717f3eb5495) (right-tailed, positive skew) with [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\")![{\\\\displaystyle {}={\\\\frac {1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/819e2bc3e97fb50bb05cf64ab50f15ecfc960dd4), and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") = 0\n\n*β* = 6 − *α* and ![{\\\\displaystyle \\\\alpha =3+{\\\\sqrt {2}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/208c420c6769274760ca91cb89d5ed807c49688e) (left-tailed, negative skew) with [skewness](https:\/\/en.wikipedia.org\/wiki\/Skewness \"Skewness\")![{\\\\displaystyle {}={\\\\frac {-1}{\\\\sqrt {2}}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3bda0dcb0efa535b7484dbc63376664e24b20cdb), and [excess kurtosis](https:\/\/en.wikipedia.org\/wiki\/Excess_kurtosis \"Excess kurtosis\") = 0\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/f\/fc\/Beta_Distribution_for_conjugate_alpha_beta.svg\/250px-Beta_Distribution_for_conjugate_alpha_beta.svg.png)](https:\/\/en.wikipedia.org\/wiki\/File:Beta_Distribution_for_conjugate_alpha_beta.svg)\n\nOtherwise, these can be poor approximations for beta distributions with other values of α and β, exhibiting average errors of 40% in the mean and 549% in the variance.[\\[74\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-74)[\\[75\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-75)[\\[76\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-76)\n\n## Random variate generation\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=72 \"Edit section: Random variate generation\")\\]\n\nIf *X* and *Y* are independent, with ![{\\\\displaystyle X\\\\sim \\\\Gamma (\\\\alpha ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/3cc24e37f0c7d4fb01955a76c7a624840eb4ccb0) and ![{\\\\displaystyle Y\\\\sim \\\\Gamma (\\\\beta ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/47a84f49e72d95c70ecdb7f3dcfc9dff7df0bd79) then\n\n![{\\\\displaystyle {\\\\frac {X}{X+Y}}\\\\sim \\\\mathrm {B} (\\\\alpha ,\\\\beta ).}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9bd2b6dea25aeefae71af31b8751b855d8848543)\n\nSo one algorithm for generating beta variates is to generate ![{\\\\displaystyle {\\\\frac {X}{X+Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42bc8b3f6470f35915b2708902dab39673407a1e), where *X* is a [gamma variate](https:\/\/en.wikipedia.org\/wiki\/Gamma_distribution#Random_variate_generation \"Gamma distribution\") with parameters (α, 1) and *Y* is an independent gamma variate with parameters (β, 1).[\\[77\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-77) In fact, here ![{\\\\displaystyle {\\\\frac {X}{X+Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42bc8b3f6470f35915b2708902dab39673407a1e) and ![{\\\\displaystyle X+Y}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/191744cf9cddeff3ab2e750e22bcfce7766d355e) are independent, and ![{\\\\displaystyle X+Y\\\\sim \\\\Gamma (\\\\alpha +\\\\beta ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/9d88f9521cace3408c444bb15dea5a4a3c2072d9). If ![{\\\\displaystyle Z\\\\sim \\\\Gamma (\\\\gamma ,\\\\theta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/997d2e51e143bd9423b61aa7bc76d4caa6366cf2) and ![{\\\\displaystyle Z}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1cc6b75e09a8aa3f04d8584b11db534f88fb56bd) is independent of ![{\\\\displaystyle X}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/68baa052181f707c662844a465bfeeb135e82bab) and ![{\\\\displaystyle Y}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/961d67d6b454b4df2301ac571808a3538b3a6d3f), then ![{\\\\displaystyle {\\\\frac {X+Y}{X+Y+Z}}\\\\sim \\\\mathrm {B} (\\\\alpha +\\\\beta ,\\\\gamma )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/bd91125d334bb959e8ebf89504c23f7066be3b54) and ![{\\\\displaystyle {\\\\frac {X+Y}{X+Y+Z}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6d5f48f66d0085cac1f622af4b9443c532d8c5eb) is independent of ![{\\\\displaystyle {\\\\frac {X}{X+Y}}}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/42bc8b3f6470f35915b2708902dab39673407a1e). This shows that the product of independent ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1fea4d61abd27c28412c65add2f028b57b17fb12) and ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha +\\\\beta ,\\\\gamma )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/2f2504a4ad7a50cab34808b42e35ff1b73e863b9) random variables is a ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta +\\\\gamma )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/8f991b0924c7dcba924e3123313bc19f762b4e79) random variable.\n\nAlso, the *k*th [order statistic](https:\/\/en.wikipedia.org\/wiki\/Order_statistic \"Order statistic\") of *n* [uniformly distributed](https:\/\/en.wikipedia.org\/wiki\/Uniform_distribution_\\(continuous\\) \"Uniform distribution (continuous)\") variates is ![{\\\\displaystyle \\\\mathrm {B} (k,n+1-k)}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6f6a88b3a90aa4cd75f5a828258720d1f4bcaa79), so an alternative if *α* and *β* are small integers is to generate α + β − 1 uniform variates and choose the α-th smallest.[\\[40\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David1-40)\n\nAnother way to generate the Beta distribution is by [Pólya urn model](https:\/\/en.wikipedia.org\/wiki\/P%C3%B3lya_urn_model \"Pólya urn model\"). According to this method, one starts with an \"urn\" with α \"black\" balls and β \"white\" balls and draws uniformly with replacement. Every trial an additional ball is added according to the color of the last ball which was drawn. Asymptotically, the proportion of black and white balls will be distributed according to the Beta distribution, where each repetition of the experiment will produce a different value.\n\nIt is also possible to use the [inverse transform sampling](https:\/\/en.wikipedia.org\/wiki\/Inverse_transform_sampling \"Inverse transform sampling\").\n\n## Normal approximation to the Beta distribution\n\\[[edit](https:\/\/en.wikipedia.org\/w\/index.php?title=Beta_distribution&action=edit&section=73 \"Edit section: Normal approximation to the Beta distribution\")\\]\n\nA beta distribution ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1fea4d61abd27c28412c65add2f028b57b17fb12) with ![{\\\\displaystyle \\\\alpha \\\\sim \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/6c257de6795c5c1704c89812118c26f38d33c81b) and ![{\\\\displaystyle \\\\alpha }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\\\displaystyle \\\\beta \\>\\>1}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/f71342cab85bc922316af9012de28319cc26047d) is approximately normal with mean ![{\\\\displaystyle 1\/2}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/e308a3a46b7fdce07cc09dcab9e8d8f73e37d935) and variance ![{\\\\displaystyle 1\/(4(2\\\\alpha +1))}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/a5f494b7d9160e6d47406af3660f84f19a500d75). If ![{\\\\displaystyle \\\\alpha \\\\geq \\\\beta }](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/fa366317721f774c7361a27f615e297691cf5468) the normal approximation can be improved by taking the cube-root of the logarithm of the reciprocal of ![{\\\\displaystyle \\\\mathrm {B} (\\\\alpha ,\\\\beta )}](https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/svg\/1fea4d61abd27c28412c65add2f028b57b17fb12)[\\[78\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-78)[\\[79\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-79)\n\n[Thomas Bayes](https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes \"Thomas Bayes\"), in a posthumous paper [\\[62\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-ThomasBayes-62) published in 1763 by [Richard Price](https:\/\/en.wikipedia.org\/wiki\/Richard_Price \"Richard Price\"), obtained a beta distribution as the density of the probability of success in Bernoulli trials (see [§ Applications, Bayesian inference](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#Applications,_Bayesian_inference)), but the paper does not analyze any of the moments of the beta distribution or discuss any of its properties.\n\n[![](https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/2\/21\/Karl_Pearson_2.jpg\/250px-Karl_Pearson_2.jpg)](https:\/\/en.wikipedia.org\/wiki\/File:Karl_Pearson_2.jpg)\n\n[Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") analyzed the beta distribution as the solution Type I of Pearson distributions\n\nThe first systematic modern discussion of the beta distribution is probably due to [Karl Pearson](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\").[\\[80\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-80)[\\[81\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-rscat-81) In Pearson's papers[\\[21\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson-21)[\\[33\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1895-33) the beta distribution is couched as a solution of a differential equation: [Pearson's Type I distribution](https:\/\/en.wikipedia.org\/wiki\/Pearson_distribution \"Pearson distribution\") which it is essentially identical to except for arbitrary shifting and re-scaling (the beta and Pearson Type I distributions can always be equalized by proper choice of parameters). In fact, in several English books and journal articles in the few decades prior to World War II, it was common to refer to the beta distribution as Pearson's Type I distribution. [William P. Elderton](https:\/\/en.wikipedia.org\/wiki\/William_Palin_Elderton \"William Palin Elderton\") in his 1906 monograph \"Frequency curves and correlation\"[\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) further analyzes the beta distribution as Pearson's Type I distribution, including a full discussion of the method of moments for the four parameter case, and diagrams of (what Elderton describes as) U-shaped, J-shaped, twisted J-shaped, \"cocked-hat\" shapes, horizontal and angled straight-line cases. Elderton wrote \"I am chiefly indebted to Professor Pearson, but the indebtedness is of a kind for which it is impossible to offer formal thanks.\" [Elderton](https:\/\/en.wikipedia.org\/wiki\/William_Palin_Elderton \"William Palin Elderton\") in his 1906 monograph [\\[42\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Elderton1906-42) provides an impressive amount of information on the beta distribution, including equations for the origin of the distribution chosen to be the mode, as well as for other Pearson distributions: types I through VII. Elderton also included a number of appendixes, including one appendix (\"II\") on the beta and gamma functions. In later editions, Elderton added equations for the origin of the distribution chosen to be the mean, and analysis of Pearson distributions VIII through XII.\n\nAs remarked by Bowman and Shenton[\\[44\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-BowmanShenton-44) \"Fisher and Pearson had a difference of opinion in the approach to (parameter) estimation, in particular relating to (Pearson's method of) moments and (Fisher's method of) maximum likelihood in the case of the Beta distribution.\" Also according to Bowman and Shenton, \"the case of a Type I (beta distribution) model being the center of the controversy was pure serendipity. A more difficult model of 4 parameters would have been hard to find.\" The long running public conflict of Fisher with Karl Pearson can be followed in a number of articles in prestigious journals. For example, concerning the estimation of the four parameters for the beta distribution, and Fisher's criticism of Pearson's method of moments as being arbitrary, see Pearson's article \"Method of moments and method of maximum likelihood\" [\\[45\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-Pearson1936-45) (published three years after his retirement from University College, London, where his position had been divided between Fisher and Pearson's son Egon) in which Pearson writes \"I read (Koshai's paper in the Journal of the Royal Statistical Society, 1933) which as far as I am aware is the only case at present published of the application of Professor Fisher's method. To my astonishment that method depends on first working out the constants of the frequency curve by the (Pearson) Method of Moments and then superposing on it, by what Fisher terms \"the Method of Maximum Likelihood\" a further approximation to obtain, what he holds, he will thus get, 'more efficient values' of the curve constants\".\n\nDavid and Edwards's treatise on the history of statistics[\\[82\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-David_History-82) cites the first modern treatment of the beta distribution, in 1911,[\\[83\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-83) using the beta designation that has become standard, due to [Corrado Gini](https:\/\/en.wikipedia.org\/wiki\/Corrado_Gini \"Corrado Gini\"), an Italian [statistician](https:\/\/en.wikipedia.org\/wiki\/Statistician \"Statistician\"), [demographer](https:\/\/en.wikipedia.org\/wiki\/Demography \"Demography\"), and [sociologist](https:\/\/en.wikipedia.org\/wiki\/Sociology \"Sociology\"), who developed the [Gini coefficient](https:\/\/en.wikipedia.org\/wiki\/Gini_coefficient \"Gini coefficient\"). [N.L.Johnson](https:\/\/en.wikipedia.org\/wiki\/Norman_Lloyd_Johnson \"Norman Lloyd Johnson\") and [S.Kotz](https:\/\/en.wikipedia.org\/wiki\/Samuel_Kotz \"Samuel Kotz\"), in their comprehensive and very informative monograph[\\[84\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-84) on leading historical personalities in statistical sciences credit [Corrado Gini](https:\/\/en.wikipedia.org\/wiki\/Corrado_Gini \"Corrado Gini\")[\\[85\\]](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_note-85) as \"an early Bayesian...who dealt with the problem of eliciting the parameters of an initial Beta distribution, by singling out techniques which anticipated the advent of the so-called empirical Bayes approach.\"\n\n1. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-5) [***g***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-6) [***h***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-7) [***i***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-8) [***j***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-9) [***k***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-10) [***l***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-11) [***m***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-12) [***n***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-13) [***o***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-14) [***p***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-15) [***q***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-16) [***r***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-17) [***s***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-18) [***t***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-19) [***u***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-20) [***v***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-21) [***w***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-22) [***x***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-23) [***y***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JKB_1-24)\n   Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). \"Chapter 25: Beta Distributions\". *Continuous Univariate Distributions Vol. 2* (2nd ed.). Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0-471-58494-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-471-58494-0 \"Special:BookSources\/978-0-471-58494-0\")\n   \n   .\n2. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-1)\n   Rose, Colin; Smith, Murray D. (2002). *Mathematical Statistics with MATHEMATICA*. Springer. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0387952345](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0387952345 \"Special:BookSources\/978-0387952345\")\n   \n   .\n3. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2011_3-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2011_3-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2011_3-2)\n   [Kruschke, John K.](https:\/\/en.wikipedia.org\/wiki\/John_K._Kruschke \"John K. Kruschke\") (2011). *Doing Bayesian data analysis: A tutorial with R and BUGS*. Academic Press \/ Elsevier. p. 83. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0123814852](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0123814852 \"Special:BookSources\/978-0123814852\")\n   \n   .\n4. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BergerDecisionTheory_4-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BergerDecisionTheory_4-1)\n   Berger, James O. (2010). *Statistical Decision Theory and Bayesian Analysis* (2nd ed.). Springer. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-1441930743](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1441930743 \"Special:BookSources\/978-1441930743\")\n   \n   .\n5. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Feller_5-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Feller_5-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Feller_5-2)\n   Feller, William (1971). [*An Introduction to Probability Theory and Its Applications, Vol. 2*](https:\/\/archive.org\/details\/introductiontopr00fell). Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0471257097](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471257097 \"Special:BookSources\/978-0471257097\")\n   \n   .\n6. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-6)**\n   Wadsworth, G. P. (1960). [*Introduction to Probability and Random Variables*](https:\/\/archive.org\/details\/introductiontopr0000wads). New York: McGraw-Hill. p. [52](https:\/\/archive.org\/details\/introductiontopr0000wads\/page\/52).\n7. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kruschke2015_7-0)**\n   [Kruschke, John K.](https:\/\/en.wikipedia.org\/wiki\/John_K._Kruschke \"John K. Kruschke\") (2015). *Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan*. Academic Press \/ Elsevier. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0-12-405888-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-12-405888-0 \"Special:BookSources\/978-0-12-405888-0\")\n   \n   .\n8. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Wadsworth_8-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Wadsworth_8-1)\n   Wadsworth, George P. and Joseph Bryan (1960). [*Introduction to Probability and Random Variables*](https:\/\/archive.org\/details\/introductiontopr0000wads). McGraw-Hill.\n9. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-5) [***g***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-6)\n   Gupta, Arjun K., ed. (2004). *Handbook of Beta Distribution and Its Applications*. CRC Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n   \n   [978-0824753962](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0824753962 \"Special:BookSources\/978-0824753962\")\n   \n   .\n10. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kerman2011_10-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kerman2011_10-1)\n    Kerman, Jouni (2011). \"A closed-form approximation for the median of the beta distribution\". [arXiv](https:\/\/en.wikipedia.org\/wiki\/ArXiv_\\(identifier\\) \"ArXiv (identifier)\"):[1111\\.0433](https:\/\/arxiv.org\/abs\/1111.0433) \\[[math.ST](https:\/\/arxiv.org\/archive\/math.ST)\\].\n11. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-MostellerTukey_11-0)**\n    Mosteller, Frederick and John Tukey (1977). [*Data Analysis and Regression: A Second Course in Statistics*](https:\/\/archive.org\/details\/dataanalysisregr0000most). Addison-Wesley Pub. Co. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1977dars.book.....M](https:\/\/ui.adsabs.harvard.edu\/abs\/1977dars.book.....M). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0201048544](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0201048544 \"Special:BookSources\/978-0201048544\")\n    \n    .\n12. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-WillyFeller1_12-0)**\n    Feller, William (1968). *An Introduction to Probability Theory and Its Applications*. Vol. 1 (3rd ed.). Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471257080](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471257080 \"Special:BookSources\/978-0471257080\")\n    \n    .\n13. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-13)** Philip J. Fleming and John J. Wallace. *How not to lie with statistics: the correct way to summarize benchmark results*. Communications of the ACM, 29(3):218–221, March 1986.\n14. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-14)**\n    [\"NIST\/SEMATECH e-Handbook of Statistical Methods 1.3.6.6.17. Beta Distribution\"](http:\/\/www.itl.nist.gov\/div898\/handbook\/eda\/section3\/eda366h.htm). *[National Institute of Standards and Technology](https:\/\/en.wikipedia.org\/wiki\/National_Institute_of_Standards_and_Technology \"National Institute of Standards and Technology\") Information Technology Laboratory*. April 2012. Retrieved May 31, 2016.\n15. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Oguamanam_15-0)**\n    Oguamanam, D.C.D.; Martin, H. R.; Huissoon, J. P. (1995). \"On the application of the beta distribution to gear damage analysis\". *Applied Acoustics*. **45** (3): 247–261\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1016\/0003-682X(95)00001-P](https:\/\/doi.org\/10.1016%2F0003-682X%2895%2900001-P).\n16. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Liang_16-0)**\n    Zhiqiang Liang; Jianming Wei; Junyu Zhao; Haitao Liu; Baoqing Li; Jie Shen; Chunlei Zheng (27 August 2008). [\"The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals\"](https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3705491). *Sensors*. **8** (8): 5106–5119\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2008Senso...8.5106L](https:\/\/ui.adsabs.harvard.edu\/abs\/2008Senso...8.5106L). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.3390\/s8085106](https:\/\/doi.org\/10.3390%2Fs8085106). [PMC](https:\/\/en.wikipedia.org\/wiki\/PMC_\\(identifier\\) \"PMC (identifier)\") [3705491](https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3705491). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [27873804](https:\/\/pubmed.ncbi.nlm.nih.gov\/27873804).\n17. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Kenney_and_Keeping_17-0)**\n    Kenney, J. F., and E. S. Keeping (1951). *Mathematics of Statistics Part Two, 2nd edition*. D. Van Nostrand Company Inc.\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n18. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Abramowitz_18-3)\n    Abramowitz, Milton and Irene A. Stegun (1965). [*Handbook Of Mathematical Functions With Formulas, Graphs, And Mathematical Tables*](https:\/\/archive.org\/details\/handbookofmathe000abra). Dover. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0-486-61272-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-486-61272-0 \"Special:BookSources\/978-0-486-61272-0\")\n    \n    .\n19. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Weisstein.Kurtosi_19-0)**\n    Weisstein., Eric W. [\"Kurtosis\"](http:\/\/mathworld.wolfram.com\/Kurtosis.html). MathWorld--A Wolfram Web Resource. Retrieved 13 August 2012.\n20. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Panik_20-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Panik_20-1)\n    Panik, Michael J (2005). *Advanced Statistics from an Elementary Point of View*. Academic Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0120884940](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0120884940 \"Special:BookSources\/978-0120884940\")\n    \n    .\n21. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson_21-5)\n    [Pearson, Karl](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") (1916). [\"Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation\"](https:\/\/doi.org\/10.1098%2Frsta.1916.0009). *Philosophical Transactions of the Royal Society A*. **216** (538–548\\): 429–457\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1916RSPTA.216..429P](https:\/\/ui.adsabs.harvard.edu\/abs\/1916RSPTA.216..429P). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rsta.1916.0009](https:\/\/doi.org\/10.1098%2Frsta.1916.0009). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [91092](https:\/\/www.jstor.org\/stable\/91092).\n22. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Zwillinger_2014_22-0)**\n    [Gradshteyn, Izrail Solomonovich](https:\/\/en.wikipedia.org\/wiki\/Izrail_Solomonovich_Gradshteyn \"Izrail Solomonovich Gradshteyn\"); [Ryzhik, Iosif Moiseevich](https:\/\/en.wikipedia.org\/wiki\/Iosif_Moiseevich_Ryzhik \"Iosif Moiseevich Ryzhik\"); [Geronimus, Yuri Veniaminovich](https:\/\/en.wikipedia.org\/wiki\/Yuri_Veniaminovich_Geronimus \"Yuri Veniaminovich Geronimus\"); [Tseytlin, Michail Yulyevich](https:\/\/en.wikipedia.org\/wiki\/Michail_Yulyevich_Tseytlin \"Michail Yulyevich Tseytlin\"); Jeffrey, Alan (2015) \\[October 2014\\]. Zwillinger, Daniel; [Moll, Victor Hugo](https:\/\/en.wikipedia.org\/wiki\/Victor_Hugo_Moll \"Victor Hugo Moll\") (eds.). [*Table of Integrals, Series, and Products*](https:\/\/en.wikipedia.org\/wiki\/Gradshteyn_and_Ryzhik \"Gradshteyn and Ryzhik\"). Translated by Scripta Technica, Inc. (8 ed.). [Academic Press, Inc.](https:\/\/en.wikipedia.org\/wiki\/Academic_Press,_Inc. \"Academic Press, Inc.\") [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0-12-384933-5](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-12-384933-5 \"Special:BookSources\/978-0-12-384933-5\")\n    \n    . [LCCN](https:\/\/en.wikipedia.org\/wiki\/LCCN_\\(identifier\\) \"LCCN (identifier)\") [2014010276](https:\/\/lccn.loc.gov\/2014010276).\n23. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-23)**\n    Billingsley, Patrick (1995). \"Section 30: The Method of Moments\". *Probability and measure* (3rd ed.). Wiley-Interscience. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0-471-00710-4](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0-471-00710-4 \"Special:BookSources\/978-0-471-00710-4\")\n    \n    .\n24. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-MacKay_24-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-MacKay_24-1)\n    MacKay, David (2003). *Information Theory, Inference and Learning Algorithms*. Cambridge University Press; First Edition. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2003itil.book.....M](https:\/\/ui.adsabs.harvard.edu\/abs\/2003itil.book.....M). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0521642989](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0521642989 \"Special:BookSources\/978-0521642989\")\n    \n    .\n25. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JohnsonLogInv_25-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JohnsonLogInv_25-1)\n    Johnson, N.L. (1949). [\"Systems of frequency curves generated by methods of translation\"](http:\/\/dml.cz\/bitstream\/handle\/10338.dmlcz\/135506\/Kybernetika_39-2003-1_3.pdf) (PDF). *Biometrika*. **36** (1–2\\): 149–176\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1093\/biomet\/36.1-2.149](https:\/\/doi.org\/10.1093%2Fbiomet%2F36.1-2.149). [hdl](https:\/\/en.wikipedia.org\/wiki\/Hdl_\\(identifier\\) \"Hdl (identifier)\"):[10338\\.dmlcz\/135506](https:\/\/hdl.handle.net\/10338.dmlcz%2F135506). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [18132090](https:\/\/pubmed.ncbi.nlm.nih.gov\/18132090).\n26. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-26)**\n    Verdugo Lazo, A. C. G.; Rathie, P. N. (1978). \"On the entropy of continuous probability distributions\". *IEEE Trans. Inf. Theory*. **24** (1): 120–122\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1109\/TIT.1978.1055832](https:\/\/doi.org\/10.1109%2FTIT.1978.1055832).\n27. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-27)**\n    Shannon, Claude E. (1948). \"A Mathematical Theory of Communication\". *Bell System Technical Journal*. **27** (4): 623–656\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1002\/j.1538-7305.1948.tb01338.x](https:\/\/doi.org\/10.1002%2Fj.1538-7305.1948.tb01338.x).\n28. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Cover_and_Thomas_28-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Cover_and_Thomas_28-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Cover_and_Thomas_28-2)\n    Cover, Thomas M. and Joy A. Thomas (2006). *Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing)*. Wiley-Interscience; 2 edition. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471241959](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471241959 \"Special:BookSources\/978-0471241959\")\n    \n    .\n29. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Plunkett_29-0)**\n    Plunkett, Kim, and Jeffrey Elman (1997). [*Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations (Neural Network Modeling and Connectionism)*](https:\/\/archive.org\/details\/exercisesinrethi0000plun). A Bradford Book. p. 166. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0262661058](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0262661058 \"Special:BookSources\/978-0262661058\")\n    \n    .\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n30. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Nallapati_30-0)**\n    Nallapati, Ramesh (2006). [*The smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval*](http:\/\/maroo.cs.umass.edu\/pub\/web\/getpdf.php?id=679) (Thesis). Computer Science Dept., University of Massachusetts Amherst.\n31. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Egon_31-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Egon_31-1)\n    Pearson, Egon S. (July 1969). [\"Some historical reflections traced through the development of the use of frequency curves\"](http:\/\/www.smu.edu\/Dedman\/Academics\/Departments\/Statistics\/Research\/TechnicalReports). *THEMIS Statistical Analysis Research Program, Technical Report 38*. Office of Naval Research, Contract N000014-68-A-0515 (Project NR 042–260).\n32. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Hahn_and_Shapiro_32-0)**\n    Hahn, Gerald J.; Shapiro, S. (1994). *Statistical Models in Engineering (Wiley Classics Library)*. Wiley-Interscience. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471040651](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471040651 \"Special:BookSources\/978-0471040651\")\n    \n    .\n33. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1895_33-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1895_33-1)\n    [Pearson, Karl](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\") (1895). [\"Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material\"](https:\/\/doi.org\/10.1098%2Frsta.1895.0010). *Philosophical Transactions of the Royal Society*. **186**: 343–414\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1895RSPTA.186..343P](https:\/\/ui.adsabs.harvard.edu\/abs\/1895RSPTA.186..343P). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rsta.1895.0010](https:\/\/doi.org\/10.1098%2Frsta.1895.0010). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [90649](https:\/\/www.jstor.org\/stable\/90649).\n34. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-34)**\n    Buchanan, K.; Rockway, J.; Sternberg, O.; Mai, N. N. (May 2016). [\"Sum-difference beamforming for radar applications using circularly tapered random arrays\"](https:\/\/zenodo.org\/record\/1279364). *2016 IEEE Radar Conference (RadarConf)*. pp. 1–5\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1109\/RADAR.2016.7485289](https:\/\/doi.org\/10.1109%2FRADAR.2016.7485289). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1-5090-0863-6](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-5090-0863-6 \"Special:BookSources\/978-1-5090-0863-6\")\n    \n    . [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [32525626](https:\/\/api.semanticscholar.org\/CorpusID:32525626).\n35. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-35)**\n    Buchanan, K.; Flores, C.; Wheeland, S.; Jensen, J.; Grayson, D.; Huff, G. (May 2017). \"Transmit beamforming for radar applications using circularly tapered random arrays\". *2017 IEEE Radar Conference (RadarConf)*. pp. 0112–0117\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1109\/RADAR.2017.7944181](https:\/\/doi.org\/10.1109%2FRADAR.2017.7944181). [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1-4673-8823-8](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-4673-8823-8 \"Special:BookSources\/978-1-4673-8823-8\")\n    \n    . [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [38429370](https:\/\/api.semanticscholar.org\/CorpusID:38429370).\n36. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-36)**\n    Ryan, Buchanan, Kristopher (2014-05-29). [\"Theory and Applications of Aperiodic (Random) Phased Arrays\"](http:\/\/oaktrust.library.tamu.edu\/handle\/1969.1\/157918).\n    \n    `{{cite web}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n37. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pham-Gia2000_37-0)**\n    Pham-Gia, T. (January 2000). [\"Distributions of the ratios of independent beta variables and applications\"](https:\/\/doi.org\/10.1080\/03610920008832632). *Communications in Statistics - Theory and Methods*. **29** (12): 2693–2715\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1080\/03610920008832632](https:\/\/doi.org\/10.1080%2F03610920008832632). [ISSN](https:\/\/en.wikipedia.org\/wiki\/ISSN_\\(identifier\\) \"ISSN (identifier)\") [0361-0926](https:\/\/search.worldcat.org\/issn\/0361-0926). Retrieved 13 November 2024.\n38. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-NewPERT_38-0)** Herrerías-Velasco, José Manuel and Herrerías-Pleguezuelo, Rafael and René van Dorp, Johan. (2011). Revisiting the PERT mean and Variance. European Journal of Operational Research (210), p. 448–451.\n39. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Malcolm_39-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Malcolm_39-1)\n    Malcolm, D. G.; Roseboom, J. H.; Clark, C. E.; Fazar, W. (September–October 1958). \"Application of a Technique for Research and Development Program Evaluation\". *Operations Research*. **7** (5): 646–669\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1287\/opre.7.5.646](https:\/\/doi.org\/10.1287%2Fopre.7.5.646). [ISSN](https:\/\/en.wikipedia.org\/wiki\/ISSN_\\(identifier\\) \"ISSN (identifier)\") [0030-364X](https:\/\/search.worldcat.org\/issn\/0030-364X).\n40. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David1_40-3)\n    David, H. A., Nagaraja, H. N. (2003) *Order Statistics* (3rd Edition). Wiley, New Jersey pp 458. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [0-471-38926-9](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/0-471-38926-9 \"Special:BookSources\/0-471-38926-9\")\n41. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-41)**\n    [\"1.3.6.6.17. Beta Distribution\"](https:\/\/www.itl.nist.gov\/div898\/handbook\/eda\/section3\/eda366h.htm). *www.itl.nist.gov*.\n42. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-5) [***g***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-6) [***h***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton1906_42-7)\n    Elderton, William Palin (1906). [*Frequency-Curves and Correlation*](https:\/\/archive.org\/details\/frequencycurvesc00elderich). Charles and Edwin Layton (London).\n43. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Elderton_and_Johnson_43-0)**\n    Elderton, William Palin and Norman Lloyd Johnson (2009). *Systems of Frequency Curves*. Cambridge University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0521093361](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0521093361 \"Special:BookSources\/978-0521093361\")\n    \n    .\n44. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BowmanShenton_44-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BowmanShenton_44-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BowmanShenton_44-2)\n    [Bowman, K. O.](https:\/\/en.wikipedia.org\/wiki\/Kimiko_O._Bowman \"Kimiko O. Bowman\"); Shenton, L. R. (2007). [\"The beta distribution, moment method, Karl Pearson and R.A. Fisher\"](http:\/\/www.csm.ornl.gov\/~bowman\/fjts232.pdf) (PDF). *Far East J. Theo. Stat*. **23** (2): 133–164\\.\n45. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1936_45-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Pearson1936_45-1)\n    Pearson, Karl (June 1936). \"Method of moments and method of maximum likelihood\". *Biometrika*. **28** (1\/2): 34–59\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.2307\/2334123](https:\/\/doi.org\/10.2307%2F2334123). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [2334123](https:\/\/www.jstor.org\/stable\/2334123).\n46. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Joanes_and_Gill_46-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Joanes_and_Gill_46-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Joanes_and_Gill_46-2)\n    Joanes, D. N.; C. A. Gill (1998). \"Comparing measures of sample skewness and kurtosis\". *The Statistician*. **47** (Part 1): 183–189\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1111\/1467-9884.00122](https:\/\/doi.org\/10.1111%2F1467-9884.00122).\n47. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-47)**\n    Beckman, R. J.; G. L. Tietjen (1978). \"Maximum likelihood estimation for the beta distribution\". *Journal of Statistical Computation and Simulation*. **7** (3–4\\): 253–258\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1080\/00949657808810232](https:\/\/doi.org\/10.1080%2F00949657808810232).\n48. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-48)**\n    Gnanadesikan, R., Pinkham and Hughes (1967). \"Maximum likelihood estimation of the parameters of the beta distribution from smallest order statistics\". *Technometrics*. **9** (4): 607–620\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.2307\/1266199](https:\/\/doi.org\/10.2307%2F1266199). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [1266199](https:\/\/www.jstor.org\/stable\/1266199).\n    \n    `{{cite journal}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n49. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-invpsi.m_49-0)**\n    Fackler, Paul. [\"Inverse Digamma Function (Matlab)\"](http:\/\/hips.seas.harvard.edu\/content\/inverse-digamma-function-matlab). Harvard University School of Engineering and Applied Sciences. Retrieved 2012-08-18.\n50. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Silvey_50-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Silvey_50-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Silvey_50-2)\n    Silvey, S.D. (1975). *Statistical Inference*. Chapman and Hal. p. 40. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0412138201](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0412138201 \"Special:BookSources\/978-0412138201\")\n    \n    .\n51. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-EdwardsLikelihood_51-0)**\n    Edwards, A. W. F. (1992). *Likelihood*. The Johns Hopkins University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0801844430](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0801844430 \"Special:BookSources\/978-0801844430\")\n    \n    .\n52. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-3) [***e***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-4) [***f***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jaynes_52-5)\n    Jaynes, E.T. (2003). *Probability theory, the logic of science*. Cambridge University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0521592710](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0521592710 \"Special:BookSources\/978-0521592710\")\n    \n    .\n53. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-CostaCover_53-0)**\n    Costa, Max, and Cover, Thomas (September 1983). [*On the similarity of the entropy power inequality and the Brunn Minkowski inequality*](https:\/\/isl.stanford.edu\/people\/cover\/papers\/transIT\/0837cost.pdf) (PDF). Tech.Report 48, Dept. Statistics, Stanford University.\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n54. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Aryal_54-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Aryal_54-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Aryal_54-2)\n    Aryal, Gokarna; Saralees Nadarajah (2004). [\"Information matrix for beta distributions\"](http:\/\/www.math.bas.bg\/serdica\/2004\/2004-513-526.pdf) (PDF). *Serdica Mathematical Journal (Bulgarian Academy of Science)*. **30**: 513–526\\.\n55. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Laplace_55-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Laplace_55-1)\n    Laplace, Pierre Simon, marquis de (1902). [*A philosophical essay on probabilities*](https:\/\/archive.org\/details\/philosophicaless00lapliala). New York : J. Wiley; London : Chapman & Hall. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1-60206-328-0](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1-60206-328-0 \"Special:BookSources\/978-1-60206-328-0\")\n    \n    .\n    \n    CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n56. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-CoxRT_56-0)**\n    Cox, Richard T. (1961). *Algebra of Probable Inference*. The Johns Hopkins University Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0801869822](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0801869822 \"Special:BookSources\/978-0801869822\")\n    \n    .\n57. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-KeynesTreatise_57-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-KeynesTreatise_57-1)\n    Keynes, John Maynard (2010) \\[1921\\]. *A Treatise on Probability: The Connection Between Philosophy and the History of Science*. Wildside Press. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1434406965](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1434406965 \"Special:BookSources\/978-1434406965\")\n    \n    .\n58. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-PearsonRuleSuccession_58-0)**\n    Pearson, Karl (1907). \"On the Influence of Past Experience on Future Expectation\". *Philosophical Magazine*. **6** (13): 365–378\\.\n59. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Jeffreys_59-3)\n    Jeffreys, Harold (1998). *Theory of Probability*. Oxford University Press, 3rd edition. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0198503682](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0198503682 \"Special:BookSources\/978-0198503682\")\n    \n    .\n60. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BroadMind_60-0)**\n    Broad, C. D. (October 1918). \"On the relation between induction and probability\". *MIND, A Quarterly Review of Psychology and Philosophy*. 27 (New Series) (108): 389–404\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1093\/mind\/XXVII.4.389](https:\/\/doi.org\/10.1093%2Fmind%2FXXVII.4.389). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [2249035](https:\/\/www.jstor.org\/stable\/2249035).\n61. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-1) [***c***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-2) [***d***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Perks_61-3)\n    Perks, Wilfred (January 1947). [\"Some observations on inverse probability including a new indifference rule\"](https:\/\/web.archive.org\/web\/20140112111032\/http:\/\/www.actuaries.org.uk\/research-and-resources\/documents\/some-observations-inverse-probability-including-new-indifference-ru). *Journal of the Institute of Actuaries*. **73** (2): 285–334\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1017\/S0020268100012270](https:\/\/doi.org\/10.1017%2FS0020268100012270). Archived from [the original](http:\/\/www.actuaries.org.uk\/research-and-resources\/documents\/some-observations-inverse-probability-including-new-indifference-ru) on 2014-01-12. Retrieved 2012-09-19.\n62. ^ [***a***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-ThomasBayes_62-0) [***b***](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-ThomasBayes_62-1)\n    Bayes, Thomas; communicated by Richard Price (1763). [\"An Essay towards solving a Problem in the Doctrine of Chances\"](https:\/\/doi.org\/10.1098%2Frstl.1763.0053). *Philosophical Transactions of the Royal Society*. **53**: 370–418\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rstl.1763.0053](https:\/\/doi.org\/10.1098%2Frstl.1763.0053). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [105741](https:\/\/www.jstor.org\/stable\/105741).\n63. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-63)**\n    [Haldane, J. B. S.](https:\/\/en.wikipedia.org\/wiki\/J._B._S._Haldane \"J. B. S. Haldane\") (1932). \"A note on inverse probability\". *[Mathematical Proceedings of the Cambridge Philosophical Society](https:\/\/en.wikipedia.org\/wiki\/Mathematical_Proceedings_of_the_Cambridge_Philosophical_Society \"Mathematical Proceedings of the Cambridge Philosophical Society\")*. **28** (1): 55–61\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1932PCPS...28...55H](https:\/\/ui.adsabs.harvard.edu\/abs\/1932PCPS...28...55H). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1017\/s0305004100010495](https:\/\/doi.org\/10.1017%2Fs0305004100010495). [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [122773707](https:\/\/api.semanticscholar.org\/CorpusID:122773707).\n64. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Zellner_64-0)**\n    Zellner, Arnold (1971). *An Introduction to Bayesian Inference in Econometrics*. Wiley-Interscience. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471169376](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471169376 \"Special:BookSources\/978-0471169376\")\n    \n    .\n65. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-JeffreysPRIOR_65-0)**\n    Jeffreys, Harold (September 1946). [\"An Invariant Form for the Prior Probability in Estimation Problems\"](https:\/\/doi.org\/10.1098%2Frspa.1946.0056). *Proceedings of the Royal Society*. A 24. **186** (1007): 453–461\\. [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[1946RSPSA.186..453J](https:\/\/ui.adsabs.harvard.edu\/abs\/1946RSPSA.186..453J). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rspa.1946.0056](https:\/\/doi.org\/10.1098%2Frspa.1946.0056). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [20998741](https:\/\/pubmed.ncbi.nlm.nih.gov\/20998741).\n66. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-BergerBernardoSun_66-0)**\n    Berger, James; Bernardo, Jose; Sun, Dongchu (2009). [\"The formal definition of reference priors\"](http:\/\/projecteuclid.org\/DPubS\/Repository\/1.0\/Disseminate?view=body&id=pdfview_1&handle=euclid.aos\/1236693154). *The Annals of Statistics*. **37** (2): 905–938\\. [arXiv](https:\/\/en.wikipedia.org\/wiki\/ArXiv_\\(identifier\\) \"ArXiv (identifier)\"):[0904\\.0156](https:\/\/arxiv.org\/abs\/0904.0156). [Bibcode](https:\/\/en.wikipedia.org\/wiki\/Bibcode_\\(identifier\\) \"Bibcode (identifier)\"):[2009arXiv0904.0156B](https:\/\/ui.adsabs.harvard.edu\/abs\/2009arXiv0904.0156B). [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1214\/07-AOS587](https:\/\/doi.org\/10.1214%2F07-AOS587). [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [3221355](https:\/\/api.semanticscholar.org\/CorpusID:3221355).\n67. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-67)**\n    Clarke, Bertrand S.; Andrew R. Barron (1994). [\"Jeffreys' prior is asymptotically least favorable under entropy risk\"](http:\/\/www.stat.yale.edu\/~arb4\/publications_files\/jeffery's%20prior.pdf) (PDF). *Journal of Statistical Planning and Inference*. **41**: 37–60\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1016\/0378-3758(94)90153-8](https:\/\/doi.org\/10.1016%2F0378-3758%2894%2990153-8).\n68. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-PearsonGrammar_68-0)**\n    Pearson, Karl (1892). [*The Grammar of Science*](https:\/\/books.google.com\/books?id=IvdsEcFwcnsC&q=grammar+of+science&pg=PR19). Walter Scott, London.\n69. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-PearsnGrammar2009_69-0)**\n    Pearson, Karl (2009). *The Grammar of Science*. BiblioLife. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1110356119](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1110356119 \"Special:BookSources\/978-1110356119\")\n    \n    .\n70. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Gelman_70-0)**\n    Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). *Bayesian Data Analysis*. Chapman and Hall\/CRC. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-1584883883](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-1584883883 \"Special:BookSources\/978-1584883883\")\n    \n    .\n    \n    `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https:\/\/en.wikipedia.org\/wiki\/Category:CS1_maint:_multiple_names:_authors_list \"Category:CS1 maint: multiple names: authors list\"))\n71. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-J01_71-0)**\n    Jøsang, Audun (2001). [\"A logic for uncertain probabilities\"](https:\/\/scholar.archive.org\/work\/nilorkzfvjccjir72m75zk3pgy). *International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems*. **9** (3): 279–311\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1142\/S0218488501000831](https:\/\/doi.org\/10.1142%2FS0218488501000831). [MR](https:\/\/en.wikipedia.org\/wiki\/MR_\\(identifier\\) \"MR (identifier)\") [1843261](https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=1843261).\n72. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-wavelet_oliveira_72-0)** H.M. de Oliveira and G.A.A. Araújo,. Compactly Supported One-cyclic Wavelets Derived from Beta Distributions. *Journal of Communication and Information Systems.* vol.20, n.3, pp.27-33, 2005.\n73. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-Balding_73-0)**\n    [Balding, David J.](https:\/\/en.wikipedia.org\/wiki\/David_Balding \"David Balding\"); Nichols, Richard A. (1995). \"A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity\". *Genetica*. **96** (1–2\\). Springer: 3–12\\. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1007\/BF01441146](https:\/\/doi.org\/10.1007%2FBF01441146). [PMID](https:\/\/en.wikipedia.org\/wiki\/PMID_\\(identifier\\) \"PMID (identifier)\") [7607457](https:\/\/pubmed.ncbi.nlm.nih.gov\/7607457). [S2CID](https:\/\/en.wikipedia.org\/wiki\/S2CID_\\(identifier\\) \"S2CID (identifier)\") [30680826](https:\/\/api.semanticscholar.org\/CorpusID:30680826).\n74. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-74)** Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 1086–1091.\n75. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-75)** Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609.\n76. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-76)**\n    [\"Defense Resource Management Institute - Naval Postgraduate School\"](https:\/\/www.nps.edu\/web\/drmi\/). *www.nps.edu*.\n77. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-77)**\n    van der Waerden, B. L., \"Mathematical Statistics\", Springer, [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-3-540-04507-6](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-3-540-04507-6 \"Special:BookSources\/978-3-540-04507-6\")\n    \n    .\n78. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-78)** On normalizing the incomplete beta-function for fitting to dose-response curves M.E. Wise Biometrika vol 47, No. 1\/2, June 1960, pp. 173–175\n79. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-79)** Pratt, John W. “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, II.” Journal of the American Statistical Association, vol. 63, no. 324, 1968, pp. 1457–83. JSTOR, <https:\/\/doi.org\/10.2307\/2285896>. Accessed 21 Oct. 2025.\n80. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-80)**\n    [Yule, G. U.](https:\/\/en.wikipedia.org\/wiki\/Udny_Yule \"Udny Yule\"); Filon, L. N. G. (1936). [\"Karl Pearson. 1857–1936\"](https:\/\/en.wikipedia.org\/wiki\/Karl_Pearson \"Karl Pearson\"). *[Obituary Notices of Fellows of the Royal Society](https:\/\/en.wikipedia.org\/wiki\/Obituary_Notices_of_Fellows_of_the_Royal_Society \"Obituary Notices of Fellows of the Royal Society\")*. **2** (5): 72. [doi](https:\/\/en.wikipedia.org\/wiki\/Doi_\\(identifier\\) \"Doi (identifier)\"):[10\\.1098\/rsbm.1936.0007](https:\/\/doi.org\/10.1098%2Frsbm.1936.0007). [JSTOR](https:\/\/en.wikipedia.org\/wiki\/JSTOR_\\(identifier\\) \"JSTOR (identifier)\") [769130](https:\/\/www.jstor.org\/stable\/769130).\n81. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-rscat_81-0)**\n    [\"Library and Archive catalogue\"](https:\/\/web.archive.org\/web\/20111025030931\/http:\/\/www2.royalsociety.org\/DServe\/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29\\)). *Sackler Digital Archive*. Royal Society. Archived from [the original](http:\/\/www2.royalsociety.org\/DServe\/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29%29) on 2011-10-25. Retrieved 2011-07-01.\n82. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-David_History_82-0)**\n    David, H. A. and A.W.F. Edwards (2001). *Annotated Readings in the History of Statistics*. Springer; 1 edition. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0387988443](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0387988443 \"Special:BookSources\/978-0387988443\")\n    \n    .\n83. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-83)**\n    Gini, Corrado (1911). \"Considerazioni Sulle Probabilità Posteriori e Applicazioni al Rapporto dei Sessi Nelle Nascite Umane\". *Studi Economico-Giuridici della Università de Cagliari*. Anno III (reproduced in Metron 15, 133, 171, 1949): 5–41\\.\n84. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-84)**\n    Johnson, Norman L. and Samuel Kotz, ed. (1997). *Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present (Wiley Series in Probability and Statistics*. Wiley. [ISBN](https:\/\/en.wikipedia.org\/wiki\/ISBN_\\(identifier\\) \"ISBN (identifier)\")\n    \n    [978-0471163817](https:\/\/en.wikipedia.org\/wiki\/Special:BookSources\/978-0471163817 \"Special:BookSources\/978-0471163817\")\n    \n    .\n85. **[^](https:\/\/en.wikipedia.org\/wiki\/Beta_distribution#cite_ref-85)**\n    Metron journal. [\"Biography of Corrado Gini\"](https:\/\/web.archive.org\/web\/20120716202225\/http:\/\/www.metronjournal.it\/storia\/ginibio.htm). Metron Journal. Archived from [the original](http:\/\/www.metronjournal.it\/storia\/ginibio.htm) on 2012-07-16. Retrieved 2012-08-18.\n\n- [\"Beta Distribution\"](http:\/\/demonstrations.wolfram.com\/BetaDistribution\/) by Fiona Maclachlan, the [Wolfram Demonstrations Project](https:\/\/en.wikipedia.org\/wiki\/Wolfram_Demonstrations_Project \"Wolfram Demonstrations Project\"), 2007.\n- [Beta Distribution – Overview and Example](http:\/\/www.xycoon.com\/beta.htm), xycoon.com\n- [Beta Distribution](https:\/\/web.archive.org\/web\/20120829140915\/http:\/\/www.brighton-webs.co.uk\/distributions\/beta.htm), brighton-webs.co.uk\n- [Beta Distribution Video](http:\/\/www.exstrom.com\/blog\/snark\/posts\/dancingbeta.html), exstrom.com\n- [\"Beta-distribution\"](https:\/\/www.encyclopediaofmath.org\/index.php?title=Beta-distribution), *[Encyclopedia of Mathematics](https:\/\/en.wikipedia.org\/wiki\/Encyclopedia_of_Mathematics \"Encyclopedia of Mathematics\")*, [EMS Press](https:\/\/en.wikipedia.org\/wiki\/European_Mathematical_Society \"European Mathematical Society\"), 2001 \\[1994\\]\n- [Weisstein, Eric W.](https:\/\/en.wikipedia.org\/wiki\/Eric_W._Weisstein \"Eric W. Weisstein\") [\"Beta Distribution\"](https:\/\/mathworld.wolfram.com\/BetaDistribution.html). *[MathWorld](https:\/\/en.wikipedia.org\/wiki\/MathWorld \"MathWorld\")*.\n- [Harvard University Statistics 110 Lecture 23 Beta Distribution, Prof. Joe Blitzstein](https:\/\/www.youtube.com\/watch?v=UZjlBQbV1KU)","meta_canonical":null,"ml_categories_json":"{\"\/Science\":954,\"\/Science\/Mathematics\":904,\"\/Science\/Mathematics\/Statistics\":871}","ml_types_json":"{\"\/Article\":997,\"\/Article\/Wiki\":732}","ml_intent_types_json":"{\"Informational\":999}","meta_language":"en","attrs_author":null,"attrs_publish_time":0,"attrs_original_publish_time":1376008991,"attrs_is_republished":0,"attrs_nr_words":"48735","attrs_boilerpipe_nr_words":"25342","body_ext_links_number":180,"body_int_links_number":731,"meta_nofollow":0,"meta_noarchive":0,"props_was_rendered":0,"src_redirect":"","download_time_msec":94,"download_ttfb_msec":41,"download_size":206000}

3. Robots.txt Check

Query:

Response:

4. Spam/Ban Check

Query:

Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

📄

INDEXABLE

✅

CRAWLED

17 days ago

🤖

ROBOTS ALLOWED

Page Info Filters

Filter	Status	Condition	Details
HTTP status	PASS	`download_http_code = 200`	HTTP 200
Age cutoff	PASS	`download_stamp > now() - 6 MONTH`	0.6 months ago (distributed domain, exempt)
History drop	PASS	`isNull(history_drop_reason)`	No drop reason
Spam/ban	PASS	`fh_dont_index != 1 AND ml_spam_score = 0`	ml_spam_score=0
Canonical	PASS	`meta_canonical IS NULL OR = '' OR = src_unparsed`	Not set

Page Details

Property

Value

URL

https://en.wikipedia.org/wiki/Beta_distribution

Last Crawled

2026-04-06 13:05:55 (17 days ago)

First Indexed

2013-08-09 00:43:11 (12 years ago)

HTTP Status Code

200

Content

Meta Title

Beta distribution - Wikipedia

Meta Description

null

Meta Canonical

null

Boilerpipe Text

Beta Probability density function Cumulative distribution function Notation Beta( α , β ) Parameters α > 0 shape ( real ) β > 0 shape ( real ) Support or PDF where and is the Gamma function . CDF (the regularized incomplete beta function ) Mean (see section: Geometric mean ) where is the digamma function Median Mode for α , β > 1 Any value in the domain for α = β = 1 No mode if α <1 or β <1. Density diverges at 0 for α ≤ 1, and at 1 if β ≤ 1 Variance (see trigamma function and see section: Geometric variance ) Skewness Excess kurtosis Entropy MGF CF (see Confluent hypergeometric function ) Fisher information see section: Fisher information matrix Method of moments In probability theory and statistics , the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or (0, 1) in terms of two positive parameters , denoted by alpha ( α ) and beta ( β ), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution. The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions. In Bayesian inference , the beta distribution is the conjugate prior probability distribution for the Bernoulli , binomial , negative binomial , and geometric distributions. The formulation of the beta distribution discussed here is also known as the beta distribution of the first kind , whereas beta distribution of the second kind is an alternative name for the beta prime distribution . The generalization to multiple variables is called a Dirichlet distribution . Probability density function [ edit ] An animation of the beta distribution for different values of its parameters. The probability density function (PDF) of the beta distribution, for or , and shape parameters , , is a power function of the variable and of its reflection as follows: where is the gamma function . The beta function , , is a normalization constant to ensure that the total probability is 1. In the above equations is a realization —an observed value that actually occurred—of a random variable . Several authors, including N. L. Johnson and S. Kotz , [ 1 ] use the symbols and (instead of and ) for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the Bernoulli distribution , because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters and approach zero. In the following, a random variable beta-distributed with parameters and will be denoted by: [ 2 ] [ 3 ] Other notations for beta-distributed random variables used in the statistical literature are [ 4 ] and . [ 5 ] Cumulative distribution function [ edit ] CDF for symmetric beta distribution vs. x and α = β CDF for skewed beta distribution vs. x and β = 5 α The cumulative distribution function is where is the incomplete beta function and is the regularized incomplete beta function . For positive integers α and β , the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a binomial distribution with [ 6 ] Alternative parameterizations [ edit ] Mean and sample size [ edit ] The beta distribution may also be reparameterized in terms of its mean μ (0 < μ < 1) and the sum of the two shape parameters ν = α + β > 0 ( [ 3 ] p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = ν = α ·Posterior + β ·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = α ·Posterior + β Posterior − 2, or ν = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section Bayesian inference for further details.) ν = α + β is referred to as the "sample size" of a beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem. This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ θ ≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters α and β via [ 3 ] α = μν , β = (1 − μ ) ν Under this parametrization , one may place an uninformative prior probability over the mean, and a vague prior probability (such as an exponential or gamma distribution ) over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it. Mode and concentration [ edit ] Concave beta distributions, which have , can be parametrized in terms of mode and "concentration". The mode, , and concentration, , can be used to define the usual shape parameters as follows: [ 7 ] For the mode, , to be well-defined, we need , or equivalently . If instead we define the concentration as , the condition simplifies to and the beta density at and can be written as: where directly scales the sufficient statistics , and . Note also that in the limit, , the distribution becomes flat. Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters α and β , one can express the α and β parameters in terms of the mean ( μ ) and the variance (var): This parametrization of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters α and β . For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance: A beta distribution with the two shape parameters α and β is supported on the range [0,1] or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, a , and maximum c ( c > a ), values of the distribution, [ 1 ] by a linear transformation substituting the non-dimensional variable x in terms of the new variable y (with support [ a , c ] or ( a , c )) and the parameters a and c : The probability density function of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range ( c − a ), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows: That a random variable Y is beta-distributed with four parameters α , β , a , and c will be denoted by: Some measures of central location are scaled (by ( c − a )) and shifted (by a ), as follows: Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can. The shape parameters of Y can be written in term of its mean and variance as The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range ( c − a ), linearly for the mean deviation and nonlinearly for the variance: Since the skewness and excess kurtosis are non-dimensional quantities (as moments centered on the mean and normalized by the standard deviation ), they are independent of the parameters a and c , and therefore equal to the expressions given above in terms of X (with support [0,1] or (0,1)): Measures of central tendency [ edit ] The mode of a beta distributed random variable X with α , β > 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression: [ 1 ] When both parameters are less than one ( α , β < 1), this is the anti-mode: the lowest point of the probability density curve. [ 8 ] Letting α = β , the expression for the mode simplifies to 1/2, showing that for α = β > 1 the mode (resp. anti-mode when α , β < 1 ), is at the center of the distribution: it is symmetric in those cases. See Shapes section in this article for a full list of mode cases, for arbitrary values of α and β . For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of α = 2, β = 1 (or α = 1, β = 2), the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, where the value of the density function approaches infinity. For example, in the case α = β = 1/2, the beta distribution simplifies to become the arcsine distribution . There is debate among mathematicians about some of these cases and whether the ends ( x = 0, and x = 1) can be called modes or not. [ 9 ] [ 2 ] Mode for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5 Whether the ends are part of the domain of the density function Whether a singularity can ever be called a mode Whether cases with two maxima should be called bimodal Median for beta distribution for 0 ≤ α ≤ 5 and 0 ≤ β ≤ 5 (Mean–median) for beta distribution versus alpha and beta from 0 to 2 The median of the beta distribution is the unique real number for which the regularized incomplete beta function . There is no general closed-form expression for the median of the beta distribution for arbitrary values of α and β . Closed-form expressions for particular values of the parameters α and β follow: [ citation needed ] The following are the limits with one parameter finite (non-zero) and the other approaching these limits: [ citation needed ] A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula [ 10 ] When α , β ≥ 1, the relative error (the absolute error divided by the median) in this approximation is less than 4% and for both α ≥ 2 and β ≥ 2 it is less than 1%. The absolute error divided by the difference between the mean and the mode is similarly small: Mean for beta distribution for 0 ≤ α ≤ 5 and 0 ≤ β ≤ 5 The expected value (mean) ( μ ) of a beta distribution random variable X with two parameters α and β is a function of only the ratio β / α of these parameters: [ 1 ] Letting α = β in the above expression one obtains μ = 1/2 , showing that for α = β the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression: Therefore, for β / α → 0, or for α / β → ∞, the mean is located at the right end, x = 1 . For these limit ratios, the beta distribution becomes a one-point degenerate distribution with a Dirac delta function spike at the right end, x = 1 , with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, x = 1 . Similarly, for β / α → ∞, or for α / β → 0, the mean is located at the left end, x = 0 . The beta distribution becomes a 1-point Degenerate distribution with a Dirac delta function spike at the left end, x = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, x = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: While for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta( α , β ) such that α , β > 2 ) it is known that the sample mean (as an estimate of location) is not as robust as the sample median, the opposite is the case for uniform or "U-shaped" bimodal distributions (with Beta( α , β ) such that α , β ≤ 1 ), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ( [ 11 ] p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta( α , β ) such that α , β ≤ 1 ) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for random walks , since the probability for the time of the last visit to the origin in a random walk is distributed as the arcsine distribution Beta(1/2, 1/2): [ 5 ] [ 12 ] the mean of a number of realizations of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case). (Mean − GeometricMean) for beta distribution versus α and β from 0 to 2, showing the asymmetry between α and β for the geometric mean Geometric means for beta distribution Purple = G ( x ), Yellow = G (1 − x ), smaller values α and β in front Geometric means for beta distribution. purple = G ( x ), yellow = G (1 − x ), larger values α and β in front The logarithm of the geometric mean G X of a distribution with random variable X is the arithmetic mean of ln( X ), or, equivalently, its expected value: For a beta distribution, the expected value integral gives: where ψ is the digamma function . Therefore, the geometric mean of a beta distribution with shape parameters α and β is the exponential of the digamma functions of α and β as follows: While for a beta distribution with equal shape parameters α = β , it follows that skewness = 0 and mode = mean = median = 1/2, the geometric mean is less than 1/2: 0 < G X < 1/2 . The reason for this is that the logarithmic transformation strongly weights the values of X close to zero, as ln( X ) strongly tends towards negative infinity as X approaches zero, while ln( X ) flattens towards zero as X → 1 . Along a line α = β , the following limits apply: Following are the limits with one parameter finite (non-zero) and the other approaching these limits: The accompanying plot shows the difference between the mean and the geometric mean for shape parameters α and β from zero to 2. Besides the fact that the difference between them approaches zero as α and β approach infinity and that the difference becomes large for values of α and β approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters α and β . The difference between the geometric mean and the mean is larger for small values of α in relation to β than when exchanging the magnitudes of β and α . N. L.Johnson and S. Kotz [ 1 ] suggest the logarithmic approximation to the digamma function ψ ( α ) ≈ ln( α − 1/2) which results in the following approximation to the geometric mean: Numerical values for the relative error in this approximation follow: [ ( α = β = 1): 9.39% ]; [ ( α = β = 2): 1.29% ]; [ ( α = 2, β = 3): 1.51% ]; [ ( α = 3, β = 2): 0.44% ]; [ ( α = β = 3): 0.51% ]; [ ( α = β = 4): 0.26% ]; [ ( α = 3, β = 4): 0.55% ]; [ ( α = 4, β = 3): 0.24% ]. Similarly, one can calculate the value of shape parameters required for the geometric mean to equal 1/2. Given the value of the parameter β , what would be the value of the other parameter, α , required for the geometric mean to equal 1/2?. The answer is that (for β > 1 ), the value of α required tends towards β + 1/2 as β → ∞ . For example, all these couples have the same geometric mean of 1/2: [ β = 1, α = 1.4427 ], [ β = 2, α = 2.46958 ], [ β = 3, α = 3.47943 ], [ β = 4, α = 4.48449 ], [ β = 5, α = 5.48756 ], [ β = 10, α = 10.4938 ], [ β = 100, α = 100.499 ]. The fundamental property of the geometric mean, which can be proven to be false for any other mean, is This makes the geometric mean the only correct mean when averaging normalized results, that is results that are presented as ratios to reference values. [ 13 ] This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section "Parameter estimation, maximum likelihood." Actually, when performing maximum likelihood estimation, besides the geometric mean G X based on the random variable X, also another geometric mean appears naturally: the geometric mean based on the linear transformation –– (1 − X ) , the mirror-image of X , denoted by G (1− X ) : Along a line α = β , the following limits apply: Following are the limits with one parameter finite (non-zero) and the other approaching these limits: It has the following approximate value: Although both G X and G 1− X are asymmetric, in the case that both shape parameters are equal α = β , the geometric means are equal: G X = G (1− X ) . This equality follows from the following symmetry displayed between both geometric means: Harmonic mean for beta distribution for 0 < α < 5 and 0 < β < 5 Harmonic mean for beta distribution versus α and β from 0 to 2 Harmonic means for beta distribution Purple = H ( X ), Yellow = H (1 − X ), smaller values α and β in front Harmonic means for beta distribution: purple = H ( X ), yellow = H (1 − X ), larger values α and β in front The inverse of the harmonic mean ( H X ) of a distribution with random variable X is the arithmetic mean of 1/ X , or, equivalently, its expected value. Therefore, the harmonic mean ( H X ) of a beta distribution with shape parameters α and β is: The harmonic mean ( H X ) of a beta distribution with α < 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter α less than unity. Letting α = β in the above expression one obtains showing that for α = β the harmonic mean ranges from 0, for α = β = 1, to 1/2, for α = β → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: The harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean H X based on the random variable X , also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − X ), the mirror-image of X , denoted by H 1 − X : The harmonic mean ( H (1 − X ) ) of a beta distribution with β < 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter β less than unity. Letting α = β in the above expression one obtains showing that for α = β the harmonic mean ranges from 0, for α = β = 1, to 1/2, for α = β → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: Although both H X and H 1− X are asymmetric, in the case that both shape parameters are equal α = β , the harmonic means are equal: H X = H 1− X . This equality follows from the following symmetry displayed between both harmonic means: Measures of statistical dispersion [ edit ] The variance (the second moment centered on the mean) of a beta distribution random variable X with parameters α and β is: [ 1 ] [ 14 ] Letting α = β in the above expression one obtains showing that for α = β the variance decreases monotonically as α = β increases. Setting α = β = 0 in this expression, one finds the maximum variance var( X ) = 1/4 [ 1 ] which only occurs approaching the limit, at α = β = 0 . The beta distribution may also be parametrized in terms of its mean μ (0 < μ < 1) and sample size ν = α + β ( ν > 0 ) (see subsection Mean and sample size ): Using this parametrization , one can express the variance in terms of the mean μ and the sample size ν as follows: Since ν = α + β > 0 , it follows that var( X ) < μ (1 − μ ) . For a symmetric distribution, the mean is at the middle of the distribution, μ = 1/2 , and therefore: Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: Geometric variance and covariance [ edit ] log geometric variances vs. α and β log geometric variances vs. α and β The logarithm of the geometric variance, ln(var GX ), of a distribution with random variable X is the second moment of the logarithm of X centered on the geometric mean of X , ln( G X ): and therefore, the geometric variance is: In the Fisher information matrix, and the curvature of the log likelihood function , the logarithm of the geometric variance of the reflected variable 1 − X and the logarithm of the geometric covariance between X and 1 − X appear: For a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section § Moments of logarithmically transformed random variables . The variance of the logarithmic variables and covariance of ln X and ln(1− X ) are: where the trigamma function , denoted ψ 1 ( α ), is the second of the polygamma functions , and is defined as the derivative of the digamma function : Therefore, The accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters α and β . The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters α and β greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values α and β less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for α and β less than unity. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: Limits with two parameters varying: Although both ln(var GX ) and ln(var G (1 − X ) ) are asymmetric, when the shape parameters are equal, α = β , one has: ln(var GX ) = ln(var G (1− X ) ). This equality follows from the following symmetry displayed between both log geometric variances: The log geometric covariance is symmetric: Mean absolute deviation around the mean [ edit ] Ratio of, ean abs.dev. to std.dev. for beta distribution with α and β ranging from 0 to 5 Ratio of mean abs.dev. to std.dev. for beta distribution with mean 0 ≤ μ ≤ 1 and sample size 0 < ν ≤ 10 The mean absolute deviation around the mean for the beta distribution with shape parameters α and β is: [ 9 ] The mean absolute deviation around the mean is a more robust estimator of statistical dispersion than the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta( α , β ) distributions with α , β > 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean. Therefore, the effect of very large deviations from the mean are not as overly weighted. Using Stirling's approximation to the Gamma function , N.L.Johnson and S.Kotz [ 1 ] derived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for α = β = 1, and it decreases to zero as α → ∞, β → ∞): At the limit α → ∞, β → ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution: . For α = β = 1 this ratio equals , so that from α = β = 1 to α , β → ∞ the ratio decreases by 8.5%. For α = β = 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from α = β = 0 to α = β = 1, and by 25% from α = β = 0 to α , β → ∞ . However, for skewed beta distributions such that α → 0 or β → 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation. Using the parametrization in terms of mean μ and sample size ν = α + β > 0: α = μν , β = (1 − μ ) ν one can express the mean absolute deviation around the mean in terms of the mean μ and the sample size ν as follows: For a symmetric distribution, the mean is at the middle of the distribution, μ = 1/2, and therefore: Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: Mean absolute difference [ edit ] The mean absolute difference for the beta distribution is: The Gini coefficient for the beta distribution is half of the relative mean absolute difference: Skewness for beta distribution as a function of variance and mean The skewness (the third moment centered on the mean, normalized by the 3/2 power of the variance) of the beta distribution is [ 1 ] Letting α = β in the above expression one obtains γ 1 = 0, showing once again that for α = β the distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for α < β , negative skew (left-tailed) for α > β . Using the parametrization in terms of mean μ and sample size ν = α + β : one can express the skewness in terms of the mean μ and the sample size ν as follows: The skewness can also be expressed just in terms of the variance var and the mean μ as follows: The accompanying plot of skewness as a function of variance and mean shows that maximum variance (1/4) is coupled with zero skewness and the symmetry condition ( μ = 1/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the "mass" of the probability distribution is concentrated at the ends (minimum variance). The following expression for the square of the skewness, in terms of the sample size ν = α + β and the variance var, is useful for the method of moments estimation of four parameters: This expression correctly gives a skewness of zero for α = β , since in that case (see § Variance ): . For the symmetric case ( α = β ), skewness = 0 over the whole range, and the following limits apply: For the asymmetric cases ( α ≠ β ) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: Excess Kurtosis for Beta Distribution as a function of variance and mean The beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear. [ 15 ] Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc. [ 16 ] Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping [ 17 ] use the symbol γ 2 for the excess kurtosis , but Abramowitz and Stegun [ 18 ] use different terminology. To prevent confusion [ 19 ] between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows: [ 9 ] [ 20 ] Letting α = β in the above expression one obtains Therefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as { α = β } → 0, and approaching a maximum value of zero as { α = β } → ∞. The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve. This minimum value is reached when all the probability density is entirely concentrated at each end x = 0 and x = 1, with nothing in between: a 2-point Bernoulli distribution with equal probability 1/2 at each end (a coin toss: see section below "Kurtosis bounded by the square of the skewness" for further discussion). The description of kurtosis as a measure of the "potential outliers" (or "potential rare, extreme values") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For α ≠ β , skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for α → 0 for finite β , or for β → 0 for finite α ) because the side away from the mode will produce occasional extreme values. Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends. Using the parametrization in terms of mean μ and sample size ν = α + β : one can express the excess kurtosis in terms of the mean μ and the sample size ν as follows: The excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size ν as follows: and, in terms of the variance var and the mean μ as follows: The plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1/4) and the symmetry condition: the mean occurring at the midpoint ( μ = 1/2). This occurs for the symmetric case of α = β = 0, with zero skewness. At the limit, this is the 2 point Bernoulli distribution with equal probability 1/2 at each Dirac delta function end x = 0 and x = 1 and zero probability everywhere else. (A coin toss: one face of the coin being x = 0 and the other face being x = 1.) Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end. Excess kurtosis is minimum: the probability density "mass" is zero at the mean and it is concentrated at the two peaks at each end. Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-"peaky" with nothing in between them. On the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end ( μ = 0 or μ = 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end. Alternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows: From this last expression, one can obtain the same limits published over a century ago by Karl Pearson [ 21 ] for the beta distribution (see section below titled "Kurtosis bounded by the square of the skewness"). Setting α + β = ν = 0 in the above expression, one obtains Pearson's lower boundary (values for the skewness and excess kurtosis below the boundary (excess kurtosis + 2 − skewness 2 = 0) cannot occur for any distribution, and hence Karl Pearson appropriately called the region below this boundary the "impossible region"). The limit of α + β = ν → ∞ determines Pearson's upper boundary. therefore: Values of ν = α + β such that ν ranges from zero to infinity, 0 < ν < ∞, span the whole region of the beta distribution in the plane of excess kurtosis versus squared skewness. For the symmetric case ( α = β ), the following limits apply: For the unsymmetric cases ( α ≠ β ) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: Characteristic function [ edit ] Re(characteristic function) symmetric case α = β ranging from 25 to 0 Re(characteristic function) symmetric case α = β ranging from 0 to 25 Re(characteristic function) β = α + 1/2; α ranging from 25 to 0 Re(characteristic function) α = β + 1/2; β ranging from 25 to 0 Re(characteristic function) α = β + 1/2; β ranging from 0 to 25 The characteristic function is the Fourier transform of the probability density function. The characteristic function of the beta distribution is Kummer's confluent hypergeometric function (of the first kind): [ 1 ] [ 18 ] [ 22 ] where is the rising factorial . The value of the characteristic function for t = 0, is one: Also, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable t : The symmetric case α = β simplifies the characteristic function of the beta distribution to a Bessel function , since in the special case α + β = 2 α the confluent hypergeometric function (of the first kind) reduces to a Bessel function (the modified Bessel function of the first kind ) using Kummer's second transformation as follows: In the accompanying plots, the real part (Re) of the characteristic function of the beta distribution is displayed for symmetric ( α = β ) and skewed ( α ≠ β ) cases. Moment generating function [ edit ] It also follows [ 1 ] [ 9 ] that the moment generating function is In particular M X ( α ; β ; 0) = 1. Using the moment generating function , the k -th raw moment is given by [ 1 ] the factor multiplying the (exponential series) term in the series of the moment generating function where ( x ) ( k ) is a Pochhammer symbol representing rising factorial. It can also be written in a recursive form as Since the moment generating function has a positive radius of convergence, [ citation needed ] the beta distribution is determined by its moments . [ 23 ] Moments of transformed random variables [ edit ] Moments of linearly transformed, product and inverted random variables [ edit ] One can also show the following expectations for a transformed random variable, [ 1 ] where the random variable X is Beta-distributed with parameters α and β : X ~ Beta( α , β ). The expected value of the variable 1 − X is the mirror-symmetry of the expected value based on X : Due to the mirror-symmetry of the probability density function of the beta distribution, the variances based on variables X and 1 − X are identical, and the covariance on X (1 − X ) is the negative of the variance: These are the expected values for inverted variables, (these are related to the harmonic means, see § Harmonic mean ): The following transformation by dividing the variable X by its mirror-image X /(1 − X )) results in the expected value of the "inverted beta distribution" or beta prime distribution (also known as beta distribution of the second kind or Pearson's Type VI ): [ 1 ] Variances of these transformed variables can be obtained by integration, as the expected values of the second moments centered on the corresponding variables: The following variance of the variable X divided by its mirror-image ( X /(1− X ) results in the variance of the "inverted beta distribution" or beta prime distribution (also known as beta distribution of the second kind or Pearson's Type VI ): [ 1 ] The covariances are: These expectations and variances appear in the four-parameter Fisher information matrix ( § Fisher information .) Moments of logarithmically transformed random variables [ edit ] Plot of logit( X ) = ln( X /(1 − X )) (vertical axis) vs. X in the domain of 0 to 1 (horizontal axis). Logit transformations are interesting, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable Expected values for logarithmic transformations (useful for maximum likelihood estimates, see § Parameter estimation, Maximum likelihood ) are discussed in this section. The following logarithmic linear transformations are related to the geometric means G X and G 1− X (see § Geometric Mean ): Where the digamma function ψ ( α ) is defined as the logarithmic derivative of the gamma function : [ 18 ] Logit transformations are interesting, [ 24 ] as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable: Johnson [ 25 ] considered the distribution of the logit – transformed variable ln( X /1 − X ), including its moment generating function and approximations for large values of the shape parameters. This transformation extends the finite support [0, 1] based on the original variable X to infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the logistic-beta distribution . Higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows: therefore the variance of the logarithmic variables and covariance of ln( X ) and ln(1− X ) are: where the trigamma function , denoted ψ 1 ( α ), is the second of the polygamma functions , and is defined as the derivative of the digamma function: The variances and covariance of the logarithmically transformed variables X and (1 − X ) are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables X and (1 − X ), as the logarithm approaches negative infinity for the variable approaching zero. These logarithmic variances and covariance are the elements of the Fisher information matrix for the beta distribution. They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation). The variances of the log inverse variables are identical to the variances of the log variables: It also follows that the variances of the logit -transformed variables are Quantities of information (entropy) [ edit ] Given a beta distributed random variable, X ~ Beta( α , β ), the differential entropy of X is (measured in nats ), [ 26 ] the expected value of the negative of the logarithm of the probability density function : where f ( x ; α , β ) is the probability density function of the beta distribution: The digamma function ψ appears in the formula for the differential entropy as a consequence of Euler's integral formula for the harmonic numbers which follows from the integral: The differential entropy of the beta distribution is negative for all values of α and β greater than zero, except at α = β = 1 (for which values the beta distribution is the same as the uniform distribution ), where the differential entropy reaches its maximum value of zero. It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable. For α or β approaching zero, the differential entropy approaches its minimum value of negative infinity. For (either or both) α or β approaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both) α or β approaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order. If either α or β approaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else. If both shape parameters are equal (the symmetric case), α = β , and they approach infinity simultaneously, the probability density becomes a spike ( Dirac delta function ) concentrated at the middle x = 1/2, and hence there is 100% probability at the middle x = 1/2 and zero probability everywhere else. The (continuous case) differential entropy was introduced by Shannon in his original paper (where he named it the "entropy of a continuous distribution"), as the concluding part of the same paper where he defined the discrete entropy . [ 27 ] It is known since then that the differential entropy may differ from the infinitesimal limit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy. Given two beta distributed random variables, X 1 ~ Beta( α , β ) and X 2 ~ Beta( α ′ , β ′ ), the cross-entropy is (measured in nats) [ 28 ] The cross entropy has been used as an error metric to measure the distance between two hypotheses. [ 29 ] [ 30 ] Its absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood [ 28 ] (see section on "Parameter estimation. Maximum likelihood estimation")). The relative entropy, or Kullback–Leibler divergence D KL ( X 1 || X 2 ), is a measure of the inefficiency of assuming that the distribution is X 2 ~ Beta( α ′ , β ′ ) when the distribution is really X 1 ~ Beta( α , β ). It is defined as follows (measured in nats). The relative entropy, or Kullback–Leibler divergence , is always non-negative. A few numerical examples follow: X 1 ~ Beta(1, 1) and X 2 ~ Beta(3, 3); D KL ( X 1 || X 2 ) = 0.598803; D KL ( X 2 || X 1 ) = 0.267864; h ( X 1 ) = 0; h ( X 2 ) = −0.267864 X 1 ~ Beta(3, 0.5) and X 2 ~ Beta(0.5, 3); D KL ( X 1 || X 2 ) = 7.21574; D KL ( X 2 || X 1 ) = 7.21574; h ( X 1 ) = −1.10805; h ( X 2 ) = −1.10805. The Kullback–Leibler divergence is not symmetric D KL ( X 1 || X 2 ) ≠ D KL ( X 2 || X 1 ) for the case in which the individual beta distributions Beta(1, 1) and Beta(3, 3) are symmetric, but have different entropies h ( X 1 ) ≠ h ( X 2 ). The value of the Kullback divergence depends on the direction traveled: whether going from a higher (differential) entropy to a lower (differential) entropy or the other way around. In the numerical example above, the Kullback divergence measures the inefficiency of assuming that the distribution is (bell-shaped) Beta(3, 3), rather than (uniform) Beta(1, 1). The "h" entropy of Beta(1, 1) is higher than the "h" entropy of Beta(3, 3) because the uniform distribution Beta(1, 1) has a maximum amount of disorder. The Kullback divergence is more than two times higher (0.598803 instead of 0.267864) when measured in the direction of decreasing entropy: the direction that assumes that the (uniform) Beta(1, 1) distribution is (bell-shaped) Beta(3, 3) rather than the other way around. In this restricted sense, the Kullback divergence is consistent with the second law of thermodynamics . The Kullback–Leibler divergence is symmetric D KL ( X 1 || X 2 ) = D KL ( X 2 || X 1 ) for the skewed cases Beta(3, 0.5) and Beta(0.5, 3) that have equal differential entropy h ( X 1 ) = h ( X 2 ). The symmetry condition: follows from the above definitions and the mirror-symmetry f ( x ; α , β ) = f (1 − x ; α , β ) enjoyed by the beta distribution. Relationships between statistical measures [ edit ] Mean, mode and median relationship [ edit ] If 1 < α < β then mode ≤ median ≤ mean. [ 10 ] Expressing the mode (only for α , β > 1), and the mean in terms of α and β : If 1 < β < α then the order of the inequalities are reversed. For α , β > 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of x . On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of x , for the ( pathological ) case of α = 1 and β = 1, for which values the beta distribution approaches the uniform distribution and the differential entropy approaches its maximum value, and hence maximum "disorder". For example, for α = 1.0001 and β = 1.00000001: mode = 0.9999; PDF(mode) = 1.00010 mean = 0.500025; PDF(mean) = 1.00003 median = 0.500035; PDF(median) = 1.00003 mean − mode = −0.499875 mean − median = −9.65538 × 10 −6 where PDF stands for the value of the probability density function . Mean, geometric mean and harmonic mean relationship [ edit ] :Mean, median, geometric mean and harmonic mean for beta distribution with 0 < α = β < 5 It is known from the inequality of arithmetic and geometric means that the geometric mean is lower than the mean. Similarly, the harmonic mean is lower than the geometric mean. The accompanying plot shows that for α = β , both the mean and the median are exactly equal to 1/2, regardless of the value of α = β , and the mode is also equal to 1/2 for α = β > 1, however the geometric and harmonic means are lower than 1/2 and they only approach this value asymptotically as α = β → ∞. Kurtosis bounded by the square of the skewness [ edit ] Beta distribution α and β parameters vs. excess kurtosis and squared skewness As remarked by Feller , [ 5 ] in the Pearson system the beta probability density appears as type I (any difference between the beta distribution and Pearson's type I distribution is only superficial and it makes no difference for the following discussion regarding the relationship between kurtosis and skewness). Karl Pearson showed, in Plate 1 of his paper [ 21 ] published in 1916, a graph with the kurtosis as the vertical axis ( ordinate ) and the square of the skewness as the horizontal axis ( abscissa ), in which a number of distributions were displayed. [ 31 ] The region occupied by the beta distribution is bounded by the following two lines in the (skewness 2 ,kurtosis) plane , or the (skewness 2 ,excess kurtosis) plane : or, equivalently, At a time when there were no powerful digital computers, Karl Pearson accurately computed further boundaries, [ 32 ] [ 21 ] for example, separating the "U-shaped" from the "J-shaped" distributions. The lower boundary line (excess kurtosis + 2 − skewness 2 = 0) is produced by skewed "U-shaped" beta distributions with both values of shape parameters α and β close to zero. The upper boundary line (excess kurtosis − (3/2) skewness 2 = 0) is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. Karl Pearson showed [ 21 ] that this upper boundary line (excess kurtosis − (3/2) skewness 2 = 0) is also the intersection with Pearson's distribution III, which has unlimited support in one direction (towards positive infinity), and can be bell-shaped or J-shaped. His son, Egon Pearson , showed [ 31 ] that the region (in the kurtosis/squared-skewness plane) occupied by the beta distribution (equivalently, Pearson's distribution I) as it approaches this boundary (excess kurtosis − (3/2) skewness 2 = 0) is shared with the noncentral chi-squared distribution . Karl Pearson [ 33 ] (Pearson 1895, pp. 357, 360, 373–376) also showed that the gamma distribution is a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6/ k and the square of the skewness is 4/ k , hence (excess kurtosis − (3/2) skewness 2 = 0) is identically satisfied by the gamma distribution regardless of the value of the parameter "k"). Pearson later noted that the chi-squared distribution is a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the chi-squared distribution the excess kurtosis is 12/ k and the square of the skewness is 8/ k , hence (excess kurtosis − (3/2) skewness 2 = 0) is identically satisfied regardless of the value of the parameter "k"). This is to be expected, since the chi-squared distribution X ~ χ 2 ( k ) is a special case of the gamma distribution, with parametrization X ~ Γ(k/2, 1/2) where k is a positive integer that specifies the "number of degrees of freedom" of the chi-squared distribution. An example of a beta distribution near the upper boundary (excess kurtosis − (3/2) skewness 2 = 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)/(skewness 2 ) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness 2 = 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)/(skewness 2 ) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both α and β approaching zero symmetrically, the excess kurtosis reaches its minimum value at −2. This minimum value occurs at the point at which the lower boundary line intersects the vertical axis ( ordinate ). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards). Values for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness 2 = 0) cannot occur for any distribution, and hence Karl Pearson appropriately called the region below this boundary the "impossible region". The boundary for this "impossible region" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters α and β approach zero and hence all the probability density is concentrated at the ends: x = 0, 1 with practically nothing in between them. Since for α ≈ β ≈ 0 the probability density is concentrated at the two ends x = 0 and x = 1, this "impossible boundary" is determined by a Bernoulli distribution , where the two only possible outcomes occur with respective probabilities p and q = 1 − p . For cases approaching this limit boundary with symmetry α = β , skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are p ≈ q ≈ 1/2. For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness 2 , and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities at the left end x = 0 and at the right end x = 1. All statements are conditional on α , β > 0: Geometry of the probability density function [ edit ] Inflection point location versus α and β showing regions with one inflection point Inflection point location versus α and β showing region with two inflection points For certain values of the shape parameters α and β, the probability density function has inflection points , at which the curvature changes sign. The position of these inflection points can be useful as a measure of the dispersion or spread of the distribution. Defining the following quantity: Points of inflection occur, [ 1 ] [ 8 ] [ 9 ] [ 20 ] depending on the value of the shape parameters α and β , as follows: ( α > 2, β > 2) The distribution is bell-shaped (symmetric for α = β and skewed otherwise), with two inflection points , equidistant from the mode: ( α = 2, β > 2) The distribution is unimodal, positively skewed, right-tailed, with one inflection point , located to the right of the mode: ( α > 2, β = 2) The distribution is unimodal, negatively skewed, left-tailed, with one inflection point , located to the left of the mode: (1 < α < 2, β > 2, α + β > 2) The distribution is unimodal, positively skewed, right-tailed, with one inflection point , located to the right of the mode: (0 < α < 1, 1 < β < 2) The distribution has a mode at the left end x = 0 and it is positively skewed, right-tailed. There is one inflection point , located to the right of the mode: ( α > 2, 1 < β < 2) The distribution is unimodal negatively skewed, left-tailed, with one inflection point , located to the left of the mode: (1 < α < 2, 0 < β < 1) The distribution has a mode at the right end x = 1 and it is negatively skewed, left-tailed. There is one inflection point , located to the left of the mode: There are no inflection points in the remaining (symmetric and skewed) regions: U-shaped: ( α , β < 1) upside-down-U-shaped: (1 < α < 2, 1 < β < 2), reverse-J-shaped ( α < 1, β > 2) or J-shaped: ( α > 2, β < 1) The accompanying plots show the inflection point locations (shown vertically, ranging from 0 to 1) versus α and β (the horizontal axes ranging from 0 to 5). There are large cuts at surfaces intersecting the lines α = 1, β = 1, α = 2, and β = 2 because at these values the beta distribution change from 2 modes, to 1 mode to no mode. PDF for symmetric beta distribution vs. x and α = β from 0 to 30 PDF for symmetric beta distribution vs. x and α = β from 0 to 2 PDF for skewed beta distribution vs. x and β = 2.5 α from 0 to 9 PDF for skewed beta distribution vs. x and β = 5.5 α from 0 to 9 PDF for skewed beta distribution vs. x and β = 8 α from 0 to 10 The beta density function can take a wide variety of different shapes depending on the values of the two parameters α and β . The ability of the beta distribution to take this great diversity of shapes (using only two parameters) is partly responsible for finding wide application for modeling actual measurements: the density function is symmetric about 1/2 (blue & teal plots). median = mean = 1/2. skewness = 0. variance = 1/(4(2 α + 1)) α = β < 1 U-shaped (blue plot). bimodal: left mode = 0, right mode =1, anti-mode = 1/2 1/12 < var( X ) < 1/4 [ 1 ] −2 < excess kurtosis( X ) < −6/5 α = β = 1/2 is the arcsine distribution var( X ) = 1/8 excess kurtosis( X ) = −3/2 CF = Rinc (t) [ 34 ] α = β → 0 is a 2-point Bernoulli distribution with equal probability 1/2 at each Dirac delta function end x = 0 and x = 1 and zero probability everywhere else. A coin toss: one face of the coin being x = 0 and the other face being x = 1. α = β = 1 the uniform [0, 1] distribution no mode var( X ) = 1/12 excess kurtosis( X ) = −6/5 The (negative anywhere else) differential entropy reaches its maximum value of zero CF = Sinc (t) α = β > 1 symmetric unimodal mode = 1/2. 0 < var( X ) < 1/12 [ 1 ] −6/5 < excess kurtosis( X ) < 0 α = β = 3/2 is a semi-elliptic [0, 1] distribution, see: Wigner semicircle distribution [ 35 ] var( X ) = 1/16. excess kurtosis( X ) = −1 CF = 2 Jinc (t) α = β = 2 is the parabolic [0, 1] distribution var( X ) = 1/20 excess kurtosis( X ) = −6/7 CF = 3 Tinc (t) [ 36 ] α = β > 2 is bell-shaped, with inflection points located to either side of the mode 0 < var( X ) < 1/20 −6/7 < excess kurtosis( X ) < 0 α = β → ∞ is a 1-point Degenerate distribution with a Dirac delta function spike at the midpoint x = 1/2 with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the single point x = 1/2. The density function is skewed . An interchange of parameter values yields the mirror image (the reverse) of the initial curve, some more specific cases: α < 1, β < 1 U-shaped Positive skew for α < β , negative skew for α > β . bimodal: left mode = 0, right mode = 1, anti-mode = 0 < median < 1. 0 < var( X ) < 1/4 α > 1, β > 1 unimodal (magenta & cyan plots), Positive skew for α < β , negative skew for α > β . 0 < median < 1 0 < var( X ) < 1/12 α < 1, β ≥ 1 α ≥ 1, β < 1 α = 1, β > 1 α > 1, β = 1 If X ~ Beta( α , β ) then 1 − X ~ Beta( β , α ) mirror-image symmetry If X ~ Beta( α , β ) then . The beta prime distribution , also called "beta distribution of the second kind". If , then has a generalized logistic distribution , with density , where is the logistic sigmoid . If X ~ Beta( α , β ) then . If and then has density for and for , where is the Hypergeometric function . [ 37 ] If X ~ Beta( n /2, m /2) then (assuming n > 0 and m > 0), the Fisher–Snedecor F distribution . If then min + X (max − min) ~ PERT(min, max, m , λ ) where PERT denotes a PERT distribution used in PERT analysis, and m =most likely value. [ 38 ] Traditionally [ 39 ] λ = 4 in PERT analysis. If X ~ Beta(1, β ) then X ~ Kumaraswamy distribution with parameters (1, β ) If X ~ Beta( α , 1) then X ~ Kumaraswamy distribution with parameters ( α , 1) If X ~ Beta( α , 1) then −ln( X ) ~ Exponential( α ) Special and limiting cases [ edit ] Example of eight realizations of a random walk in one dimension starting at 0: the probability for the time of the last visit to the origin is distributed as Beta(1/2, 1/2) Beta(1/2, 1/2): The arcsine distribution probability density was proposed by Harold Jeffreys to represent uncertainty for a Bernoulli or a binomial distribution in Bayesian inference , and is now commonly referred to as Jeffreys prior : p −1/2 (1 − p ) −1/2 . This distribution also appears in several random walk fundamental theorems Beta(1, 1) ~ U(0, 1) with density 1 on that interval. Beta(n, 1) ~ Maximum of n independent rvs. with U(0, 1) , sometimes called a a standard power function distribution with density n x n –1 on that interval. Beta(1, n) ~ Minimum of n independent rvs. with U(0, 1) with density n (1 − x ) n −1 on that interval. If X ~ Beta(3/2, 3/2) and r > 0 then 2 rX − r ~ Wigner semicircle distribution . Beta(1/2, 1/2) is equivalent to the arcsine distribution . This distribution is also Jeffreys prior probability for the Bernoulli and binomial distributions . the exponential distribution . the gamma distribution . For large , the normal distribution . More precisely, if then converges in distribution to a normal distribution with mean 0 and variance as n increases. Derived from other distributions [ edit ] Combination with other distributions [ edit ] X ~ Beta( α , β ) and Y ~ F(2 β ,2 α ) then for all x > 0. Compounding with other distributions [ edit ] If p ~ Beta(α, β) and X ~ Bin( k , p ) then X ~ beta-binomial distribution If p ~ Beta(α, β) and X ~ NB( r , p ) then X ~ beta negative binomial distribution Statistical inference [ edit ] Parameter estimation [ edit ] Two unknown parameters [ edit ] Two unknown parameters ( of a beta distribution supported in the [0,1] interval) can be estimated, using the method of moments, with the first two moments (sample mean and sample variance) as follows. Let: be the sample mean estimate and be the sample variance estimate. The method-of-moments estimates of the parameters are When the distribution is required over a known interval other than [0, 1] with random variable X , say [ a , c ] with random variable Y , then replace with and with in the above couple of equations for the shape parameters (see the "Four unknown parameters" section below), [ 41 ] where: Four unknown parameters [ edit ] Solutions for parameter estimates vs. (sample) excess Kurtosis and (sample) squared Skewness Beta distribution All four parameters ( of a beta distribution supported in the [ a , c ] interval, see section "Alternative parametrizations, Four parameters" ) can be estimated, using the method of moments developed by Karl Pearson , by equating sample and population values of the first four central moments (mean, variance, skewness and excess kurtosis). [ 1 ] [ 42 ] [ 43 ] The excess kurtosis was expressed in terms of the square of the skewness, and the sample size ν = α + β, (see previous section "Kurtosis" ) as follows: One can use this equation to solve for the sample size ν= α + β in terms of the square of the skewness and the excess kurtosis as follows: [ 42 ] This is the ratio (multiplied by a factor of 3) between the previously derived limit boundaries for the beta distribution in a space (as originally done by Karl Pearson [ 21 ] ) defined with coordinates of the square of the skewness in one axis and the excess kurtosis in the other axis (see § Kurtosis bounded by the square of the skewness ): The case of zero skewness, can be immediately solved because for zero skewness, α = β and hence ν = 2 α = 2 β , therefore α = β = ν /2 (Excess kurtosis is negative for the beta distribution with zero skewness, ranging from -2 to 0, so that -and therefore the sample shape parameters- is positive, ranging from zero when the shape parameters approach zero and the excess kurtosis approaches -2, to infinity when the shape parameters approach infinity and the excess kurtosis approaches zero). For non-zero sample skewness one needs to solve a system of two coupled equations. Since the skewness and the excess kurtosis are independent of the parameters , the parameters can be uniquely determined from the sample skewness and the sample excess kurtosis, by solving the coupled equations with two known variables (sample skewness and sample excess kurtosis) and two unknowns (the shape parameters): resulting in the following solution: [ 42 ] Where one should take the solutions as follows: for (negative) sample skewness < 0, and for (positive) sample skewness > 0. The accompanying plot shows these two solutions as surfaces in a space with horizontal axes of (sample excess kurtosis) and (sample squared skewness) and the shape parameters as the vertical axis. The surfaces are constrained by the condition that the sample excess kurtosis must be bounded by the sample squared skewness as stipulated in the above equation. The two surfaces meet at the right edge defined by zero skewness. Along this right edge, both parameters are equal and the distribution is symmetric U-shaped for α = β < 1, uniform for α = β = 1, upside-down-U-shaped for 1 < α = β < 2 and bell-shaped for α = β > 2. The surfaces also meet at the front (lower) edge defined by "the impossible boundary" line (excess kurtosis + 2 - skewness 2 = 0). Along this front (lower) boundary both shape parameters approach zero, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities at the left end x = 0 and at the right end x = 1. The two surfaces become further apart towards the rear edge. At this rear edge the surface parameters are quite different from each other. As remarked, for example, by Bowman and Shenton, [ 44 ] sampling in the neighborhood of the line (sample excess kurtosis - (3/2)(sample skewness) 2 = 0) (the just-J-shaped portion of the rear edge where blue meets beige), "is dangerously near to chaos", because at that line the denominator of the expression above for the estimate ν = α + β becomes zero and hence ν approaches infinity as that line is approached. Bowman and Shenton [ 44 ] write that "the higher moment parameters (kurtosis and skewness) are extremely fragile (near that line). However, the mean and standard deviation are fairly reliable." Therefore, the problem is for the case of four parameter estimation for very skewed distributions such that the excess kurtosis approaches (3/2) times the square of the skewness. This boundary line is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. See § Kurtosis bounded by the square of the skewness for a numerical example and further comments about this rear edge boundary line (sample excess kurtosis - (3/2)(sample skewness) 2 = 0). As remarked by Karl Pearson himself [ 45 ] this issue may not be of much practical importance as this trouble arises only for very skewed J-shaped (or mirror-image J-shaped) distributions with very different values of shape parameters that are unlikely to occur much in practice). The usual skewed-bell-shape distributions that occur in practice do not have this parameter estimation problem. The remaining two parameters can be determined using the sample mean and the sample variance using a variety of equations. [ 1 ] [ 42 ] One alternative is to calculate the support interval range based on the sample variance and the sample kurtosis. For this purpose one can solve, in terms of the range , the equation expressing the excess kurtosis in terms of the sample variance, and the sample size ν (see § Kurtosis and § Alternative parametrizations, four parameters ): to obtain: Another alternative is to calculate the support interval range based on the sample variance and the sample skewness. [ 42 ] For this purpose one can solve, in terms of the range , the equation expressing the squared skewness in terms of the sample variance, and the sample size ν (see section titled "Skewness" and "Alternative parametrizations, four parameters"): to obtain: [ 42 ] The remaining parameter can be determined from the sample mean and the previously obtained parameters: : and finally, . In the above formulas one may take, for example, as estimates of the sample moments: The estimators G 1 for sample skewness and G 2 for sample kurtosis are used by DAP / SAS , PSPP / SPSS , and Excel . However, they are not used by BMDP and (according to [ 46 ] ) they were not used by MINITAB in 1998. Actually, Joanes and Gill in their 1998 study [ 46 ] concluded that the skewness and kurtosis estimators used in BMDP and in MINITAB (at that time) had smaller variance and mean-squared error in normal samples, but the skewness and kurtosis estimators used in DAP / SAS , PSPP / SPSS , namely G 1 and G 2 , had smaller mean-squared error in samples from a very skewed distribution. It is for this reason that we have spelled out "sample skewness", etc., in the above formulas, to make it explicit that the user should choose the best estimator according to the problem at hand, as the best estimator for skewness and kurtosis depends on the amount of skewness (as shown by Joanes and Gill [ 46 ] ). Two unknown parameters [ edit ] Max (joint log likelihood/ N ) for beta distribution maxima at α = β = 2 Max (joint log likelihood/ N ) for Beta distribution maxima at α = β ∈ {0.25,0.5,1,2,4,6,8} As is also the case for maximum likelihood estimates for the gamma distribution , the maximum likelihood estimates for the beta distribution do not have a general closed form solution for arbitrary values of the shape parameters. If X 1 , ..., X N are independent random variables each having a beta distribution, the joint log likelihood function for N iid observations is: Finding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the maximum likelihood estimator of the shape parameters: where: since the digamma function denoted ψ(α) is defined as the logarithmic derivative of the gamma function : [ 18 ] To ensure that the values with zero tangent slope are indeed a maximum (instead of a saddle-point or a minimum) one has to also satisfy the condition that the curvature is negative. This amounts to satisfying that the second partial derivative with respect to the shape parameters is negative using the previous equations, this is equivalent to: where the trigamma function , denoted ψ 1 ( α ), is the second of the polygamma functions , and is defined as the derivative of the digamma function: These conditions are equivalent to stating that the variances of the logarithmically transformed variables are positive, since: Therefore, the condition of negative curvature at a maximum is equivalent to the statements: Alternatively, the condition of negative curvature at a maximum is also equivalent to stating that the following logarithmic derivatives of the geometric means G X and G (1−X) are positive, since: While these slopes are indeed positive, the other slopes are negative: The slopes of the mean and the median with respect to α and β display similar sign behavior. From the condition that at a maximum, the partial derivative with respect to the shape parameter equals zero, we obtain the following system of coupled maximum likelihood estimate equations (for the average log-likelihoods) that needs to be inverted to obtain the (unknown) shape parameter estimates in terms of the (known) average of logarithms of the samples X 1 , ..., X N : [ 1 ] where we recognize as the logarithm of the sample geometric mean and as the logarithm of the sample geometric mean based on (1 − X ), the mirror-image of X . For , it follows that . These coupled equations containing digamma functions of the shape parameter estimates must be solved by numerical methods as done, for example, by Beckman et al. [ 47 ] Gnanadesikan et al. give numerical solutions for a few cases. [ 48 ] N.L.Johnson and S.Kotz [ 1 ] suggest that for "not too small" shape parameter estimates , the logarithmic approximation to the digamma function may be used to obtain initial values for an iterative solution, since the equations resulting from this approximation can be solved exactly: which leads to the following solution for the initial values (of the estimate shape parameters in terms of the sample geometric means) for an iterative solution: Alternatively, the estimates provided by the method of moments can instead be used as initial values for an iterative solution of the maximum likelihood coupled equations in terms of the digamma functions. When the distribution is required over a known interval other than [0, 1] with random variable X , say [ a , c ] with random variable Y , then replace ln( X i ) in the first equation with and replace ln(1− X i ) in the second equation with (see "Alternative parametrizations, four parameters" section below). If one of the shape parameters is known, the problem is considerably simplified. The following logit transformation can be used to solve for the unknown shape parameter (for skewed cases such that , otherwise, if symmetric, both -equal- parameters are known when one is known): This logit transformation is the logarithm of the transformation that divides the variable X by its mirror-image ( X /(1 - X ) resulting in the "inverted beta distribution" or beta prime distribution (also known as beta distribution of the second kind or Pearson's Type VI ) with support [0, +∞). As previously discussed in the section "Moments of logarithmically transformed random variables," the logit transformation , studied by Johnson, [ 25 ] extends the finite support [0, 1] based on the original variable X to infinite support in both directions of the real line (−∞, +∞). If, for example, is known, the unknown parameter can be obtained in terms of the inverse [ 49 ] digamma function of the right hand side of this equation: In particular, if one of the shape parameters has a value of unity, for example for (the power function distribution with bounded support [0,1]), using the identity ψ( x + 1) = ψ( x ) + 1/ x in the equation , the maximum likelihood estimator for the unknown parameter is, [ 1 ] exactly: The beta has support [0, 1], therefore , and hence , and therefore In conclusion, the maximum likelihood estimates of the shape parameters of a beta distribution are (in general) a complicated function of the sample geometric mean , and of the sample geometric mean based on (1− X )), the mirror-image of X . One may ask, if the variance (in addition to the mean) is necessary to estimate two shape parameters with the method of moments, why is the (logarithmic or geometric) variance not necessary to estimate two shape parameters with the maximum likelihood method, for which only the geometric means suffice? The answer is because the mean does not provide as much information as the geometric mean. For a beta distribution with equal shape parameters α = β , the mean is exactly 1/2, regardless of the value of the shape parameters, and therefore regardless of the value of the statistical dispersion (the variance). On the other hand, the geometric mean of a beta distribution with equal shape parameters α = β , depends on the value of the shape parameters, and therefore it contains more information. Also, the geometric mean of a beta distribution does not satisfy the symmetry conditions satisfied by the mean, therefore, by employing both the geometric mean based on X and geometric mean based on (1 − X ), the maximum likelihood method is able to provide best estimates for both parameters α = β , without need of employing the variance. One can express the joint log likelihood per N iid observations in terms of the sufficient statistics (the sample geometric means) as follows: We can plot the joint log likelihood per N observations for fixed values of the sample geometric means to see the behavior of the likelihood function as a function of the shape parameters α and β. In such a plot, the shape parameter estimators correspond to the maxima of the likelihood function. See the accompanying graph that shows that all the likelihood functions intersect at α = β = 1, which corresponds to the values of the shape parameters that give the maximum entropy (the maximum entropy occurs for shape parameters equal to unity: the uniform distribution). It is evident from the plot that the likelihood function gives sharp peaks for values of the shape parameter estimators close to zero, but that for values of the shape parameters estimators greater than one, the likelihood function becomes quite flat, with less defined peaks. Obviously, the maximum likelihood parameter estimation method for the beta distribution becomes less acceptable for larger values of the shape parameter estimators, as the uncertainty in the peak definition increases with the value of the shape parameter estimators. One can arrive at the same conclusion by noticing that the expression for the curvature of the likelihood function is in terms of the geometric variances These variances (and therefore the curvatures) are much larger for small values of the shape parameter α and β. However, for shape parameter values α, β > 1, the variances (and therefore the curvatures) flatten out. Equivalently, this result follows from the Cramér–Rao bound , since the Fisher information matrix components for the beta distribution are these logarithmic variances. The Cramér–Rao bound states that the variance of any unbiased estimator of α is bounded by the reciprocal of the Fisher information : so the variance of the estimators increases with increasing α and β, as the logarithmic variances decrease. Also one can express the joint log likelihood per N iid observations in terms of the digamma function expressions for the logarithms of the sample geometric means as follows: this expression is identical to the negative of the cross-entropy (see section on "Quantities of information (entropy)"). Therefore, finding the maximum of the joint log likelihood of the shape parameters, per N iid observations, is identical to finding the minimum of the cross-entropy for the beta distribution, as a function of the shape parameters. with the cross-entropy defined as follows: Four unknown parameters [ edit ] The procedure is similar to the one followed in the two unknown parameter case. If Y 1 , ..., Y N are independent random variables each having a beta distribution with four parameters, the joint log likelihood function for N iid observations is: Finding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the maximum likelihood estimator of the shape parameters: these equations can be re-arranged as the following system of four coupled equations (the first two equations are geometric means and the second two equations are the harmonic means) in terms of the maximum likelihood estimates for the four parameters : with sample geometric means: The parameters are embedded inside the geometric mean expressions in a nonlinear way (to the power 1/ N ). This precludes, in general, a closed form solution, even for an initial value approximation for iteration purposes. One alternative is to use as initial values for iteration the values obtained from the method of moments solution for the four parameter case. Furthermore, the expressions for the harmonic means are well-defined only for , which precludes a maximum likelihood solution for shape parameters less than unity in the four-parameter case. Fisher's information matrix for the four parameter case is positive-definite only for α, β > 2 (for further discussion, see section on Fisher information matrix, four parameter case), for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. The following Fisher information components (that represent the expectations of the curvature of the log likelihood function) have singularities at the following values: (for further discussion see section on Fisher information matrix). Thus, it is not possible to strictly carry on the maximum likelihood estimation for some well known distributions belonging to the four-parameter beta distribution family, like the uniform distribution (Beta(1, 1, a , c )), and the arcsine distribution (Beta(1/2, 1/2, a , c )). N.L.Johnson and S.Kotz [ 1 ] ignore the equations for the harmonic means and instead suggest "If a and c are unknown, and maximum likelihood estimators of a , c , α and β are required, the above procedure (for the two unknown parameter case, with X transformed as X = ( Y − a )/( c − a )) can be repeated using a succession of trial values of a and c , until the pair ( a , c ) for which maximum likelihood (given a and c ) is as great as possible, is attained" (where, for the purpose of clarity, their notation for the parameters has been translated into the present notation). Fisher information matrix [ edit ] Let a random variable X have a probability density f ( x ; α ). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log likelihood function is called the score . The second moment of the score is called the Fisher information : The expectation of the score is zero, therefore the Fisher information is also the second moment centered on the mean of the score: the variance of the score. If the log likelihood function is twice differentiable with respect to the parameter α, and under certain regularity conditions, [ 50 ] then the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes): Thus, the Fisher information is the negative of the expectation of the second derivative with respect to the parameter α of the log likelihood function . Therefore, Fisher information is a measure of the curvature of the log likelihood function of α. A low curvature (and therefore high radius of curvature ), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large curvature (and therefore low radius of curvature ) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters ("the observed Fisher information matrix") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms. [ 51 ] The word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators. The Cramér–Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any estimator of a parameter α: The precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two alternative hypothesis of a parameter. [ 52 ] When there are N parameters then the Fisher information takes the form of an N × N positive semidefinite symmetric matrix , the Fisher information matrix, with typical element: Under certain regularity conditions, [ 50 ] the Fisher Information Matrix may also be written in the following form, which is often more convenient for computation: With X 1 , ..., X N iid random variables, an N -dimensional "box" can be constructed with sides X 1 , ..., X N . Costa and Cover [ 53 ] show that the (Shannon) differential entropy h ( X ) is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set. For X 1 , ..., X N independent random variables each having a beta distribution parametrized with shape parameters α and β , the joint log likelihood function for N iid observations is: therefore the joint log likelihood function per N iid observations is For the two parameter case, the Fisher information has 4 components: 2 diagonal and 2 off-diagonal. Since the Fisher information matrix is symmetric, one of these off diagonal components is independent. Therefore, the Fisher information matrix has 3 independent components (2 diagonal and 1 off diagonal). Aryal and Nadarajah [ 54 ] calculated Fisher's information matrix for the four-parameter case, from which the two parameter case can be obtained as follows: Since the Fisher information matrix is symmetric The Fisher information components are equal to the log geometric variances and log geometric covariance. Therefore, they can be expressed as trigamma functions , denoted ψ 1 (α), the second of the polygamma functions , defined as the derivative of the digamma function: These derivatives are also derived in the § Two unknown parameters and plots of the log likelihood function are also shown in that section. § Geometric variance and covariance contains plots and further discussion of the Fisher information matrix components: the log geometric variances and log geometric covariance as a function of the shape parameters α and β. § Moments of logarithmically transformed random variables contains formulas for moments of logarithmically transformed random variables. Images for the Fisher information components and are shown in § Geometric variance . The determinant of Fisher's information matrix is of interest (for example for the calculation of Jeffreys prior probability). From the expressions for the individual components of the Fisher information matrix, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution is: From Sylvester's criterion (checking whether the diagonal elements are all positive), it follows that the Fisher information matrix for the two parameter case is positive-definite (under the standard condition that the shape parameters are positive α > 0 and β > 0). Fisher Information I ( a , a ) for α = β vs range ( c − a ) and exponent α = β Fisher Information I ( α , a ) for α = β , vs. range ( c − a ) and exponent α = β If Y 1 , ..., Y N are independent random variables each having a beta distribution with four parameters: the exponents α and β , and also a (the minimum of the distribution range), and c (the maximum of the distribution range) (section titled "Alternative parametrizations", "Four parameters"), with probability density function : the joint log likelihood function per N iid observations is: For the four parameter case, the Fisher information has 4*4=16 components. It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components. Aryal and Nadarajah [ 54 ] calculated Fisher's information matrix for the four parameter case as follows: In the above expressions, the use of X instead of Y in the expressions var[ln( X )] = ln(var GX ) is not an error . The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter X ~ Beta( α , β ) parametrization because when taking the partial derivatives with respect to the exponents ( α , β ) in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum a and maximum c of the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents α and β is the second derivative of the log of the beta function: ln(B( α , β )). This term is independent of the minimum a and maximum c of the distribution's range. Double differentiation of this term results in trigamma functions. The sections titled "Maximum likelihood", "Two unknown parameters" and "Four unknown parameters" also show this fact. The Fisher information for N i.i.d. samples is N times the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas [ 28 ] ). (Aryal and Nadarajah [ 54 ] take a single observation, N = 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per N observations. Moreover, below the erroneous expression for in Aryal and Nadarajah has been corrected.) The lower two diagonal entries of the Fisher information matrix, with respect to the parameter a (the minimum of the distribution's range): , and with respect to the parameter c (the maximum of the distribution's range): are only defined for exponents α > 2 and β > 2 respectively. The Fisher information matrix component for the minimum a approaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component for the maximum c approaches infinity for exponent β approaching 2 from above. The Fisher information matrix for the four parameter case does not depend on the individual values of the minimum a and the maximum c , but only on the total range ( c − a ). Moreover, the components of the Fisher information matrix that depend on the range ( c − a ), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range ( c − a ). The accompanying images show the Fisher information components and . Images for the Fisher information components and are shown in § Geometric variance . All these Fisher information components look like a basin, with the "walls" of the basin being located at low values of the parameters. The following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter: X ~ Beta(α, β) expectations of the transformed ratio ((1 − X )/ X ) and of its mirror image ( X /(1 − X )), scaled by the range ( c − a ), which may be helpful for interpretation: These are also the expected values of the "inverted beta distribution" or beta prime distribution (also known as beta distribution of the second kind or Pearson's Type VI ) [ 1 ] and its mirror image, scaled by the range ( c − a ). Also, the following Fisher information components can be expressed in terms of the harmonic (1/X) variances or of variances based on the ratio transformed variables ((1-X)/X) as follows: See section "Moments of linearly transformed, product and inverted random variables" for these expectations. The determinant of Fisher's information matrix is of interest (for example for the calculation of Jeffreys prior probability). From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is: Using Sylvester's criterion (checking whether the diagonal elements are all positive), and since diagonal components and have singularities at α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is positive-definite for α>2 and β>2. Since for α > 2 and β > 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the uniform distribution (Beta(1,1,a,c)) have Fisher information components ( ) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case). The four-parameter Wigner semicircle distribution (Beta(3/2,3/2, a , c )) and arcsine distribution (Beta(1/2,1/2, a , c )) have negative Fisher information determinants for the four-parameter case. : The uniform distribution probability density was proposed by Thomas Bayes to represent ignorance of prior probabilities in Bayesian inference . The use of Beta distributions in Bayesian inference is due to the fact that they provide a family of conjugate prior probability distributions for binomial (including Bernoulli ) and geometric distributions . The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value p : [ 24 ] Examples of beta distributions used as prior probabilities to represent ignorance of prior parameter values in Bayesian inference are Beta(1,1), Beta(0,0) and Beta(1/2,1/2). A classic application of the beta distribution is the rule of succession , introduced in the 18th century by Pierre-Simon Laplace [ 55 ] in the course of treating the sunrise problem . It states that, given s successes in n conditionally independent Bernoulli trials with probability p, that the estimate of the expected value in the next trial is . This estimate is the expected value of the posterior distribution over p, namely Beta( s +1, n − s +1), which is given by Bayes' rule if one assumes a uniform prior probability over p (i.e., Beta(1, 1)) and then observes that p generated s successes in n trials. Laplace's rule of succession has been criticized by prominent scientists. R. T. Cox described Laplace's application of the rule of succession to the sunrise problem ( [ 56 ] p. 89) as "a travesty of the proper use of the principle". Keynes remarks ( [ 57 ] Ch.XXX, p. 382) "indeed this is so foolish a theorem that to entertain it is discreditable". Karl Pearson [ 58 ] showed that the probability that the next ( n + 1) trials will be successes, after n successes in n trials, is only 50%, which has been considered too low by scientists like Jeffreys and unacceptable as a representation of the scientific process of experimentation to test a proposed scientific law. As pointed out by Jeffreys ( [ 59 ] p. 128) (crediting C. D. Broad [ 60 ] ) Laplace's rule of succession establishes a high probability of success ((n+1)/(n+2)) in the next trial, but only a moderate probability (50%) that a further sample ( n +1) comparable in size will be equally successful. As pointed out by Perks, [ 61 ] "The rule of succession itself is hard to accept. It assigns a probability to the next trial which implies the assumption that the actual run observed is an average run and that we are always at the end of an average run. It would, one would think, be more reasonable to assume that we were in the middle of an average run. Clearly a higher value for both probabilities is necessary if they are to accord with reasonable belief." These problems with Laplace's rule of succession motivated Haldane, Perks, Jeffreys and others to search for other forms of prior probability (see the next § Bayesian inference ). According to Jaynes, [ 52 ] the main problem with the rule of succession is that it is not valid when s=0 or s=n (see rule of succession , for an analysis of its validity). Bayes–Laplace prior probability (Beta(1,1)) [ edit ] The beta distribution achieves maximum differential entropy for Beta(1,1): the uniform probability density, for which all values in the domain of the distribution have equal density. This uniform distribution Beta(1,1) was suggested ("with a great deal of doubt") by Thomas Bayes [ 62 ] as the prior probability distribution to express ignorance about the correct prior distribution. This prior distribution was adopted (apparently, from his writings, with little sign of doubt [ 55 ] ) by Pierre-Simon Laplace , and hence it was also known as the "Bayes–Laplace rule" or the "Laplace rule" of " inverse probability " in publications of the first half of the 20th century. In the later part of the 19th century and early part of the 20th century, scientists realized that the assumption of uniform "equal" probability density depended on the actual functions (for example whether a linear or a logarithmic scale was most appropriate) and parametrizations used. In particular, the behavior near the ends of distributions with finite support (for example near x = 0, for a distribution with initial support at x = 0) required particular attention. Keynes ( [ 57 ] Ch.XXX, p. 381) criticized the use of Bayes's uniform prior probability (Beta(1,1)) that all values between zero and one are equiprobable, as follows: "Thus experience, if it shows anything, shows that there is a very marked clustering of statistical ratios in the neighborhoods of zero and unity, of those for positive theories and for correlations between positive qualities in the neighborhood of zero, and of those for negative theories and for correlations between negative qualities in the neighborhood of unity. " Haldane's prior probability (Beta(0,0)) [ edit ] : The Haldane prior probability expressing total ignorance about prior information, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure. As α, β → 0, the beta distribution approaches a two-point Bernoulli distribution with all probability density concentrated at each end, at 0 and 1, and nothing in between. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Beta(0,0) distribution was proposed by J.B.S. Haldane , [ 63 ] who suggested that the prior probability representing complete uncertainty should be proportional to p −1 (1− p ) −1 . The function p −1 (1− p ) −1 can be viewed as the limit of the numerator of the beta distribution as both shape parameters approach zero: α, β → 0. The Beta function (in the denominator of the beta distribution) approaches infinity, for both parameters approaching zero, α, β → 0. Therefore, p −1 (1− p ) −1 divided by the Beta function approaches a 2-point Bernoulli distribution with equal probability 1/2 at each end, at 0 and 1, and nothing in between, as α, β → 0. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Haldane prior probability distribution Beta(0,0) is an " improper prior " because its integration (from 0 to 1) fails to strictly converge to 1 due to the singularities at each end. However, this is not an issue for computing posterior probabilities unless the sample size is very small. Furthermore, Zellner [ 64 ] points out that on the log-odds scale, (the logit transformation ), the Haldane prior is the uniformly flat prior. The fact that a uniform prior probability on the logit transformed variable ln( p /1 − p ) (with domain (−∞, ∞)) is equivalent to the Haldane prior on the domain [0, 1] was pointed out by Harold Jeffreys in the first edition (1939) of his book Theory of Probability ( [ 59 ] p. 123). Jeffreys writes "Certainly if we take the Bayes–Laplace rule right up to the extremes we are led to results that do not correspond to anybody's way of thinking. The (Haldane) rule d x /( x (1 − x )) goes too far the other way. It would lead to the conclusion that if a sample is of one type with respect to some property there is a probability 1 that the whole population is of that type." The fact that "uniform" depends on the parametrization, led Jeffreys to seek a form of prior that would be invariant under different parametrizations. Jeffreys' prior probability (Beta(1/2,1/2) for a Bernoulli or for a binomial distribution) [ edit ] Jeffreys prior probability for the beta distribution: the square root of the determinant of Fisher's information matrix: is a function of the trigamma function ψ 1 of shape parameters α, β Posterior Beta densities with samples having success = "s", failure = "f" of s /( s + f ) = 1/2, and s + f = {3,10,50}, based on 3 different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near p = 1/2). Significant differences appear for very small sample sizes (the flatter distribution for sample size of 3) Posterior Beta densities with samples having success = "s", failure = "f" of s /( s + f ) = 1/4, and s + f ∈ {3,10,50}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near p = 1/4). Significant differences appear for very small sample sizes (the very skewed distribution for the degenerate case of sample size = 3, in this degenerate and unlikely case the Haldane prior results in a reverse "J" shape with mode at p = 0 instead of p = 1/4. If there is sufficient sampling data , the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1/2,1/2)) and Haldane (Beta(0,0)) should yield similar posterior probability densities. Posterior Beta densities with samples having success = s , failure = f of s /( s + f ) = 1/4, and s + f ∈ {4,12,40}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 40 (with more pronounced peak near p = 1/4). Significant differences appear for very small sample sizes Harold Jeffreys [ 59 ] [ 65 ] proposed to use an uninformative prior probability measure that should be invariant under reparameterization : proportional to the square root of the determinant of Fisher's information matrix. For the Bernoulli distribution , this can be shown as follows: for a coin that is "heads" with probability p ∈ [0, 1] and is "tails" with probability 1 − p , for a given (H,T) ∈ {(0,1), (1,0)} the probability is p H (1 − p ) T . Since T = 1 − H , the Bernoulli distribution is p H (1 − p ) 1 − H . Considering p as the only parameter, it follows that the log likelihood for the Bernoulli distribution is The Fisher information matrix has only one component (it is a scalar, because there is only one parameter: p ), therefore: Similarly, for the Binomial distribution with n Bernoulli trials , it can be shown that Thus, for the Bernoulli , and Binomial distributions , Jeffreys prior is proportional to , which happens to be proportional to a beta distribution with domain variable x = p , and shape parameters α = β = 1/2, the arcsine distribution : It will be shown in the next section that the normalizing constant for Jeffreys prior is immaterial to the final result because the normalizing constant cancels out in Bayes' theorem for the posterior probability. Hence Beta(1/2,1/2) is used as the Jeffreys prior for both Bernoulli and binomial distributions. As shown in the next section, when using this expression as a prior probability times the likelihood in Bayes' theorem , the posterior probability turns out to be a beta distribution. It is important to realize, however, that Jeffreys prior is proportional to for the Bernoulli and binomial distribution, but not for the beta distribution. Jeffreys prior for the beta distribution is given by the determinant of Fisher's information for the beta distribution, which, as shown in the § Fisher information matrix is a function of the trigamma function ψ 1 of shape parameters α and β as follows: As previously discussed, Jeffreys prior for the Bernoulli and binomial distributions is proportional to the arcsine distribution Beta(1/2,1/2), a one-dimensional curve that looks like a basin as a function of the parameter p of the Bernoulli and binomial distributions. The walls of the basin are formed by p approaching the singularities at the ends p → 0 and p → 1, where Beta(1/2,1/2) approaches infinity. Jeffreys prior for the beta distribution is a 2-dimensional surface (embedded in a three-dimensional space) that looks like a basin with only two of its walls meeting at the corner α = β = 0 (and missing the other two walls) as a function of the shape parameters α and β of the beta distribution. The two adjoining walls of this 2-dimensional surface are formed by the shape parameters α and β approaching the singularities (of the trigamma function) at α, β → 0. It has no walls for α, β → ∞ because in this case the determinant of Fisher's information matrix for the beta distribution approaches zero. It will be shown in the next section that Jeffreys prior probability results in posterior probabilities (when multiplied by the binomial likelihood function) that are intermediate between the posterior probability results of the Haldane and Bayes prior probabilities. Jeffreys prior may be difficult to obtain analytically, and for some cases it just doesn't exist (even for simple distribution functions like the asymmetric triangular distribution ). Berger, Bernardo and Sun, in a 2009 paper [ 66 ] defined a reference prior probability distribution that (unlike Jeffreys prior) exists for the asymmetric triangular distribution . They cannot obtain a closed-form expression for their reference prior, but numerical calculations show it to be nearly perfectly fitted by the (proper) prior where θ is the vertex variable for the asymmetric triangular distribution with support [0, 1] (corresponding to the following parameter values in Wikipedia's article on the triangular distribution : vertex c = θ , left end a = 0, and right end b = 1). Berger et al. also give a heuristic argument that Beta(1/2,1/2) could indeed be the exact Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution. Therefore, Beta(1/2,1/2) not only is Jeffreys prior for the Bernoulli and binomial distributions, but also seems to be the Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution (for which the Jeffreys prior does not exist), a distribution used in project management and PERT analysis to describe the cost and duration of project tasks. Clarke and Barron [ 67 ] prove that, among continuous positive priors, Jeffreys prior (when it exists) asymptotically maximizes Shannon's mutual information between a sample of size n and the parameter, and therefore Jeffreys prior is the most uninformative prior (measuring information as Shannon information). The proof rests on an examination of the Kullback–Leibler divergence between probability density functions for iid random variables. Effect of different prior probability choices on the posterior beta distribution [ edit ] If samples are drawn from the population of a random variable X that result in s successes and f failures in n Bernoulli trials n = s + f , then the likelihood function for parameters s and f given x = p (the notation x = p in the expressions below will emphasize that the domain x stands for the value of the parameter p in the binomial distribution), is the following binomial distribution : If beliefs about prior probability information are reasonably well approximated by a beta distribution with parameters α Prior and β Prior, then: According to Bayes' theorem for a continuous event space, the posterior probability density is given by the product of the prior probability and the likelihood function (given the evidence s and f = n − s ), normalized so that the area under the curve equals one, as follows: The binomial coefficient appears both in the numerator and the denominator of the posterior probability, and it does not depend on the integration variable x , hence it cancels out, and it is irrelevant to the final result. Similarly the normalizing factor for the prior probability, the beta function B(αPrior,βPrior) cancels out and it is immaterial to the final result. The same posterior probability result can be obtained if one uses an un-normalized prior because the normalizing factors all cancel out. Several authors (including Jeffreys himself) thus use an un-normalized prior formula since the normalization constant cancels out. The numerator of the posterior probability ends up being just the (un-normalized) product of the prior probability and the likelihood function, and the denominator is its integral from zero to one. The beta function in the denominator, B( s + α Prior, n − s + β Prior), appears as a normalization constant to ensure that the total posterior probability integrates to unity. The ratio s / n of the number of successes to the total number of trials is a sufficient statistic in the binomial case, which is relevant for the following results. For the Bayes' prior probability (Beta(1,1)), the posterior probability is: For the Jeffreys' prior probability (Beta(1/2,1/2)), the posterior probability is: and for the Haldane prior probability (Beta(0,0)), the posterior probability is: From the above expressions it follows that for s / n = 1/2) all the above three prior probabilities result in the identical location for the posterior probability mean = mode = 1/2. For s / n < 1/2, the mean of the posterior probabilities, using the following priors, are such that: mean for Bayes prior > mean for Jeffreys prior > mean for Haldane prior. For s / n > 1/2 the order of these inequalities is reversed such that the Haldane prior probability results in the largest posterior mean. The Haldane prior probability Beta(0,0) results in a posterior probability density with mean (the expected value for the probability of success in the "next" trial) identical to the ratio s / n of the number of successes to the total number of trials. Therefore, the Haldane prior results in a posterior probability with expected value in the next trial equal to the maximum likelihood. The Bayes prior probability Beta(1,1) results in a posterior probability density with mode identical to the ratio s / n (the maximum likelihood). In the case that 100% of the trials have been successful s = n , the Bayes prior probability Beta(1,1) results in a posterior expected value equal to the rule of succession ( n + 1)/( n + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of 1 (absolute certainty of success in the next trial). Jeffreys prior probability results in a posterior expected value equal to ( n + 1/2)/( n + 1). Perks [ 61 ] (p. 303) points out: "This provides a new rule of succession and expresses a 'reasonable' position to take up, namely, that after an unbroken run of n successes we assume a probability for the next trial equivalent to the assumption that we are about half-way through an average run, i.e. that we expect a failure once in (2 n + 2) trials. The Bayes–Laplace rule implies that we are about at the end of an average run or that we expect a failure once in ( n + 2) trials. The comparison clearly favours the new result (what is now called Jeffreys prior) from the point of view of 'reasonableness'." Conversely, in the case that 100% of the trials have resulted in failure ( s = 0), the Bayes prior probability Beta(1,1) results in a posterior expected value for success in the next trial equal to 1/( n + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of success in the next trial of 0 (absolute certainty of failure in the next trial). Jeffreys prior probability results in a posterior expected value for success in the next trial equal to (1/2)/( n + 1), which Perks [ 61 ] (p. 303) points out: "is a much more reasonably remote result than the Bayes–Laplace result 1/( n + 2)". Jaynes [ 52 ] questions (for the Haldane prior Beta(0,0)) the use of these formulas for the cases s = 0 or s = n because the integrals do not converge (Beta(0,0) is an improper prior for s = 0 or s = n ). In practice, the conditions 0<s<n necessary for a mode to exist between both ends for the Bayes prior are usually met, and therefore the Bayes prior (as long as 0 < s < n ) results in a posterior mode located between both ends of the domain. As remarked in the section on the rule of succession, K. Pearson showed that after n successes in n trials the posterior probability (based on the Bayes Beta(1,1) distribution as the prior probability) that the next ( n + 1) trials will all be successes is exactly 1/2, whatever the value of n . Based on the Haldane Beta(0,0) distribution as the prior probability, this posterior probability is 1 (absolute certainty that after n successes in n trials the next ( n + 1) trials will all be successes). Perks [ 61 ] (p. 303) shows that, for what is now known as the Jeffreys prior, this probability is (( n + 1/2)/( n + 1))(( n + 3/2)/( n + 2))...(2 n + 1/2)/(2 n + 1), which for n = 1, 2, 3 gives 15/24, 315/480, 9009/13440; rapidly approaching a limiting value of as n tends to infinity. Perks remarks that what is now known as the Jeffreys prior: "is clearly more 'reasonable' than either the Bayes–Laplace result or the result on the (Haldane) alternative rule rejected by Jeffreys which gives certainty as the probability. It clearly provides a very much better correspondence with the process of induction. Whether it is 'absolutely' reasonable for the purpose, i.e. whether it is yet large enough, without the absurdity of reaching unity, is a matter for others to decide. But it must be realized that the result depends on the assumption of complete indifference and absence of knowledge prior to the sampling experiment." Following are the variances of the posterior distribution obtained with these three prior probability distributions: for the Bayes' prior probability (Beta(1,1)), the posterior variance is: for the Jeffreys' prior probability (Beta(1/2,1/2)), the posterior variance is: and for the Haldane prior probability (Beta(0,0)), the posterior variance is: So, as remarked by Silvey, [ 50 ] for large n , the variance is small and hence the posterior distribution is highly concentrated, whereas the assumed prior distribution was very diffuse. This is in accord with what one would hope for, as vague prior knowledge is transformed (through Bayes' theorem) into a more precise posterior knowledge by an informative experiment. For small n the Haldane Beta(0,0) prior results in the largest posterior variance while the Bayes Beta(1,1) prior results in the more concentrated posterior. Jeffreys prior Beta(1/2,1/2) results in a posterior variance in between the other two. As n increases, the variance rapidly decreases so that the posterior variance for all three priors converges to approximately the same value (approaching zero variance as n → ∞). Recalling the previous result that the Haldane prior probability Beta(0,0) results in a posterior probability density with mean (the expected value for the probability of success in the "next" trial) identical to the ratio s/n of the number of successes to the total number of trials, it follows from the above expression that also the Haldane prior Beta(0,0) results in a posterior with variance identical to the variance expressed in terms of the max. likelihood estimate s/n and sample size (in § Variance ): with the mean μ = s / n and the sample size ν = n . In Bayesian inference, using a prior distribution Beta( α Prior, β Prior) prior to a binomial distribution is equivalent to adding ( α Prior − 1) pseudo-observations of "success" and ( β Prior − 1) pseudo-observations of "failure" to the actual number of successes and failures observed, then estimating the parameter p of the binomial distribution by the proportion of successes over both real- and pseudo-observations. A uniform prior Beta(1,1) does not add (or subtract) any pseudo-observations since for Beta(1,1) it follows that ( α Prior − 1) = 0 and ( β Prior − 1) = 0. The Haldane prior Beta(0,0) subtracts one pseudo observation from each and Jeffreys prior Beta(1/2,1/2) subtracts 1/2 pseudo-observation of success and an equal number of failure. This subtraction has the effect of smoothing out the posterior distribution. If the proportion of successes is not 50% ( s / n ≠ 1/2) values of α Prior and β Prior less than 1 (and therefore negative ( α Prior − 1) and ( β Prior − 1)) favor sparsity, i.e. distributions where the parameter p is closer to either 0 or 1. In effect, values of α Prior and β Prior between 0 and 1, when operating together, function as a concentration parameter . The accompanying plots show the posterior probability density functions for sample sizes n ∈ {3,10,50}, successes s ∈ { n /2, n /4} and Beta( α Prior, β Prior) ∈ {Beta(0,0),Beta(1/2,1/2),Beta(1,1)}. Also shown are the cases for n = {4,12,40}, success s = { n /4} and Beta( α Prior, β Prior) ∈ {Beta(0,0),Beta(1/2,1/2),Beta(1,1)}. The first plot shows the symmetric cases, for successes s ∈ {n/2}, with mean = mode = 1/2 and the second plot shows the skewed cases s ∈ { n /4}. The images show that there is little difference between the priors for the posterior with sample size of 50 (characterized by a more pronounced peak near p = 1/2). Significant differences appear for very small sample sizes (in particular for the flatter distribution for the degenerate case of sample size = 3). Therefore, the skewed cases, with successes s = { n /4}, show a larger effect from the choice of prior, at small sample size, than the symmetric cases. For symmetric distributions, the Bayes prior Beta(1,1) results in the most "peaky" and highest posterior distributions and the Haldane prior Beta(0,0) results in the flattest and lowest peak distribution. The Jeffreys prior Beta(1/2,1/2) lies in between them. For nearly symmetric, not too skewed distributions the effect of the priors is similar. For very small sample size (in this case for a sample size of 3) and skewed distribution (in this example for s ∈ { n /4}) the Haldane prior can result in a reverse-J-shaped distribution with a singularity at the left end. However, this happens only in degenerate cases (in this example n = 3 and hence s = 3/4 < 1, a degenerate value because s should be greater than unity in order for the posterior of the Haldane prior to have a mode located between the ends, and because s = 3/4 is not an integer number, hence it violates the initial assumption of a binomial distribution for the likelihood) and it is not an issue in generic cases of reasonable sample size (such that the condition 1 < s < n − 1, necessary for a mode to exist between both ends, is fulfilled). In Chapter 12 (p. 385) of his book, Jaynes [ 52 ] asserts that the Haldane prior Beta(0,0) describes a prior state of knowledge of complete ignorance , where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure, while the Bayes (uniform) prior Beta(1,1) applies if one knows that both binary outcomes are possible . Jaynes states: " interpret the Bayes–Laplace (Beta(1,1)) prior as describing not a state of complete ignorance , but the state of knowledge in which we have observed one success and one failure...once we have seen at least one success and one failure, then we know that the experiment is a true binary one, in the sense of physical possibility." Jaynes [ 52 ] does not specifically discuss Jeffreys prior Beta(1/2,1/2) (Jaynes discussion of "Jeffreys prior" on pp. 181, 423 and on chapter 12 of Jaynes book [ 52 ] refers instead to the improper, un-normalized, prior "1/ p dp " introduced by Jeffreys in the 1939 edition of his book, [ 59 ] seven years before he introduced what is now known as Jeffreys' invariant prior: the square root of the determinant of Fisher's information matrix. "1/p" is Jeffreys' (1946) invariant prior for the exponential distribution , not for the Bernoulli or binomial distributions ). However, it follows from the above discussion that Jeffreys Beta(1/2,1/2) prior represents a state of knowledge in between the Haldane Beta(0,0) and Bayes Beta (1,1) prior. Similarly, Karl Pearson in his 1892 book The Grammar of Science [ 68 ] [ 69 ] (p. 144 of 1900 edition) maintained that the Bayes (Beta(1,1) uniform prior was not a complete ignorance prior, and that it should be used when prior information justified to "distribute our ignorance equally"". K. Pearson wrote: "Yet the only supposition that we appear to have made is this: that, knowing nothing of nature, routine and anomy (from the Greek ανομία, namely: a- "without", and nomos "law") are to be considered as equally likely to occur. Now we were not really justified in making even this assumption, for it involves a knowledge that we do not possess regarding nature. We use our experience of the constitution and action of coins in general to assert that heads and tails are equally probable, but we have no right to assert before experience that, as we know nothing of nature, routine and breach are equally probable. In our ignorance we ought to consider before experience that nature may consist of all routines, all anomies (normlessness), or a mixture of the two in any proportion whatever, and that all such are equally probable. Which of these constitutions after experience is the most probable must clearly depend on what that experience has been like." If there is sufficient sampling data , and the posterior probability mode is not located at one of the extremes of the domain ( x = 0 or x = 1), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1/2,1/2)) and Haldane (Beta(0,0)) should yield similar posterior probability densities. Otherwise, as Gelman et al. [ 70 ] (p. 65) point out, "if so few data are available that the choice of noninformative prior distribution makes a difference, one should put relevant information into the prior distribution", or as Berger [ 4 ] (p. 125) points out "when different reasonable priors yield substantially different answers, can it be right to state that there is a single answer? Would it not be better to admit that there is scientific uncertainty, with the conclusion depending on prior beliefs?." Occurrence and applications [ edit ] The beta distribution has an important application in the theory of order statistics . A basic result is that the distribution of the k th smallest of a sample of size n from a continuous uniform distribution has a beta distribution. [ 40 ] This result is summarized as From this, and application of the theory related to the probability integral transform , the distribution of any individual order statistic from any continuous distribution can be derived. [ 40 ] In standard logic, propositions are considered to be either true or false. In contradistinction, subjective logic assumes that humans cannot determine with absolute certainty whether a proposition about the real world is absolutely true or false. In subjective logic the posteriori probability estimates of binary events can be represented by beta distributions. [ 71 ] A wavelet is a wave-like oscillation with an amplitude that starts out at zero, increases, and then decreases back to zero. It can typically be visualized as a "brief oscillation" that promptly decays. Wavelets can be used to extract information from many different kinds of data, including – but certainly not limited to – audio signals and images. Thus, wavelets are purposefully crafted to have specific properties that make them useful for signal processing . Wavelets are localized in both time and frequency whereas the standard Fourier transform is only localized in frequency. Therefore, standard Fourier Transforms are only applicable to stationary processes , while wavelets are applicable to non- stationary processes . Continuous wavelets can be constructed based on the beta distribution. Beta wavelets [ 72 ] can be viewed as a soft variety of Haar wavelets whose shape is fine-tuned by two shape parameters α and β. Population genetics [ edit ] The Balding–Nichols model is a two-parameter parametrization of the beta distribution used in population genetics . [ 73 ] It is a statistical description of the allele frequencies in the components of a sub-divided population: where and ; here F is (Wright's) genetic distance between two populations. Project management: task cost and schedule modeling [ edit ] The beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the triangular distribution — is used extensively in PERT , critical path method (CPM), Joint Cost Schedule Modeling (JCSM) and other project management /control systems to describe the time to completion and the cost of a task. In project management, shorthand computations are widely used to estimate the mean and standard deviation of the beta distribution: [ 39 ] where a is the minimum, c is the maximum, and b is the most likely value (the mode for α > 1 and β > 1). The above estimate for the mean is known as the PERT three-point estimation and it is exact for either of the following values of β (for arbitrary α within these ranges): β = α > 1 (symmetric case) with standard deviation , skewness = 0, and excess kurtosis = or β = 6 − α for 5 > α > 1 (skewed case) with standard deviation skewness , and excess kurtosis The above estimate for the standard deviation σ ( X ) = ( c − a )/6 is exact for either of the following values of α and β : α = β = 4 (symmetric) with skewness = 0, and excess kurtosis = −6/11. β = 6 − α and (right-tailed, positive skew) with skewness , and excess kurtosis = 0 β = 6 − α and (left-tailed, negative skew) with skewness , and excess kurtosis = 0 Otherwise, these can be poor approximations for beta distributions with other values of α and β, exhibiting average errors of 40% in the mean and 549% in the variance. [ 74 ] [ 75 ] [ 76 ] Random variate generation [ edit ] If X and Y are independent, with and then So one algorithm for generating beta variates is to generate , where X is a gamma variate with parameters (α, 1) and Y is an independent gamma variate with parameters (β, 1). [ 77 ] In fact, here and are independent, and . If and is independent of and , then and is independent of . This shows that the product of independent and random variables is a random variable. Also, the k th order statistic of n uniformly distributed variates is , so an alternative if α and β are small integers is to generate α + β − 1 uniform variates and choose the α-th smallest. [ 40 ] Another way to generate the Beta distribution is by Pólya urn model . According to this method, one starts with an "urn" with α "black" balls and β "white" balls and draws uniformly with replacement. Every trial an additional ball is added according to the color of the last ball which was drawn. Asymptotically, the proportion of black and white balls will be distributed according to the Beta distribution, where each repetition of the experiment will produce a different value. It is also possible to use the inverse transform sampling . Normal approximation to the Beta distribution [ edit ] A beta distribution with and and is approximately normal with mean and variance . If the normal approximation can be improved by taking the cube-root of the logarithm of the reciprocal of [ 78 ] [ 79 ] Thomas Bayes , in a posthumous paper [ 62 ] published in 1763 by Richard Price , obtained a beta distribution as the density of the probability of success in Bernoulli trials (see § Applications, Bayesian inference ), but the paper does not analyze any of the moments of the beta distribution or discuss any of its properties. Karl Pearson analyzed the beta distribution as the solution Type I of Pearson distributions The first systematic modern discussion of the beta distribution is probably due to Karl Pearson . [ 80 ] [ 81 ] In Pearson's papers [ 21 ] [ 33 ] the beta distribution is couched as a solution of a differential equation: Pearson's Type I distribution which it is essentially identical to except for arbitrary shifting and re-scaling (the beta and Pearson Type I distributions can always be equalized by proper choice of parameters). In fact, in several English books and journal articles in the few decades prior to World War II, it was common to refer to the beta distribution as Pearson's Type I distribution. William P. Elderton in his 1906 monograph "Frequency curves and correlation" [ 42 ] further analyzes the beta distribution as Pearson's Type I distribution, including a full discussion of the method of moments for the four parameter case, and diagrams of (what Elderton describes as) U-shaped, J-shaped, twisted J-shaped, "cocked-hat" shapes, horizontal and angled straight-line cases. Elderton wrote "I am chiefly indebted to Professor Pearson, but the indebtedness is of a kind for which it is impossible to offer formal thanks." Elderton in his 1906 monograph [ 42 ] provides an impressive amount of information on the beta distribution, including equations for the origin of the distribution chosen to be the mode, as well as for other Pearson distributions: types I through VII. Elderton also included a number of appendixes, including one appendix ("II") on the beta and gamma functions. In later editions, Elderton added equations for the origin of the distribution chosen to be the mean, and analysis of Pearson distributions VIII through XII. As remarked by Bowman and Shenton [ 44 ] "Fisher and Pearson had a difference of opinion in the approach to (parameter) estimation, in particular relating to (Pearson's method of) moments and (Fisher's method of) maximum likelihood in the case of the Beta distribution." Also according to Bowman and Shenton, "the case of a Type I (beta distribution) model being the center of the controversy was pure serendipity. A more difficult model of 4 parameters would have been hard to find." The long running public conflict of Fisher with Karl Pearson can be followed in a number of articles in prestigious journals. For example, concerning the estimation of the four parameters for the beta distribution, and Fisher's criticism of Pearson's method of moments as being arbitrary, see Pearson's article "Method of moments and method of maximum likelihood" [ 45 ] (published three years after his retirement from University College, London, where his position had been divided between Fisher and Pearson's son Egon) in which Pearson writes "I read (Koshai's paper in the Journal of the Royal Statistical Society, 1933) which as far as I am aware is the only case at present published of the application of Professor Fisher's method. To my astonishment that method depends on first working out the constants of the frequency curve by the (Pearson) Method of Moments and then superposing on it, by what Fisher terms "the Method of Maximum Likelihood" a further approximation to obtain, what he holds, he will thus get, 'more efficient values' of the curve constants". David and Edwards's treatise on the history of statistics [ 82 ] cites the first modern treatment of the beta distribution, in 1911, [ 83 ] using the beta designation that has become standard, due to Corrado Gini , an Italian statistician , demographer , and sociologist , who developed the Gini coefficient . N.L.Johnson and S.Kotz , in their comprehensive and very informative monograph [ 84 ] on leading historical personalities in statistical sciences credit Corrado Gini [ 85 ] as "an early Bayesian...who dealt with the problem of eliciting the parameters of an initial Beta distribution, by singling out techniques which anticipated the advent of the so-called empirical Bayes approach." ^ a b c d e f g h i j k l m n o p q r s t u v w x y Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). "Chapter 25: Beta Distributions". Continuous Univariate Distributions Vol. 2 (2nd ed.). Wiley. ISBN 978-0-471-58494-0 . ^ a b Rose, Colin; Smith, Murray D. (2002). Mathematical Statistics with MATHEMATICA . Springer. ISBN 978-0387952345 . ^ a b c Kruschke, John K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS . Academic Press / Elsevier. p. 83. ISBN 978-0123814852 . ^ a b Berger, James O. (2010). Statistical Decision Theory and Bayesian Analysis (2nd ed.). Springer. ISBN 978-1441930743 . ^ a b c Feller, William (1971). An Introduction to Probability Theory and Its Applications, Vol. 2 . Wiley. ISBN 978-0471257097 . ^ Wadsworth, G. P. (1960). Introduction to Probability and Random Variables . New York: McGraw-Hill. p. 52 . ^ Kruschke, John K. (2015). Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan . Academic Press / Elsevier. ISBN 978-0-12-405888-0 . ^ a b Wadsworth, George P. and Joseph Bryan (1960). Introduction to Probability and Random Variables . McGraw-Hill. ^ a b c d e f g Gupta, Arjun K., ed. (2004). Handbook of Beta Distribution and Its Applications . CRC Press. ISBN 978-0824753962 . ^ a b Kerman, Jouni (2011). "A closed-form approximation for the median of the beta distribution". arXiv : 1111.0433 [ math.ST ]. ^ Mosteller, Frederick and John Tukey (1977). Data Analysis and Regression: A Second Course in Statistics . Addison-Wesley Pub. Co. Bibcode : 1977dars.book.....M . ISBN 978-0201048544 . ^ Feller, William (1968). An Introduction to Probability Theory and Its Applications . Vol. 1 (3rd ed.). Wiley. ISBN 978-0471257080 . ^ Philip J. Fleming and John J. Wallace. How not to lie with statistics: the correct way to summarize benchmark results . Communications of the ACM, 29(3):218–221, March 1986. ^ "NIST/SEMATECH e-Handbook of Statistical Methods 1.3.6.6.17. Beta Distribution" . National Institute of Standards and Technology Information Technology Laboratory . April 2012 . Retrieved May 31, 2016 . ^ Oguamanam, D.C.D.; Martin, H. R.; Huissoon, J. P. (1995). "On the application of the beta distribution to gear damage analysis". Applied Acoustics . 45 (3): 247– 261. doi : 10.1016/0003-682X(95)00001-P . ^ Zhiqiang Liang; Jianming Wei; Junyu Zhao; Haitao Liu; Baoqing Li; Jie Shen; Chunlei Zheng (27 August 2008). "The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals" . Sensors . 8 (8): 5106– 5119. Bibcode : 2008Senso...8.5106L . doi : 10.3390/s8085106 . PMC 3705491 . PMID 27873804 . ^ Kenney, J. F., and E. S. Keeping (1951). Mathematics of Statistics Part Two, 2nd edition . D. Van Nostrand Company Inc. {{ cite book }} : CS1 maint: multiple names: authors list ( link ) ^ a b c d Abramowitz, Milton and Irene A. Stegun (1965). Handbook Of Mathematical Functions With Formulas, Graphs, And Mathematical Tables . Dover. ISBN 978-0-486-61272-0 . ^ Weisstein., Eric W. "Kurtosis" . MathWorld--A Wolfram Web Resource . Retrieved 13 August 2012 . ^ a b Panik, Michael J (2005). Advanced Statistics from an Elementary Point of View . Academic Press. ISBN 978-0120884940 . ^ a b c d e f Pearson, Karl (1916). "Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation" . Philosophical Transactions of the Royal Society A . 216 ( 538– 548): 429– 457. Bibcode : 1916RSPTA.216..429P . doi : 10.1098/rsta.1916.0009 . JSTOR 91092 . ^ Gradshteyn, Izrail Solomonovich ; Ryzhik, Iosif Moiseevich ; Geronimus, Yuri Veniaminovich ; Tseytlin, Michail Yulyevich ; Jeffrey, Alan (2015) [October 2014]. Zwillinger, Daniel; Moll, Victor Hugo (eds.). Table of Integrals, Series, and Products . Translated by Scripta Technica, Inc. (8 ed.). Academic Press, Inc. ISBN 978-0-12-384933-5 . LCCN 2014010276 . ^ Billingsley, Patrick (1995). "Section 30: The Method of Moments". Probability and measure (3rd ed.). Wiley-Interscience. ISBN 978-0-471-00710-4 . ^ a b MacKay, David (2003). Information Theory, Inference and Learning Algorithms . Cambridge University Press; First Edition. Bibcode : 2003itil.book.....M . ISBN 978-0521642989 . ^ a b Johnson, N.L. (1949). "Systems of frequency curves generated by methods of translation" (PDF) . Biometrika . 36 ( 1– 2): 149– 176. doi : 10.1093/biomet/36.1-2.149 . hdl : 10338.dmlcz/135506 . PMID 18132090 . ^ Verdugo Lazo, A. C. G.; Rathie, P. N. (1978). "On the entropy of continuous probability distributions". IEEE Trans. Inf. Theory . 24 (1): 120– 122. doi : 10.1109/TIT.1978.1055832 . ^ Shannon, Claude E. (1948). "A Mathematical Theory of Communication". Bell System Technical Journal . 27 (4): 623– 656. doi : 10.1002/j.1538-7305.1948.tb01338.x . ^ a b c Cover, Thomas M. and Joy A. Thomas (2006). Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing) . Wiley-Interscience; 2 edition. ISBN 978-0471241959 . ^ Plunkett, Kim, and Jeffrey Elman (1997). Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations (Neural Network Modeling and Connectionism) . A Bradford Book. p. 166. ISBN 978-0262661058 . {{ cite book }} : CS1 maint: multiple names: authors list ( link ) ^ Nallapati, Ramesh (2006). The smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval (Thesis). Computer Science Dept., University of Massachusetts Amherst. ^ a b Pearson, Egon S. (July 1969). "Some historical reflections traced through the development of the use of frequency curves" . THEMIS Statistical Analysis Research Program, Technical Report 38 . Office of Naval Research, Contract N000014-68-A-0515 (Project NR 042–260). ^ Hahn, Gerald J.; Shapiro, S. (1994). Statistical Models in Engineering (Wiley Classics Library) . Wiley-Interscience. ISBN 978-0471040651 . ^ a b Pearson, Karl (1895). "Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material" . Philosophical Transactions of the Royal Society . 186 : 343– 414. Bibcode : 1895RSPTA.186..343P . doi : 10.1098/rsta.1895.0010 . JSTOR 90649 . ^ Buchanan, K.; Rockway, J.; Sternberg, O.; Mai, N. N. (May 2016). "Sum-difference beamforming for radar applications using circularly tapered random arrays" . 2016 IEEE Radar Conference (RadarConf) . pp. 1– 5. doi : 10.1109/RADAR.2016.7485289 . ISBN 978-1-5090-0863-6 . S2CID 32525626 . ^ Buchanan, K.; Flores, C.; Wheeland, S.; Jensen, J.; Grayson, D.; Huff, G. (May 2017). "Transmit beamforming for radar applications using circularly tapered random arrays". 2017 IEEE Radar Conference (RadarConf) . pp. 0112– 0117. doi : 10.1109/RADAR.2017.7944181 . ISBN 978-1-4673-8823-8 . S2CID 38429370 . ^ Ryan, Buchanan, Kristopher (2014-05-29). "Theory and Applications of Aperiodic (Random) Phased Arrays" . {{ cite web }} : CS1 maint: multiple names: authors list ( link ) ^ Pham-Gia, T. (January 2000). "Distributions of the ratios of independent beta variables and applications" . Communications in Statistics - Theory and Methods . 29 (12): 2693– 2715. doi : 10.1080/03610920008832632 . ISSN 0361-0926 . Retrieved 13 November 2024 . ^ Herrerías-Velasco, José Manuel and Herrerías-Pleguezuelo, Rafael and René van Dorp, Johan. (2011). Revisiting the PERT mean and Variance. European Journal of Operational Research (210), p. 448–451. ^ a b Malcolm, D. G.; Roseboom, J. H.; Clark, C. E.; Fazar, W. (September–October 1958). "Application of a Technique for Research and Development Program Evaluation". Operations Research . 7 (5): 646– 669. doi : 10.1287/opre.7.5.646 . ISSN 0030-364X . ^ a b c d David, H. A., Nagaraja, H. N. (2003) Order Statistics (3rd Edition). Wiley, New Jersey pp 458. ISBN 0-471-38926-9 ^ "1.3.6.6.17. Beta Distribution" . www.itl.nist.gov . ^ a b c d e f g h Elderton, William Palin (1906). Frequency-Curves and Correlation . Charles and Edwin Layton (London). ^ Elderton, William Palin and Norman Lloyd Johnson (2009). Systems of Frequency Curves . Cambridge University Press. ISBN 978-0521093361 . ^ a b c Bowman, K. O. ; Shenton, L. R. (2007). "The beta distribution, moment method, Karl Pearson and R.A. Fisher" (PDF) . Far East J. Theo. Stat . 23 (2): 133– 164. ^ a b Pearson, Karl (June 1936). "Method of moments and method of maximum likelihood". Biometrika . 28 (1/2): 34– 59. doi : 10.2307/2334123 . JSTOR 2334123 . ^ a b c Joanes, D. N.; C. A. Gill (1998). "Comparing measures of sample skewness and kurtosis". The Statistician . 47 (Part 1): 183– 189. doi : 10.1111/1467-9884.00122 . ^ Beckman, R. J.; G. L. Tietjen (1978). "Maximum likelihood estimation for the beta distribution". Journal of Statistical Computation and Simulation . 7 ( 3– 4): 253– 258. doi : 10.1080/00949657808810232 . ^ Gnanadesikan, R., Pinkham and Hughes (1967). "Maximum likelihood estimation of the parameters of the beta distribution from smallest order statistics". Technometrics . 9 (4): 607– 620. doi : 10.2307/1266199 . JSTOR 1266199 . {{ cite journal }} : CS1 maint: multiple names: authors list ( link ) ^ Fackler, Paul. "Inverse Digamma Function (Matlab)" . Harvard University School of Engineering and Applied Sciences . Retrieved 2012-08-18 . ^ a b c Silvey, S.D. (1975). Statistical Inference . Chapman and Hal. p. 40. ISBN 978-0412138201 . ^ Edwards, A. W. F. (1992). Likelihood . The Johns Hopkins University Press. ISBN 978-0801844430 . ^ a b c d e f Jaynes, E.T. (2003). Probability theory, the logic of science . Cambridge University Press. ISBN 978-0521592710 . ^ Costa, Max, and Cover, Thomas (September 1983). On the similarity of the entropy power inequality and the Brunn Minkowski inequality (PDF) . Tech.Report 48, Dept. Statistics, Stanford University. {{ cite book }} : CS1 maint: multiple names: authors list ( link ) ^ a b c Aryal, Gokarna; Saralees Nadarajah (2004). "Information matrix for beta distributions" (PDF) . Serdica Mathematical Journal (Bulgarian Academy of Science) . 30 : 513– 526. ^ a b Laplace, Pierre Simon, marquis de (1902). A philosophical essay on probabilities . New York : J. Wiley; London : Chapman & Hall. ISBN 978-1-60206-328-0 . CS1 maint: multiple names: authors list ( link ) ^ Cox, Richard T. (1961). Algebra of Probable Inference . The Johns Hopkins University Press. ISBN 978-0801869822 . ^ a b Keynes, John Maynard (2010) [1921]. A Treatise on Probability: The Connection Between Philosophy and the History of Science . Wildside Press. ISBN 978-1434406965 . ^ Pearson, Karl (1907). "On the Influence of Past Experience on Future Expectation". Philosophical Magazine . 6 (13): 365– 378. ^ a b c d Jeffreys, Harold (1998). Theory of Probability . Oxford University Press, 3rd edition. ISBN 978-0198503682 . ^ Broad, C. D. (October 1918). "On the relation between induction and probability". MIND, A Quarterly Review of Psychology and Philosophy . 27 (New Series) (108): 389– 404. doi : 10.1093/mind/XXVII.4.389 . JSTOR 2249035 . ^ a b c d Perks, Wilfred (January 1947). "Some observations on inverse probability including a new indifference rule" . Journal of the Institute of Actuaries . 73 (2): 285– 334. doi : 10.1017/S0020268100012270 . Archived from the original on 2014-01-12 . Retrieved 2012-09-19 . ^ a b Bayes, Thomas; communicated by Richard Price (1763). "An Essay towards solving a Problem in the Doctrine of Chances" . Philosophical Transactions of the Royal Society . 53 : 370– 418. doi : 10.1098/rstl.1763.0053 . JSTOR 105741 . ^ Haldane, J. B. S. (1932). "A note on inverse probability". Mathematical Proceedings of the Cambridge Philosophical Society . 28 (1): 55– 61. Bibcode : 1932PCPS...28...55H . doi : 10.1017/s0305004100010495 . S2CID 122773707 . ^ Zellner, Arnold (1971). An Introduction to Bayesian Inference in Econometrics . Wiley-Interscience. ISBN 978-0471169376 . ^ Jeffreys, Harold (September 1946). "An Invariant Form for the Prior Probability in Estimation Problems" . Proceedings of the Royal Society . A 24. 186 (1007): 453– 461. Bibcode : 1946RSPSA.186..453J . doi : 10.1098/rspa.1946.0056 . PMID 20998741 . ^ Berger, James; Bernardo, Jose; Sun, Dongchu (2009). "The formal definition of reference priors" . The Annals of Statistics . 37 (2): 905– 938. arXiv : 0904.0156 . Bibcode : 2009arXiv0904.0156B . doi : 10.1214/07-AOS587 . S2CID 3221355 . ^ Clarke, Bertrand S.; Andrew R. Barron (1994). "Jeffreys' prior is asymptotically least favorable under entropy risk" (PDF) . Journal of Statistical Planning and Inference . 41 : 37– 60. doi : 10.1016/0378-3758(94)90153-8 . ^ Pearson, Karl (1892). The Grammar of Science . Walter Scott, London. ^ Pearson, Karl (2009). The Grammar of Science . BiblioLife. ISBN 978-1110356119 . ^ Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis . Chapman and Hall/CRC. ISBN 978-1584883883 . {{ cite book }} : CS1 maint: multiple names: authors list ( link ) ^ Jøsang, Audun (2001). "A logic for uncertain probabilities" . International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems . 9 (3): 279– 311. doi : 10.1142/S0218488501000831 . MR 1843261 . ^ H.M. de Oliveira and G.A.A. Araújo,. Compactly Supported One-cyclic Wavelets Derived from Beta Distributions. Journal of Communication and Information Systems. vol.20, n.3, pp.27-33, 2005. ^ Balding, David J. ; Nichols, Richard A. (1995). "A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity". Genetica . 96 ( 1– 2). Springer: 3– 12. doi : 10.1007/BF01441146 . PMID 7607457 . S2CID 30680826 . ^ Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 1086–1091. ^ Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609. ^ "Defense Resource Management Institute - Naval Postgraduate School" . www.nps.edu . ^ van der Waerden, B. L., "Mathematical Statistics", Springer, ISBN 978-3-540-04507-6 . ^ On normalizing the incomplete beta-function for fitting to dose-response curves M.E. Wise Biometrika vol 47, No. 1/2, June 1960, pp. 173–175 ^ Pratt, John W. “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, II.” Journal of the American Statistical Association, vol. 63, no. 324, 1968, pp. 1457–83. JSTOR, https://doi.org/10.2307/2285896 . Accessed 21 Oct. 2025. ^ Yule, G. U. ; Filon, L. N. G. (1936). "Karl Pearson. 1857–1936" . Obituary Notices of Fellows of the Royal Society . 2 (5): 72. doi : 10.1098/rsbm.1936.0007 . JSTOR 769130 . ^ "Library and Archive catalogue" . Sackler Digital Archive . Royal Society. Archived from the original on 2011-10-25 . Retrieved 2011-07-01 . ^ David, H. A. and A.W.F. Edwards (2001). Annotated Readings in the History of Statistics . Springer; 1 edition. ISBN 978-0387988443 . ^ Gini, Corrado (1911). "Considerazioni Sulle Probabilità Posteriori e Applicazioni al Rapporto dei Sessi Nelle Nascite Umane". Studi Economico-Giuridici della Università de Cagliari . Anno III (reproduced in Metron 15, 133, 171, 1949): 5– 41. ^ Johnson, Norman L. and Samuel Kotz, ed. (1997). Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present (Wiley Series in Probability and Statistics . Wiley. ISBN 978-0471163817 . ^ Metron journal. "Biography of Corrado Gini" . Metron Journal. Archived from the original on 2012-07-16 . Retrieved 2012-08-18 . "Beta Distribution" by Fiona Maclachlan, the Wolfram Demonstrations Project , 2007. Beta Distribution – Overview and Example , xycoon.com Beta Distribution , brighton-webs.co.uk Beta Distribution Video , exstrom.com "Beta-distribution" , Encyclopedia of Mathematics , EMS Press , 2001 [1994] Weisstein, Eric W. "Beta Distribution" . MathWorld . Harvard University Statistics 110 Lecture 23 Beta Distribution, Prof. Joe Blitzstein

Markdown

[Jump to content](https://en.wikipedia.org/wiki/Beta_distribution#bodyContent) Main menu Main menu move to sidebar hide Navigation - [Main page](https://en.wikipedia.org/wiki/Main_Page "Visit the main page [z]") - [Contents](https://en.wikipedia.org/wiki/Wikipedia:Contents "Guides to browsing Wikipedia") - [Current events](https://en.wikipedia.org/wiki/Portal:Current_events "Articles related to current events") - [Random article](https://en.wikipedia.org/wiki/Special:Random "Visit a randomly selected article [x]") - [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About "Learn about Wikipedia and how it works") - [Contact us](https://en.wikipedia.org/wiki/Wikipedia:Contact_us "How to contact Wikipedia") Contribute - [Help](https://en.wikipedia.org/wiki/Help:Contents "Guidance on how to use and edit Wikipedia") - [Learn to edit](https://en.wikipedia.org/wiki/Help:Introduction "Learn how to edit Wikipedia") - [Community portal](https://en.wikipedia.org/wiki/Wikipedia:Community_portal "The hub for editors") - [Recent changes](https://en.wikipedia.org/wiki/Special:RecentChanges "A list of recent changes to Wikipedia [r]") - [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_upload_wizard "Add images or other media for use on Wikipedia") - [Special pages](https://en.wikipedia.org/wiki/Special:SpecialPages "A list of all special pages [q]") [![](https://en.wikipedia.org/static/images/icons/enwiki-25.svg) ![Wikipedia](https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-en-25.svg) ![The Free Encyclopedia](https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-tagline-en-25.svg)](https://en.wikipedia.org/wiki/Main_Page) [Search](https://en.wikipedia.org/wiki/Special:Search "Search Wikipedia [f]") Appearance - [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en) - [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=Beta+distribution "You are encouraged to create an account and log in; however, it is not mandatory") - [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Beta+distribution "You're encouraged to log in; however, it's not mandatory. [o]") Personal tools - [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en) - [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=Beta+distribution "You are encouraged to create an account and log in; however, it is not mandatory") - [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Beta+distribution "You're encouraged to log in; however, it's not mandatory. [o]") ## Contents move to sidebar hide - [(Top)](https://en.wikipedia.org/wiki/Beta_distribution) - [1 Definitions](https://en.wikipedia.org/wiki/Beta_distribution#Definitions) Toggle Definitions subsection - [1\.1 Probability density function](https://en.wikipedia.org/wiki/Beta_distribution#Probability_density_function) - [1\.2 Cumulative distribution function](https://en.wikipedia.org/wiki/Beta_distribution#Cumulative_distribution_function) - [1\.3 Alternative parameterizations](https://en.wikipedia.org/wiki/Beta_distribution#Alternative_parameterizations) - [1\.3.1 Two parameters](https://en.wikipedia.org/wiki/Beta_distribution#Two_parameters) - [1\.3.1.1 Mean and sample size](https://en.wikipedia.org/wiki/Beta_distribution#Mean_and_sample_size) - [1\.3.1.2 Mode and concentration](https://en.wikipedia.org/wiki/Beta_distribution#Mode_and_concentration) - [1\.3.1.3 Mean and variance](https://en.wikipedia.org/wiki/Beta_distribution#Mean_and_variance) - [1\.3.2 Four parameters](https://en.wikipedia.org/wiki/Beta_distribution#Four_parameters) - [2 Properties](https://en.wikipedia.org/wiki/Beta_distribution#Properties) Toggle Properties subsection - [2\.1 Measures of central tendency](https://en.wikipedia.org/wiki/Beta_distribution#Measures_of_central_tendency) - [2\.1.1 Mode](https://en.wikipedia.org/wiki/Beta_distribution#Mode) - [2\.1.2 Median](https://en.wikipedia.org/wiki/Beta_distribution#Median) - [2\.1.3 Mean](https://en.wikipedia.org/wiki/Beta_distribution#Mean) - [2\.1.4 Geometric mean](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_mean) - [2\.1.5 Harmonic mean](https://en.wikipedia.org/wiki/Beta_distribution#Harmonic_mean) - [2\.2 Measures of statistical dispersion](https://en.wikipedia.org/wiki/Beta_distribution#Measures_of_statistical_dispersion) - [2\.2.1 Variance](https://en.wikipedia.org/wiki/Beta_distribution#Variance) - [2\.2.2 Geometric variance and covariance](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_variance_and_covariance) - [2\.2.3 Mean absolute deviation around the mean](https://en.wikipedia.org/wiki/Beta_distribution#Mean_absolute_deviation_around_the_mean) - [2\.2.4 Mean absolute difference](https://en.wikipedia.org/wiki/Beta_distribution#Mean_absolute_difference) - [2\.3 Skewness](https://en.wikipedia.org/wiki/Beta_distribution#Skewness) - [2\.4 Kurtosis](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis) - [2\.5 Characteristic function](https://en.wikipedia.org/wiki/Beta_distribution#Characteristic_function) - [2\.6 Other moments](https://en.wikipedia.org/wiki/Beta_distribution#Other_moments) - [2\.6.1 Moment generating function](https://en.wikipedia.org/wiki/Beta_distribution#Moment_generating_function) - [2\.6.2 Higher moments](https://en.wikipedia.org/wiki/Beta_distribution#Higher_moments) - [2\.6.3 Moments of transformed random variables](https://en.wikipedia.org/wiki/Beta_distribution#Moments_of_transformed_random_variables) - [2\.6.3.1 Moments of linearly transformed, product and inverted random variables](https://en.wikipedia.org/wiki/Beta_distribution#Moments_of_linearly_transformed,_product_and_inverted_random_variables) - [2\.6.3.2 Moments of logarithmically transformed random variables](https://en.wikipedia.org/wiki/Beta_distribution#Moments_of_logarithmically_transformed_random_variables) - [2\.7 Quantities of information (entropy)](https://en.wikipedia.org/wiki/Beta_distribution#Quantities_of_information_\(entropy\)) - [2\.8 Relationships between statistical measures](https://en.wikipedia.org/wiki/Beta_distribution#Relationships_between_statistical_measures) - [2\.8.1 Mean, mode and median relationship](https://en.wikipedia.org/wiki/Beta_distribution#Mean,_mode_and_median_relationship) - [2\.8.2 Mean, geometric mean and harmonic mean relationship](https://en.wikipedia.org/wiki/Beta_distribution#Mean,_geometric_mean_and_harmonic_mean_relationship) - [2\.8.3 Kurtosis bounded by the square of the skewness](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness) - [2\.9 Symmetry](https://en.wikipedia.org/wiki/Beta_distribution#Symmetry) - [2\.10 Geometry of the probability density function](https://en.wikipedia.org/wiki/Beta_distribution#Geometry_of_the_probability_density_function) - [2\.10.1 Inflection points](https://en.wikipedia.org/wiki/Beta_distribution#Inflection_points) - [2\.10.2 Shapes](https://en.wikipedia.org/wiki/Beta_distribution#Shapes) - [2\.10.2.1 Symmetric (*α* = *β*)](https://en.wikipedia.org/wiki/Beta_distribution#Symmetric_\(%CE%B1_=_%CE%B2\)) - [2\.10.2.2 Skewed (*α* ≠ *β*)](https://en.wikipedia.org/wiki/Beta_distribution#Skewed_\(%CE%B1_%E2%89%A0_%CE%B2\)) - [3 Related distributions](https://en.wikipedia.org/wiki/Beta_distribution#Related_distributions) Toggle Related distributions subsection - [3\.1 Transformations](https://en.wikipedia.org/wiki/Beta_distribution#Transformations) - [3\.2 Special and limiting cases](https://en.wikipedia.org/wiki/Beta_distribution#Special_and_limiting_cases) - [3\.3 Derived from other distributions](https://en.wikipedia.org/wiki/Beta_distribution#Derived_from_other_distributions) - [3\.4 Combination with other distributions](https://en.wikipedia.org/wiki/Beta_distribution#Combination_with_other_distributions) - [3\.5 Compounding with other distributions](https://en.wikipedia.org/wiki/Beta_distribution#Compounding_with_other_distributions) - [3\.6 Generalisations](https://en.wikipedia.org/wiki/Beta_distribution#Generalisations) - [4 Statistical inference](https://en.wikipedia.org/wiki/Beta_distribution#Statistical_inference) Toggle Statistical inference subsection - [4\.1 Parameter estimation](https://en.wikipedia.org/wiki/Beta_distribution#Parameter_estimation) - [4\.1.1 Method of moments](https://en.wikipedia.org/wiki/Beta_distribution#Method_of_moments) - [4\.1.1.1 Two unknown parameters](https://en.wikipedia.org/wiki/Beta_distribution#Two_unknown_parameters) - [4\.1.1.2 Four unknown parameters](https://en.wikipedia.org/wiki/Beta_distribution#Four_unknown_parameters) - [4\.1.2 Maximum likelihood](https://en.wikipedia.org/wiki/Beta_distribution#Maximum_likelihood) - [4\.1.2.1 Two unknown parameters](https://en.wikipedia.org/wiki/Beta_distribution#Two_unknown_parameters_2) - [4\.1.2.2 Four unknown parameters](https://en.wikipedia.org/wiki/Beta_distribution#Four_unknown_parameters_2) - [4\.1.3 Fisher information matrix](https://en.wikipedia.org/wiki/Beta_distribution#Fisher_information_matrix) - [4\.1.3.1 Two parameters](https://en.wikipedia.org/wiki/Beta_distribution#Two_parameters_2) - [4\.1.3.2 Four parameters](https://en.wikipedia.org/wiki/Beta_distribution#Four_parameters_2) - [4\.2 Bayesian inference](https://en.wikipedia.org/wiki/Beta_distribution#Bayesian_inference) - [4\.2.1 Rule of succession](https://en.wikipedia.org/wiki/Beta_distribution#Rule_of_succession) - [4\.2.2 Bayes–Laplace prior probability (Beta(1,1))](https://en.wikipedia.org/wiki/Beta_distribution#Bayes%E2%80%93Laplace_prior_probability_\(Beta\(1,1\)\)) - [4\.2.3 Haldane's prior probability (Beta(0,0))](https://en.wikipedia.org/wiki/Beta_distribution#Haldane's_prior_probability_\(Beta\(0,0\)\)) - [4\.2.4 Jeffreys' prior probability (Beta(1/2,1/2) for a Bernoulli or for a binomial distribution)](https://en.wikipedia.org/wiki/Beta_distribution#Jeffreys'_prior_probability_\(Beta\(1/2,1/2\)_for_a_Bernoulli_or_for_a_binomial_distribution\)) - [4\.2.5 Effect of different prior probability choices on the posterior beta distribution](https://en.wikipedia.org/wiki/Beta_distribution#Effect_of_different_prior_probability_choices_on_the_posterior_beta_distribution) - [5 Occurrence and applications](https://en.wikipedia.org/wiki/Beta_distribution#Occurrence_and_applications) Toggle Occurrence and applications subsection - [5\.1 Order statistics](https://en.wikipedia.org/wiki/Beta_distribution#Order_statistics) - [5\.2 Subjective logic](https://en.wikipedia.org/wiki/Beta_distribution#Subjective_logic) - [5\.3 Wavelet analysis](https://en.wikipedia.org/wiki/Beta_distribution#Wavelet_analysis) - [5\.4 Population genetics](https://en.wikipedia.org/wiki/Beta_distribution#Population_genetics) - [5\.5 Project management: task cost and schedule modeling](https://en.wikipedia.org/wiki/Beta_distribution#Project_management:_task_cost_and_schedule_modeling) - [6 Random variate generation](https://en.wikipedia.org/wiki/Beta_distribution#Random_variate_generation) - [7 Normal approximation to the Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution#Normal_approximation_to_the_Beta_distribution) - [8 History](https://en.wikipedia.org/wiki/Beta_distribution#History) - [9 References](https://en.wikipedia.org/wiki/Beta_distribution#References) - [10 External links](https://en.wikipedia.org/wiki/Beta_distribution#External_links) Toggle the table of contents # Beta distribution 27 languages - [العربية](https://ar.wikipedia.org/wiki/%D8%AA%D9%88%D8%B2%D9%8A%D8%B9_%D8%A8%D9%8A%D8%AA%D8%A7 "توزيع بيتا – Arabic") - [Беларуская](https://be.wikipedia.org/wiki/%D0%91%D1%8D%D1%82%D0%B0-%D1%80%D0%B0%D0%B7%D0%BC%D0%B5%D1%80%D0%BA%D0%B0%D0%B2%D0%B0%D0%BD%D0%BD%D0%B5 "Бэта-размеркаванне – Belarusian") - [Català](https://ca.wikipedia.org/wiki/Distribuci%C3%B3_beta "Distribució beta – Catalan") - [Čeština](https://cs.wikipedia.org/wiki/Rozd%C4%9Blen%C3%AD_beta "Rozdělení beta – Czech") - [Deutsch](https://de.wikipedia.org/wiki/Beta-Verteilung "Beta-Verteilung – German") - [Español](https://es.wikipedia.org/wiki/Distribuci%C3%B3n_beta "Distribución beta – Spanish") - [فارسی](https://fa.wikipedia.org/wiki/%D8%AA%D9%88%D8%B2%DB%8C%D8%B9_%D8%A8%D8%AA%D8%A7 "توزیع بتا – Persian") - [Suomi](https://fi.wikipedia.org/wiki/Beta-jakauma "Beta-jakauma – Finnish") - [Français](https://fr.wikipedia.org/wiki/Loi_b%C3%AAta "Loi bêta – French") - [Galego](https://gl.wikipedia.org/wiki/Distribuci%C3%B3n_beta "Distribución beta – Galician") - [עברית](https://he.wikipedia.org/wiki/%D7%94%D7%AA%D7%A4%D7%9C%D7%92%D7%95%D7%AA_%D7%91%D7%98%D7%90 "התפלגות בטא – Hebrew") - [Magyar](https://hu.wikipedia.org/wiki/B%C3%A9ta-eloszl%C3%A1s "Béta-eloszlás – Hungarian") - [Italiano](https://it.wikipedia.org/wiki/Distribuzione_Beta "Distribuzione Beta – Italian") - [日本語](https://ja.wikipedia.org/wiki/%E3%83%99%E3%83%BC%E3%82%BF%E5%88%86%E5%B8%83 "ベータ分布 – Japanese") - [ქართული](https://ka.wikipedia.org/wiki/%E1%83%91%E1%83%94%E1%83%A2%E1%83%90_%E1%83%92%E1%83%90%E1%83%9C%E1%83%90%E1%83%AC%E1%83%98%E1%83%9A%E1%83%94%E1%83%91%E1%83%90 "ბეტა განაწილება – Georgian") - [한국어](https://ko.wikipedia.org/wiki/%EB%B2%A0%ED%83%80_%EB%B6%84%ED%8F%AC "베타 분포 – Korean") - [Nederlands](https://nl.wikipedia.org/wiki/B%C3%A8taverdeling "Bètaverdeling – Dutch") - [Polski](https://pl.wikipedia.org/wiki/Rozk%C5%82ad_beta "Rozkład beta – Polish") - [Português](https://pt.wikipedia.org/wiki/Distribui%C3%A7%C3%A3o_beta "Distribuição beta – Portuguese") - [Русский](https://ru.wikipedia.org/wiki/%D0%91%D0%B5%D1%82%D0%B0-%D1%80%D0%B0%D1%81%D0%BF%D1%80%D0%B5%D0%B4%D0%B5%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5 "Бета-распределение – Russian") - [Slovenščina](https://sl.wikipedia.org/wiki/Porazdelitev_beta "Porazdelitev beta – Slovenian") - [Sunda](https://su.wikipedia.org/wiki/Sebaran_b%C3%A9ta "Sebaran béta – Sundanese") - [Svenska](https://sv.wikipedia.org/wiki/Betaf%C3%B6rdelning "Betafördelning – Swedish") - [Tagalog](https://tl.wikipedia.org/wiki/Distribusyong_Beta "Distribusyong Beta – Tagalog") - [Türkçe](https://tr.wikipedia.org/wiki/Beta_da%C4%9F%C4%B1l%C4%B1m%C4%B1 "Beta dağılımı – Turkish") - [Українська](https://uk.wikipedia.org/wiki/%D0%91%D0%B5%D1%82%D0%B0-%D1%80%D0%BE%D0%B7%D0%BF%D0%BE%D0%B4%D1%96%D0%BB "Бета-розподіл – Ukrainian") - [中文](https://zh.wikipedia.org/wiki/%CE%92%E5%88%86%E5%B8%83 "Β分布 – Chinese") [Edit links](https://www.wikidata.org/wiki/Special:EntityPage/Q756254#sitelinks-wikipedia "Edit interlanguage links") - [Article](https://en.wikipedia.org/wiki/Beta_distribution "View the content page [c]") - [Talk](https://en.wikipedia.org/wiki/Talk:Beta_distribution "Discuss improvements to the content page [t]") English - [Read](https://en.wikipedia.org/wiki/Beta_distribution) - [Edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit "Edit this page [e]") - [View history](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=history "Past revisions of this page [h]") Tools Tools move to sidebar hide Actions - [Read](https://en.wikipedia.org/wiki/Beta_distribution) - [Edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit "Edit this page [e]") - [View history](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=history) General - [What links here](https://en.wikipedia.org/wiki/Special:WhatLinksHere/Beta_distribution "List of all English Wikipedia pages containing links to this page [j]") - [Related changes](https://en.wikipedia.org/wiki/Special:RecentChangesLinked/Beta_distribution "Recent changes in pages linked from this page [k]") - [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_Upload_Wizard "Upload files [u]") - [Permanent link](https://en.wikipedia.org/w/index.php?title=Beta_distribution&oldid=1345309542 "Permanent link to this revision of this page") - [Page information](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=info "More information about this page") - [Cite this page](https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&page=Beta_distribution&id=1345309542&wpFormIdentifier=titleform "Information on how to cite this page") - [Get shortened URL](https://en.wikipedia.org/w/index.php?title=Special:UrlShortener&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBeta_distribution) Print/export - [Download as PDF](https://en.wikipedia.org/w/index.php?title=Special:DownloadAsPdf&page=Beta_distribution&action=show-download-screen "Download this page as a PDF file") - [Printable version](https://en.wikipedia.org/w/index.php?title=Beta_distribution&printable=yes "Printable version of this page [p]") In other projects - [Wikimedia Commons](https://commons.wikimedia.org/wiki/Category:Beta_distribution) - [Wikidata item](https://www.wikidata.org/wiki/Special:EntityPage/Q756254 "Structured data on this page hosted by Wikidata [g]") Appearance move to sidebar hide From Wikipedia, the free encyclopedia Probability distribution Not to be confused with [Beta function](https://en.wikipedia.org/wiki/Beta_function "Beta function"). | Beta | | |---|---| | Probability density function[![Probability density function for the beta distribution](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Beta_distribution_pdf.svg/330px-Beta_distribution_pdf.svg.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_pdf.svg "Probability density function for the beta distribution") | | | Cumulative distribution function[![Cumulative distribution function for the beta distribution](https://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Beta_distribution_cdf.svg/330px-Beta_distribution_cdf.svg.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_cdf.svg "Cumulative distribution function for the beta distribution") | | | Notation | Beta(*α*, *β*) | | [Parameters](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") | *α* \> 0 [shape](https://en.wikipedia.org/wiki/Shape_parameter "Shape parameter") ([real](https://en.wikipedia.org/wiki/Real_number "Real number")) *β* \> 0 [shape](https://en.wikipedia.org/wiki/Shape_parameter "Shape parameter") ([real](https://en.wikipedia.org/wiki/Real_number "Real number")) | | [Support](https://en.wikipedia.org/wiki/Support_\(mathematics\) "Support (mathematics)") | x ∈ \[ 0 , 1 \] {\\displaystyle x\\in \[0,1\]\\!} ![{\\displaystyle x\\in \[0,1\]\\!}](https://wikimedia.org/api/rest_v1/media/math/render/svg/09601f74a28f3e2cad381be1a915ab0c02fe39c6) or x ∈ ( 0 , 1 ) {\\displaystyle x\\in (0,1)\\!} ![{\\displaystyle x\\in (0,1)\\!}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6c4bd4921b023da2cf81472604e1583c7526af1d) | In [probability theory](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **beta distribution** is a family of continuous [probability distributions](https://en.wikipedia.org/wiki/Probability_distribution "Probability distribution") defined on the interval \[0, 1\] or (0, 1) in terms of two positive [parameters](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter"), denoted by *alpha* (*α*) and *beta* (*β*), that appear as exponents of the variable and its complement to 1, respectively, and control the [shape](https://en.wikipedia.org/wiki/Shape_parameter "Shape parameter") of the distribution. The beta distribution has been applied to model the behavior of [random variables](https://en.wikipedia.org/wiki/Random_variables "Random variables") limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions. In [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference"), the beta distribution is the [conjugate prior probability distribution](https://en.wikipedia.org/wiki/Conjugate_prior_distribution "Conjugate prior distribution") for the [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), [binomial](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"), [negative binomial](https://en.wikipedia.org/wiki/Negative_binomial_distribution "Negative binomial distribution"), and [geometric](https://en.wikipedia.org/wiki/Geometric_distribution "Geometric distribution") distributions. The formulation of the beta distribution discussed here is also known as the **beta distribution of the first kind**, whereas *beta distribution of the second kind* is an alternative name for the [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution"). The generalization to multiple variables is called a [Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution "Dirichlet distribution"). ## Definitions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=1 "Edit section: Definitions")\] ### Probability density function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=2 "Edit section: Probability density function")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/PDF_of_the_Beta_distribution.gif/250px-PDF_of_the_Beta_distribution.gif)](https://en.wikipedia.org/wiki/File:PDF_of_the_Beta_distribution.gif) An animation of the beta distribution for different values of its parameters. The [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") (PDF) of the beta distribution, for 0 ≤ x ≤ 1 {\\displaystyle 0\\leq x\\leq 1} ![{\\displaystyle 0\\leq x\\leq 1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/30810e06ad49f3a837bd2193d4392eda1f74e7ab) or 0 \< x \< 1 {\\displaystyle 0\<x\<1} ![{\\displaystyle 0\<x\<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a440e33e5630b5f22cd3ca24cfdf85f56965ac8f), and shape parameters α {\\displaystyle \\alpha } ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3), β \> 0 {\\displaystyle \\beta \>0} ![{\\displaystyle \\beta \>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4a87dc52878418173659e6d0ff8e77ab2897eac9), is a [power function](https://en.wikipedia.org/wiki/Power_function "Power function") of the variable x {\\displaystyle x} ![{\\displaystyle x}](https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4) and of its [reflection](https://en.wikipedia.org/wiki/Reflection_formula "Reflection formula") ( 1 − x ) {\\displaystyle (1-x)} ![{\\displaystyle (1-x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0539aebb17a9b292910a1a750233eed6c569cd00) as follows: f ( x ; α , β ) \= c o n s t a n t ⋅ x α − 1 ( 1 − x ) β − 1 \= x α − 1 ( 1 − x ) β − 1 ∫ 0 1 u α − 1 ( 1 − u ) β − 1 d u \= Γ ( α \+ β ) Γ ( α ) Γ ( β ) x α − 1 ( 1 − x ) β − 1 \= 1 B ( α , β ) x α − 1 ( 1 − x ) β − 1 {\\displaystyle {\\begin{aligned}f(x;\\alpha ,\\beta )&=\\mathrm {constant} \\cdot x^{\\alpha -1}(1-x)^{\\beta -1}\\\\\[3pt\]&={\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\displaystyle \\int \_{0}^{1}u^{\\alpha -1}(1-u)^{\\beta -1}\\,du}}\\\\\[6pt\]&={\\frac {\\Gamma (\\alpha +\\beta )}{\\Gamma (\\alpha )\\Gamma (\\beta )}}\\,x^{\\alpha -1}(1-x)^{\\beta -1}\\\\\[6pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}x^{\\alpha -1}(1-x)^{\\beta -1}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}f(x;\\alpha ,\\beta )&=\\mathrm {constant} \\cdot x^{\\alpha -1}(1-x)^{\\beta -1}\\\\\[3pt\]&={\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\displaystyle \\int \_{0}^{1}u^{\\alpha -1}(1-u)^{\\beta -1}\\,du}}\\\\\[6pt\]&={\\frac {\\Gamma (\\alpha +\\beta )}{\\Gamma (\\alpha )\\Gamma (\\beta )}}\\,x^{\\alpha -1}(1-x)^{\\beta -1}\\\\\[6pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}x^{\\alpha -1}(1-x)^{\\beta -1}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5fc18388353b219c482e8e35ca4aae808ab1be81) where Γ ( z ) {\\displaystyle \\Gamma (z)} ![{\\displaystyle \\Gamma (z)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/11ca17f880240539116aac7e6326909299e2a080) is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"). The [beta function](https://en.wikipedia.org/wiki/Beta_function "Beta function"), B {\\displaystyle \\mathrm {B} } ![{\\displaystyle \\mathrm {B} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/93003d072991ba424a73ed1e081afe55c124b6ce), is a [normalization constant](https://en.wikipedia.org/wiki/Normalization_constant "Normalization constant") to ensure that the total probability is 1. In the above equations x {\\displaystyle x} ![{\\displaystyle x}](https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4) is a [realization](https://en.wikipedia.org/wiki/Realization_\(probability\) "Realization (probability)")—an observed value that actually occurred—of a [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") X {\\displaystyle X} ![{\\displaystyle X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68baa052181f707c662844a465bfeeb135e82bab). Several authors, including [N. L. Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S. Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz"),[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) use the symbols p {\\displaystyle p} ![{\\displaystyle p}](https://wikimedia.org/api/rest_v1/media/math/render/svg/81eac1e205430d1f40810df36a0edffdc367af36) and q {\\displaystyle q} ![{\\displaystyle q}](https://wikimedia.org/api/rest_v1/media/math/render/svg/06809d64fa7c817ffc7e323f85997f783dbdf71d) (instead of α {\\displaystyle \\alpha } ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β {\\displaystyle \\beta } ![{\\displaystyle \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ed48a5e36207156fb792fa79d29925d2f7901e8)) for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters α {\\displaystyle \\alpha } ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β {\\displaystyle \\beta } ![{\\displaystyle \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ed48a5e36207156fb792fa79d29925d2f7901e8) approach zero. In the following, a random variable X {\\displaystyle X} ![{\\displaystyle X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68baa052181f707c662844a465bfeeb135e82bab) beta-distributed with parameters α {\\displaystyle \\alpha } ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β {\\displaystyle \\beta } ![{\\displaystyle \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ed48a5e36207156fb792fa79d29925d2f7901e8) will be denoted by:[\[2\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2)[\[3\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2011-3) X ∼ Beta ⁡ ( α , β ) {\\displaystyle X\\sim \\operatorname {Beta} (\\alpha ,\\beta )} ![{\\displaystyle X\\sim \\operatorname {Beta} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/36783d6420752d49ce41b434457741100627c50a) Other notations for beta-distributed random variables used in the statistical literature are X ∼ B e ( α , β ) {\\displaystyle X\\sim {\\mathcal {B}}e(\\alpha ,\\beta )} ![{\\displaystyle X\\sim {\\mathcal {B}}e(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/56a8faad7cb5575778caa99bdb3a393dbc22f42a)[\[4\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BergerDecisionTheory-4) and X ∼ β α , β {\\displaystyle X\\sim \\beta \_{\\alpha ,\\beta }} ![{\\displaystyle X\\sim \\beta \_{\\alpha ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cda49e80784b8a7df54ac2a81b26c67d9a01831a).[\[5\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Feller-5) ### Cumulative distribution function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=3 "Edit section: Cumulative distribution function")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg/250px-CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg) CDF for symmetric beta distribution vs. *x* and *α* = *β* [![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg/250px-CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg) CDF for skewed beta distribution vs. *x* and *β* = 5*α* The [cumulative distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function "Cumulative distribution function") is F ( x ; α , β ) \= B ( x ; α , β ) B ( α , β ) \= I x ( α , β ) {\\displaystyle F(x;\\alpha ,\\beta )={\\frac {\\mathrm {B} {}(x;\\alpha ,\\beta )}{\\mathrm {B} {}(\\alpha ,\\beta )}}=I\_{x}(\\alpha ,\\beta )} ![{\\displaystyle F(x;\\alpha ,\\beta )={\\frac {\\mathrm {B} {}(x;\\alpha ,\\beta )}{\\mathrm {B} {}(\\alpha ,\\beta )}}=I\_{x}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4ef58bb8473944bfb8efa7f5477fb7201d39ac21) where B ( x ; α , β ) {\\displaystyle \\mathrm {B} (x;\\alpha ,\\beta )} ![{\\displaystyle \\mathrm {B} (x;\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/95174df7b06d48c98cf8c754e2964784f71f1530) is the [incomplete beta function](https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function "Beta function") and I x ( α , β ) {\\displaystyle I\_{x}(\\alpha ,\\beta )} ![{\\displaystyle I\_{x}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/663054c3f3dc36c0c8c445386d9e52aea7e26b07) is the [regularized incomplete beta function](https://en.wikipedia.org/wiki/Regularized_incomplete_beta_function "Regularized incomplete beta function"). For positive integers *α* and *β*, the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") with[\[6\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-6) F beta ( x ; α , β ) \= F binomial ( β − 1 ; α \+ β − 1 , 1 − x ) . {\\displaystyle F\_{\\text{beta}}(x;\\alpha ,\\beta )=F\_{\\text{binomial}}(\\beta -1;\\alpha +\\beta -1,1-x).} ![{\\displaystyle F\_{\\text{beta}}(x;\\alpha ,\\beta )=F\_{\\text{binomial}}(\\beta -1;\\alpha +\\beta -1,1-x).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/38f787a104f98d5db23c517a28454d8fa997c134) ### Alternative parameterizations \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=4 "Edit section: Alternative parameterizations")\] #### Two parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=5 "Edit section: Two parameters")\] ##### Mean and sample size \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=6 "Edit section: Mean and sample size")\] The beta distribution may also be reparameterized in terms of its mean *μ* (0 \< *μ* \< 1) and the sum of the two shape parameters *ν* = *α* + *β* \> 0([\[3\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2011-3) p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = *ν* = *α*·Posterior + *β*·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = *α*·Posterior + *β* Posterior − 2, or *ν* = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section [Bayesian inference](https://en.wikipedia.org/wiki/Beta_distribution#Bayesian_inference) for further details.) *ν* = *α* + *β* is referred to as the "sample size" of a beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem. This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ *θ* ≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters *α* and *β* via[\[3\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2011-3) *α* = *μν*, *β* = (1 − *μ*)*ν* Under this [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter"), one may place an [uninformative prior](https://en.wikipedia.org/wiki/Uninformative_prior "Uninformative prior") probability over the mean, and a vague prior probability (such as an [exponential](https://en.wikipedia.org/wiki/Exponential_distribution "Exponential distribution") or [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution")) over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it. ##### Mode and concentration \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=7 "Edit section: Mode and concentration")\] [Concave](https://en.wikipedia.org/wiki/Concave_function "Concave function") beta distributions, which have α , β \> 1 {\\displaystyle \\alpha ,\\beta \>1} ![{\\displaystyle \\alpha ,\\beta \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc3f33fc553c096bb6e12987a13ab58edef863b6), can be parametrized in terms of mode and "concentration". The mode, ω \= α − 1 α \+ β − 2 {\\displaystyle \\omega ={\\frac {\\alpha -1}{\\alpha +\\beta -2}}} ![{\\displaystyle \\omega ={\\frac {\\alpha -1}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3af2c8c28b40e585a0ee03663cd5e9a0a9663303), and concentration, κ \= α \+ β {\\displaystyle \\kappa =\\alpha +\\beta } ![{\\displaystyle \\kappa =\\alpha +\\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/083a5361b63ff338f0f521f743abe6c497967dbb), can be used to define the usual shape parameters as follows:[\[7\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2015-7) α \= ω ( κ − 2 ) \+ 1 β \= ( 1 − ω ) ( κ − 2 ) \+ 1 {\\displaystyle {\\begin{aligned}\\alpha &=\\omega (\\kappa -2)+1\\\\\\beta &=(1-\\omega )(\\kappa -2)+1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha &=\\omega (\\kappa -2)+1\\\\\\beta &=(1-\\omega )(\\kappa -2)+1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2f0609418607d04de71bfcf1aaccbe653ebc97dd) For the mode, 0 \< ω \< 1 {\\displaystyle 0\<\\omega \<1} ![{\\displaystyle 0\<\\omega \<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/31c81e865b4e001c098ca75a143060ec563cfbf3), to be well-defined, we need α , β \> 1 {\\displaystyle \\alpha ,\\beta \>1} ![{\\displaystyle \\alpha ,\\beta \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc3f33fc553c096bb6e12987a13ab58edef863b6), or equivalently κ \> 2 {\\displaystyle \\kappa \>2} ![{\\displaystyle \\kappa \>2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4e53f495cab81087d61f1a1efc9e5bbbb91e3632). If instead we define the concentration as c \= α \+ β − 2 {\\displaystyle c=\\alpha +\\beta -2} ![{\\displaystyle c=\\alpha +\\beta -2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c8ddb6cd3165b37fc58796c628e74e2b8028c57c), the condition simplifies to c \> 0 {\\displaystyle c\>0} ![{\\displaystyle c\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2ba126f626d61752f62eaacaf11761a54de4dc84) and the beta density at α \= 1 \+ c ω {\\displaystyle \\alpha =1+c\\omega } ![{\\displaystyle \\alpha =1+c\\omega }](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba7c36ac271ae5f8221d66e2ad723511bd7e8c97) and β \= 1 \+ c ( 1 − ω ) {\\displaystyle \\beta =1+c(1-\\omega )} ![{\\displaystyle \\beta =1+c(1-\\omega )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f2b5f8667e2d553e1b907808d7ba1a77f455c81f) can be written as: f ( x ; ω , c ) \= x c ω ( 1 − x ) c ( 1 − ω ) B ( 1 \+ c ω , 1 \+ c ( 1 − ω ) ) {\\displaystyle f(x;\\omega ,c)={\\frac {x^{c\\omega }(1-x)^{c(1-\\omega )}}{\\mathrm {B} {\\bigl (}1+c\\omega ,1+c(1-\\omega ){\\bigr )}}}} ![{\\displaystyle f(x;\\omega ,c)={\\frac {x^{c\\omega }(1-x)^{c(1-\\omega )}}{\\mathrm {B} {\\bigl (}1+c\\omega ,1+c(1-\\omega ){\\bigr )}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2d3157f6c9c92fb94a2058c8ee7a01b607676c9c) where c {\\displaystyle c} ![{\\displaystyle c}](https://wikimedia.org/api/rest_v1/media/math/render/svg/86a67b81c2de995bd608d5b2df50cd8cd7d92455) directly scales the [sufficient statistics](https://en.wikipedia.org/wiki/Sufficient_statistics "Sufficient statistics"), log ⁡ ( x ) {\\displaystyle \\log(x)} ![{\\displaystyle \\log(x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4157d3b51ac7b147fca145d431d58ec92abc1f70) and log ⁡ ( 1 − x ) {\\displaystyle \\log(1-x)} ![{\\displaystyle \\log(1-x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d88b5d2afd979fe80e61f8c9c217d7801c810160). Note also that in the limit, c → 0 {\\displaystyle c\\to 0} ![{\\displaystyle c\\to 0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/daa7595054d0a6c13cd4431f85cb517c857a2109), the distribution becomes flat. ##### Mean and variance \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=8 "Edit section: Mean and variance")\] Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters *α* and *β*, one can express the *α* and *β* parameters in terms of the mean (*μ*) and the variance (var): ν \= α \+ β \= μ ( 1 − μ ) v a r − 1 , where ν \= ( α \+ β ) \> 0 , therefore: var \< μ ( 1 − μ ) α \= μ ν \= μ ( μ ( 1 − μ ) var − 1 ) , if var \< μ ( 1 − μ ) β \= ( 1 − μ ) ν \= ( 1 − μ ) ( μ ( 1 − μ ) var − 1 ) , if var \< μ ( 1 − μ ) . {\\displaystyle {\\begin{aligned}\\nu &=\\alpha +\\beta ={\\frac {\\mu (1-\\mu )}{\\mathrm {var} }}-1,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,{\\text{ therefore: }}{\\text{var}}\<\\mu (1-\\mu )\\\\\\alpha &=\\mu \\nu =\\mu \\left({\\frac {\\mu (1-\\mu )}{\\text{var}}}-1\\right),{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )\\\\\\beta &=(1-\\mu )\\nu =(1-\\mu )\\left({\\frac {\\mu (1-\\mu )}{\\text{var}}}-1\\right),{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu ).\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\nu &=\\alpha +\\beta ={\\frac {\\mu (1-\\mu )}{\\mathrm {var} }}-1,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,{\\text{ therefore: }}{\\text{var}}\<\\mu (1-\\mu )\\\\\\alpha &=\\mu \\nu =\\mu \\left({\\frac {\\mu (1-\\mu )}{\\text{var}}}-1\\right),{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )\\\\\\beta &=(1-\\mu )\\nu =(1-\\mu )\\left({\\frac {\\mu (1-\\mu )}{\\text{var}}}-1\\right),{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a68ffe42cadd81968db6bc0e75ee9c8b739c2416) This [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters *α* and *β*. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance: [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg/330px-Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg/330px-Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/02/Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8f/Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg) #### Four parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=9 "Edit section: Four parameters")\] A beta distribution with the two shape parameters *α* and *β* is supported on the range \[0,1\] or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, *a*, and maximum *c* (*c* \> *a*), values of the distribution,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) by a linear transformation substituting the non-dimensional variable *x* in terms of the new variable *y* (with support \[*a*,*c*\] or (*a*,*c*)) and the parameters *a* and *c*: y \= x ( c − a ) \+ a , therefore x \= y − a c − a . {\\displaystyle y=x(c-a)+a,{\\text{ therefore }}x={\\frac {y-a}{c-a}}.} ![{\\displaystyle y=x(c-a)+a,{\\text{ therefore }}x={\\frac {y-a}{c-a}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8570165db81ac45831ccffe1b07b63fb5292fecc) The [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (*c* − *a*), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows: f ( y ; α , β , a , c ) \= f ( x ; α , β ) c − a \= ( y − a c − a ) α − 1 ( c − y c − a ) β − 1 ( c − a ) B ( α , β ) \= ( y − a ) α − 1 ( c − y ) β − 1 ( c − a ) α \+ β − 1 B ( α , β ) . {\\displaystyle {\\begin{aligned}f(y;\\alpha ,\\beta ,a,c)={\\frac {f(x;\\alpha ,\\beta )}{c-a}}&={\\frac {\\left({\\frac {y-a}{c-a}}\\right)^{\\alpha -1}\\left({\\frac {c-y}{c-a}}\\right)^{\\beta -1}}{(c-a)B(\\alpha ,\\beta )}}\\\\\[1ex\]&={\\frac {(y-a)^{\\alpha -1}(c-y)^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}B(\\alpha ,\\beta )}}.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}f(y;\\alpha ,\\beta ,a,c)={\\frac {f(x;\\alpha ,\\beta )}{c-a}}&={\\frac {\\left({\\frac {y-a}{c-a}}\\right)^{\\alpha -1}\\left({\\frac {c-y}{c-a}}\\right)^{\\beta -1}}{(c-a)B(\\alpha ,\\beta )}}\\\\\[1ex\]&={\\frac {(y-a)^{\\alpha -1}(c-y)^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}B(\\alpha ,\\beta )}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ebfbb9c4da37593762747522d2d91a4ca72e0011) That a random variable *Y* is beta-distributed with four parameters *α*, *β*, *a*, and *c* will be denoted by: Y ∼ Beta ⁡ ( α , β , a , c ) . {\\displaystyle Y\\sim \\operatorname {Beta} (\\alpha ,\\beta ,a,c).} ![{\\displaystyle Y\\sim \\operatorname {Beta} (\\alpha ,\\beta ,a,c).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/085ce268909196f9aff5e61f3dca6e308bdb4797) Some measures of central location are scaled (by (*c* − *a*)) and shifted (by *a*), as follows: μ Y \= μ X ( c − a ) \+ a \= α α \+ β ( c − a ) \+ a \= α c \+ β a α \+ β {\\displaystyle {\\begin{aligned}\\mu \_{Y}&=\\mu \_{X}(c-a)+a\\\\\[1ex\]&={\\frac {\\alpha }{\\alpha +\\beta }}\\left(c-a\\right)+a={\\frac {\\alpha c+\\beta a}{\\alpha +\\beta }}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\mu \_{Y}&=\\mu \_{X}(c-a)+a\\\\\[1ex\]&={\\frac {\\alpha }{\\alpha +\\beta }}\\left(c-a\\right)+a={\\frac {\\alpha c+\\beta a}{\\alpha +\\beta }}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/86a8b4fe30b5075b8c038d5b5b3e1f6ee8e5963f) mode ( Y ) \= mode ( X ) ( c − a ) \+ a \= α − 1 α \+ β − 2 ( c − a ) \+ a \= ( α − 1 ) c \+ ( β − 1 ) a α \+ β − 2 , if α , β \> 1 {\\displaystyle {\\begin{aligned}{\\text{mode}}(Y)&={\\text{mode}}(X)(c-a)+a\\\\\[1ex\]&={\\frac {\\alpha -1}{\\alpha +\\beta -2}}\\left(c-a\\right)+a\\\\\[1ex\]&={\\frac {(\\alpha -1)c+(\\beta -1)a}{\\alpha +\\beta -2}}\\ ,&{\\text{ if }}\\alpha ,\\,\\beta \>1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\text{mode}}(Y)&={\\text{mode}}(X)(c-a)+a\\\\\[1ex\]&={\\frac {\\alpha -1}{\\alpha +\\beta -2}}\\left(c-a\\right)+a\\\\\[1ex\]&={\\frac {(\\alpha -1)c+(\\beta -1)a}{\\alpha +\\beta -2}}\\ ,&{\\text{ if }}\\alpha ,\\,\\beta \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/768c42362dbb2d2904c218dcfc6df1de62b5f635) median ( Y ) \= median ( X ) ( c − a ) \+ a \= I 1 2 \[ − 1 \] ( α , β ) ( c − a ) \+ a {\\displaystyle {\\begin{aligned}{\\text{median}}(Y)&={\\text{median}}(X)(c-a)+a\\\\\[1ex\]&=I\_{\\frac {1}{2}}^{\[-1\]}(\\alpha ,\\beta )\\left(c-a\\right)+a\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\text{median}}(Y)&={\\text{median}}(X)(c-a)+a\\\\\[1ex\]&=I\_{\\frac {1}{2}}^{\[-1\]}(\\alpha ,\\beta )\\left(c-a\\right)+a\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9b039773a1da5743c1e27109447176e3cfe1925e) Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can. The shape parameters of *Y* can be written in term of its mean and variance as α \= ( a − μ Y ) ( a c − a μ Y − c μ Y \+ μ Y 2 \+ σ Y 2 ) σ Y 2 ( c − a ) β \= − ( c − μ Y ) ( a c − a μ Y − c μ Y \+ μ Y 2 \+ σ Y 2 ) σ Y 2 ( c − a ) {\\displaystyle {\\begin{aligned}\\alpha &={\\frac {\\left(a-\\mu \_{Y}\\right)\\left(a\\,c-a\\,\\mu \_{Y}-c\\,\\mu \_{Y}+\\mu \_{Y}^{2}+\\sigma \_{Y}^{2}\\right)}{\\sigma \_{Y}^{2}(c-a)}}\\\\\\beta &=-{\\frac {\\left(c-\\mu \_{Y}\\right)\\left(a\\,c-a\\,\\mu \_{Y}-c\\,\\mu \_{Y}+\\mu \_{Y}^{2}+\\sigma \_{Y}^{2}\\right)}{\\sigma \_{Y}^{2}(c-a)}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha &={\\frac {\\left(a-\\mu \_{Y}\\right)\\left(a\\,c-a\\,\\mu \_{Y}-c\\,\\mu \_{Y}+\\mu \_{Y}^{2}+\\sigma \_{Y}^{2}\\right)}{\\sigma \_{Y}^{2}(c-a)}}\\\\\\beta &=-{\\frac {\\left(c-\\mu \_{Y}\\right)\\left(a\\,c-a\\,\\mu \_{Y}-c\\,\\mu \_{Y}+\\mu \_{Y}^{2}+\\sigma \_{Y}^{2}\\right)}{\\sigma \_{Y}^{2}(c-a)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c05caf453d32d506f7290c48a091d9c48ddd4250) The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (*c* − *a*), linearly for the mean deviation and nonlinearly for the variance: (mean deviation around mean) ( Y ) \= ( (mean deviation around mean) ( X ) ) ( c − a ) \= 2 α α β β B ( α , β ) ( α \+ β ) α \+ β \+ 1 ( c − a ) {\\displaystyle {\\begin{aligned}&{\\text{(mean deviation around mean)}}(Y)\\\\\[1ex\]&=({\\text{(mean deviation around mean)}}(X))(c-a)\\\\&={\\frac {2\\alpha ^{\\alpha }\\beta ^{\\beta }}{\\mathrm {B} (\\alpha ,\\beta )(\\alpha +\\beta )^{\\alpha +\\beta +1}}}(c-a)\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&{\\text{(mean deviation around mean)}}(Y)\\\\\[1ex\]&=({\\text{(mean deviation around mean)}}(X))(c-a)\\\\&={\\frac {2\\alpha ^{\\alpha }\\beta ^{\\beta }}{\\mathrm {B} (\\alpha ,\\beta )(\\alpha +\\beta )^{\\alpha +\\beta +1}}}(c-a)\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/769fc1182a10805b989db4ae5c207769240dc3b5) var ( Y ) \= var ( X ) ( c − a ) 2 \= α β ( c − a ) 2 ( α \+ β ) 2 ( α \+ β \+ 1 ) . {\\displaystyle {\\text{var}}(Y)={\\text{var}}(X)(c-a)^{2}={\\frac {\\alpha \\beta (c-a)^{2}}{(\\alpha +\\beta )^{2}(\\alpha +\\beta +1)}}.} ![{\\displaystyle {\\text{var}}(Y)={\\text{var}}(X)(c-a)^{2}={\\frac {\\alpha \\beta (c-a)^{2}}{(\\alpha +\\beta )^{2}(\\alpha +\\beta +1)}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/211065eb20fc7ce222957c21a73d8bd8a5906a5a) Since the [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") are non-dimensional quantities (as [moments](https://en.wikipedia.org/wiki/Moment_\(mathematics\) "Moment (mathematics)") centered on the mean and normalized by the [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation")), they are independent of the parameters *a* and *c*, and therefore equal to the expressions given above in terms of *X* (with support \[0,1\] or (0,1)): skewness ( Y ) \= skewness ( X ) \= 2 ( β − α ) α \+ β \+ 1 ( α \+ β \+ 2 ) α β . {\\displaystyle {\\text{skewness}}(Y)={\\text{skewness}}(X)={\\frac {2(\\beta -\\alpha ){\\sqrt {\\alpha +\\beta +1}}}{(\\alpha +\\beta +2){\\sqrt {\\alpha \\beta }}}}.} ![{\\displaystyle {\\text{skewness}}(Y)={\\text{skewness}}(X)={\\frac {2(\\beta -\\alpha ){\\sqrt {\\alpha +\\beta +1}}}{(\\alpha +\\beta +2){\\sqrt {\\alpha \\beta }}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/884dd9bbf9a6dcc30f70bba80abdfcbd1c4a18ed) kurtosis excess ( Y ) \= kurtosis excess ( X ) \= 6 \[ ( α − β ) 2 ( α \+ β \+ 1 ) − α β ( α \+ β \+ 2 ) \] α β ( α \+ β \+ 2 ) ( α \+ β \+ 3 ) {\\displaystyle {\\text{kurtosis excess}}(Y)={\\text{kurtosis excess}}(X)={\\frac {6\\left\[(\\alpha -\\beta )^{2}(\\alpha +\\beta +1)-\\alpha \\beta (\\alpha +\\beta +2)\\right\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}} ![{\\displaystyle {\\text{kurtosis excess}}(Y)={\\text{kurtosis excess}}(X)={\\frac {6\\left\[(\\alpha -\\beta )^{2}(\\alpha +\\beta +1)-\\alpha \\beta (\\alpha +\\beta +2)\\right\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/76f8b524c994e9cdaf8317555457b4369ab2271e) ## Properties \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=10 "Edit section: Properties")\] ### Measures of central tendency \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=11 "Edit section: Measures of central tendency")\] #### Mode \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=12 "Edit section: Mode")\] The [mode](https://en.wikipedia.org/wiki/Mode_\(statistics\) "Mode (statistics)") of a beta distributed [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* with *α*, *β* \> 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) α − 1 α \+ β − 2 . {\\displaystyle {\\frac {\\alpha -1}{\\alpha +\\beta -2}}.} ![{\\displaystyle {\\frac {\\alpha -1}{\\alpha +\\beta -2}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/064149e2700adff9c3fb957a3682577905181336) When both parameters are less than one (*α*, *β* \< 1), this is the anti-mode: the lowest point of the probability density curve.[\[8\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Wadsworth-8) Letting *α* = *β*, the expression for the mode simplifies to 1/2, showing that for *α* = *β* \> 1 the mode (resp. anti-mode when *α*, *β* \< 1), is at the center of the distribution: it is symmetric in those cases. See [Shapes](https://en.wikipedia.org/wiki/Beta_distribution#Shapes) section in this article for a full list of mode cases, for arbitrary values of *α* and *β*. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of *α* = 2, *β* = 1 (or *α* = 1, *β* = 2), the density function becomes a [right-triangle distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution") which is finite at both ends. In several other cases there is a [singularity](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") at one end, where the value of the density function approaches infinity. For example, in the case *α* = *β* = 1/2, the beta distribution simplifies to become the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution"). There is debate among mathematicians about some of these cases and whether the ends (*x* = 0, and *x* = 1) can be called *modes* or not.[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\[2\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) Mode for beta distribution for 1 ≤ *α* ≤ 5 and 1 ≤ β ≤ 5 - Whether the ends are part of the [domain](https://en.wikipedia.org/wiki/Domain_of_a_function "Domain of a function") of the density function - Whether a [singularity](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") can ever be called a *mode* - Whether cases with two maxima should be called *bimodal* #### Median \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=13 "Edit section: Median")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg/330px-Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg) Median for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e9/%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg/250px-%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:\(Mean_-_Median\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg) (Mean–median) for beta distribution versus alpha and beta from 0 to 2 The median of the beta distribution is the unique real number x \= I 1 / 2 \[ − 1 \] ( α , β ) {\\displaystyle x=I\_{1/2}^{\[-1\]}(\\alpha ,\\beta )} ![{\\displaystyle x=I\_{1/2}^{\[-1\]}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d7510f94efa49f254eb3924678b527a6fd22d0fc) for which the [regularized incomplete beta function](https://en.wikipedia.org/wiki/Regularized_incomplete_beta_function "Regularized incomplete beta function") I x ( α , β ) \= 1 2 {\\displaystyle I\_{x}(\\alpha ,\\beta )={\\tfrac {1}{2}}} ![{\\displaystyle I\_{x}(\\alpha ,\\beta )={\\tfrac {1}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/80f2c2ef73043b11fddaf3488eb5108dabb78a4c). There is no general [closed-form expression](https://en.wikipedia.org/wiki/Closed-form_expression "Closed-form expression") for the [median](https://en.wikipedia.org/wiki/Median "Median") of the beta distribution for arbitrary values of *α* and *β*. [Closed-form expressions](https://en.wikipedia.org/wiki/Closed-form_expression "Closed-form expression") for particular values of the parameters *α* and *β* follow:\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] - For symmetric cases *α* = *β*, median = 1/2. - For *α* = 1 and *β* \> 0, median \= 1 − 2 − 1 / β {\\displaystyle =1-2^{-1/\\beta }} ![{\\displaystyle =1-2^{-1/\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6fa00571487d3351617d6e32c47662994f9f55b9) (this case is the [mirror-image](https://en.wikipedia.org/wiki/Mirror_image "Mirror image") of the [power function distribution](https://en.wikipedia.org/w/index.php?title=Power_function_distribution&action=edit&redlink=1 "Power function distribution (page does not exist)")) - For *α* \> 0 and *β* = 1, median = 2 − 1 / α {\\displaystyle 2^{-1/\\alpha }} ![{\\displaystyle 2^{-1/\\alpha }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e980502f2881b1aef6067b05e7c704834bf00297) (this case is the power function distribution[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)) - For *α* = 3 and *β* = 2, median = 0.6142724318676105..., the real solution to the [quartic equation](https://en.wikipedia.org/wiki/Quartic_function "Quartic function") 1 − 8*x*3 + 6*x*4 = 0, which lies in \[0,1\]. - For *α* = 2 and *β* = 3, median = 0.38572756813238945... = 1−median(Beta(3, 2)) The following are the limits with one parameter finite (non-zero) and the other approaching these limits:\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] lim β → 0 median \= lim α → ∞ median \= 1 , lim α → 0 median \= lim β → ∞ median \= 0\. {\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}{\\text{median}}=\\lim \_{\\alpha \\to \\infty }{\\text{median}}=1,\\\\\\lim \_{\\alpha \\to 0}{\\text{median}}=\\lim \_{\\beta \\to \\infty }{\\text{median}}=0.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}{\\text{median}}=\\lim \_{\\alpha \\to \\infty }{\\text{median}}=1,\\\\\\lim \_{\\alpha \\to 0}{\\text{median}}=\\lim \_{\\beta \\to \\infty }{\\text{median}}=0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cfb9ecb081e25c8eb5633c4e1ad99a00924d9f96) A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula[\[10\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kerman2011-10) median ≈ α − 1 3 α \+ β − 2 3 for α , β ≥ 1\. {\\displaystyle {\\text{median}}\\approx {\\frac {\\alpha -{\\tfrac {1}{3}}}{\\alpha +\\beta -{\\tfrac {2}{3}}}}{\\text{ for }}\\alpha ,\\beta \\geq 1.} ![{\\displaystyle {\\text{median}}\\approx {\\frac {\\alpha -{\\tfrac {1}{3}}}{\\alpha +\\beta -{\\tfrac {2}{3}}}}{\\text{ for }}\\alpha ,\\beta \\geq 1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/90cd0f42e583dd11e7add651ef1851597d88f184) When *α*, *β* ≥ 1, the [relative error](https://en.wikipedia.org/wiki/Relative_error "Relative error") (the [absolute error](https://en.wikipedia.org/wiki/Approximation_error "Approximation error") divided by the median) in this approximation is less than 4% and for both *α* ≥ 2 and *β* ≥ 2 it is less than 1%. The [absolute error](https://en.wikipedia.org/wiki/Approximation_error "Approximation error") divided by the difference between the mean and the mode is similarly small: [![Abs\[(Median-Appr.)/Median\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg "Abs[(Median-Appr.)/Median] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5")[![Abs\[(Median-Appr.)/(Mean-Mode)\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e8/Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg "Abs[(Median-Appr.)/(Mean-Mode)] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5") #### Mean \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=14 "Edit section: Mean")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg/330px-Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg) Mean for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5 The [expected value](https://en.wikipedia.org/wiki/Expected_value "Expected value") (mean) (*μ*) of a beta distribution [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* with two parameters *α* and *β* is a function of only the ratio *β*/*α* of these parameters:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) μ \= E ⁡ \[ X \] \= ∫ 0 1 x f ( x ; α , β ) d x \= ∫ 0 1 x x α − 1 ( 1 − x ) β − 1 B ( α , β ) d x \= α α \+ β \= 1 1 \+ β α {\\displaystyle {\\begin{aligned}\\mu =\\operatorname {E} \[X\]&=\\int \_{0}^{1}xf(x;\\alpha ,\\beta )\\,dx\\\\&=\\int \_{0}^{1}x\\,{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\,dx\\\\&={\\frac {\\alpha }{\\alpha +\\beta }}\\\\&={\\frac {1}{1+{\\frac {\\beta }{\\alpha }}}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\mu =\\operatorname {E} \[X\]&=\\int \_{0}^{1}xf(x;\\alpha ,\\beta )\\,dx\\\\&=\\int \_{0}^{1}x\\,{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\,dx\\\\&={\\frac {\\alpha }{\\alpha +\\beta }}\\\\&={\\frac {1}{1+{\\frac {\\beta }{\\alpha }}}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e9137834d9d47360ed6c23550c6236fed5fd35f7) Letting *α* = *β* in the above expression one obtains *μ* = 1/2, showing that for *α* = *β* the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression: lim β α → 0 μ \= 1 lim β α → ∞ μ \= 0 {\\displaystyle {\\begin{aligned}\\lim \_{{\\frac {\\beta }{\\alpha }}\\to 0}\\mu =1\\\\\\lim \_{{\\frac {\\beta }{\\alpha }}\\to \\infty }\\mu =0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\lim \_{{\\frac {\\beta }{\\alpha }}\\to 0}\\mu =1\\\\\\lim \_{{\\frac {\\beta }{\\alpha }}\\to \\infty }\\mu =0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/89775a4dd28774cf29d02e7bc054848f5d617946) Therefore, for *β*/*α* → 0, or for *α*/*β* → ∞, the mean is located at the right end, *x* = 1. For these limit ratios, the beta distribution becomes a one-point [degenerate distribution](https://en.wikipedia.org/wiki/Degenerate_distribution "Degenerate distribution") with a [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") spike at the right end, *x* = 1, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, *x* = 1. Similarly, for *β*/*α* → ∞, or for *α*/*β* → 0, the mean is located at the left end, *x* = 0. The beta distribution becomes a 1-point [Degenerate distribution](https://en.wikipedia.org/wiki/Degenerate_distribution "Degenerate distribution") with a [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") spike at the left end, *x* = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, *x* = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: lim β → 0 μ \= lim α → ∞ μ \= 1 lim α → 0 μ \= lim β → ∞ μ \= 0 {\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}\\mu =\\lim \_{\\alpha \\to \\infty }\\mu =1\\\\\\lim \_{\\alpha \\to 0}\\mu =\\lim \_{\\beta \\to \\infty }\\mu =0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}\\mu =\\lim \_{\\alpha \\to \\infty }\\mu =1\\\\\\lim \_{\\alpha \\to 0}\\mu =\\lim \_{\\beta \\to \\infty }\\mu =0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/79321fbb81bc184dbb8f51471b36495d844c7a14) While for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(*α*, *β*) such that *α*, *β* \> 2) it is known that the sample mean (as an estimate of location) is not as [robust](https://en.wikipedia.org/wiki/Robust_statistics "Robust statistics") as the sample median, the opposite is the case for uniform or "U-shaped" bimodal distributions (with Beta(*α*, *β*) such that *α*, *β* ≤ 1), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ([\[11\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-MostellerTukey-11) p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta(*α*, *β*) such that *α*, *β* ≤ 1) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for [random walks](https://en.wikipedia.org/wiki/Random_walk "Random walk"), since the probability for the time of the last visit to the origin in a random walk is distributed as the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") Beta(1/2, 1/2):[\[5\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Feller-5)[\[12\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-WillyFeller1-12) the mean of a number of [realizations](https://en.wikipedia.org/wiki/Realization_\(probability\) "Realization (probability)") of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case). #### Geometric mean \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=15 "Edit section: Geometric mean")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg/250px-%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:\(Mean_-_GeometricMean\)_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg) (Mean − GeometricMean) for beta distribution versus *α* and *β* from 0 to 2, showing the asymmetry between *α* and *β* for the geometric mean [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Geometric_Means_for_Beta_distribution_Purple%3DG\(X\),_Yellow%3DG\(1-X\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Geometric means for beta distribution Purple = *G*(*x*), Yellow = *G*(1 − *x*), smaller values *α* and *β* in front [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Geometric_Means_for_Beta_distribution_Purple%3DG\(X\),_Yellow%3DG\(1-X\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Geometric means for beta distribution. purple = *G*(*x*), yellow = *G*(1 − *x*), larger values *α* and *β* in front The logarithm of the [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") *GX* of a distribution with [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* is the arithmetic mean of ln(*X*), or, equivalently, its expected value: ln ⁡ G X \= E ⁡ \[ ln ⁡ X \] {\\displaystyle \\ln G\_{X}=\\operatorname {E} \[\\ln X\]} ![{\\displaystyle \\ln G\_{X}=\\operatorname {E} \[\\ln X\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/64b67cb73b90bc0e09ba41003b44f84b6e1d3feb) For a beta distribution, the expected value integral gives: E ⁡ \[ ln ⁡ X \] \= ∫ 0 1 ln ⁡ x f ( x ; α , β ) d x \= ∫ 0 1 ln ⁡ x x α − 1 ( 1 − x ) β − 1 B ( α , β ) d x \= 1 B ( α , β ) ∫ 0 1 ∂ x α − 1 ( 1 − x ) β − 1 ∂ α d x \= 1 B ( α , β ) ∂ ∂ α ∫ 0 1 x α − 1 ( 1 − x ) β − 1 d x \= 1 B ( α , β ) ∂ B ( α , β ) ∂ α \= ∂ ln ⁡ B ( α , β ) ∂ α \= ∂ ln ⁡ Γ ( α ) ∂ α − ∂ ln ⁡ Γ ( α \+ β ) ∂ α \= ψ ( α ) − ψ ( α \+ β ) {\\displaystyle {\\begin{aligned}\\operatorname {E} \[\\ln X\]&=\\int \_{0}^{1}\\ln x\\,f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&=\\int \_{0}^{1}\\ln x\\,{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}\\,\\int \_{0}^{1}{\\frac {\\partial x^{\\alpha -1}(1-x)^{\\beta -1}}{\\partial \\alpha }}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}{\\frac {\\partial }{\\partial \\alpha }}\\int \_{0}^{1}x^{\\alpha -1}(1-x)^{\\beta -1}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}{\\frac {\\partial \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&={\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&={\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&=\\psi (\\alpha )-\\psi (\\alpha +\\beta )\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[\\ln X\]&=\\int \_{0}^{1}\\ln x\\,f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&=\\int \_{0}^{1}\\ln x\\,{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}\\,\\int \_{0}^{1}{\\frac {\\partial x^{\\alpha -1}(1-x)^{\\beta -1}}{\\partial \\alpha }}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}{\\frac {\\partial }{\\partial \\alpha }}\\int \_{0}^{1}x^{\\alpha -1}(1-x)^{\\beta -1}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}{\\frac {\\partial \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&={\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&={\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&=\\psi (\\alpha )-\\psi (\\alpha +\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cd9db519e08e3c72cd6f9e2f0c90a7c57bdba035) where *ψ* is the [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function"). Therefore, the geometric mean of a beta distribution with shape parameters *α* and *β* is the exponential of the digamma functions of *α* and *β* as follows: G X \= e E ⁡ \[ ln ⁡ X \] \= e ψ ( α ) − ψ ( α \+ β ) {\\displaystyle G\_{X}=e^{\\operatorname {E} \[\\ln X\]}=e^{\\psi (\\alpha )-\\psi (\\alpha +\\beta )}} ![{\\displaystyle G\_{X}=e^{\\operatorname {E} \[\\ln X\]}=e^{\\psi (\\alpha )-\\psi (\\alpha +\\beta )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c93ffa7f0155fa3816fcb151c3eb677700aabca2) While for a beta distribution with equal shape parameters *α* = *β*, it follows that skewness = 0 and mode = mean = median = 1/2, the geometric mean is less than 1/2: 0 \< *G**X* \< 1/2. The reason for this is that the logarithmic transformation strongly weights the values of *X* close to zero, as ln(*X*) strongly tends towards negative infinity as *X* approaches zero, while ln(*X*) flattens towards zero as *X* → 1. Along a line *α* = *β*, the following limits apply: lim α \= β → 0 G X \= 0 lim α \= β → ∞ G X \= 1 2 {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}G\_{X}=0\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }G\_{X}={\\tfrac {1}{2}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}G\_{X}=0\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }G\_{X}={\\tfrac {1}{2}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f79aaab766e7ff7eadf78bed8e0ba71401906469) Following are the limits with one parameter finite (non-zero) and the other approaching these limits: lim β → 0 G X \= lim α → ∞ G X \= 1 lim α → 0 G X \= lim β → ∞ G X \= 0 {\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}G\_{X}=\\lim \_{\\alpha \\to \\infty }G\_{X}=1\\\\\\lim \_{\\alpha \\to 0}G\_{X}=\\lim \_{\\beta \\to \\infty }G\_{X}=0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}G\_{X}=\\lim \_{\\alpha \\to \\infty }G\_{X}=1\\\\\\lim \_{\\alpha \\to 0}G\_{X}=\\lim \_{\\beta \\to \\infty }G\_{X}=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/17991f065f9f550de7ce3f62d4f6c0818871b4ff) The accompanying plot shows the difference between the mean and the geometric mean for shape parameters *α* and *β* from zero to 2. Besides the fact that the difference between them approaches zero as *α* and *β* approach infinity and that the difference becomes large for values of *α* and *β* approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters *α* and *β*. The difference between the geometric mean and the mean is larger for small values of *α* in relation to *β* than when exchanging the magnitudes of *β* and *α*. [N. L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S. Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) suggest the logarithmic approximation to the digamma function *ψ*(*α*) ≈ ln(*α* − 1/2) which results in the following approximation to the geometric mean: G X ≈ α − 1 2 α \+ β − 1 2 if α , β \> 1\. {\\displaystyle G\_{X}\\approx {\\frac {\\alpha \\,-{\\frac {1}{2}}}{\\alpha +\\beta -{\\frac {1}{2}}}}{\\text{ if }}\\alpha ,\\beta \>1.} ![{\\displaystyle G\_{X}\\approx {\\frac {\\alpha \\,-{\\frac {1}{2}}}{\\alpha +\\beta -{\\frac {1}{2}}}}{\\text{ if }}\\alpha ,\\beta \>1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b99248644aa6d645f217ee91b14fd9dc653c044e) Numerical values for the [relative error](https://en.wikipedia.org/wiki/Relative_error "Relative error") in this approximation follow: \[(*α* = *β* = 1): 9.39%\]; \[(*α* = *β* = 2): 1.29%\]; \[(*α* = 2, *β* = 3): 1.51%\]; \[(*α* = 3, *β* = 2): 0.44%\]; \[(*α* = *β* = 3): 0.51%\]; \[(*α* = *β* = 4): 0.26%\]; \[(*α* = 3, *β* = 4): 0.55%\]; \[(*α* = 4, *β* = 3): 0.24%\]. Similarly, one can calculate the value of shape parameters required for the geometric mean to equal 1/2. Given the value of the parameter *β*, what would be the value of the other parameter, *α*, required for the geometric mean to equal 1/2?. The answer is that (for *β* \> 1), the value of *α* required tends towards *β* + 1/2 as *β* → ∞. For example, all these couples have the same geometric mean of 1/2: \[*β* = 1, *α* = 1.4427\], \[*β* = 2, *α* = 2.46958\], \[*β* = 3, *α* = 3.47943\], \[*β* = 4, *α* = 4.48449\], \[*β* = 5, *α* = 5.48756\], \[*β* = 10, *α* = 10.4938\], \[*β* = 100, *α* = 100.499\]. The fundamental property of the geometric mean, which can be proven to be false for any other mean, is G ( X i Y i ) \= G ( X i ) G ( Y i ) {\\displaystyle G{\\left({\\frac {X\_{i}}{Y\_{i}}}\\right)}={\\frac {G(X\_{i})}{G(Y\_{i})}}} ![{\\displaystyle G{\\left({\\frac {X\_{i}}{Y\_{i}}}\\right)}={\\frac {G(X\_{i})}{G(Y\_{i})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7f6dfb89e5cbd95bc0d8f931a46f5f7d8643feb9) This makes the geometric mean the only correct mean when averaging *normalized* results, that is results that are presented as ratios to reference values.[\[13\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-13) This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section "Parameter estimation, maximum likelihood." Actually, when performing maximum likelihood estimation, besides the [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") *GX* based on the random variable X, also another geometric mean appears naturally: the [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") based on the linear transformation ––(1 − *X*), the mirror-image of *X*, denoted by *G*(1−*X*): G 1 − X \= e E ⁡ \[ ln ⁡ ( 1 − X ) \] \= e ψ ( β ) − ψ ( α \+ β ) {\\displaystyle G\_{1-X}=e^{\\operatorname {E} \[\\ln(1-X)\]}=e^{\\psi (\\beta )-\\psi (\\alpha +\\beta )}} ![{\\displaystyle G\_{1-X}=e^{\\operatorname {E} \[\\ln(1-X)\]}=e^{\\psi (\\beta )-\\psi (\\alpha +\\beta )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/58d36e067302e87f85db3f0fb1e2902201e38d76) Along a line *α* = *β*, the following limits apply: lim α \= β → 0 G 1 − X \= 0 lim α \= β → ∞ G 1 − X \= 1 2 {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}G\_{1-X}=0\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }G\_{1-X}={\\tfrac {1}{2}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}G\_{1-X}=0\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }G\_{1-X}={\\tfrac {1}{2}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/003b2e791a1e220ab169db260dac39b0a180f7b4) Following are the limits with one parameter finite (non-zero) and the other approaching these limits: lim β → 0 G ( 1 − X ) \= lim α → ∞ G ( 1 − X ) \= 0 lim α → 0 G ( 1 − X ) \= lim β → ∞ G ( 1 − X ) \= 1 {\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}G\_{(1-X)}=\\lim \_{\\alpha \\to \\infty }G\_{(1-X)}=0\\\\\\lim \_{\\alpha \\to 0}G\_{(1-X)}=\\lim \_{\\beta \\to \\infty }G\_{(1-X)}=1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}G\_{(1-X)}=\\lim \_{\\alpha \\to \\infty }G\_{(1-X)}=0\\\\\\lim \_{\\alpha \\to 0}G\_{(1-X)}=\\lim \_{\\beta \\to \\infty }G\_{(1-X)}=1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0b3abc831911811340bd23f919c81b72ba4cbdbe) It has the following approximate value: G ( 1 − X ) ≈ β − 1 2 α \+ β − 1 2 if α , β \> 1\. {\\displaystyle G\_{(1-X)}\\approx {\\frac {\\beta -{\\frac {1}{2}}}{\\alpha +\\beta -{\\frac {1}{2}}}}{\\text{ if }}\\alpha ,\\beta \>1.} ![{\\displaystyle G\_{(1-X)}\\approx {\\frac {\\beta -{\\frac {1}{2}}}{\\alpha +\\beta -{\\frac {1}{2}}}}{\\text{ if }}\\alpha ,\\beta \>1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/40d7459baac164c2fedfad9dd8316320553e3d10) Although both *G**X* and *G*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the geometric means are equal: *G**X* = *G*(1−*X*). This equality follows from the following symmetry displayed between both geometric means: G X ( B ( α , β ) ) \= G 1 − X ( B ( β , α ) ) . {\\displaystyle G\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=G\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )).} ![{\\displaystyle G\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=G\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4153dc018580179899eb2d817b44418862eb0922) #### Harmonic mean \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=16 "Edit section: Harmonic mean")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg/250px-Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg) Harmonic mean for beta distribution for 0 \< *α* \< 5 and 0 \< *β* \< 5 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c0/%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg/250px-%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:\(Mean_-_HarmonicMean\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg) Harmonic mean for beta distribution versus *α* and *β* from 0 to 2 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\(X\),_Yellow%3DH\(1-X\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Harmonic means for beta distribution Purple = *H*(*X*), Yellow = *H*(1 − *X*), smaller values *α* and *β* in front [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/00/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\(X\),_Yellow%3DH\(1-X\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Harmonic means for beta distribution: purple = *H*(*X*), yellow = *H*(1 − *X*), larger values *α* and *β* in front The inverse of the [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*HX*) of a distribution with [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* is the arithmetic mean of 1/*X*, or, equivalently, its expected value. Therefore, the [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*HX*) of a beta distribution with shape parameters *α* and *β* is: H X \= 1 E ⁡ \[ 1 X \] \= 1 ∫ 0 1 f ( x ; α , β ) x d x \= 1 ∫ 0 1 x α − 1 ( 1 − x ) β − 1 x B ( α , β ) d x \= α − 1 α \+ β − 1 if α \> 1 and β \> 0 {\\displaystyle {\\begin{aligned}H\_{X}&={\\frac {1}{\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]}}\\\\&={\\frac {1}{\\int \_{0}^{1}{\\frac {f(x;\\alpha ,\\beta )}{x}}\\,dx}}\\\\&={\\frac {1}{\\int \_{0}^{1}{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{x\\mathrm {B} (\\alpha ,\\beta )}}\\,dx}}\\\\&={\\frac {\\alpha -1}{\\alpha +\\beta -1}}{\\text{ if }}\\alpha \>1{\\text{ and }}\\beta \>0\\\\\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}H\_{X}&={\\frac {1}{\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]}}\\\\&={\\frac {1}{\\int \_{0}^{1}{\\frac {f(x;\\alpha ,\\beta )}{x}}\\,dx}}\\\\&={\\frac {1}{\\int \_{0}^{1}{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{x\\mathrm {B} (\\alpha ,\\beta )}}\\,dx}}\\\\&={\\frac {\\alpha -1}{\\alpha +\\beta -1}}{\\text{ if }}\\alpha \>1{\\text{ and }}\\beta \>0\\\\\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ed7d99dd7493b9c085cd5d407861730e2a2abf6c) The [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*HX*) of a beta distribution with *α* \< 1 is undefined, because its defining expression is not bounded in \[0, 1\] for shape parameter *α* less than unity. Letting *α* = *β* in the above expression one obtains H X \= α − 1 2 α − 1 , {\\displaystyle H\_{X}={\\frac {\\alpha -1}{2\\alpha -1}},} ![{\\displaystyle H\_{X}={\\frac {\\alpha -1}{2\\alpha -1}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0748b4780dc8ea57db97db149af39c2c37fcd769) showing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1/2, for *α* = *β* → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: lim α → 0 H X is undefined lim α → 1 H X \= lim β → ∞ H X \= 0 lim β → 0 H X \= lim α → ∞ H X \= 1 {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}H\_{X}{\\text{ is undefined}}\\\\&\\lim \_{\\alpha \\to 1}H\_{X}=\\lim \_{\\beta \\to \\infty }H\_{X}=0\\\\&\\lim \_{\\beta \\to 0}H\_{X}=\\lim \_{\\alpha \\to \\infty }H\_{X}=1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}H\_{X}{\\text{ is undefined}}\\\\&\\lim \_{\\alpha \\to 1}H\_{X}=\\lim \_{\\beta \\to \\infty }H\_{X}=0\\\\&\\lim \_{\\beta \\to 0}H\_{X}=\\lim \_{\\alpha \\to \\infty }H\_{X}=1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f9c9be97ae581b7d1ba77fc375e8cedfd799bb4e) The harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean *HX* based on the random variable *X*, also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − *X*), the mirror-image of *X*, denoted by *H*1 − *X*: H 1 − X \= 1 E ⁡ \[ 1 1 − X \] \= β − 1 α \+ β − 1 if β \> 1 , and α \> 0\. {\\displaystyle H\_{1-X}={\\frac {1}{\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]}}={\\frac {\\beta -1}{\\alpha +\\beta -1}}{\\text{ if }}\\beta \>1,{\\text{ and }}\\alpha \>0.} ![{\\displaystyle H\_{1-X}={\\frac {1}{\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]}}={\\frac {\\beta -1}{\\alpha +\\beta -1}}{\\text{ if }}\\beta \>1,{\\text{ and }}\\alpha \>0.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/48f4fd69f20c4259cb8a50e754df8dfed5a1ddca) The [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*H*(1 − *X*)) of a beta distribution with *β* \< 1 is undefined, because its defining expression is not bounded in \[0, 1\] for shape parameter *β* less than unity. Letting *α* = *β* in the above expression one obtains H ( 1 − X ) \= β − 1 2 β − 1 , {\\displaystyle H\_{(1-X)}={\\frac {\\beta -1}{2\\beta -1}},} ![{\\displaystyle H\_{(1-X)}={\\frac {\\beta -1}{2\\beta -1}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e44f14a6bf3119656952a5ff13b75bc5b81cdec) showing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1/2, for *α* = *β* → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: lim β → 0 H 1 − X is undefined lim β → 1 H 1 − X \= lim α → ∞ H 1 − X \= 0 lim α → 0 H 1 − X \= lim β → ∞ H 1 − X \= 1 {\\displaystyle {\\begin{aligned}&\\lim \_{\\beta \\to 0}H\_{1-X}{\\text{ is undefined}}\\\\&\\lim \_{\\beta \\to 1}H\_{1-X}=\\lim \_{\\alpha \\to \\infty }H\_{1-X}=0\\\\&\\lim \_{\\alpha \\to 0}H\_{1-X}=\\lim \_{\\beta \\to \\infty }H\_{1-X}=1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\beta \\to 0}H\_{1-X}{\\text{ is undefined}}\\\\&\\lim \_{\\beta \\to 1}H\_{1-X}=\\lim \_{\\alpha \\to \\infty }H\_{1-X}=0\\\\&\\lim \_{\\alpha \\to 0}H\_{1-X}=\\lim \_{\\beta \\to \\infty }H\_{1-X}=1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/383c5ac1f2d11de963103ce8fec670e2a5eba78c) Although both *H**X* and *H*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the harmonic means are equal: *H**X* = *H*1−*X*. This equality follows from the following symmetry displayed between both harmonic means: H X ( B ( α , β ) ) \= H 1 − X ( B ( β , α ) ) if α , β \> 1\. {\\displaystyle H\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=H\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )){\\text{ if }}\\alpha ,\\beta \>1.} ![{\\displaystyle H\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=H\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )){\\text{ if }}\\alpha ,\\beta \>1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0e80c207c2d510bbda4077f80954f86f872c4986) ### Measures of statistical dispersion \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=17 "Edit section: Measures of statistical dispersion")\] #### Variance \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=18 "Edit section: Variance")\] The [variance](https://en.wikipedia.org/wiki/Variance "Variance") (the second moment centered on the mean) of a beta distribution [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* with parameters *α* and *β* is:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[14\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-14) var ⁡ ( X ) \= E ⁡ \[ ( X − μ ) 2 \] \= α β ( α \+ β ) 2 ( α \+ β \+ 1 ) {\\displaystyle \\operatorname {var} (X)=\\operatorname {E} \\left\[(X-\\mu )^{2}\\right\]={\\frac {\\alpha \\beta }{\\left(\\alpha +\\beta \\right)^{2}\\left(\\alpha +\\beta +1\\right)}}} ![{\\displaystyle \\operatorname {var} (X)=\\operatorname {E} \\left\[(X-\\mu )^{2}\\right\]={\\frac {\\alpha \\beta }{\\left(\\alpha +\\beta \\right)^{2}\\left(\\alpha +\\beta +1\\right)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d7d2effe47b57f9a004264ee6aac04029d3954de) Letting *α* = *β* in the above expression one obtains var ⁡ ( X ) \= 1 4 ( 2 β \+ 1 ) , {\\displaystyle \\operatorname {var} (X)={\\frac {1}{4(2\\beta +1)}},} ![{\\displaystyle \\operatorname {var} (X)={\\frac {1}{4(2\\beta +1)}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9ddb2c17489caef8b881faaf9005d70cbfdb113f) showing that for *α* = *β* the variance decreases monotonically as *α* = *β* increases. Setting *α* = *β* = 0 in this expression, one finds the maximum variance var(*X*) = 1/4[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) which only occurs approaching the limit, at *α* = *β* = 0. The beta distribution may also be [parametrized](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of its mean *μ* (0 \< *μ* \< 1) and sample size *ν* = *α* + *β* (*ν* \> 0) (see subsection [Mean and sample size](https://en.wikipedia.org/wiki/Beta_distribution#Mean_and_sample_size)): α \= μ ν , where ν \= ( α \+ β ) \> 0 , β \= ( 1 − μ ) ν , where ν \= ( α \+ β ) \> 0\. {\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,\\\\\\beta &=(1-\\mu )\\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,\\\\\\beta &=(1-\\mu )\\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/20a45fa8a94426b232051253608b906c6822b55f) Using this [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter"), one can express the variance in terms of the mean *μ* and the sample size *ν* as follows: var ⁡ ( X ) \= μ ( 1 − μ ) 1 \+ ν {\\displaystyle \\operatorname {var} (X)={\\frac {\\mu (1-\\mu )}{1+\\nu }}} ![{\\displaystyle \\operatorname {var} (X)={\\frac {\\mu (1-\\mu )}{1+\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c47a19d1f9adc5d491983b5709e2cf1b54ccdc7f) Since *ν* = *α* + *β* \> 0, it follows that var(*X*) \< *μ*(1 − *μ*). For a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1/2, and therefore: var ⁡ ( X ) \= 1 4 ( 1 \+ ν ) if μ \= 1 2 {\\displaystyle \\operatorname {var} (X)={\\frac {1}{4(1+\\nu )}}{\\text{ if }}\\mu ={\\tfrac {1}{2}}} ![{\\displaystyle \\operatorname {var} (X)={\\frac {1}{4(1+\\nu )}}{\\text{ if }}\\mu ={\\tfrac {1}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ce5b637b913b97c48ca2fb6582eaf867338b149a) Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: lim β → 0 var ⁡ ( X ) \= lim α → 0 var ⁡ ( X ) \= lim β → ∞ var ⁡ ( X ) \= lim α → ∞ var ⁡ ( X ) \= 0 lim ν → ∞ var ⁡ ( X ) \= lim μ → 0 var ⁡ ( X ) \= lim μ → 1 var ⁡ ( X ) \= 0 lim ν → 0 var ⁡ ( X ) \= μ ( 1 − μ ) {\\displaystyle {\\begin{aligned}&\\lim \_{\\beta \\to 0}\\operatorname {var} (X)=\\lim \_{\\alpha \\to 0}\\operatorname {var} (X)=\\lim \_{\\beta \\to \\infty }\\operatorname {var} (X)=\\lim \_{\\alpha \\to \\infty }\\operatorname {var} (X)=0\\\\&\\lim \_{\\nu \\to \\infty }\\operatorname {var} (X)=\\lim \_{\\mu \\to 0}\\operatorname {var} (X)=\\lim \_{\\mu \\to 1}\\operatorname {var} (X)=0\\\\&\\lim \_{\\nu \\to 0}\\operatorname {var} (X)=\\mu (1-\\mu )\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\beta \\to 0}\\operatorname {var} (X)=\\lim \_{\\alpha \\to 0}\\operatorname {var} (X)=\\lim \_{\\beta \\to \\infty }\\operatorname {var} (X)=\\lim \_{\\alpha \\to \\infty }\\operatorname {var} (X)=0\\\\&\\lim \_{\\nu \\to \\infty }\\operatorname {var} (X)=\\lim \_{\\mu \\to 0}\\operatorname {var} (X)=\\lim \_{\\mu \\to 1}\\operatorname {var} (X)=0\\\\&\\lim \_{\\nu \\to 0}\\operatorname {var} (X)=\\mu (1-\\mu )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7fbf483ab2c1f8a471abc14c877b0b4ca13cbaff) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg/330px-Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg) #### Geometric variance and covariance \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=19 "Edit section: Geometric variance and covariance")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/36/Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png/250px-Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png) log geometric variances vs. *α* and *β* [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png/250px-Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png) log geometric variances vs. *α* and *β* The logarithm of the geometric variance, ln(var*GX*), of a distribution with [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* is the second moment of the logarithm of *X* centered on the geometric mean of *X*, ln(*GX*): ln ⁡ var G X \= E ⁡ \[ ( ln ⁡ X − ln ⁡ G X ) 2 \] \= E ⁡ \[ ( ln ⁡ X − E ⁡ \[ ln ⁡ X \] ) 2 \] \= E ⁡ \[ ( ln ⁡ X ) 2 \] − ( E ⁡ \[ ln ⁡ X \] ) 2 \= var ⁡ \[ ln ⁡ X \] {\\displaystyle {\\begin{aligned}\\ln \\operatorname {var} \_{GX}&=\\operatorname {E} \\left\[\\left(\\ln X-\\ln G\_{X}\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln X-\\operatorname {E} \\left\[\\ln X\\right\]\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln X\\right)^{2}\\right\]-\\left(\\operatorname {E} \[\\ln X\]\\right)^{2}\\\\&=\\operatorname {var} \[\\ln X\]\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\ln \\operatorname {var} \_{GX}&=\\operatorname {E} \\left\[\\left(\\ln X-\\ln G\_{X}\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln X-\\operatorname {E} \\left\[\\ln X\\right\]\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln X\\right)^{2}\\right\]-\\left(\\operatorname {E} \[\\ln X\]\\right)^{2}\\\\&=\\operatorname {var} \[\\ln X\]\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/03642aabe02f83fe6e01e5a136c245d82b898904) and therefore, the geometric variance is: var G X \= e var ⁡ \[ ln ⁡ X \] {\\displaystyle \\operatorname {var} \_{GX}=e^{\\operatorname {var} \[\\ln X\]}} ![{\\displaystyle \\operatorname {var} \_{GX}=e^{\\operatorname {var} \[\\ln X\]}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/524cf664ccfd5eb381fd1987926209f1c401a200) In the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information") matrix, and the curvature of the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function"), the logarithm of the geometric variance of the [reflected](https://en.wikipedia.org/wiki/Reflection_formula "Reflection formula") variable 1 − *X* and the logarithm of the geometric covariance between *X* and 1 − *X* appear: ln ⁡ v a r G ( 1 \- X ) \= E ⁡ \[ ( ln ⁡ ( 1 − X ) − ln ⁡ G 1 − X ) 2 \] \= E ⁡ \[ ( ln ⁡ ( 1 − X ) − E ⁡ \[ ln ⁡ ( 1 − X ) \] ) 2 \] \= E ⁡ \[ ( ln ⁡ ( 1 − X ) ) 2 \] − ( E ⁡ \[ ln ⁡ ( 1 − X ) \] ) 2 \= var ⁡ \[ ln ⁡ ( 1 − X ) \] v a r G ( 1 \- X ) \= e var ⁡ \[ ln ⁡ ( 1 − X ) \] ln ⁡ c o v G X , 1 \- X \= E ⁡ \[ ( ln ⁡ X − ln ⁡ G X ) ( ln ⁡ ( 1 − X ) − ln ⁡ G 1 − X ) \] \= E ⁡ \[ ( ln ⁡ X − E ⁡ \[ ln ⁡ X \] ) ( ln ⁡ ( 1 − X ) − E ⁡ \[ ln ⁡ ( 1 − X ) \] ) \] \= E ⁡ \[ ln ⁡ X ln ⁡ ( 1 − X ) \] − E ⁡ \[ ln ⁡ X \] E ⁡ \[ ln ⁡ ( 1 − X ) \] \= cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] cov G X , ( 1 − X ) \= e cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] {\\displaystyle {\\begin{aligned}\\ln \\operatorname {var\_{G(1-X)}} &=\\operatorname {E} \\left\[\\left(\\ln(1-X)-\\ln G\_{1-X}\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln(1-X)-\\operatorname {E} \[\\ln(1-X)\]\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[(\\ln(1-X))^{2}\\right\]-\\left(\\operatorname {E} \[\\ln(1-X)\]\\right)^{2}\\\\&=\\operatorname {var} \[\\ln(1-X)\]\\\\&\\\\\\operatorname {var\_{G(1-X)}} &=e^{\\operatorname {var} \[\\ln(1-X)\]}\\\\&\\\\\\ln \\operatorname {cov\_{G{X,1-X}}} &=\\operatorname {E} \[(\\ln X-\\ln G\_{X})(\\ln(1-X)-\\ln G\_{1-X})\]\\\\&=\\operatorname {E} \[(\\ln X-\\operatorname {E} \[\\ln X\])(\\ln(1-X)-\\operatorname {E} \[\\ln(1-X)\])\]\\\\&=\\operatorname {E} \\left\[\\ln X\\ln(1-X)\\right\]-\\operatorname {E} \[\\ln X\]\\operatorname {E} \[\\ln(1-X)\]\\\\&=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\\\&\\\\\\operatorname {cov} \_{G{X,(1-X)}}&=e^{\\operatorname {cov} \[\\ln X,\\ln(1-X)\]}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\ln \\operatorname {var\_{G(1-X)}} &=\\operatorname {E} \\left\[\\left(\\ln(1-X)-\\ln G\_{1-X}\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln(1-X)-\\operatorname {E} \[\\ln(1-X)\]\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[(\\ln(1-X))^{2}\\right\]-\\left(\\operatorname {E} \[\\ln(1-X)\]\\right)^{2}\\\\&=\\operatorname {var} \[\\ln(1-X)\]\\\\&\\\\\\operatorname {var\_{G(1-X)}} &=e^{\\operatorname {var} \[\\ln(1-X)\]}\\\\&\\\\\\ln \\operatorname {cov\_{G{X,1-X}}} &=\\operatorname {E} \[(\\ln X-\\ln G\_{X})(\\ln(1-X)-\\ln G\_{1-X})\]\\\\&=\\operatorname {E} \[(\\ln X-\\operatorname {E} \[\\ln X\])(\\ln(1-X)-\\operatorname {E} \[\\ln(1-X)\])\]\\\\&=\\operatorname {E} \\left\[\\ln X\\ln(1-X)\\right\]-\\operatorname {E} \[\\ln X\]\\operatorname {E} \[\\ln(1-X)\]\\\\&=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\\\&\\\\\\operatorname {cov} \_{G{X,(1-X)}}&=e^{\\operatorname {cov} \[\\ln X,\\ln(1-X)\]}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/657c11ca41846366cb8d9843536af9e002ea4cdf) For a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section [§ Moments of logarithmically transformed random variables](https://en.wikipedia.org/wiki/Beta_distribution#Moments_of_logarithmically_transformed_random_variables). The [variance](https://en.wikipedia.org/wiki/Variance "Variance") of the logarithmic variables and [covariance](https://en.wikipedia.org/wiki/Covariance "Covariance") of ln *X* and ln(1−*X*) are: var ⁡ \[ ln ⁡ X \] \= ψ 1 ( α ) − ψ 1 ( α \+ β ) {\\displaystyle \\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e396e8700267735eb741f73e8906445579c43bc6) var ⁡ \[ ln ⁡ ( 1 − X ) \] \= ψ 1 ( β ) − ψ 1 ( α \+ β ) {\\displaystyle \\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/70eefadef46c7d56cc13c8221aa3df1d71596b7f) cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] \= − ψ 1 ( α \+ β ) {\\displaystyle \\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e7a515ada0b9d62c5a3a7b35662b03256d66e3b9) where the **[trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), and is defined as the derivative of the [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function"): ψ 1 ( α ) \= d 2 ln ⁡ Γ ( α ) d α 2 \= d ψ ( α ) d α . {\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{d\\alpha ^{2}}}={\\frac {d\\psi (\\alpha )}{d\\alpha }}.} ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{d\\alpha ^{2}}}={\\frac {d\\psi (\\alpha )}{d\\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/041bbff527a17a19f628022e9d3bbb548e7c9f87) Therefore, ln ⁡ var G X \= var ⁡ \[ ln ⁡ X \] \= ψ 1 ( α ) − ψ 1 ( α \+ β ) {\\displaystyle \\ln \\operatorname {var} \_{GX}=\\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\ln \\operatorname {var} \_{GX}=\\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/194b00552edda5d8d026a24872cdb27b604516c9) ln ⁡ var G ( 1 − X ) \= var ⁡ \[ ln ⁡ ( 1 − X ) \] \= ψ 1 ( β ) − ψ 1 ( α \+ β ) {\\displaystyle \\ln \\operatorname {var} \_{G(1-X)}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\ln \\operatorname {var} \_{G(1-X)}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/96dd82553307c025c84da68a3c373aad7467abd2) ln ⁡ cov G X , 1 − X \= cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] \= − ψ 1 ( α \+ β ) {\\displaystyle \\ln \\operatorname {cov} \_{GX,1-X}=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\ln \\operatorname {cov} \_{GX,1-X}=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/40a793c0271e457f671edb0668edc15bbae8740f) The accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters *α* and *β*. The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters *α* and *β* greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values *α* and *β* less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for *α* and *β* less than unity. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: lim α → 0 ln ⁡ var G X \= lim β → 0 ln ⁡ var G ( 1 − X ) \= ∞ lim β → 0 ln ⁡ var G X \= lim α → ∞ ln ⁡ var G X \= lim α → 0 ln ⁡ var G ( 1 − X ) \= lim β → ∞ ln ⁡ var G ( 1 − X ) \= 0 lim α → ∞ ln ⁡ cov G X , ( 1 − X ) \= lim β → ∞ ln ⁡ cov G X , ( 1 − X ) \= 0 lim β → ∞ ln ⁡ var G X \= ψ 1 ( α ) lim α → ∞ ln ⁡ var G ( 1 − X ) \= ψ 1 ( β ) lim α → 0 ln ⁡ cov G X , ( 1 − X ) \= − ψ 1 ( β ) lim β → 0 ln ⁡ cov G X , ( 1 − X ) \= − ψ 1 ( α ) {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{GX}=\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{G(1-X)}=\\infty \\\\&\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{GX}=\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{GX}=\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{G(1-X)}=\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)}=0\\\\&\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {cov} \_{GX,(1-X)}=\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {cov} \_{GX,(1-X)}=0\\\\&\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{GX}=\\psi \_{1}(\\alpha )\\\\&\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)}=\\psi \_{1}(\\beta )\\\\&\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)}=-\\psi \_{1}(\\beta )\\\\&\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)}=-\\psi \_{1}(\\alpha )\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{GX}=\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{G(1-X)}=\\infty \\\\&\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{GX}=\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{GX}=\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{G(1-X)}=\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)}=0\\\\&\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {cov} \_{GX,(1-X)}=\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {cov} \_{GX,(1-X)}=0\\\\&\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{GX}=\\psi \_{1}(\\alpha )\\\\&\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)}=\\psi \_{1}(\\beta )\\\\&\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)}=-\\psi \_{1}(\\beta )\\\\&\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)}=-\\psi \_{1}(\\alpha )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/52b49d30ad8262df98b1571219f266c2555dc215) Limits with two parameters varying: lim α → ∞ ( lim β → ∞ ln ⁡ var G X ) \= lim β → ∞ ( lim α → ∞ ln ⁡ var G ( 1 − X ) ) \= lim α → ∞ ( lim β → 0 ln ⁡ cov G X , ( 1 − X ) ) \= lim β → ∞ ( lim α → 0 ln ⁡ cov G X , ( 1 − X ) ) \= 0 lim α → ∞ ( lim β → 0 ln ⁡ var G X ) \= lim β → ∞ ( lim α → 0 ln ⁡ var G ( 1 − X ) ) \= ∞ lim α → 0 ( lim β → 0 ln ⁡ cov G X , ( 1 − X ) ) \= lim β → 0 ( lim α → 0 ln ⁡ cov G X , ( 1 − X ) ) \= − ∞ {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{GX})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)})=\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=0\\\\&\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{GX})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{G(1-X)})=\\infty \\\\&\\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=\\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=-\\infty \\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{GX})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)})=\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=0\\\\&\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{GX})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{G(1-X)})=\\infty \\\\&\\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=\\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=-\\infty \\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0bddeff859939cba195caabb8a8340195320ad31) Although both ln(var*GX*) and ln(var*G*(1 − *X*)) are asymmetric, when the shape parameters are equal, *α* = *β*, one has: ln(var*GX*) = ln(var*G*(1−*X*)). This equality follows from the following symmetry displayed between both log geometric variances: ln ⁡ var G X ⁡ ( B ( α , β ) ) \= ln ⁡ var G ( 1 − X ) ⁡ ( B ( β , α ) ) . {\\displaystyle \\ln \\operatorname {var} \_{GX}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {var} \_{G(1-X)}(\\mathrm {B} (\\beta ,\\alpha )).} ![{\\displaystyle \\ln \\operatorname {var} \_{GX}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {var} \_{G(1-X)}(\\mathrm {B} (\\beta ,\\alpha )).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/73e9d65f936352d6c5ba0168fdc0a51177ec77ba) The log geometric covariance is symmetric: ln ⁡ cov G X , ( 1 − X ) ⁡ ( B ( α , β ) ) \= ln ⁡ cov G X , ( 1 − X ) ⁡ ( B ( β , α ) ) {\\displaystyle \\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle \\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d59fb83efc45a84f3606874dc5791681812ca46b) #### Mean absolute deviation around the mean \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=20 "Edit section: Mean absolute deviation around the mean")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg) Ratio of, ean abs.dev. to std.dev. for beta distribution with α and β ranging from 0 to 5 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg) Ratio of mean abs.dev. to std.dev. for beta distribution with mean 0 ≤ *μ* ≤ 1 and sample size 0 \< *ν* ≤ 10 The [mean absolute deviation](https://en.wikipedia.org/wiki/Mean_absolute_deviation "Mean absolute deviation") around the mean for the beta distribution with shape parameters *α* and *β* is:[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9) E ⁡ \[ \| X − E \[ X \] \| \] \= 2 α α β β B ( α , β ) ( α \+ β ) α \+ β \+ 1 {\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2\\alpha ^{\\alpha }\\beta ^{\\beta }}{\\mathrm {B} (\\alpha ,\\beta )(\\alpha +\\beta )^{\\alpha +\\beta +1}}}} ![{\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2\\alpha ^{\\alpha }\\beta ^{\\beta }}{\\mathrm {B} (\\alpha ,\\beta )(\\alpha +\\beta )^{\\alpha +\\beta +1}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6d1c6330a91df22b40cedc7903dbc70120d66cf9) The mean absolute deviation around the mean is a more [robust](https://en.wikipedia.org/wiki/Robust_statistics "Robust statistics") [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of [statistical dispersion](https://en.wikipedia.org/wiki/Statistical_dispersion "Statistical dispersion") than the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta(*α*, *β*) distributions with *α*,*β* \> 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean. Therefore, the effect of very large deviations from the mean are not as overly weighted. Using [Stirling's approximation](https://en.wikipedia.org/wiki/Stirling%27s_approximation "Stirling's approximation") to the [Gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"), [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) derived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for *α* = *β* = 1, and it decreases to zero as *α* → ∞, *β* → ∞): mean abs. dev. from mean standard deviation \= E ⁡ \[ \| X − E \[ X \] \| \] var ⁡ ( X ) ≈ 2 π ( 1 \+ 7 12 ( α \+ β ) − 1 12 α − 1 12 β ) , if α , β \> 1\. {\\displaystyle {\\begin{aligned}{\\frac {\\text{mean abs. dev. from mean}}{\\text{standard deviation}}}&={\\frac {\\operatorname {E} \[\|X-E\[X\]\|\]}{\\sqrt {\\operatorname {var} (X)}}}\\\\&\\approx {\\sqrt {\\frac {2}{\\pi }}}\\left(1+{\\frac {7}{12(\\alpha +\\beta )}}{}-{\\frac {1}{12\\alpha }}-{\\frac {1}{12\\beta }}\\right),{\\text{ if }}\\alpha ,\\beta \>1.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\frac {\\text{mean abs. dev. from mean}}{\\text{standard deviation}}}&={\\frac {\\operatorname {E} \[\|X-E\[X\]\|\]}{\\sqrt {\\operatorname {var} (X)}}}\\\\&\\approx {\\sqrt {\\frac {2}{\\pi }}}\\left(1+{\\frac {7}{12(\\alpha +\\beta )}}{}-{\\frac {1}{12\\alpha }}-{\\frac {1}{12\\beta }}\\right),{\\text{ if }}\\alpha ,\\beta \>1.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c196a5a2eb110b71471a3dc019241c6cb8c3f927) At the limit *α* → ∞, *β* → ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution: 2 π {\\displaystyle {\\sqrt {\\frac {2}{\\pi }}}} ![{\\displaystyle {\\sqrt {\\frac {2}{\\pi }}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/14c49a2b1362a06b132b9477b3669977ad5633dd). For *α* = *β* = 1 this ratio equals 3 2 {\\displaystyle {\\frac {\\sqrt {3}}{2}}} ![{\\displaystyle {\\frac {\\sqrt {3}}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4864a0c173339d1d88e89ca3c943f016744c879a), so that from *α* = *β* = 1 to *α*, *β* → ∞ the ratio decreases by 8.5%. For *α* = *β* = 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from *α* = *β* = 0 to *α* = *β* = 1, and by 25% from *α* = *β* = 0 to *α*, *β* → ∞ . However, for skewed beta distributions such that *α* → 0 or *β* → 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation. Using the [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of mean *μ* and sample size *ν* = *α* + *β* \> 0: *α* = *μν*, *β* = (1 − *μ*)*ν* one can express the mean [absolute deviation](https://en.wikipedia.org/wiki/Absolute_deviation "Absolute deviation") around the mean in terms of the mean *μ* and the sample size *ν* as follows: E ⁡ \[ \| X − E \[ X \] \| \] \= 2 μ μ ν ( 1 − μ ) ( 1 − μ ) ν ν B ( μ ν , ( 1 − μ ) ν ) {\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2\\mu ^{\\mu \\nu }(1-\\mu )^{(1-\\mu )\\nu }}{\\nu \\mathrm {B} (\\mu \\nu ,(1-\\mu )\\nu )}}} ![{\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2\\mu ^{\\mu \\nu }(1-\\mu )^{(1-\\mu )\\nu }}{\\nu \\mathrm {B} (\\mu \\nu ,(1-\\mu )\\nu )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/027efecf8aaefea8c805194e47a1374ffcb63cb8) For a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1/2, and therefore: E ⁡ \[ \| X − E \[ X \] \| \] \= 2 1 − ν ν B ( ν 2 , ν 2 ) \= 2 1 − ν Γ ( ν ) ν ( Γ ( ν 2 ) ) 2 lim ν → 0 ( lim μ → 1 2 E ⁡ \[ \| X − E \[ X \] \| \] ) \= 1 2 lim ν → ∞ ( lim μ → 1 2 E ⁡ \[ \| X − E \[ X \] \| \] ) \= 0 {\\displaystyle {\\begin{aligned}\\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2^{1-\\nu }}{\\nu \\mathrm {B} ({\\tfrac {\\nu }{2}},{\\tfrac {\\nu }{2}})}}&={\\frac {2^{1-\\nu }\\Gamma (\\nu )}{\\nu (\\Gamma ({\\tfrac {\\nu }{2}}))^{2}}}\\\\\\lim \_{\\nu \\to 0}\\left(\\lim \_{\\mu \\to {\\frac {1}{2}}}\\operatorname {E} \[\|X-E\[X\]\|\]\\right)&={\\frac {1}{2}}\\\\\\lim \_{\\nu \\to \\infty }\\left(\\lim \_{\\mu \\to {\\frac {1}{2}}}\\operatorname {E} \[\|X-E\[X\]\|\]\\right)&=0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2^{1-\\nu }}{\\nu \\mathrm {B} ({\\tfrac {\\nu }{2}},{\\tfrac {\\nu }{2}})}}&={\\frac {2^{1-\\nu }\\Gamma (\\nu )}{\\nu (\\Gamma ({\\tfrac {\\nu }{2}}))^{2}}}\\\\\\lim \_{\\nu \\to 0}\\left(\\lim \_{\\mu \\to {\\frac {1}{2}}}\\operatorname {E} \[\|X-E\[X\]\|\]\\right)&={\\frac {1}{2}}\\\\\\lim \_{\\nu \\to \\infty }\\left(\\lim \_{\\mu \\to {\\frac {1}{2}}}\\operatorname {E} \[\|X-E\[X\]\|\]\\right)&=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/745a053a1ef3cc7edf07332763b401bd09b40e42) Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: lim β → 0 E ⁡ \[ \| X − E \[ X \] \| \] \= lim α → 0 E ⁡ \[ \| X − E \[ X \] \| \] \= 0 lim β → ∞ E ⁡ \[ \| X − E \[ X \] \| \] \= lim α → ∞ E ⁡ \[ \| X − E \[ X \] \| \] \= 0 lim μ → 0 E ⁡ \[ \| X − E \[ X \] \| \] \= lim μ → 1 E ⁡ \[ \| X − E \[ X \] \| \] \= 0 lim ν → 0 E ⁡ \[ \| X − E \[ X \] \| \] \= μ ( 1 − μ ) lim ν → ∞ E ⁡ \[ \| X − E \[ X \] \| \] \= 0 {\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\alpha \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\beta \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\alpha \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\mu \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\mu \\to 1}\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\nu \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&={\\sqrt {\\mu (1-\\mu )}}\\\\\\lim \_{\\nu \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]&=0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\alpha \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\beta \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\alpha \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\mu \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\mu \\to 1}\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\nu \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&={\\sqrt {\\mu (1-\\mu )}}\\\\\\lim \_{\\nu \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]&=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/87c43b4a05f8ea3acf3f15b0a16f6ee07811ac6b) #### Mean absolute difference \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=21 "Edit section: Mean absolute difference")\] The [mean absolute difference](https://en.wikipedia.org/wiki/Mean_absolute_difference "Mean absolute difference") for the beta distribution is: M D \= ∫ 0 1 ∫ 0 1 f ( x ; α , β ) f ( y ; α , β ) \| x − y \| d x d y \= 4 α \+ β B ( α \+ β , α \+ β ) B ( α , α ) B ( β , β ) {\\displaystyle {\\begin{aligned}\\mathrm {MD} &=\\int \_{0}^{1}\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\,f(y;\\alpha ,\\beta )\\left\|x-y\\right\|dx\\,dy\\\\\[1ex\]&={\\frac {4}{\\alpha +\\beta }}{\\frac {B(\\alpha +\\beta ,\\alpha +\\beta )}{B(\\alpha ,\\alpha )B(\\beta ,\\beta )}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\mathrm {MD} &=\\int \_{0}^{1}\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\,f(y;\\alpha ,\\beta )\\left\|x-y\\right\|dx\\,dy\\\\\[1ex\]&={\\frac {4}{\\alpha +\\beta }}{\\frac {B(\\alpha +\\beta ,\\alpha +\\beta )}{B(\\alpha ,\\alpha )B(\\beta ,\\beta )}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0de290eea66b8a424727bff1b9a02f53f2607361) The [Gini coefficient](https://en.wikipedia.org/wiki/Gini_coefficient "Gini coefficient") for the beta distribution is half of the relative mean absolute difference: G \= ( 2 α ) B ( α \+ β , α \+ β ) B ( α , α ) B ( β , β ) {\\displaystyle \\mathrm {G} =\\left({\\frac {2}{\\alpha }}\\right){\\frac {B(\\alpha +\\beta ,\\alpha +\\beta )}{B(\\alpha ,\\alpha )B(\\beta ,\\beta )}}} ![{\\displaystyle \\mathrm {G} =\\left({\\frac {2}{\\alpha }}\\right){\\frac {B(\\alpha +\\beta ,\\alpha +\\beta )}{B(\\alpha ,\\alpha )B(\\beta ,\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0b4dc9f001aea3434b57f12eaaabd341347cc169) ### Skewness \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=22 "Edit section: Skewness")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg/330px-Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg) Skewness for beta distribution as a function of variance and mean The [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") (the third moment centered on the mean, normalized by the 3/2 power of the variance) of the beta distribution is[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) γ 1 \= E ⁡ \[ ( X − μ ) 3 \] ( var ⁡ ( X ) ) 3 / 2 \= 2 ( β − α ) α \+ β \+ 1 ( α \+ β \+ 2 ) α β . {\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \\left\[\\left(X-\\mu \\right)^{3}\\right\]}{\\left(\\operatorname {var} (X)\\right)^{3/2}}}={\\frac {2\\left(\\beta -\\alpha \\right){\\sqrt {\\alpha +\\beta +1}}}{\\left(\\alpha +\\beta +2\\right){\\sqrt {\\alpha \\beta }}}}.} ![{\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \\left\[\\left(X-\\mu \\right)^{3}\\right\]}{\\left(\\operatorname {var} (X)\\right)^{3/2}}}={\\frac {2\\left(\\beta -\\alpha \\right){\\sqrt {\\alpha +\\beta +1}}}{\\left(\\alpha +\\beta +2\\right){\\sqrt {\\alpha \\beta }}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1c880f0b8322d91fe382f87eaa4f8730faa164ed) Letting *α* = *β* in the above expression one obtains *γ*1 = 0, showing once again that for *α* = *β* the distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for *α* \< *β*, negative skew (left-tailed) for *α* \> *β*. Using the [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of mean *μ* and sample size *ν* = *α* + *β*: α \= μ ν , where ν \= ( α \+ β ) \> 0 , β \= ( 1 − μ ) ν , where ν \= ( α \+ β ) \> 0\. {\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,\\\\\\beta &=(1-\\mu )\\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,\\\\\\beta &=(1-\\mu )\\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/20a45fa8a94426b232051253608b906c6822b55f) one can express the skewness in terms of the mean *μ* and the sample size ν as follows: γ 1 \= E ⁡ \[ ( X − μ ) 3 \] ( var ⁡ ( X ) ) 3 / 2 \= 2 ( 1 − 2 μ ) 1 \+ ν ( 2 \+ ν ) μ ( 1 − μ ) . {\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \[(X-\\mu )^{3}\]}{\\left(\\operatorname {var} (X)\\right)^{3/2}}}={\\frac {2(1-2\\mu ){\\sqrt {1+\\nu }}}{(2+\\nu ){\\sqrt {\\mu (1-\\mu )}}}}.} ![{\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \[(X-\\mu )^{3}\]}{\\left(\\operatorname {var} (X)\\right)^{3/2}}}={\\frac {2(1-2\\mu ){\\sqrt {1+\\nu }}}{(2+\\nu ){\\sqrt {\\mu (1-\\mu )}}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c88399efb587f7d2443ddb232e4b24f2763050b9) The skewness can also be expressed just in terms of the variance *var* and the mean *μ* as follows: γ 1 \= E ⁡ \[ ( X − μ ) 3 \] ( var ⁡ ( X ) ) 3 / 2 \= 2 ( 1 − 2 μ ) var μ ( 1 − μ ) \+ var if var \< μ ( 1 − μ ) {\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \[(X-\\mu )^{3}\]}{(\\operatorname {var} (X))^{3/2}}}={\\frac {2(1-2\\mu ){\\sqrt {\\operatorname {var} }}}{\\mu (1-\\mu )+\\operatorname {var} }}{\\text{ if }}\\operatorname {var} \<\\mu (1-\\mu )} ![{\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \[(X-\\mu )^{3}\]}{(\\operatorname {var} (X))^{3/2}}}={\\frac {2(1-2\\mu ){\\sqrt {\\operatorname {var} }}}{\\mu (1-\\mu )+\\operatorname {var} }}{\\text{ if }}\\operatorname {var} \<\\mu (1-\\mu )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6b48373ac7ce8096381e7f74edf9e44bc435ad13) The accompanying plot of skewness as a function of variance and mean shows that maximum variance (1/4) is coupled with zero skewness and the symmetry condition (*μ* = 1/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the "mass" of the probability distribution is concentrated at the ends (minimum variance). The following expression for the square of the skewness, in terms of the sample size *ν* = *α* + *β* and the variance var, is useful for the method of moments estimation of four parameters: ( γ 1 ) 2 \= ( E ⁡ \[ ( X − μ ) 3 \] ) 2 ( var ⁡ ( X ) ) 3 \= 4 ( 2 \+ ν ) 2 ( 1 var − 4 ( 1 \+ ν ) ) {\\displaystyle (\\gamma \_{1})^{2}={\\frac {\\left(\\operatorname {E} \[(X-\\mu )^{3}\]\\right)^{2}}{\\left(\\operatorname {var} (X)\\right)^{3}}}={\\frac {4}{(2+\\nu )^{2}}}\\left({\\frac {1}{\\operatorname {var} }}-4(1+\\nu )\\right)} ![{\\displaystyle (\\gamma \_{1})^{2}={\\frac {\\left(\\operatorname {E} \[(X-\\mu )^{3}\]\\right)^{2}}{\\left(\\operatorname {var} (X)\\right)^{3}}}={\\frac {4}{(2+\\nu )^{2}}}\\left({\\frac {1}{\\operatorname {var} }}-4(1+\\nu )\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b4f7b7d9a6d73e9bbcc43812b8f3fd573bc02ee3) This expression correctly gives a skewness of zero for *α* = *β*, since in that case (see [§ Variance](https://en.wikipedia.org/wiki/Beta_distribution#Variance)): var \= 1 4 ( 1 \+ ν ) {\\displaystyle \\operatorname {var} ={\\frac {1}{4(1+\\nu )}}} ![{\\displaystyle \\operatorname {var} ={\\frac {1}{4(1+\\nu )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ceee4bc548c8256d3770abfd91ced35cdaeb4305). For the symmetric case (*α* = *β*), skewness = 0 over the whole range, and the following limits apply: lim α \= β → 0 γ 1 \= lim α \= β → ∞ γ 1 \= lim ν → 0 γ 1 \= lim ν → ∞ γ 1 \= lim μ → 1 2 γ 1 \= 0 {\\displaystyle \\lim \_{\\alpha =\\beta \\to 0}\\gamma \_{1}=\\lim \_{\\alpha =\\beta \\to \\infty }\\gamma \_{1}=\\lim \_{\\nu \\to 0}\\gamma \_{1}=\\lim \_{\\nu \\to \\infty }\\gamma \_{1}=\\lim \_{\\mu \\to {\\frac {1}{2}}}\\gamma \_{1}=0} ![{\\displaystyle \\lim \_{\\alpha =\\beta \\to 0}\\gamma \_{1}=\\lim \_{\\alpha =\\beta \\to \\infty }\\gamma \_{1}=\\lim \_{\\nu \\to 0}\\gamma \_{1}=\\lim \_{\\nu \\to \\infty }\\gamma \_{1}=\\lim \_{\\mu \\to {\\frac {1}{2}}}\\gamma \_{1}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/62067392844dd260a0af419672fd2f6e8c964ea8) For the asymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: lim α → 0 γ 1 \= lim μ → 0 γ 1 \= ∞ lim β → 0 γ 1 \= lim μ → 1 γ 1 \= − ∞ lim α → ∞ γ 1 \= − 2 β , lim β → 0 ( lim α → ∞ γ 1 ) \= − ∞ , lim β → ∞ ( lim α → ∞ γ 1 ) \= 0 lim β → ∞ γ 1 \= 2 α , lim α → 0 ( lim β → ∞ γ 1 ) \= ∞ , lim α → ∞ ( lim β → ∞ γ 1 ) \= 0 lim ν → 0 γ 1 \= 1 − 2 μ μ ( 1 − μ ) , lim μ → 0 ( lim ν → 0 γ 1 ) \= ∞ , lim μ → 1 ( lim ν → 0 γ 1 ) \= − ∞ {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}\\gamma \_{1}=\\lim \_{\\mu \\to 0}\\gamma \_{1}=\\infty \\\\&\\lim \_{\\beta \\to 0}\\gamma \_{1}=\\lim \_{\\mu \\to 1}\\gamma \_{1}=-\\infty \\\\&\\lim \_{\\alpha \\to \\infty }\\gamma \_{1}=-{\\frac {2}{\\sqrt {\\beta }}},\\quad \\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to \\infty }\\gamma \_{1})=-\\infty ,\\quad \\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }\\gamma \_{1})=0\\\\&\\lim \_{\\beta \\to \\infty }\\gamma \_{1}={\\frac {2}{\\sqrt {\\alpha }}},\\quad \\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to \\infty }\\gamma \_{1})=\\infty ,\\quad \\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }\\gamma \_{1})=0\\\\&\\lim \_{\\nu \\to 0}\\gamma \_{1}={\\frac {1-2\\mu }{\\sqrt {\\mu (1-\\mu )}}},\\quad \\lim \_{\\mu \\to 0}(\\lim \_{\\nu \\to 0}\\gamma \_{1})=\\infty ,\\quad \\lim \_{\\mu \\to 1}(\\lim \_{\\nu \\to 0}\\gamma \_{1})=-\\infty \\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}\\gamma \_{1}=\\lim \_{\\mu \\to 0}\\gamma \_{1}=\\infty \\\\&\\lim \_{\\beta \\to 0}\\gamma \_{1}=\\lim \_{\\mu \\to 1}\\gamma \_{1}=-\\infty \\\\&\\lim \_{\\alpha \\to \\infty }\\gamma \_{1}=-{\\frac {2}{\\sqrt {\\beta }}},\\quad \\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to \\infty }\\gamma \_{1})=-\\infty ,\\quad \\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }\\gamma \_{1})=0\\\\&\\lim \_{\\beta \\to \\infty }\\gamma \_{1}={\\frac {2}{\\sqrt {\\alpha }}},\\quad \\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to \\infty }\\gamma \_{1})=\\infty ,\\quad \\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }\\gamma \_{1})=0\\\\&\\lim \_{\\nu \\to 0}\\gamma \_{1}={\\frac {1-2\\mu }{\\sqrt {\\mu (1-\\mu )}}},\\quad \\lim \_{\\mu \\to 0}(\\lim \_{\\nu \\to 0}\\gamma \_{1})=\\infty ,\\quad \\lim \_{\\mu \\to 1}(\\lim \_{\\nu \\to 0}\\gamma \_{1})=-\\infty \\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bba90eaf084c6a3d43b4c92915911f61b0b7a77) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg) ### Kurtosis \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=23 "Edit section: Kurtosis")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg/330px-Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg) Excess Kurtosis for Beta Distribution as a function of variance and mean The beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear.[\[15\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Oguamanam-15) Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc.[\[16\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Liang-16) Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping[\[17\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kenney_and_Keeping-17) use the symbol γ2 for the [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis"), but [Abramowitz and Stegun](https://en.wikipedia.org/wiki/Abramowitz_and_Stegun "Abramowitz and Stegun")[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18) use different terminology. To prevent confusion[\[19\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Weisstein.Kurtosi-19) between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows:[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\[20\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Panik-20) excess kurtosis \= kurtosis − 3 \= E ⁡ \[ ( X − μ ) 4 \] ( var ⁡ ( X ) ) 2 − 3 \= 6 \[ α 3 − α 2 ( 2 β − 1 ) \+ β 2 ( β \+ 1 ) − 2 α β ( β \+ 2 ) \] α β ( α \+ β \+ 2 ) ( α \+ β \+ 3 ) \= 6 \[ ( α − β ) 2 ( α \+ β \+ 1 ) − α β ( α \+ β \+ 2 ) \] α β ( α \+ β \+ 2 ) ( α \+ β \+ 3 ) . {\\displaystyle {\\begin{aligned}{\\text{excess kurtosis}}&={\\text{kurtosis}}-3\\\\&={\\frac {\\operatorname {E} \[(X-\\mu )^{4}\]}{(\\operatorname {var} (X))^{2}}}-3\\\\&={\\frac {6\[\\alpha ^{3}-\\alpha ^{2}(2\\beta -1)+\\beta ^{2}(\\beta +1)-2\\alpha \\beta (\\beta +2)\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}\\\\&={\\frac {6\[(\\alpha -\\beta )^{2}(\\alpha +\\beta +1)-\\alpha \\beta (\\alpha +\\beta +2)\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\text{excess kurtosis}}&={\\text{kurtosis}}-3\\\\&={\\frac {\\operatorname {E} \[(X-\\mu )^{4}\]}{(\\operatorname {var} (X))^{2}}}-3\\\\&={\\frac {6\[\\alpha ^{3}-\\alpha ^{2}(2\\beta -1)+\\beta ^{2}(\\beta +1)-2\\alpha \\beta (\\beta +2)\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}\\\\&={\\frac {6\[(\\alpha -\\beta )^{2}(\\alpha +\\beta +1)-\\alpha \\beta (\\alpha +\\beta +2)\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ed8320d4f38ba9260f8ad91c30238abc08306dc8) Letting *α* = *β* in the above expression one obtains excess kurtosis \= − 6 3 \+ 2 α if α \= β . {\\displaystyle {\\text{excess kurtosis}}=-{\\frac {6}{3+2\\alpha }}{\\text{ if }}\\alpha =\\beta .} ![{\\displaystyle {\\text{excess kurtosis}}=-{\\frac {6}{3+2\\alpha }}{\\text{ if }}\\alpha =\\beta .}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f71fa5123180a4ba12fefe14575f8ee902ecf6f5) Therefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as {*α* = *β*} → 0, and approaching a maximum value of zero as {*α* = *β*} → ∞. The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve. This minimum value is reached when all the probability density is entirely concentrated at each end *x* = 0 and *x* = 1, with nothing in between: a 2-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each end (a coin toss: see section below "Kurtosis bounded by the square of the skewness" for further discussion). The description of [kurtosis](https://en.wikipedia.org/wiki/Kurtosis "Kurtosis") as a measure of the "potential outliers" (or "potential rare, extreme values") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For *α* ≠ *β*, skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for *α* → 0 for finite *β*, or for *β* → 0 for finite *α*) because the side away from the mode will produce occasional extreme values. Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends. Using the [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of mean *μ* and sample size *ν* = *α* + *β*: α \= μ ν , where ν \= ( α \+ β ) \> 0 β \= ( 1 − μ ) ν , where ν \= ( α \+ β ) \> 0\. {\\displaystyle {\\begin{aligned}\\alpha &{}=\\mu \\nu ,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0\\\\\\beta &{}=(1-\\mu )\\nu ,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha &{}=\\mu \\nu ,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0\\\\\\beta &{}=(1-\\mu )\\nu ,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e9235083c23a44820d57502412277b6492733df3) one can express the excess kurtosis in terms of the mean *μ* and the sample size *ν* as follows: excess kurtosis \= 6 3 \+ ν ( ( 1 − 2 μ ) 2 ( 1 \+ ν ) μ ( 1 − μ ) ( 2 \+ ν ) − 1 ) {\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}{\\bigg (}{\\frac {(1-2\\mu )^{2}(1+\\nu )}{\\mu (1-\\mu )(2+\\nu )}}-1{\\bigg )}} ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}{\\bigg (}{\\frac {(1-2\\mu )^{2}(1+\\nu )}{\\mu (1-\\mu )(2+\\nu )}}-1{\\bigg )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/93b68e62d58b197fa50fe15bc12d94b9a4accd9a) The excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size *ν* as follows: excess kurtosis \= 6 ( 3 \+ ν ) ( 2 \+ ν ) ( 1 var − 6 − 5 ν ) if var \< μ ( 1 − μ ) {\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{(3+\\nu )(2+\\nu )}}\\left({\\frac {1}{\\text{ var }}}-6-5\\nu \\right){\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )} ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{(3+\\nu )(2+\\nu )}}\\left({\\frac {1}{\\text{ var }}}-6-5\\nu \\right){\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c62c8758de9adab39b9c265430d2c8be8c1156be) and, in terms of the variance *var* and the mean *μ* as follows: excess kurtosis \= 6 var ( 1 − var − 5 μ ( 1 − μ ) ) ( var \+ μ ( 1 − μ ) ) ( 2 var \+ μ ( 1 − μ ) ) if var \< μ ( 1 − μ ) {\\displaystyle {\\text{excess kurtosis}}={\\frac {6{\\text{ var }}(1-{\\text{ var }}-5\\mu (1-\\mu ))}{({\\text{var }}+\\mu (1-\\mu ))(2{\\text{ var }}+\\mu (1-\\mu ))}}{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )} ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6{\\text{ var }}(1-{\\text{ var }}-5\\mu (1-\\mu ))}{({\\text{var }}+\\mu (1-\\mu ))(2{\\text{ var }}+\\mu (1-\\mu ))}}{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/742a17be5afc4513ea91bfa55f1b1483f0a66a02) The plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1/4) and the symmetry condition: the mean occurring at the midpoint (*μ* = 1/2). This occurs for the symmetric case of *α* = *β* = 0, with zero skewness. At the limit, this is the 2 point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") end *x* = 0 and *x* = 1 and zero probability everywhere else. (A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1.) Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end. Excess kurtosis is minimum: the probability density "mass" is zero at the mean and it is concentrated at the two peaks at each end. Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-"peaky" with nothing in between them. On the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end (*μ* = 0 or *μ* = 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end. Alternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows: excess kurtosis \= 6 3 \+ ν ( ( 2 \+ ν ) 4 ( skewness ) 2 − 1 ) if (skewness) 2 − 2 \< excess kurtosis \< 3 2 ( skewness ) 2 {\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}{\\bigg (}{\\frac {(2+\\nu )}{4}}({\\text{skewness}})^{2}-1{\\bigg )}{\\text{ if (skewness)}}^{2}-2\<{\\text{excess kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}} ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}{\\bigg (}{\\frac {(2+\\nu )}{4}}({\\text{skewness}})^{2}-1{\\bigg )}{\\text{ if (skewness)}}^{2}-2\<{\\text{excess kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5c4a2a00762216460d3146b5e185c584ca7894da) From this last expression, one can obtain the same limits published over a century ago by [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson")[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) for the beta distribution (see section below titled "Kurtosis bounded by the square of the skewness"). Setting *α* + *β* = *ν* = 0 in the above expression, one obtains Pearson's lower boundary (values for the skewness and excess kurtosis below the boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") appropriately called the region below this boundary the "impossible region"). The limit of *α* + *β* = *ν* → ∞ determines Pearson's upper boundary. lim ν → 0 excess kurtosis \= ( skewness ) 2 − 2 lim ν → ∞ excess kurtosis \= 3 2 ( skewness ) 2 {\\displaystyle {\\begin{aligned}&\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}}=({\\text{skewness}})^{2}-2\\\\&\\lim \_{\\nu \\to \\infty }{\\text{excess kurtosis}}={\\tfrac {3}{2}}({\\text{skewness}})^{2}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}}=({\\text{skewness}})^{2}-2\\\\&\\lim \_{\\nu \\to \\infty }{\\text{excess kurtosis}}={\\tfrac {3}{2}}({\\text{skewness}})^{2}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5c1b3a5082942b2039499009068f1640b3cb8507) therefore: ( skewness ) 2 − 2 \< excess kurtosis \< 3 2 ( skewness ) 2 {\\displaystyle ({\\text{skewness}})^{2}-2\<{\\text{excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{skewness}})^{2}} ![{\\displaystyle ({\\text{skewness}})^{2}-2\<{\\text{excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1cfacc2713aca2945a2a4d65e4a41d7ce3486ec4) Values of *ν* = *α* + *β* such that *ν* ranges from zero to infinity, 0 \< *ν* \< ∞, span the whole region of the beta distribution in the plane of excess kurtosis versus squared skewness. For the symmetric case (*α* = *β*), the following limits apply: lim α \= β → 0 excess kurtosis \= − 2 lim α \= β → ∞ excess kurtosis \= 0 lim μ → 1 2 excess kurtosis \= − 6 3 \+ ν {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}{\\text{excess kurtosis}}=-2\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }{\\text{excess kurtosis}}=0\\\\&\\lim \_{\\mu \\to {\\frac {1}{2}}}{\\text{excess kurtosis}}=-{\\frac {6}{3+\\nu }}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}{\\text{excess kurtosis}}=-2\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }{\\text{excess kurtosis}}=0\\\\&\\lim \_{\\mu \\to {\\frac {1}{2}}}{\\text{excess kurtosis}}=-{\\frac {6}{3+\\nu }}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/faf6d13398e34d7d3fab238a94b36633d89460b0) For the unsymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: lim α → 0 excess kurtosis \= lim β → 0 excess kurtosis \= lim μ → 0 excess kurtosis \= lim μ → 1 excess kurtosis \= ∞ lim α → ∞ excess kurtosis \= 6 β , lim β → 0 ( lim α → ∞ excess kurtosis ) \= ∞ , lim β → ∞ ( lim α → ∞ excess kurtosis ) \= 0 lim β → ∞ excess kurtosis \= 6 α , lim α → 0 ( lim β → ∞ excess kurtosis ) \= ∞ , lim α → ∞ ( lim β → ∞ excess kurtosis ) \= 0 lim ν → 0 excess kurtosis \= − 6 \+ 1 μ ( 1 − μ ) , lim μ → 0 ( lim ν → 0 excess kurtosis ) \= ∞ , lim μ → 1 ( lim ν → 0 excess kurtosis ) \= ∞ {\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\beta \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\mu \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\mu \\to 1}{\\text{excess kurtosis}}=\\infty \\\\&\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}}={\\frac {6}{\\beta }},{\\text{ }}\\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}})=0\\\\&\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}}={\\frac {6}{\\alpha }},{\\text{ }}\\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}})=0\\\\&\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}}=-6+{\\frac {1}{\\mu (1-\\mu )}},{\\text{ }}\\lim \_{\\mu \\to 0}(\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\mu \\to 1}(\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}})=\\infty \\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\beta \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\mu \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\mu \\to 1}{\\text{excess kurtosis}}=\\infty \\\\&\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}}={\\frac {6}{\\beta }},{\\text{ }}\\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}})=0\\\\&\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}}={\\frac {6}{\\alpha }},{\\text{ }}\\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}})=0\\\\&\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}}=-6+{\\frac {1}{\\mu (1-\\mu )}},{\\text{ }}\\lim \_{\\mu \\to 0}(\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\mu \\to 1}(\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}})=\\infty \\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/28a968a269123a07cf9a32ccfe2d75ee09e42460) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)[![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg) ### Characteristic function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=24 "Edit section: Characteristic function")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/24/Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg/330px-Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacteristicFunction\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") symmetric case *α* = *β* ranging from 25 to 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg/330px-Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacteristicFunc\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") symmetric case *α* = *β* ranging from 0 to 25 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/24/Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg/330px-Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacteristFunc\)_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") *β* = *α* + 1/2; *α* ranging from 25 to 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg/330px-Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacterFunc\)_Beta_Distrib._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") *α* = *β* + 1/2; *β* ranging from 25 to 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg/330px-Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacterFunc\)_Beta_Distr._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") *α* = *β* + 1/2; *β* ranging from 0 to 25 The [characteristic function](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") is the [Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform "Fourier transform") of the probability density function. The characteristic function of the beta distribution is [Kummer's confluent hypergeometric function](https://en.wikipedia.org/wiki/Confluent_hypergeometric_function "Confluent hypergeometric function") (of the first kind):[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18)[\[22\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Zwillinger_2014-22) φ X ( α ; β ; t ) \= E ⁡ \[ e i t X \] \= ∫ 0 1 e i t x f ( x ; α , β ) d x \= 1 F 1 ( α ; α \+ β ; i t ) \= ∑ n \= 0 ∞ α n ¯ ( i t ) n ( α \+ β ) n ¯ n \! \= 1 \+ ∑ k \= 1 ∞ ( ∏ r \= 0 k − 1 α \+ r α \+ β \+ r ) ( i t ) k k \! {\\displaystyle {\\begin{aligned}\\varphi \_{X}(\\alpha ;\\beta ;t)&=\\operatorname {E} \\left\[e^{itX}\\right\]\\\\&=\\int \_{0}^{1}e^{itx}f(x;\\alpha ,\\beta )\\,dx\\\\&={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\!\\\\&=\\sum \_{n=0}^{\\infty }{\\frac {\\alpha ^{\\overline {n}}(it)^{n}}{(\\alpha +\\beta )^{\\overline {n}}n!}}\\\\&=1+\\sum \_{k=1}^{\\infty }\\left(\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}\\right){\\frac {(it)^{k}}{k!}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\varphi \_{X}(\\alpha ;\\beta ;t)&=\\operatorname {E} \\left\[e^{itX}\\right\]\\\\&=\\int \_{0}^{1}e^{itx}f(x;\\alpha ,\\beta )\\,dx\\\\&={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\!\\\\&=\\sum \_{n=0}^{\\infty }{\\frac {\\alpha ^{\\overline {n}}(it)^{n}}{(\\alpha +\\beta )^{\\overline {n}}n!}}\\\\&=1+\\sum \_{k=1}^{\\infty }\\left(\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}\\right){\\frac {(it)^{k}}{k!}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e0e8f5d3bf4ec0cbe9b911e26c961e9ebaefdd8) where x n ¯ \= x ( x \+ 1 ) ( x \+ 2 ) ⋯ ( x \+ n − 1 ) {\\displaystyle x^{\\overline {n}}=x(x+1)(x+2)\\cdots (x+n-1)} ![{\\displaystyle x^{\\overline {n}}=x(x+1)(x+2)\\cdots (x+n-1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4621f37bf97baedc02455832a6345db0233819aa) is the [rising factorial](https://en.wikipedia.org/wiki/Rising_factorial "Rising factorial"). The value of the characteristic function for *t* = 0, is one: φ X ( α ; β ; 0 ) \= 1 F 1 ( α ; α \+ β ; 0 ) \= 1\. {\\displaystyle \\varphi \_{X}(\\alpha ;\\beta ;0)={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;0)=1.} ![{\\displaystyle \\varphi \_{X}(\\alpha ;\\beta ;0)={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;0)=1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4b9e9291a6bd2e67688aa52ef57fdf08b4c50bc4) Also, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable *t*: Re ⁡ \[ 1 F 1 ( α ; α \+ β ; i t ) \] \= Re ⁡ \[ 1 F 1 ( α ; α \+ β ; − i t ) \] {\\displaystyle \\operatorname {Re} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\right\]=\\operatorname {Re} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\\right\]} ![{\\displaystyle \\operatorname {Re} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\right\]=\\operatorname {Re} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/468f2135d76bd1b522c84092679209ff6abd5845) Im ⁡ \[ 1 F 1 ( α ; α \+ β ; i t ) \] \= − Im ⁡ \[ 1 F 1 ( α ; α \+ β ; − i t ) \] {\\displaystyle \\operatorname {Im} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\right\]=-\\operatorname {Im} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\\right\]} ![{\\displaystyle \\operatorname {Im} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\right\]=-\\operatorname {Im} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0fca8984292cc85cb8a37ecb5a2b7c04b5596282) The symmetric case *α* = *β* simplifies the characteristic function of the beta distribution to a [Bessel function](https://en.wikipedia.org/wiki/Bessel_function "Bessel function"), since in the special case *α* + *β* = 2*α* the [confluent hypergeometric function](https://en.wikipedia.org/wiki/Confluent_hypergeometric_function "Confluent hypergeometric function") (of the first kind) reduces to a [Bessel function](https://en.wikipedia.org/wiki/Bessel_function "Bessel function") (the modified Bessel function of the first kind I α − 1 2 {\\displaystyle I\_{\\alpha -{\\frac {1}{2}}}} ![{\\displaystyle I\_{\\alpha -{\\frac {1}{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/93d199b8c1bcbdb8b27bc66ea2a0eb51102aa71e) ) using [Kummer's](https://en.wikipedia.org/wiki/Ernst_Kummer "Ernst Kummer") second transformation as follows: 1 F 1 ( α ; 2 α ; i t ) \= e i t 2 0 F 1 ( ; α \+ 1 2 ; ( i t ) 2 16 ) \= e i t 2 ( i t 4 ) 1 2 − α Γ ( α \+ 1 2 ) I α − 1 2 ( i t 2 ) . {\\displaystyle {\\begin{aligned}{}\_{1}F\_{1}(\\alpha ;2\\alpha ;it)&=e^{\\frac {it}{2}}{}\_{0}F\_{1}\\left(;\\alpha +{\\tfrac {1}{2}};{\\frac {(it)^{2}}{16}}\\right)\\\\&=e^{\\frac {it}{2}}\\left({\\frac {it}{4}}\\right)^{{\\frac {1}{2}}-\\alpha }\\Gamma \\left(\\alpha +{\\tfrac {1}{2}}\\right)I\_{\\alpha -{\\frac {1}{2}}}\\left({\\frac {it}{2}}\\right).\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{}\_{1}F\_{1}(\\alpha ;2\\alpha ;it)&=e^{\\frac {it}{2}}{}\_{0}F\_{1}\\left(;\\alpha +{\\tfrac {1}{2}};{\\frac {(it)^{2}}{16}}\\right)\\\\&=e^{\\frac {it}{2}}\\left({\\frac {it}{4}}\\right)^{{\\frac {1}{2}}-\\alpha }\\Gamma \\left(\\alpha +{\\tfrac {1}{2}}\\right)I\_{\\alpha -{\\frac {1}{2}}}\\left({\\frac {it}{2}}\\right).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f6729c240186898dc4200acd85640a1bc43a37f8) In the accompanying plots, the [real part](https://en.wikipedia.org/wiki/Complex_number "Complex number") (Re) of the [characteristic function](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") of the beta distribution is displayed for symmetric (*α* = *β*) and skewed (*α* ≠ *β*) cases. ### Other moments \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=25 "Edit section: Other moments")\] #### Moment generating function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=26 "Edit section: Moment generating function")\] It also follows[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9) that the [moment generating function](https://en.wikipedia.org/wiki/Moment_generating_function "Moment generating function") is M X ( α ; β ; t ) \= E ⁡ \[ e t X \] \= ∫ 0 1 e t x f ( x ; α , β ) d x \= 1 F 1 ( α ; α \+ β ; t ) \= ∑ n \= 0 ∞ α n ¯ ( α \+ β ) n ¯ t n n \! \= 1 \+ ∑ k \= 1 ∞ ( ∏ r \= 0 k − 1 α \+ r α \+ β \+ r ) t k k \! . {\\displaystyle {\\begin{aligned}M\_{X}(\\alpha ;\\beta ;t)&=\\operatorname {E} \\left\[e^{tX}\\right\]\\\\\[4pt\]&=\\int \_{0}^{1}e^{tx}f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;t)\\\\\[4pt\]&=\\sum \_{n=0}^{\\infty }{\\frac {\\alpha ^{\\overline {n}}}{(\\alpha +\\beta )^{\\overline {n}}}}{\\frac {t^{n}}{n!}}\\\\\[4pt\]&=1+\\sum \_{k=1}^{\\infty }\\left(\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}\\right){\\frac {t^{k}}{k!}}.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}M\_{X}(\\alpha ;\\beta ;t)&=\\operatorname {E} \\left\[e^{tX}\\right\]\\\\\[4pt\]&=\\int \_{0}^{1}e^{tx}f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;t)\\\\\[4pt\]&=\\sum \_{n=0}^{\\infty }{\\frac {\\alpha ^{\\overline {n}}}{(\\alpha +\\beta )^{\\overline {n}}}}{\\frac {t^{n}}{n!}}\\\\\[4pt\]&=1+\\sum \_{k=1}^{\\infty }\\left(\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}\\right){\\frac {t^{k}}{k!}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7c5c0eea7bcffadb73fef0c85534a8bdca36ebbb) In particular *M**X*(*α*; *β*; 0) = 1. #### Higher moments \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=27 "Edit section: Higher moments")\] Using the [moment generating function](https://en.wikipedia.org/wiki/Moment_generating_function "Moment generating function"), the *k*\-th [raw moment](https://en.wikipedia.org/wiki/Raw_moment "Raw moment") is given by[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) the factor ∏ r \= 0 k − 1 α \+ r α \+ β \+ r {\\displaystyle \\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}} ![{\\displaystyle \\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2ebd486800857fa26dc780277828ac6e8549b6dd) multiplying the (exponential series) term ( t k k \! ) {\\displaystyle \\left({\\frac {t^{k}}{k!}}\\right)} ![{\\displaystyle \\left({\\frac {t^{k}}{k!}}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0fa171a890daf87017345709927744967da720eb) in the series of the [moment generating function](https://en.wikipedia.org/wiki/Moment_generating_function "Moment generating function") E ⁡ \[ X k \] \= α k ¯ ( α \+ β ) k ¯ \= ∏ r \= 0 k − 1 α \+ r α \+ β \+ r {\\displaystyle \\operatorname {E} \[X^{k}\]={\\frac {\\alpha ^{\\overline {k}}}{(\\alpha +\\beta )^{\\overline {k}}}}=\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}} ![{\\displaystyle \\operatorname {E} \[X^{k}\]={\\frac {\\alpha ^{\\overline {k}}}{(\\alpha +\\beta )^{\\overline {k}}}}=\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/af7e0927f7f82eccc61e9fb55883ce96d7f958f1) where (*x*)(*k*) is a [Pochhammer symbol](https://en.wikipedia.org/wiki/Pochhammer_symbol "Pochhammer symbol") representing rising factorial. It can also be written in a recursive form as E ⁡ \[ X k \] \= α \+ k − 1 α \+ β \+ k − 1 E ⁡ \[ X k − 1 \] . {\\displaystyle \\operatorname {E} \[X^{k}\]={\\frac {\\alpha +k-1}{\\alpha +\\beta +k-1}}\\operatorname {E} \[X^{k-1}\].} ![{\\displaystyle \\operatorname {E} \[X^{k}\]={\\frac {\\alpha +k-1}{\\alpha +\\beta +k-1}}\\operatorname {E} \[X^{k-1}\].}](https://wikimedia.org/api/rest_v1/media/math/render/svg/069cb373a905b1e8a5a82a0e3b028e88f63672e2) Since the moment generating function M X ( α ; β ; ⋅ ) {\\displaystyle M\_{X}(\\alpha ;\\beta ;\\cdot )} ![{\\displaystyle M\_{X}(\\alpha ;\\beta ;\\cdot )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2da3ca5fb26a0197cfdb61c2bad8a839475aedb4) has a positive radius of convergence,\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] the beta distribution is [determined by its moments](https://en.wikipedia.org/wiki/Moment_problem "Moment problem").[\[23\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-23) #### Moments of transformed random variables \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=28 "Edit section: Moments of transformed random variables")\] ##### Moments of linearly transformed, product and inverted random variables \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=29 "Edit section: Moments of linearly transformed, product and inverted random variables")\] One can also show the following expectations for a transformed random variable,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) where the random variable *X* is Beta-distributed with parameters *α* and *β*: *X* ~ Beta(*α*, *β*). The expected value of the variable 1 − *X* is the mirror-symmetry of the expected value based on *X*: E ⁡ \[ 1 − X \] \= β α \+ β E ⁡ \[ X ( 1 − X ) \] \= E ⁡ \[ ( 1 − X ) X \] \= α β ( α \+ β ) ( α \+ β \+ 1 ) {\\displaystyle {\\begin{aligned}\\operatorname {E} \[1-X\]&={\\frac {\\beta }{\\alpha +\\beta }}\\\\\\operatorname {E} \[X(1-X)\]&=\\operatorname {E} \[(1-X)X\]={\\frac {\\alpha \\beta }{(\\alpha +\\beta )(\\alpha +\\beta +1)}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[1-X\]&={\\frac {\\beta }{\\alpha +\\beta }}\\\\\\operatorname {E} \[X(1-X)\]&=\\operatorname {E} \[(1-X)X\]={\\frac {\\alpha \\beta }{(\\alpha +\\beta )(\\alpha +\\beta +1)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/43fc49b9eafccd56d39c236b26d222dde51638ce) Due to the mirror-symmetry of the probability density function of the beta distribution, the variances based on variables *X* and 1 − *X* are identical, and the covariance on *X*(1 − *X*) is the negative of the variance: var ⁡ \[ ( 1 − X ) \] \= var ⁡ \[ X \] \= − cov ⁡ \[ X , ( 1 − X ) \] \= α β ( α \+ β ) 2 ( α \+ β \+ 1 ) {\\displaystyle \\operatorname {var} \[(1-X)\]=\\operatorname {var} \[X\]=-\\operatorname {cov} \[X,(1-X)\]={\\frac {\\alpha \\beta }{(\\alpha +\\beta )^{2}(\\alpha +\\beta +1)}}} ![{\\displaystyle \\operatorname {var} \[(1-X)\]=\\operatorname {var} \[X\]=-\\operatorname {cov} \[X,(1-X)\]={\\frac {\\alpha \\beta }{(\\alpha +\\beta )^{2}(\\alpha +\\beta +1)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7273cc84a6c789724b985c34059fa75a62bce631) These are the expected values for inverted variables, (these are related to the harmonic means, see [§ Harmonic mean](https://en.wikipedia.org/wiki/Beta_distribution#Harmonic_mean)): E ⁡ \[ 1 X \] \= α \+ β − 1 α − 1 if α \> 1 E ⁡ \[ 1 1 − X \] \= α \+ β − 1 β − 1 if β \> 1 {\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]&={\\frac {\\alpha +\\beta -1}{\\alpha -1}}&&{\\text{ if }}\\alpha \>1\\\\\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]&={\\frac {\\alpha +\\beta -1}{\\beta -1}}&&{\\text{ if }}\\beta \>1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]&={\\frac {\\alpha +\\beta -1}{\\alpha -1}}&&{\\text{ if }}\\alpha \>1\\\\\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]&={\\frac {\\alpha +\\beta -1}{\\beta -1}}&&{\\text{ if }}\\beta \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/972a9c1853e6991aac6666b06c0a633a0caad5b7) The following transformation by dividing the variable *X* by its mirror-image *X*/(1 − *X*)) results in the expected value of the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")):[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) E ⁡ \[ X 1 − X \] \= α β − 1 if β \> 1 E ⁡ \[ 1 − X X \] \= β α − 1 if α \> 1 {\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]&={\\frac {\\alpha }{\\beta -1}}&&{\\text{ if }}\\beta \>1\\\\\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]&={\\frac {\\beta }{\\alpha -1}}&&{\\text{ if }}\\alpha \>1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]&={\\frac {\\alpha }{\\beta -1}}&&{\\text{ if }}\\beta \>1\\\\\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]&={\\frac {\\beta }{\\alpha -1}}&&{\\text{ if }}\\alpha \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/db2a0928a7f906bcbe5cfd9d0c57713c9ab5cfb7) Variances of these transformed variables can be obtained by integration, as the expected values of the second moments centered on the corresponding variables: var ⁡ \[ 1 X \] \= E ⁡ \[ ( 1 X − E ⁡ \[ 1 X \] ) 2 \] \= var ⁡ \[ 1 − X X \] \= E ⁡ \[ ( 1 − X X − E ⁡ \[ 1 − X X \] ) 2 \] \= β ( α \+ β − 1 ) ( α − 2 ) ( α − 1 ) 2 if α \> 2 {\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[{\\frac {1}{X}}\\right\]&=\\operatorname {E} \\left\[\\left({\\frac {1}{X}}-\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]\\right)^{2}\\right\]=\\operatorname {var} \\left\[{\\frac {1-X}{X}}\\right\]\\\\&=\\operatorname {E} \\left\[\\left({\\frac {1-X}{X}}-\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]\\right)^{2}\\right\]={\\frac {\\beta (\\alpha +\\beta -1)}{\\left(\\alpha -2\\right)\\left(\\alpha -1\\right)^{2}}}{\\text{ if }}\\alpha \>2\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[{\\frac {1}{X}}\\right\]&=\\operatorname {E} \\left\[\\left({\\frac {1}{X}}-\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]\\right)^{2}\\right\]=\\operatorname {var} \\left\[{\\frac {1-X}{X}}\\right\]\\\\&=\\operatorname {E} \\left\[\\left({\\frac {1-X}{X}}-\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]\\right)^{2}\\right\]={\\frac {\\beta (\\alpha +\\beta -1)}{\\left(\\alpha -2\\right)\\left(\\alpha -1\\right)^{2}}}{\\text{ if }}\\alpha \>2\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ab8c3db5d04798c92c6360624060ba583ebdf3df) The following variance of the variable *X* divided by its mirror-image (*X*/(1−*X*) results in the variance of the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")):[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) var ⁡ \[ 1 1 − X \] \= E ⁡ \[ ( 1 1 − X − E ⁡ \[ 1 1 − X \] ) 2 \] \= var ⁡ \[ X 1 − X \] \= E ⁡ \[ ( X 1 − X − E ⁡ \[ X 1 − X \] ) 2 \] \= α ( α \+ β − 1 ) ( β − 2 ) ( β − 1 ) 2 if β \> 2 {\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[{\\frac {1}{1-X}}\\right\]&=\\operatorname {E} \\left\[\\left({\\frac {1}{1-X}}-\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]\\right)^{2}\\right\]=\\operatorname {var} \\left\[{\\frac {X}{1-X}}\\right\]\\\\\[1ex\]&=\\operatorname {E} \\left\[\\left({\\frac {X}{1-X}}-\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]\\right)^{2}\\right\]={\\frac {\\alpha (\\alpha +\\beta -1)}{\\left(\\beta -2\\right)\\left(\\beta -1\\right)^{2}}}{\\text{ if }}\\beta \>2\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[{\\frac {1}{1-X}}\\right\]&=\\operatorname {E} \\left\[\\left({\\frac {1}{1-X}}-\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]\\right)^{2}\\right\]=\\operatorname {var} \\left\[{\\frac {X}{1-X}}\\right\]\\\\\[1ex\]&=\\operatorname {E} \\left\[\\left({\\frac {X}{1-X}}-\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]\\right)^{2}\\right\]={\\frac {\\alpha (\\alpha +\\beta -1)}{\\left(\\beta -2\\right)\\left(\\beta -1\\right)^{2}}}{\\text{ if }}\\beta \>2\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/76d83000746ff4852e63b063948f9db8110d2d22) The covariances are: cov ⁡ \[ 1 X , 1 1 − X \] \= cov ⁡ \[ 1 − X X , X 1 − X \] \= cov ⁡ \[ 1 X , X 1 − X \] \= cov ⁡ \[ 1 − X X , 1 1 − X \] \= α \+ β − 1 ( α − 1 ) ( β − 1 ) if α , β \> 1 {\\displaystyle {\\begin{aligned}\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {1}{1-X}}\\right\]&=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {X}{1-X}}\\right\]=\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {X}{1-X}}\\right\]\\\\\[1ex\]&=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {1}{1-X}}\\right\]={\\frac {\\alpha +\\beta -1}{(\\alpha -1)(\\beta -1)}}{\\text{ if }}\\alpha ,\\beta \>1\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {1}{1-X}}\\right\]&=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {X}{1-X}}\\right\]=\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {X}{1-X}}\\right\]\\\\\[1ex\]&=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {1}{1-X}}\\right\]={\\frac {\\alpha +\\beta -1}{(\\alpha -1)(\\beta -1)}}{\\text{ if }}\\alpha ,\\beta \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ce6ebb40e9f206c0269353799307092c2edbcfa0) These expectations and variances appear in the four-parameter Fisher information matrix ([§ Fisher information](https://en.wikipedia.org/wiki/Beta_distribution#Fisher_information).) ##### Moments of logarithmically transformed random variables \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=30 "Edit section: Moments of logarithmically transformed random variables")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Logit.svg/500px-Logit.svg.png)](https://en.wikipedia.org/wiki/File:Logit.svg) Plot of logit(*X*) = ln(*X*/(1 −*X*)) (vertical axis) vs. *X* in the domain of 0 to 1 (horizontal axis). Logit transformations are interesting, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable Expected values for [logarithmic transformations](https://en.wikipedia.org/wiki/Logarithm_transformation "Logarithm transformation") (useful for [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimates, see [§ Parameter estimation, Maximum likelihood](https://en.wikipedia.org/wiki/Beta_distribution#Parameter_estimation,_Maximum_likelihood)) are discussed in this section. The following logarithmic linear transformations are related to the geometric means *GX* and *G*1−*X* (see [§ Geometric Mean](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_Mean)): E ⁡ \[ ln ⁡ X \] \= ψ ( α ) − ψ ( α \+ β ) \= − E ⁡ \[ ln ⁡ 1 X \] , E ⁡ \[ ln ⁡ ( 1 − X ) \] \= ψ ( β ) − ψ ( α \+ β ) \= − E ⁡ \[ ln ⁡ 1 1 − X \] . {\\displaystyle {\\begin{aligned}\\operatorname {E} \[\\ln X\]&=\\psi (\\alpha )-\\psi (\\alpha +\\beta )=-\\operatorname {E} \\left\[\\ln {\\frac {1}{X}}\\right\],\\\\\\operatorname {E} \[\\ln(1-X)\]&=\\psi (\\beta )-\\psi (\\alpha +\\beta )=-\\operatorname {E} \\left\[\\ln {\\frac {1}{1-X}}\\right\].\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[\\ln X\]&=\\psi (\\alpha )-\\psi (\\alpha +\\beta )=-\\operatorname {E} \\left\[\\ln {\\frac {1}{X}}\\right\],\\\\\\operatorname {E} \[\\ln(1-X)\]&=\\psi (\\beta )-\\psi (\\alpha +\\beta )=-\\operatorname {E} \\left\[\\ln {\\frac {1}{1-X}}\\right\].\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/af6c3188ac96160054472db2a23bab9a22f1e486) Where the **[digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function")** *ψ*(*α*) is defined as the [logarithmic derivative](https://en.wikipedia.org/wiki/Logarithmic_derivative "Logarithmic derivative") of the [gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"):[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18) ψ ( α ) \= d d α ln ⁡ Γ ( α ) {\\displaystyle \\psi (\\alpha )={\\frac {d}{d\\alpha }}\\ln \\Gamma (\\alpha )} ![{\\displaystyle \\psi (\\alpha )={\\frac {d}{d\\alpha }}\\ln \\Gamma (\\alpha )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7990c6470eee454b17e98d894aa6a2b31960a8f7) [Logit](https://en.wikipedia.org/wiki/Logit "Logit") transformations are interesting,[\[24\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-MacKay-24) as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable: E ⁡ \[ ln ⁡ X 1 − X \] \= ψ ( α ) − ψ ( β ) \= E ⁡ \[ ln ⁡ X \] \+ E ⁡ \[ ln ⁡ 1 1 − X \] , E ⁡ \[ ln ⁡ 1 − X X \] \= ψ ( β ) − ψ ( α ) \= − E ⁡ \[ ln ⁡ X 1 − X \] . {\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[\\ln {\\frac {X}{1-X}}\\right\]&=\\psi (\\alpha )-\\psi (\\beta )=\\operatorname {E} \[\\ln X\]+\\operatorname {E} \\left\[\\ln {\\frac {1}{1-X}}\\right\],\\\\\\operatorname {E} \\left\[\\ln {\\frac {1-X}{X}}\\right\]&=\\psi (\\beta )-\\psi (\\alpha )=-\\operatorname {E} \\left\[\\ln {\\frac {X}{1-X}}\\right\].\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[\\ln {\\frac {X}{1-X}}\\right\]&=\\psi (\\alpha )-\\psi (\\beta )=\\operatorname {E} \[\\ln X\]+\\operatorname {E} \\left\[\\ln {\\frac {1}{1-X}}\\right\],\\\\\\operatorname {E} \\left\[\\ln {\\frac {1-X}{X}}\\right\]&=\\psi (\\beta )-\\psi (\\alpha )=-\\operatorname {E} \\left\[\\ln {\\frac {X}{1-X}}\\right\].\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a30b0fbbefff93de41d6775f2ab623670f09a4b2) Johnson[\[25\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JohnsonLogInv-25) considered the distribution of the [logit](https://en.wikipedia.org/wiki/Logit "Logit") – transformed variable ln(*X*/1 − *X*), including its moment generating function and approximations for large values of the shape parameters. This transformation extends the finite support \[0, 1\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the [logistic-beta distribution](https://en.wikipedia.org/wiki/Logistic-beta_distribution "Logistic-beta distribution"). Higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows: E ⁡ \[ ln 2 ⁡ ( X ) \] \= ( ψ ( α ) − ψ ( α \+ β ) ) 2 \+ ψ 1 ( α ) − ψ 1 ( α \+ β ) , E ⁡ \[ ln 2 ⁡ ( 1 − X ) \] \= ( ψ ( β ) − ψ ( α \+ β ) ) 2 \+ ψ 1 ( β ) − ψ 1 ( α \+ β ) , E ⁡ \[ ln ⁡ ( X ) ln ⁡ ( 1 − X ) \] \= ( ψ ( α ) − ψ ( α \+ β ) ) ( ψ ( β ) − ψ ( α \+ β ) ) − ψ 1 ( α \+ β ) . {\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[\\ln ^{2}(X)\\right\]&=(\\psi (\\alpha )-\\psi (\\alpha +\\beta ))^{2}+\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {E} \\left\[\\ln ^{2}(1-X)\\right\]&=(\\psi (\\beta )-\\psi (\\alpha +\\beta ))^{2}+\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {E} \\left\[\\ln(X)\\ln(1-X)\\right\]&=(\\psi (\\alpha )-\\psi (\\alpha +\\beta ))(\\psi (\\beta )-\\psi (\\alpha +\\beta ))-\\psi \_{1}(\\alpha +\\beta ).\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[\\ln ^{2}(X)\\right\]&=(\\psi (\\alpha )-\\psi (\\alpha +\\beta ))^{2}+\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {E} \\left\[\\ln ^{2}(1-X)\\right\]&=(\\psi (\\beta )-\\psi (\\alpha +\\beta ))^{2}+\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {E} \\left\[\\ln(X)\\ln(1-X)\\right\]&=(\\psi (\\alpha )-\\psi (\\alpha +\\beta ))(\\psi (\\beta )-\\psi (\\alpha +\\beta ))-\\psi \_{1}(\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b42eb1276e349df39df3051df11e0e16afe88e2e) therefore the [variance](https://en.wikipedia.org/wiki/Variance "Variance") of the logarithmic variables and [covariance](https://en.wikipedia.org/wiki/Covariance "Covariance") of ln(*X*) and ln(1−*X*) are: cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] \= E ⁡ \[ ln ⁡ X ln ⁡ ( 1 − X ) \] − E ⁡ \[ ln ⁡ X \] E ⁡ \[ ln ⁡ ( 1 − X ) \] \= − ψ 1 ( α \+ β ) var ⁡ \[ ln ⁡ X \] \= E ⁡ \[ ln 2 ⁡ X \] − ( E ⁡ \[ ln ⁡ X \] ) 2 \= ψ 1 ( α ) − ψ 1 ( α \+ β ) \= ψ 1 ( α ) \+ cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] var ⁡ \[ ln ⁡ ( 1 − X ) \] \= E ⁡ \[ ln 2 ⁡ ( 1 − X ) \] − ( E ⁡ \[ ln ⁡ ( 1 − X ) \] ) 2 \= ψ 1 ( β ) − ψ 1 ( α \+ β ) \= ψ 1 ( β ) \+ cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] {\\displaystyle {\\begin{aligned}\\operatorname {cov} \[\\ln X,\\ln(1-X)\]&=\\operatorname {E} \\left\[\\ln X\\ln(1-X)\\right\]-\\operatorname {E} \[\\ln X\]\\operatorname {E} \[\\ln(1-X)\]\\\\&=-\\psi \_{1}(\\alpha +\\beta )\\\\&\\\\\\operatorname {var} \[\\ln X\]&=\\operatorname {E} \[\\ln ^{2}X\]-(\\operatorname {E} \[\\ln X\])^{2}\\\\&=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )\\\\&=\\psi \_{1}(\\alpha )+\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\\\&\\\\\\operatorname {var} \[\\ln(1-X)\]&=\\operatorname {E} \[\\ln ^{2}(1-X)\]-(\\operatorname {E} \[\\ln(1-X)\])^{2}\\\\&=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )\\\\&=\\psi \_{1}(\\beta )+\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {cov} \[\\ln X,\\ln(1-X)\]&=\\operatorname {E} \\left\[\\ln X\\ln(1-X)\\right\]-\\operatorname {E} \[\\ln X\]\\operatorname {E} \[\\ln(1-X)\]\\\\&=-\\psi \_{1}(\\alpha +\\beta )\\\\&\\\\\\operatorname {var} \[\\ln X\]&=\\operatorname {E} \[\\ln ^{2}X\]-(\\operatorname {E} \[\\ln X\])^{2}\\\\&=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )\\\\&=\\psi \_{1}(\\alpha )+\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\\\&\\\\\\operatorname {var} \[\\ln(1-X)\]&=\\operatorname {E} \[\\ln ^{2}(1-X)\]-(\\operatorname {E} \[\\ln(1-X)\])^{2}\\\\&=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )\\\\&=\\psi \_{1}(\\beta )+\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53f5f8222528ab3aa1e5f610f49d440d348f7d40) where the **[trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), and is defined as the derivative of the [digamma](https://en.wikipedia.org/wiki/Digamma "Digamma") function: ψ 1 ( α ) \= d 2 ln ⁡ Γ ( α ) d α 2 \= d ψ ( α ) d α . {\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{d\\alpha ^{2}}}={\\frac {d\\psi (\\alpha )}{d\\alpha }}.} ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{d\\alpha ^{2}}}={\\frac {d\\psi (\\alpha )}{d\\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/041bbff527a17a19f628022e9d3bbb548e7c9f87) The variances and covariance of the logarithmically transformed variables *X* and (1 − *X*) are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables *X* and (1 − *X*), as the logarithm approaches negative infinity for the variable approaching zero. These logarithmic variances and covariance are the elements of the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information") matrix for the beta distribution. They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation). The variances of the log inverse variables are identical to the variances of the log variables: var ⁡ \[ ln ⁡ 1 X \] \= var ⁡ \[ ln ⁡ X \] \= ψ 1 ( α ) − ψ 1 ( α \+ β ) , var ⁡ \[ ln ⁡ 1 1 − X \] \= var ⁡ \[ ln ⁡ ( 1 − X ) \] \= ψ 1 ( β ) − ψ 1 ( α \+ β ) , cov ⁡ \[ ln ⁡ 1 X , ln ⁡ 1 1 − X \] \= cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] \= − ψ 1 ( α \+ β ) . {\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[\\ln {\\frac {1}{X}}\\right\]&=\\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {var} \\left\[\\ln {\\frac {1}{1-X}}\\right\]&=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {cov} \\left\[\\ln {\\frac {1}{X}},\\,\\ln {\\frac {1}{1-X}}\\right\]&=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta ).\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[\\ln {\\frac {1}{X}}\\right\]&=\\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {var} \\left\[\\ln {\\frac {1}{1-X}}\\right\]&=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {cov} \\left\[\\ln {\\frac {1}{X}},\\,\\ln {\\frac {1}{1-X}}\\right\]&=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/18739db82ac6e431571537bb8f09ff006d670b04) It also follows that the variances of the [logit](https://en.wikipedia.org/wiki/Logit "Logit")\-transformed variables are var ⁡ \[ ln ⁡ X 1 − X \] \= var ⁡ \[ ln ⁡ 1 − X X \] \= − cov ⁡ \[ ln ⁡ X 1 − X , ln ⁡ 1 − X X \] \= ψ 1 ( α ) \+ ψ 1 ( β ) . {\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[\\ln {\\frac {X}{1-X}}\\right\]&=\\operatorname {var} \\left\[\\ln {\\frac {1-X}{X}}\\right\]\\\\&=-\\operatorname {cov} \\left\[\\ln {\\frac {X}{1-X}},\\,\\ln {\\frac {1-X}{X}}\\right\]\\\\\[1ex\]&=\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ).\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[\\ln {\\frac {X}{1-X}}\\right\]&=\\operatorname {var} \\left\[\\ln {\\frac {1-X}{X}}\\right\]\\\\&=-\\operatorname {cov} \\left\[\\ln {\\frac {X}{1-X}},\\,\\ln {\\frac {1-X}{X}}\\right\]\\\\\[1ex\]&=\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b0f473b85a33bfd20c16bd2e752af80def37388a) ### Quantities of information (entropy) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=31 "Edit section: Quantities of information (entropy)")\] Given a beta distributed random variable, *X* ~ Beta(*α*, *β*), the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") of *X* is (measured in [nats](https://en.wikipedia.org/wiki/Nat_\(unit\) "Nat (unit)")),[\[26\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-26) the expected value of the negative of the logarithm of the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function"): h ( X ) \= E ⁡ \[ − ln ⁡ f ( X ; α , β ) \] \= ∫ 0 1 − f ( x ; α , β ) ln ⁡ f ( x ; α , β ) d x \= ln ⁡ B ( α , β ) − ( α − 1 ) ψ ( α ) − ( β − 1 ) ψ ( β ) \+ ( α \+ β − 2 ) ψ ( α \+ β ) {\\displaystyle {\\begin{aligned}h(X)&=\\operatorname {E} \\left\[-\\ln f(X;\\alpha ,\\beta )\\right\]\\\\\[4pt\]&=\\int \_{0}^{1}-f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&=\\ln \\mathrm {B} (\\alpha ,\\beta )-(\\alpha -1)\\psi (\\alpha )-(\\beta -1)\\psi (\\beta )+(\\alpha +\\beta -2)\\psi (\\alpha +\\beta )\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}h(X)&=\\operatorname {E} \\left\[-\\ln f(X;\\alpha ,\\beta )\\right\]\\\\\[4pt\]&=\\int \_{0}^{1}-f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&=\\ln \\mathrm {B} (\\alpha ,\\beta )-(\\alpha -1)\\psi (\\alpha )-(\\beta -1)\\psi (\\beta )+(\\alpha +\\beta -2)\\psi (\\alpha +\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ee7535c40811773d4239dc63ea6c2200c4b7a63c) where *f*(*x*; *α*, *β*) is the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") of the beta distribution: f ( x ; α , β ) \= x α − 1 ( 1 − x ) β − 1 B ( α , β ) {\\displaystyle f(x;\\alpha ,\\beta )={\\frac {x^{\\alpha -1}\\left(1-x\\right)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}} ![{\\displaystyle f(x;\\alpha ,\\beta )={\\frac {x^{\\alpha -1}\\left(1-x\\right)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4fe624be6412bdc2abe8b1775b1ca44b5a9ea27c) The [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function") *ψ* appears in the formula for the differential entropy as a consequence of Euler's integral formula for the [harmonic numbers](https://en.wikipedia.org/wiki/Harmonic_number "Harmonic number") which follows from the integral: ∫ 0 1 1 − x α − 1 1 − x d x \= ψ ( α ) − ψ ( 1 ) {\\displaystyle \\int \_{0}^{1}{\\frac {1-x^{\\alpha -1}}{1-x}}\\,dx=\\psi (\\alpha )-\\psi (1)} ![{\\displaystyle \\int \_{0}^{1}{\\frac {1-x^{\\alpha -1}}{1-x}}\\,dx=\\psi (\\alpha )-\\psi (1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/364bd5d460a8db9c5038b2f19cc1e5d088671ae8) The [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") of the beta distribution is negative for all values of *α* and *β* greater than zero, except at *α* = *β* = 1 (for which values the beta distribution is the same as the [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)")), where the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") reaches its [maximum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of zero. It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable. For *α* or *β* approaching zero, the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") approaches its [minimum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of negative infinity. For (either or both) *α* or *β* approaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both) *α* or *β* approaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order. If either *α* or *β* approaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else. If both shape parameters are equal (the symmetric case), *α* = *β*, and they approach infinity simultaneously, the probability density becomes a spike ([Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function")) concentrated at the middle *x* = 1/2, and hence there is 100% probability at the middle *x* = 1/2 and zero probability everywhere else. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)[![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg) The (continuous case) [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") was introduced by Shannon in his original paper (where he named it the "entropy of a continuous distribution"), as the concluding part of the same paper where he defined the [discrete entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy").[\[27\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-27) It is known since then that the differential entropy may differ from the [infinitesimal](https://en.wikipedia.org/wiki/Infinitesimal "Infinitesimal") limit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy. Given two beta distributed random variables, *X*1 ~ Beta(*α*, *β*) and *X*2 ~ Beta(*α′*, *β′*), the [cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy "Cross-entropy") is (measured in nats)[\[28\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Cover_and_Thomas-28) H ( X 1 , X 2 ) \= ∫ 0 1 − f ( x ; α , β ) ln ⁡ f ( x ; α ′ , β ′ ) d x \= ln ⁡ B ( α ′ , β ′ ) − ( α ′ − 1 ) ψ ( α ) − ( β ′ − 1 ) ψ ( β ) \+ ( α ′ \+ β ′ − 2 ) ψ ( α \+ β ) . {\\displaystyle {\\begin{aligned}H(X\_{1},X\_{2})&=\\int \_{0}^{1}-f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ',\\beta ')\\,dx\\\\\[4pt\]&=\\ln \\mathrm {B} (\\alpha ',\\beta ')-(\\alpha '-1)\\psi (\\alpha )-(\\beta '-1)\\psi (\\beta )+\\left(\\alpha '+\\beta '-2\\right)\\psi (\\alpha +\\beta ).\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}H(X\_{1},X\_{2})&=\\int \_{0}^{1}-f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ',\\beta ')\\,dx\\\\\[4pt\]&=\\ln \\mathrm {B} (\\alpha ',\\beta ')-(\\alpha '-1)\\psi (\\alpha )-(\\beta '-1)\\psi (\\beta )+\\left(\\alpha '+\\beta '-2\\right)\\psi (\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f9e7662d68f802bd6e6b1019f0eb46e6a4bfc0a4) The [cross entropy](https://en.wikipedia.org/wiki/Cross_entropy "Cross entropy") has been used as an error metric to measure the distance between two hypotheses.[\[29\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Plunkett-29)[\[30\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Nallapati-30) Its absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood [\[28\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Cover_and_Thomas-28)(see section on "Parameter estimation. Maximum likelihood estimation")). The relative entropy, or [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") *D*KL(*X*1 \|\| *X*2), is a measure of the inefficiency of assuming that the distribution is *X*2 ~ Beta(*α′*, *β′*) when the distribution is really *X*1 ~ Beta(*α*, *β*). It is defined as follows (measured in nats). D K L ( X 1 ∥ X 2 ) \= ∫ 0 1 f ( x ; α , β ) ln ⁡ f ( x ; α , β ) f ( x ; α ′ , β ′ ) d x \= ( ∫ 0 1 f ( x ; α , β ) ln ⁡ f ( x ; α , β ) d x ) − ( ∫ 0 1 f ( x ; α , β ) ln ⁡ f ( x ; α ′ , β ′ ) d x ) \= − h ( X 1 ) \+ H ( X 1 , X 2 ) \= ln ⁡ B ( α ′ , β ′ ) B ( α , β ) \+ ( α − α ′ ) ψ ( α ) \+ ( β − β ′ ) ψ ( β ) \+ ( α ′ − α \+ β ′ − β ) ψ ( α \+ β ) . {\\displaystyle {\\begin{aligned}D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})&=\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\,\\ln {\\frac {f(x;\\alpha ,\\beta )}{f(x;\\alpha ',\\beta ')}}\\,dx\\\\\[4pt\]&=\\left(\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ,\\beta )\\,dx\\right)-\\left(\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ',\\beta ')\\,dx\\right)\\\\\[4pt\]&=-h(X\_{1})+H(X\_{1},X\_{2})\\\\\[4pt\]&=\\ln {\\frac {\\mathrm {B} (\\alpha ',\\beta ')}{\\mathrm {B} (\\alpha ,\\beta )}}+\\left(\\alpha -\\alpha '\\right)\\psi (\\alpha )+\\left(\\beta -\\beta '\\right)\\psi (\\beta )+\\left(\\alpha '-\\alpha +\\beta '-\\beta \\right)\\psi (\\alpha +\\beta ).\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})&=\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\,\\ln {\\frac {f(x;\\alpha ,\\beta )}{f(x;\\alpha ',\\beta ')}}\\,dx\\\\\[4pt\]&=\\left(\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ,\\beta )\\,dx\\right)-\\left(\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ',\\beta ')\\,dx\\right)\\\\\[4pt\]&=-h(X\_{1})+H(X\_{1},X\_{2})\\\\\[4pt\]&=\\ln {\\frac {\\mathrm {B} (\\alpha ',\\beta ')}{\\mathrm {B} (\\alpha ,\\beta )}}+\\left(\\alpha -\\alpha '\\right)\\psi (\\alpha )+\\left(\\beta -\\beta '\\right)\\psi (\\beta )+\\left(\\alpha '-\\alpha +\\beta '-\\beta \\right)\\psi (\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/533f00a1d061ebd96a170013f1339d34fc8f1322) The relative entropy, or [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence"), is always non-negative. A few numerical examples follow: - *X*1 ~ Beta(1, 1) and *X*2 ~ Beta(3, 3); *D*KL(*X*1 \|\| *X*2) = 0.598803; *D*KL(*X*2 \|\| *X*1) = 0.267864; *h*(*X*1) = 0; *h*(*X*2) = −0.267864 - *X*1 ~ Beta(3, 0.5) and *X*2 ~ Beta(0.5, 3); *D*KL(*X*1 \|\| *X*2) = 7.21574; *D*KL(*X*2 \|\| *X*1) = 7.21574; *h*(*X*1) = −1.10805; *h*(*X*2) = −1.10805. The [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") is not symmetric *D*KL(*X*1 \|\| *X*2) ≠ *D*KL(*X*2 \|\| *X*1) for the case in which the individual beta distributions Beta(1, 1) and Beta(3, 3) are symmetric, but have different entropies *h*(*X*1) ≠ *h*(*X*2). The value of the Kullback divergence depends on the direction traveled: whether going from a higher (differential) entropy to a lower (differential) entropy or the other way around. In the numerical example above, the Kullback divergence measures the inefficiency of assuming that the distribution is (bell-shaped) Beta(3, 3), rather than (uniform) Beta(1, 1). The "h" entropy of Beta(1, 1) is higher than the "h" entropy of Beta(3, 3) because the uniform distribution Beta(1, 1) has a maximum amount of disorder. The Kullback divergence is more than two times higher (0.598803 instead of 0.267864) when measured in the direction of decreasing entropy: the direction that assumes that the (uniform) Beta(1, 1) distribution is (bell-shaped) Beta(3, 3) rather than the other way around. In this restricted sense, the Kullback divergence is consistent with the [second law of thermodynamics](https://en.wikipedia.org/wiki/Second_law_of_thermodynamics "Second law of thermodynamics"). The [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") is symmetric *D*KL(*X*1 \|\| *X*2) = *D*KL(*X*2 \|\| *X*1) for the skewed cases Beta(3, 0.5) and Beta(0.5, 3) that have equal differential entropy *h*(*X*1) = *h*(*X*2). The symmetry condition: D K L ( X 1 ∥ X 2 ) \= D K L ( X 2 ∥ X 1 ) , if h ( X 1 ) \= h ( X 2 ) , for (skewed) α ≠ β {\\displaystyle D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})=D\_{\\mathrm {KL} }(X\_{2}\\parallel X\_{1}),{\\text{ if }}h(X\_{1})=h(X\_{2}),{\\text{ for (skewed) }}\\alpha \\neq \\beta } ![{\\displaystyle D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})=D\_{\\mathrm {KL} }(X\_{2}\\parallel X\_{1}),{\\text{ if }}h(X\_{1})=h(X\_{2}),{\\text{ for (skewed) }}\\alpha \\neq \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba60c74ffe89210448b12938900faa4019c8daea) follows from the above definitions and the mirror-symmetry *f*(*x*; *α*, *β*) = *f*(1 − *x*; *α*, *β*) enjoyed by the beta distribution. ### Relationships between statistical measures \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=32 "Edit section: Relationships between statistical measures")\] #### Mean, mode and median relationship \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=33 "Edit section: Mean, mode and median relationship")\] If 1 \< *α* \< *β* then mode ≤ median ≤ mean.[\[10\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kerman2011-10) Expressing the mode (only for *α*, *β* \> 1), and the mean in terms of *α* and *β*: α − 1 α \+ β − 2 ≤ median ≤ α α \+ β , {\\displaystyle {\\frac {\\alpha -1}{\\alpha +\\beta -2}}\\leq {\\text{median}}\\leq {\\frac {\\alpha }{\\alpha +\\beta }},} ![{\\displaystyle {\\frac {\\alpha -1}{\\alpha +\\beta -2}}\\leq {\\text{median}}\\leq {\\frac {\\alpha }{\\alpha +\\beta }},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bce75b15e77da7773748f47ccb73d08207595281) If 1 \< *β* \< *α* then the order of the inequalities are reversed. For *α*, *β* \> 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of *x*. On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of *x*, for the ([pathological](https://en.wikipedia.org/wiki/Pathological_\(mathematics\) "Pathological (mathematics)")) case of *α* = 1 and *β* = 1, for which values the beta distribution approaches the uniform distribution and the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") approaches its [maximum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value, and hence maximum "disorder". For example, for *α* = 1.0001 and *β* = 1.00000001: - mode = 0.9999; PDF(mode) = 1.00010 - mean = 0.500025; PDF(mean) = 1.00003 - median = 0.500035; PDF(median) = 1.00003 - mean − mode = −0.499875 - mean − median = −9.65538 × 10−6 where PDF stands for the value of the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function"). [![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) #### Mean, geometric mean and harmonic mean relationship \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=34 "Edit section: Mean, geometric mean and harmonic mean relationship")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png/250px-Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Mean,_Median,_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png) :Mean, median, geometric mean and harmonic mean for beta distribution with 0 \< *α* = *β* \< 5 It is known from the [inequality of arithmetic and geometric means](https://en.wikipedia.org/wiki/Inequality_of_arithmetic_and_geometric_means "Inequality of arithmetic and geometric means") that the geometric mean is lower than the mean. Similarly, the harmonic mean is lower than the geometric mean. The accompanying plot shows that for *α* = *β*, both the mean and the median are exactly equal to 1/2, regardless of the value of *α* = *β*, and the mode is also equal to 1/2 for *α* = *β* \> 1, however the geometric and harmonic means are lower than 1/2 and they only approach this value asymptotically as *α* = *β* → ∞. #### Kurtosis bounded by the square of the skewness \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=35 "Edit section: Kurtosis bounded by the square of the skewness")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:\(alpha_and_beta\)_Parameter_estimates_vs._excess_Kurtosis_and_\(squared\)_Skewness_Beta_distribution_-_J._Rodal.png) Beta distribution *α* and *β* parameters vs. excess kurtosis and squared skewness As remarked by [Feller](https://en.wikipedia.org/wiki/William_Feller "William Feller"),[\[5\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Feller-5) in the [Pearson system](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution") the beta probability density appears as [type I](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution") (any difference between the beta distribution and Pearson's type I distribution is only superficial and it makes no difference for the following discussion regarding the relationship between kurtosis and skewness). [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") showed, in Plate 1 of his paper [\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) published in 1916, a graph with the [kurtosis](https://en.wikipedia.org/wiki/Kurtosis "Kurtosis") as the vertical axis ([ordinate](https://en.wikipedia.org/wiki/Ordinate "Ordinate")) and the square of the [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") as the horizontal axis ([abscissa](https://en.wikipedia.org/wiki/Abscissa "Abscissa")), in which a number of distributions were displayed.[\[31\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Egon-31) The region occupied by the beta distribution is bounded by the following two [lines](https://en.wikipedia.org/wiki/Line_\(geometry\) "Line (geometry)") in the (skewness2,kurtosis) [plane](https://en.wikipedia.org/wiki/Cartesian_coordinate_system "Cartesian coordinate system"), or the (skewness2,excess kurtosis) [plane](https://en.wikipedia.org/wiki/Cartesian_coordinate_system "Cartesian coordinate system"): ( skewness ) 2 \+ 1 \< kurtosis \< 3 2 ( skewness ) 2 \+ 3 {\\displaystyle ({\\text{skewness}})^{2}+1\<{\\text{kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}+3} ![{\\displaystyle ({\\text{skewness}})^{2}+1\<{\\text{kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}+3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2061054109329b1c61c8bc91e10a8a45e80b9dc2) or, equivalently, ( skewness ) 2 − 2 \< excess kurtosis \< 3 2 ( skewness ) 2 {\\displaystyle ({\\text{skewness}})^{2}-2\<{\\text{excess kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}} ![{\\displaystyle ({\\text{skewness}})^{2}-2\<{\\text{excess kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f75ee7e52ed7a7bceb6f1d754ee547640bebb19f) At a time when there were no powerful digital computers, [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") accurately computed further boundaries,[\[32\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Hahn_and_Shapiro-32)[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) for example, separating the "U-shaped" from the "J-shaped" distributions. The lower boundary line (excess kurtosis + 2 − skewness2 = 0) is produced by skewed "U-shaped" beta distributions with both values of shape parameters *α* and *β* close to zero. The upper boundary line (excess kurtosis − (3/2) skewness2 = 0) is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") showed[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) that this upper boundary line (excess kurtosis − (3/2) skewness2 = 0) is also the intersection with Pearson's distribution III, which has unlimited support in one direction (towards positive infinity), and can be bell-shaped or J-shaped. His son, [Egon Pearson](https://en.wikipedia.org/wiki/Egon_Pearson "Egon Pearson"), showed[\[31\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Egon-31) that the region (in the kurtosis/squared-skewness plane) occupied by the beta distribution (equivalently, Pearson's distribution I) as it approaches this boundary (excess kurtosis − (3/2) skewness2 = 0) is shared with the [noncentral chi-squared distribution](https://en.wikipedia.org/wiki/Noncentral_chi-squared_distribution "Noncentral chi-squared distribution"). Karl Pearson[\[33\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1895-33) (Pearson 1895, pp. 357, 360, 373–376) also showed that the [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution") is a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6/*k* and the square of the skewness is 4/*k*, hence (excess kurtosis − (3/2) skewness2 = 0) is identically satisfied by the gamma distribution regardless of the value of the parameter "k"). Pearson later noted that the [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution "Chi-squared distribution") is a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution "Chi-squared distribution") the excess kurtosis is 12/*k* and the square of the skewness is 8/*k*, hence (excess kurtosis − (3/2) skewness2 = 0) is identically satisfied regardless of the value of the parameter "k"). This is to be expected, since the chi-squared distribution *X* ~ χ2(*k*) is a special case of the gamma distribution, with parametrization X ~ Γ(k/2, 1/2) where k is a positive integer that specifies the "number of degrees of freedom" of the chi-squared distribution. An example of a beta distribution near the upper boundary (excess kurtosis − (3/2) skewness2 = 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)/(skewness2) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness2 = 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)/(skewness2) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both *α* and *β* approaching zero symmetrically, the excess kurtosis reaches its minimum value at −2. This minimum value occurs at the point at which the lower boundary line intersects the vertical axis ([ordinate](https://en.wikipedia.org/wiki/Ordinate "Ordinate")). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards). Values for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") appropriately called the region below this boundary the "impossible region". The boundary for this "impossible region" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters *α* and *β* approach zero and hence all the probability density is concentrated at the ends: *x* = 0, 1 with practically nothing in between them. Since for *α* ≈ *β* ≈ 0 the probability density is concentrated at the two ends *x* = 0 and *x* = 1, this "impossible boundary" is determined by a [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), where the two only possible outcomes occur with respective probabilities *p* and *q* = 1 − *p*. For cases approaching this limit boundary with symmetry *α* = *β*, skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are *p* ≈ *q* ≈ 1/2. For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness2, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities p \= β α \+ β {\\displaystyle p={\\tfrac {\\beta }{\\alpha +\\beta }}} ![{\\displaystyle p={\\tfrac {\\beta }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and q \= 1 − p \= α α \+ β {\\displaystyle q=1-p={\\tfrac {\\alpha }{\\alpha +\\beta }}} ![{\\displaystyle q=1-p={\\tfrac {\\alpha }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1. ### Symmetry \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=36 "Edit section: Symmetry")\] All statements are conditional on *α*, *β* \> 0: - **Probability density function** [reflection symmetry](https://en.wikipedia.org/wiki/Symmetry "Symmetry") f ( x ; α , β ) \= f ( 1 − x ; β , α ) {\\displaystyle f(x;\\alpha ,\\beta )=f(1-x;\\beta ,\\alpha )} ![{\\displaystyle f(x;\\alpha ,\\beta )=f(1-x;\\beta ,\\alpha )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/205d7fb285ac28ac490f522167589070fc9a2ed7) - **Cumulative distribution function** [reflection symmetry](https://en.wikipedia.org/wiki/Symmetry "Symmetry") plus unitary [translation](https://en.wikipedia.org/wiki/Symmetry "Symmetry") F ( x ; α , β ) \= I x ( α , β ) \= 1 − F ( 1 − x ; β , α ) \= 1 − I 1 − x ( β , α ) {\\displaystyle F(x;\\alpha ,\\beta )=I\_{x}(\\alpha ,\\beta )=1-F(1-x;\\beta ,\\alpha )=1-I\_{1-x}(\\beta ,\\alpha )} ![{\\displaystyle F(x;\\alpha ,\\beta )=I\_{x}(\\alpha ,\\beta )=1-F(1-x;\\beta ,\\alpha )=1-I\_{1-x}(\\beta ,\\alpha )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4fd0b60a23fd363c11d1b5bc7247a2ca23a1ca9b) - **Mode** [reflection symmetry](https://en.wikipedia.org/wiki/Symmetry "Symmetry") plus unitary [translation](https://en.wikipedia.org/wiki/Symmetry "Symmetry") mode ⁡ ( B ( α , β ) ) \= 1 − mode ⁡ ( B ( β , α ) ) , if B ( β , α ) ≠ B ( 1 , 1 ) {\\displaystyle \\operatorname {mode} (\\mathrm {B} (\\alpha ,\\beta ))=1-\\operatorname {mode} (\\mathrm {B} (\\beta ,\\alpha )),{\\text{ if }}\\mathrm {B} (\\beta ,\\alpha )\\neq \\mathrm {B} (1,1)} ![{\\displaystyle \\operatorname {mode} (\\mathrm {B} (\\alpha ,\\beta ))=1-\\operatorname {mode} (\\mathrm {B} (\\beta ,\\alpha )),{\\text{ if }}\\mathrm {B} (\\beta ,\\alpha )\\neq \\mathrm {B} (1,1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6e4026defabbf20db73c3ca78d814097181dad1b) - **Median** [reflection symmetry](https://en.wikipedia.org/wiki/Symmetry "Symmetry") plus unitary [translation](https://en.wikipedia.org/wiki/Symmetry "Symmetry") median ⁡ ( B ( α , β ) ) \= 1 − median ⁡ ( B ( β , α ) ) {\\displaystyle \\operatorname {median} (\\mathrm {B} (\\alpha ,\\beta ))=1-\\operatorname {median} (\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle \\operatorname {median} (\\mathrm {B} (\\alpha ,\\beta ))=1-\\operatorname {median} (\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/19c5fc81a596f9cf061b4d44f522bc1ce338b884) - **Mean** [reflection symmetry](https://en.wikipedia.org/wiki/Symmetry "Symmetry") plus unitary [translation](https://en.wikipedia.org/wiki/Symmetry "Symmetry") μ ( B ( α , β ) ) \= 1 − μ ( B ( β , α ) ) {\\displaystyle \\mu (\\mathrm {B} (\\alpha ,\\beta ))=1-\\mu (\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle \\mu (\\mathrm {B} (\\alpha ,\\beta ))=1-\\mu (\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/76f30cec5ad799ce3fb2738b63d77239089284d4) - **Geometric means** each is individually asymmetric, the following symmetry applies between the geometric mean based on *X* and the geometric mean based on its [reflection](https://en.wikipedia.org/wiki/Reflection_formula "Reflection formula") 1−*X* G X ( B ( α , β ) ) \= G 1 − X ( B ( β , α ) ) {\\displaystyle G\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=G\_{1-X}(\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle G\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=G\_{1-X}(\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/89c7628b97558d0091a96555f199da15889a6783) - **Harmonic means** each is individually asymmetric, the following symmetry applies between the harmonic mean based on *X* and the harmonic mean based on its [reflection](https://en.wikipedia.org/wiki/Reflection_formula "Reflection formula") 1−*X* H X ( B ( α , β ) ) \= H 1 − X ( B ( β , α ) ) if α , β \> 1\. {\\displaystyle H\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=H\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )){\\text{ if }}\\alpha ,\\beta \>1.} ![{\\displaystyle H\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=H\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )){\\text{ if }}\\alpha ,\\beta \>1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0e80c207c2d510bbda4077f80954f86f872c4986) - **Variance** symmetry var ⁡ ( B ( α , β ) ) \= var ⁡ ( B ( β , α ) ) {\\displaystyle \\operatorname {var} (\\mathrm {B} (\\alpha ,\\beta ))=\\operatorname {var} (\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle \\operatorname {var} (\\mathrm {B} (\\alpha ,\\beta ))=\\operatorname {var} (\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/74a71c71ec7328072c2e21431f3a3400332bb421) - **Geometric variances** each is individually asymmetric, the following symmetry applies between the log geometric variance based on X and the log geometric variance based on its [reflection](https://en.wikipedia.org/wiki/Reflection_formula "Reflection formula") 1−*X* ln ⁡ ( var G X ⁡ ( B ( α , β ) ) ) \= ln ⁡ ( var G ( 1 − X ) ⁡ ( B ( β , α ) ) ) {\\displaystyle \\ln(\\operatorname {var} \_{GX}(\\mathrm {B} (\\alpha ,\\beta )))=\\ln(\\operatorname {var} \_{G(1-X)}(\\mathrm {B} (\\beta ,\\alpha )))} ![{\\displaystyle \\ln(\\operatorname {var} \_{GX}(\\mathrm {B} (\\alpha ,\\beta )))=\\ln(\\operatorname {var} \_{G(1-X)}(\\mathrm {B} (\\beta ,\\alpha )))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/02858709f01f43892fd42f844eebb028c6805dc7) - **Geometric covariance** symmetry ln ⁡ cov G X , ( 1 − X ) ⁡ ( B ( α , β ) ) \= ln ⁡ cov G X , ( 1 − X ) ⁡ ( B ( β , α ) ) {\\displaystyle \\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle \\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d59fb83efc45a84f3606874dc5791681812ca46b) - **Mean [absolute deviation](https://en.wikipedia.org/wiki/Absolute_deviation "Absolute deviation") around the mean** symmetry E ⁡ \[ \| X − E \[ X \] \| \] ( B ( α , β ) ) \= E ⁡ \[ \| X − E \[ X \] \| \] ( B ( β , α ) ) {\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\](\\mathrm {B} (\\alpha ,\\beta ))=\\operatorname {E} \[\|X-E\[X\]\|\](\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\](\\mathrm {B} (\\alpha ,\\beta ))=\\operatorname {E} \[\|X-E\[X\]\|\](\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/83468b7365d9095b07e04bfbb5c9cff50c64ea2d) - **Skewness** [skew-symmetry](https://en.wikipedia.org/wiki/Symmetry_\(mathematics\) "Symmetry (mathematics)") skewness ⁡ ( B ( α , β ) ) \= − skewness ⁡ ( B ( β , α ) ) {\\displaystyle \\operatorname {skewness} (\\mathrm {B} (\\alpha ,\\beta ))=-\\operatorname {skewness} (\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle \\operatorname {skewness} (\\mathrm {B} (\\alpha ,\\beta ))=-\\operatorname {skewness} (\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/748ffd21174eaca16caadef5d2a418721c9e433b) - **Excess kurtosis** symmetry excess kurtosis ( B ( α , β ) ) \= excess kurtosis ( B ( β , α ) ) {\\displaystyle {\\text{excess kurtosis}}(\\mathrm {B} (\\alpha ,\\beta ))={\\text{excess kurtosis}}(\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle {\\text{excess kurtosis}}(\\mathrm {B} (\\alpha ,\\beta ))={\\text{excess kurtosis}}(\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5a8fa5ae5e41c84ac33b9681ecef49133ae20ff3) - **Characteristic function** symmetry of [Real part](https://en.wikipedia.org/wiki/Real_part "Real part") (with respect to the origin of variable "*t*") Re \[ 1 F 1 ( α ; α \+ β ; i t ) \] \= Re \[ 1 F 1 ( α ; α \+ β ; − i t ) \] {\\displaystyle {\\text{Re}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\]={\\text{Re}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\]} ![{\\displaystyle {\\text{Re}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\]={\\text{Re}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/55bbddff6ec53eb39eccc603618459ba36e10ad2) - **Characteristic function** [skew-symmetry](https://en.wikipedia.org/wiki/Symmetry_\(mathematics\) "Symmetry (mathematics)") of [Imaginary part](https://en.wikipedia.org/wiki/Imaginary_part "Imaginary part") (with respect to the origin of variable "*t*") Im \[ 1 F 1 ( α ; α \+ β ; i t ) \] \= − Im \[ 1 F 1 ( α ; α \+ β ; − i t ) \] {\\displaystyle {\\text{Im}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\]=-{\\text{Im}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\]} ![{\\displaystyle {\\text{Im}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\]=-{\\text{Im}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ebb682c9915a60ef8d982340de042127ab262a0f) - **Characteristic function** symmetry of [Absolute value](https://en.wikipedia.org/wiki/Absolute_value "Absolute value") (with respect to the origin of variable "*t*") Abs \[ 1 F 1 ( α ; α \+ β ; i t ) \] \= Abs \[ 1 F 1 ( α ; α \+ β ; − i t ) \] {\\displaystyle {\\text{Abs}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\]={\\text{Abs}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\]} ![{\\displaystyle {\\text{Abs}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\]={\\text{Abs}}\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/08c1c15ff4c63020b66d2404d439514d965946d0) - **Differential entropy** symmetry h ( B ( α , β ) ) \= h ( B ( β , α ) ) {\\displaystyle h(\\mathrm {B} (\\alpha ,\\beta ))=h(\\mathrm {B} (\\beta ,\\alpha ))} ![{\\displaystyle h(\\mathrm {B} (\\alpha ,\\beta ))=h(\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b4d4f6513fd16af109112a3b87590558cddb1d29) - **Relative entropy (also called [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence"))** symmetry D K L ( X 1 ∥ X 2 ) \= D K L ( X 2 ∥ X 1 ) , if h ( X 1 ) \= h ( X 2 ) , for (skewed) α ≠ β {\\displaystyle D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})=D\_{\\mathrm {KL} }(X\_{2}\\parallel X\_{1}),{\\text{ if }}h(X\_{1})=h(X\_{2}){\\text{, for (skewed) }}\\alpha \\neq \\beta } ![{\\displaystyle D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})=D\_{\\mathrm {KL} }(X\_{2}\\parallel X\_{1}),{\\text{ if }}h(X\_{1})=h(X\_{2}){\\text{, for (skewed) }}\\alpha \\neq \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/4fb12e025beb0652951bc2e4401c947a957d3a62) - **Fisher information matrix** symmetry I i , j \= I j , i {\\displaystyle {\\mathcal {I}}\_{i,j}={\\mathcal {I}}\_{j,i}} ![{\\displaystyle {\\mathcal {I}}\_{i,j}={\\mathcal {I}}\_{j,i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/af6b1dc4d38273cf6dc8499ce6411044ec63ab18) ### Geometry of the probability density function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=37 "Edit section: Geometry of the probability density function")\] #### Inflection points \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=38 "Edit section: Inflection points")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e0/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg) Inflection point location versus α and β showing regions with one inflection point [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg) Inflection point location versus α and β showing region with two inflection points For certain values of the shape parameters α and β, the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") has [inflection points](https://en.wikipedia.org/wiki/Inflection_points "Inflection points"), at which the [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") changes sign. The position of these inflection points can be useful as a measure of the [dispersion](https://en.wikipedia.org/wiki/Statistical_dispersion "Statistical dispersion") or spread of the distribution. Defining the following quantity: κ \= ( α − 1 ) ( β − 1 ) α \+ β − 3 α \+ β − 2 {\\displaystyle \\kappa ={\\frac {\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}{\\alpha +\\beta -2}}} ![{\\displaystyle \\kappa ={\\frac {\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a25618bff371a2cacfa4d37cc75a397fc79eda3) Points of inflection occur,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[8\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Wadsworth-8)[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\[20\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Panik-20) depending on the value of the shape parameters *α* and *β*, as follows: - (*α* \> 2, *β* \> 2) The distribution is bell-shaped (symmetric for *α* = *β* and skewed otherwise), with **two inflection points**, equidistant from the mode: x \= mode ± κ \= α − 1 ± ( α − 1 ) ( β − 1 ) α \+ β − 3 α \+ β − 2 {\\displaystyle x={\\text{mode}}\\pm \\kappa ={\\frac {\\alpha -1\\pm {\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}} ![{\\displaystyle x={\\text{mode}}\\pm \\kappa ={\\frac {\\alpha -1\\pm {\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/824e9ad23c78338bd281a68fa12cd6488803800a) - (*α* = 2, *β* \> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode: x \= mode \+ κ \= 2 β {\\displaystyle x={\\text{mode}}+\\kappa ={\\frac {2}{\\beta }}} ![{\\displaystyle x={\\text{mode}}+\\kappa ={\\frac {2}{\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/620d8f2d2763e7caf8633301e4142d00a754e60a) - (*α* \> 2, β = 2) The distribution is unimodal, negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode: x \= mode − κ \= 1 − 2 α {\\displaystyle x={\\text{mode}}-\\kappa =1-{\\frac {2}{\\alpha }}} ![{\\displaystyle x={\\text{mode}}-\\kappa =1-{\\frac {2}{\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1c3330dc70173881e3a955fb2d48c7531ce4501c) - (1 \< *α* \< 2, β \> 2, *α* + *β* \> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode: x \= mode \+ κ \= α − 1 \+ ( α − 1 ) ( β − 1 ) α \+ β − 3 α \+ β − 2 {\\displaystyle x={\\text{mode}}+\\kappa ={\\frac {\\alpha -1+{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}} ![{\\displaystyle x={\\text{mode}}+\\kappa ={\\frac {\\alpha -1+{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f6f4879f21644e05c71610250aae2fdf73b14418) - (0 \< *α* \< 1, 1 \< *β* \< 2) The distribution has a mode at the left end *x* = 0 and it is positively skewed, right-tailed. There is **one inflection point**, located to the right of the mode: x \= α − 1 \+ ( α − 1 ) ( β − 1 ) α \+ β − 3 α \+ β − 2 {\\displaystyle x={\\frac {\\alpha -1+{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}} ![{\\displaystyle x={\\frac {\\alpha -1+{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c27732cec33ec3f248a4a2f1ef607aef29e374a1) - (*α* \> 2, 1 \< *β* \< 2) The distribution is unimodal negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode: x \= mode − κ \= α − 1 − ( α − 1 ) ( β − 1 ) α \+ β − 3 α \+ β − 2 {\\displaystyle x={\\text{mode}}-\\kappa ={\\frac {\\alpha -1-{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}} ![{\\displaystyle x={\\text{mode}}-\\kappa ={\\frac {\\alpha -1-{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/711b8cb900645ea1295fc899aa63bdd0648315d4) - (1 \< *α* \< 2, 0 \< *β* \< 1) The distribution has a mode at the right end *x* = 1 and it is negatively skewed, left-tailed. There is **one inflection point**, located to the left of the mode: x \= α − 1 − ( α − 1 ) ( β − 1 ) α \+ β − 3 α \+ β − 2 {\\displaystyle x={\\frac {\\alpha -1-{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}} ![{\\displaystyle x={\\frac {\\alpha -1-{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0d9005b2b06e722f6f78c8c0ace994575a519d65) There are no inflection points in the remaining (symmetric and skewed) regions: U-shaped: (*α*, *β* \< 1) upside-down-U-shaped: (1 \< *α* \< 2, 1 \< *β* \< 2), reverse-J-shaped (*α* \< 1, *β* \> 2) or J-shaped: (*α* \> 2, *β* \< 1) The accompanying plots show the inflection point locations (shown vertically, ranging from 0 to 1) versus *α* and *β* (the horizontal axes ranging from 0 to 5). There are large cuts at surfaces intersecting the lines *α* = 1, *β* = 1, *α* = 2, and *β* = 2 because at these values the beta distribution change from 2 modes, to 1 mode to no mode. #### Shapes \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=39 "Edit section: Shapes")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b9/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg) PDF for symmetric beta distribution vs. *x* and *α* = *β* from 0 to 30 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4e/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg) PDF for symmetric beta distribution vs. x and *α* = *β* from 0 to 2 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/de/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg) PDF for skewed beta distribution vs. *x* and *β* = 2.5*α* from 0 to 9 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg) PDF for skewed beta distribution vs. x and *β* = 5.5*α* from 0 to 9 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/85/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg) PDF for skewed beta distribution vs. x and *β* = 8*α* from 0 to 10 The beta density function can take a wide variety of different shapes depending on the values of the two parameters *α* and *β*. The ability of the beta distribution to take this great diversity of shapes (using only two parameters) is partly responsible for finding wide application for modeling actual measurements: ##### Symmetric (*α* = *β*) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=40 "Edit section: Symmetric (α = β)")\] - the density function is [symmetric](https://en.wikipedia.org/wiki/Symmetry "Symmetry") about 1/2 (blue & teal plots). - median = mean = 1/2. - skewness = 0. - variance = 1/(4(2*α* + 1)) - ***α* = *β* \< 1** - U-shaped (blue plot). - bimodal: left mode = 0, right mode =1, anti-mode = 1/2 - 1/12 \< var(*X*) \< 1/4[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) - −2 \< excess kurtosis(*X*) \< −6/5 - *α* = *β* = 1/2 is the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") - var(*X*) = 1/8 - excess kurtosis(*X*) = −3/2 - CF = Rinc (t) [\[34\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-34) - *α* = *β* → 0 is a 2-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") end *x* = 0 and *x* = 1 and zero probability everywhere else. A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1. - lim α \= β → 0 var ⁡ ( X ) \= 1 4 {\\displaystyle \\lim \_{\\alpha =\\beta \\to 0}\\operatorname {var} (X)={\\tfrac {1}{4}}} ![{\\displaystyle \\lim \_{\\alpha =\\beta \\to 0}\\operatorname {var} (X)={\\tfrac {1}{4}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a318e9812e5bcddf9f90e080e7c29f2fdbbf3492) - lim α \= β → 0 e x c e s s k u r t o s i s ⁡ ( X ) \= − 2 {\\displaystyle \\lim \_{\\alpha =\\beta \\to 0}\\operatorname {excess\\ kurtosis} (X)=-2} ![{\\displaystyle \\lim \_{\\alpha =\\beta \\to 0}\\operatorname {excess\\ kurtosis} (X)=-2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7faf8f6752129fa96869d74fa7998970f17d06f4) a lower value than this is impossible for any distribution to reach. - The [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") approaches a [minimum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of −∞ - **α = β = 1** - the [uniform \[0, 1\] distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") - no mode - var(*X*) = 1/12 - excess kurtosis(*X*) = −6/5 - The (negative anywhere else) [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") reaches its [maximum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of zero - CF = Sinc (t) - ***α* = *β* \> 1** - symmetric [unimodal](https://en.wikipedia.org/wiki/Unimodal "Unimodal") - mode = 1/2. - 0 \< var(*X*) \< 1/12[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) - −6/5 \< excess kurtosis(*X*) \< 0 - *α* = *β* = 3/2 is a semi-elliptic \[0, 1\] distribution, see: [Wigner semicircle distribution](https://en.wikipedia.org/wiki/Wigner_semicircle_distribution "Wigner semicircle distribution")[\[35\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-35) - var(*X*) = 1/16. - excess kurtosis(*X*) = −1 - CF = 2 Jinc (t) - *α* = *β* = 2 is the parabolic \[0, 1\] distribution - var(*X*) = 1/20 - excess kurtosis(*X*) = −6/7 - CF = 3 Tinc (t) [\[36\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-36) - *α* = *β* \> 2 is bell-shaped, with [inflection points](https://en.wikipedia.org/wiki/Inflection_point "Inflection point") located to either side of the mode - 0 \< var(*X*) \< 1/20 - −6/7 \< excess kurtosis(*X*) \< 0 - *α* = *β* → ∞ is a 1-point [Degenerate distribution](https://en.wikipedia.org/wiki/Degenerate_distribution "Degenerate distribution") with a [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") spike at the midpoint *x* = 1/2 with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the single point *x* = 1/2. - lim α \= β → ∞ var ⁡ ( X ) \= 0 {\\displaystyle \\lim \_{\\alpha =\\beta \\to \\infty }\\operatorname {var} (X)=0} ![{\\displaystyle \\lim \_{\\alpha =\\beta \\to \\infty }\\operatorname {var} (X)=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/64b7560056b21989a42e3aae0f2686a63c5d9115) - lim α \= β → ∞ e x c e s s k u r t o s i s ⁡ ( X ) \= 0 {\\displaystyle \\lim \_{\\alpha =\\beta \\to \\infty }\\operatorname {excess\\ kurtosis} (X)=0} ![{\\displaystyle \\lim \_{\\alpha =\\beta \\to \\infty }\\operatorname {excess\\ kurtosis} (X)=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7f02e97f6ca6d3c46ed639e5b03e51c4262591ab) - The [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") approaches a [minimum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of −∞ ##### Skewed (*α* ≠ *β*) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=41 "Edit section: Skewed (α ≠ β)")\] The density function is [skewed](https://en.wikipedia.org/wiki/Skewness "Skewness"). An interchange of parameter values yields the [mirror image](https://en.wikipedia.org/wiki/Mirror_image "Mirror image") (the reverse) of the initial curve, some more specific cases: - ***α* \< 1, *β* \< 1** - U-shaped - Positive skew for *α* \< *β*, negative skew for *α* \> *β*. - bimodal: left mode = 0, right mode = 1, anti-mode = α − 1 α \+ β − 2 {\\displaystyle {\\tfrac {\\alpha -1}{\\alpha +\\beta -2}}} ![{\\displaystyle {\\tfrac {\\alpha -1}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1632b6874ce03bd0d75001d5816183986685c998) - 0 \< median \< 1. - 0 \< var(*X*) \< 1/4 - ***α* \> 1, *β* \> 1** - [unimodal](https://en.wikipedia.org/wiki/Unimodal "Unimodal") (magenta & cyan plots), - Positive skew for *α* \< *β*, negative skew for *α* \> *β*. - mode \= α − 1 α \+ β − 2 {\\displaystyle {\\text{mode}}={\\tfrac {\\alpha -1}{\\alpha +\\beta -2}}} ![{\\displaystyle {\\text{mode}}={\\tfrac {\\alpha -1}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2a200ee365bda4944379e77c02b4b8278b4a8815) - 0 \< median \< 1 - 0 \< var(*X*) \< 1/12 - ***α* \< 1, *β* ≥ 1** - reverse J-shaped with a right tail, - positively skewed, - strictly decreasing, [convex](https://en.wikipedia.org/wiki/Convex_function "Convex function") - mode = 0 - 0 \< median \< 1/2. - 0 \< var ⁡ ( X ) \< − 11 \+ 5 5 2 , {\\displaystyle 0\<\\operatorname {var} (X)\<{\\tfrac {-11+5{\\sqrt {5}}}{2}},} ![{\\displaystyle 0\<\\operatorname {var} (X)\<{\\tfrac {-11+5{\\sqrt {5}}}{2}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4cadd8b11fdd42150ace45b7773d08ecc8e8fe74) (maximum variance occurs for α \= − 1 \+ 5 2 , β \= 1 {\\displaystyle \\alpha ={\\tfrac {-1+{\\sqrt {5}}}{2}},\\beta =1} ![{\\displaystyle \\alpha ={\\tfrac {-1+{\\sqrt {5}}}{2}},\\beta =1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f410ad34dcc819db94aa295c039855a4f8d36ed3) , or *α* = **Φ** the [golden ratio conjugate](https://en.wikipedia.org/wiki/Golden_ratio "Golden ratio")) - ***α* ≥ 1, *β* \< 1** - J-shaped with a left tail, - negatively skewed, - strictly increasing, [convex](https://en.wikipedia.org/wiki/Convex_function "Convex function") - mode = 1 - 1/2 \< median \< 1 - 0 \< var ⁡ ( X ) \< − 11 \+ 5 5 2 , {\\displaystyle 0\<\\operatorname {var} (X)\<{\\tfrac {-11+5{\\sqrt {5}}}{2}},} ![{\\displaystyle 0\<\\operatorname {var} (X)\<{\\tfrac {-11+5{\\sqrt {5}}}{2}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4cadd8b11fdd42150ace45b7773d08ecc8e8fe74) (maximum variance occurs for α \= 1 , β \= − 1 \+ 5 2 {\\displaystyle \\alpha =1,\\beta ={\\tfrac {-1+{\\sqrt {5}}}{2}}} ![{\\displaystyle \\alpha =1,\\beta ={\\tfrac {-1+{\\sqrt {5}}}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1a90d376a97ae58c5b62c070f3a64797617cecc6) , or *β* = **Φ** the [golden ratio conjugate](https://en.wikipedia.org/wiki/Golden_ratio "Golden ratio")) - ***α* = 1, *β* \> 1** - positively skewed, - strictly decreasing (red plot), - a reversed (mirror-image) [power function distribution](https://en.wikipedia.org/w/index.php?title=Power_function_distribution&action=edit&redlink=1 "Power function distribution (page does not exist)") - mean = 1 / (*β* + 1) - median = 1 - 1/21/*β* - mode = 0 - α = 1, 1 \< β \< 2 - [concave](https://en.wikipedia.org/wiki/Concave_function "Concave function") - 1 − 1 2 \< median \< 1 2 {\\displaystyle 1-{\\tfrac {1}{\\sqrt {2}}}\<{\\text{median}}\<{\\tfrac {1}{2}}} ![{\\displaystyle 1-{\\tfrac {1}{\\sqrt {2}}}\<{\\text{median}}\<{\\tfrac {1}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b55b97f0824d4ceb525a8eb431b1edc14491785e) - 1/18 \< var(*X*) \< 1/12. - α = 1, β = 2 - a straight line with slope −2, the right-[triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution") with right angle at the left end, at *x* = 0 - median \= 1 − 1 2 {\\displaystyle {\\text{median}}=1-{\\tfrac {1}{\\sqrt {2}}}} ![{\\displaystyle {\\text{median}}=1-{\\tfrac {1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/81c4cfb877754e9c488dd91417d658cd6acdcd17) - var(*X*) = 1/18 - α = 1, β \> 2 - reverse J-shaped with a right tail, - [convex](https://en.wikipedia.org/wiki/Convex_function "Convex function") - 0 \< median \< 1 − 1 2 {\\displaystyle 0\<{\\text{median}}\<1-{\\tfrac {1}{\\sqrt {2}}}} ![{\\displaystyle 0\<{\\text{median}}\<1-{\\tfrac {1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1a83299b14cd568ea16972ae4e5ba0ed49e9db9e) - 0 \< var(*X*) \< 1/18 - **α \> 1, β = 1** - negatively skewed, - strictly increasing (green plot), - the power function distribution[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9) - mean = α / (α + 1) - median = 1/21/α - mode = 1 - 2 \> α \> 1, β = 1 - [concave](https://en.wikipedia.org/wiki/Concave_function "Concave function") - 1 2 \< median \< 1 2 {\\displaystyle {\\tfrac {1}{2}}\<{\\text{median}}\<{\\tfrac {1}{\\sqrt {2}}}} ![{\\displaystyle {\\tfrac {1}{2}}\<{\\text{median}}\<{\\tfrac {1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c545efc34084ab06a0e012554c5d733c23388060) - 1/18 \< var(*X*) \< 1/12 - α = 2, β = 1 - a straight line with slope +2, the right-[triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution") with right angle at the right end, at *x* = 1 - median \= 1 2 {\\displaystyle {\\text{median}}={\\tfrac {1}{\\sqrt {2}}}} ![{\\displaystyle {\\text{median}}={\\tfrac {1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/dfd31bd7dc82f3d5caa0eb45b3b3f360f32dbf14) - var(*X*) = 1/18 - α \> 2, β = 1 - J-shaped with a left tail, [convex](https://en.wikipedia.org/wiki/Convex_function "Convex function") - 1 2 \< median \< 1 {\\displaystyle {\\tfrac {1}{\\sqrt {2}}}\<{\\text{median}}\<1} ![{\\displaystyle {\\tfrac {1}{\\sqrt {2}}}\<{\\text{median}}\<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/80e97699a1620336c7267276f96efbdafc13ba64) - 0 \< var(*X*) \< 1/18 ## Related distributions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=42 "Edit section: Related distributions")\] ### Transformations \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=43 "Edit section: Transformations")\] - If *X* ~ Beta(*α*, *β*) then 1 − *X* ~ Beta(*β*, *α*) [mirror-image](https://en.wikipedia.org/wiki/Mirror_image "Mirror image") symmetry - If *X* ~ Beta(*α*, *β*) then X 1 − X ∼ β ′ ( α , β ) {\\displaystyle {\\tfrac {X}{1-X}}\\sim {\\beta '}(\\alpha ,\\beta )} ![{\\displaystyle {\\tfrac {X}{1-X}}\\sim {\\beta '}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4729760d819d7894ca35e889bc9192ad601f5fd1) . The [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution"), also called "beta distribution of the second kind". - If X ∼ Beta ( α , β ) {\\displaystyle X\\sim {\\text{Beta}}(\\alpha ,\\beta )} ![{\\displaystyle X\\sim {\\text{Beta}}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/54f5f5824479195eafef981fab9a5b7722002d15) , then Y \= log ⁡ X 1 − X {\\displaystyle Y=\\log {\\frac {X}{1-X}}} ![{\\displaystyle Y=\\log {\\frac {X}{1-X}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/45a2407291080b9e048ea5234c98581218a8aa46) has a [generalized logistic distribution](https://en.wikipedia.org/wiki/Generalized_logistic_distribution "Generalized logistic distribution"), with density σ ( y ) α σ ( − y ) β B ( α , β ) {\\displaystyle {\\frac {\\sigma (y)^{\\alpha }\\sigma (-y)^{\\beta }}{B(\\alpha ,\\beta )}}} ![{\\displaystyle {\\frac {\\sigma (y)^{\\alpha }\\sigma (-y)^{\\beta }}{B(\\alpha ,\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9c9e5c0ca7d0c451eeb1d6de5e6756a2ea1044c5) , where σ {\\displaystyle \\sigma } ![{\\displaystyle \\sigma }](https://wikimedia.org/api/rest_v1/media/math/render/svg/59f59b7c3e6fdb1d0365a494b81fb9a696138c36) is the [logistic sigmoid](https://en.wikipedia.org/wiki/Logistic_sigmoid "Logistic sigmoid"). - If *X* ~ Beta(*α*, *β*) then 1 X − 1 ∼ β ′ ( β , α ) {\\displaystyle {\\tfrac {1}{X}}-1\\sim {\\beta '}(\\beta ,\\alpha )} ![{\\displaystyle {\\tfrac {1}{X}}-1\\sim {\\beta '}(\\beta ,\\alpha )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/589767ba92a975c90e9ec4542a6caddacdd2ee04) . - If X ∼ Beta ( α 1 , β 1 ) {\\displaystyle X\\sim {\\text{Beta}}(\\alpha \_{1},\\beta \_{1})} ![{\\displaystyle X\\sim {\\text{Beta}}(\\alpha \_{1},\\beta \_{1})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ca69afff636e2e379147763ab1fcd9ad720360cd) and Y ∼ Beta ( α 2 , β 2 ) {\\displaystyle Y\\sim {\\text{Beta}}(\\alpha \_{2},\\beta \_{2})} ![{\\displaystyle Y\\sim {\\text{Beta}}(\\alpha \_{2},\\beta \_{2})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68129647ad7f447682d0c7d50ac8eddb789eb0b4) then Z \= X Y {\\displaystyle Z={\\tfrac {X}{Y}}} ![{\\displaystyle Z={\\tfrac {X}{Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ef4bd5ded089856779a9af407dc80a9c407b6851) has density B ( α 1 \+ α 2 , β 2 ) z α 1 − 1 2 F 1 ( α 1 \+ α 2 , 1 − β 1 ; α 1 \+ α 2 \+ β 2 ; z ) B ( α 1 , β 1 ) B ( α 2 , β 2 ) {\\displaystyle {\\tfrac {B(\\alpha \_{1}+\\alpha \_{2},\\beta \_{2})z^{\\alpha \_{1}-1}{}\_{2}F\_{1}(\\alpha \_{1}+\\alpha \_{2},1-\\beta \_{1};\\alpha \_{1}+\\alpha \_{2}+\\beta \_{2};z)}{B(\\alpha \_{1},\\beta \_{1})B(\\alpha \_{2},\\beta \_{2})}}} ![{\\displaystyle {\\tfrac {B(\\alpha \_{1}+\\alpha \_{2},\\beta \_{2})z^{\\alpha \_{1}-1}{}\_{2}F\_{1}(\\alpha \_{1}+\\alpha \_{2},1-\\beta \_{1};\\alpha \_{1}+\\alpha \_{2}+\\beta \_{2};z)}{B(\\alpha \_{1},\\beta \_{1})B(\\alpha \_{2},\\beta \_{2})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1403166914066b28598d36efe3835e4a320ad620) for 0 \< z ≤ 1 {\\displaystyle 0\<z\\leq 1} ![{\\displaystyle 0\<z\\leq 1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2fd7451c82b8fde0587bd1de5a14e0384ef31691) and B ( α 1 \+ α 2 , β 1 ) z − ( α 2 \+ 1 ) 2 F 1 ( α 1 \+ α 2 , 1 − β 2 ; α 1 \+ α 2 \+ β 1 ; 1 z ) B ( α 1 , β 1 ) B ( α 2 , β 2 ) {\\displaystyle {\\tfrac {B(\\alpha \_{1}+\\alpha \_{2},\\beta \_{1})z^{-(\\alpha \_{2}+1)}{}\_{2}F\_{1}(\\alpha \_{1}+\\alpha \_{2},1-\\beta \_{2};\\alpha \_{1}+\\alpha \_{2}+\\beta \_{1};{\\tfrac {1}{z}})}{B(\\alpha \_{1},\\beta \_{1})B(\\alpha \_{2},\\beta \_{2})}}} ![{\\displaystyle {\\tfrac {B(\\alpha \_{1}+\\alpha \_{2},\\beta \_{1})z^{-(\\alpha \_{2}+1)}{}\_{2}F\_{1}(\\alpha \_{1}+\\alpha \_{2},1-\\beta \_{2};\\alpha \_{1}+\\alpha \_{2}+\\beta \_{1};{\\tfrac {1}{z}})}{B(\\alpha \_{1},\\beta \_{1})B(\\alpha \_{2},\\beta \_{2})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3a5b9b30e91efa5764a5dbaf3d825e6f7c0c9584) for z ≥ 1 {\\displaystyle z\\geq 1} ![{\\displaystyle z\\geq 1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ec063fd092caf41bf16256d71e392658e394ee1f) , where 2 F 1 ( a , b ; c ; x ) {\\displaystyle {}\_{2}F\_{1}(a,b;c;x)} ![{\\displaystyle {}\_{2}F\_{1}(a,b;c;x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad33456f0f2b7432a504572054b22396ba9d3692) is the [Hypergeometric function](https://en.wikipedia.org/wiki/Hypergeometric_function "Hypergeometric function").[\[37\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pham-Gia2000-37) - If *X* ~ Beta(*n*/2, *m*/2) then m X n ( 1 − X ) ∼ F ( n , m ) {\\displaystyle {\\tfrac {mX}{n(1-X)}}\\sim F(n,m)} ![{\\displaystyle {\\tfrac {mX}{n(1-X)}}\\sim F(n,m)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/457612b34711860ab2e561e98784b94be23068fc) (assuming *n* \> 0 and *m* \> 0), the [Fisher–Snedecor F distribution](https://en.wikipedia.org/wiki/F-distribution "F-distribution"). - If X ∼ Beta ⁡ ( 1 \+ λ m − min max − min , 1 \+ λ max − m max − min ) {\\displaystyle X\\sim \\operatorname {Beta} \\left(1+\\lambda {\\tfrac {m-\\min }{\\max -\\min }},1+\\lambda {\\tfrac {\\max -m}{\\max -\\min }}\\right)} ![{\\displaystyle X\\sim \\operatorname {Beta} \\left(1+\\lambda {\\tfrac {m-\\min }{\\max -\\min }},1+\\lambda {\\tfrac {\\max -m}{\\max -\\min }}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cf1ef11aa6ae7b24fc6add2ad05667a5ece7f8b0) then min + *X*(max − min) ~ PERT(min, max, *m*, *λ*) where *PERT* denotes a [PERT distribution](https://en.wikipedia.org/wiki/PERT_distribution "PERT distribution") used in [PERT](https://en.wikipedia.org/wiki/PERT "PERT") analysis, and *m*\=most likely value.[\[38\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-NewPERT-38) Traditionally[\[39\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Malcolm-39) *λ* = 4 in PERT analysis. - If *X* ~ Beta(1, *β*) then *X* ~ [Kumaraswamy distribution](https://en.wikipedia.org/wiki/Kumaraswamy_distribution "Kumaraswamy distribution") with parameters (1, *β*) - If *X* ~ Beta(*α*, 1) then *X* ~ [Kumaraswamy distribution](https://en.wikipedia.org/wiki/Kumaraswamy_distribution "Kumaraswamy distribution") with parameters (*α*, 1) - If *X* ~ Beta(*α*, 1) then −ln(*X*) ~ Exponential(*α*) ### Special and limiting cases \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=44 "Edit section: Special and limiting cases")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Random_Walk_example.svg/250px-Random_Walk_example.svg.png)](https://en.wikipedia.org/wiki/File:Random_Walk_example.svg) Example of eight realizations of a random walk in one dimension starting at 0: the probability for the time of the last visit to the origin is distributed as Beta(1/2, 1/2) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/db/Arcsin_density.svg/250px-Arcsin_density.svg.png)](https://en.wikipedia.org/wiki/File:Arcsin_density.svg) Beta(1/2, 1/2): The [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") probability density was proposed by [Harold Jeffreys](https://en.wikipedia.org/wiki/Harold_Jeffreys "Harold Jeffreys") to represent uncertainty for a [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") or a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") in [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference"), and is now commonly referred to as [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior"): *p*−1/2(1 − *p*)−1/2. This distribution also appears in several [random walk](https://en.wikipedia.org/wiki/Random_walk "Random walk") fundamental theorems - Beta(1, 1) ~ [U(0, 1)](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") with density 1 on that interval. - Beta(n, 1) ~ Maximum of *n* independent rvs. with [U(0, 1)](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)"), sometimes called a *a standard power function distribution* with density *n* *x**n*–1 on that interval. - Beta(1, n) ~ Minimum of *n* independent rvs. with [U(0, 1)](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") with density *n*(1 − *x*)*n*−1 on that interval. - If *X* ~ Beta(3/2, 3/2) and *r* \> 0 then 2*rX* − *r* ~ [Wigner semicircle distribution](https://en.wikipedia.org/wiki/Wigner_semicircle_distribution "Wigner semicircle distribution"). - Beta(1/2, 1/2) is equivalent to the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution"). This distribution is also [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability for the [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") and [binomial distributions](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"). - lim n → ∞ n Beta ⁡ ( 1 , n ) \= Exponential ⁡ ( 1 ) {\\displaystyle \\lim \_{n\\to \\infty }n\\operatorname {Beta} (1,n)=\\operatorname {Exponential} (1)} ![{\\displaystyle \\lim \_{n\\to \\infty }n\\operatorname {Beta} (1,n)=\\operatorname {Exponential} (1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e36846ea4b3cb7e13970a0fb5618bb9e53c5b72f) the [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution "Exponential distribution"). - lim n → ∞ n Beta ⁡ ( k , n ) \= Gamma ⁡ ( k , 1 ) {\\displaystyle \\lim \_{n\\to \\infty }n\\operatorname {Beta} (k,n)=\\operatorname {Gamma} (k,1)} ![{\\displaystyle \\lim \_{n\\to \\infty }n\\operatorname {Beta} (k,n)=\\operatorname {Gamma} (k,1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3f88394ad2fa55d9ea30ac2220c0f12befc29869) the [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution"). - For large n {\\displaystyle n} ![{\\displaystyle n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a601995d55609f2d9f5e233e36fbe9ea26011b3b) , Beta ⁡ ( α n , β n ) → N ( α α \+ β , α β ( α \+ β ) 3 1 n ) {\\displaystyle \\operatorname {Beta} (\\alpha n,\\beta n)\\to {\\mathcal {N}}\\left({\\frac {\\alpha }{\\alpha +\\beta }},{\\frac {\\alpha \\beta }{(\\alpha +\\beta )^{3}}}{\\frac {1}{n}}\\right)} ![{\\displaystyle \\operatorname {Beta} (\\alpha n,\\beta n)\\to {\\mathcal {N}}\\left({\\frac {\\alpha }{\\alpha +\\beta }},{\\frac {\\alpha \\beta }{(\\alpha +\\beta )^{3}}}{\\frac {1}{n}}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/485d916ae0bdbd9f069c23bd938746587ed3b0ab) the [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution "Normal distribution"). More precisely, if X n ∼ Beta ⁡ ( α n , β n ) {\\displaystyle X\_{n}\\sim \\operatorname {Beta} (\\alpha n,\\beta n)} ![{\\displaystyle X\_{n}\\sim \\operatorname {Beta} (\\alpha n,\\beta n)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/39946932655f63cd48bdd025f3fdb545ec112dbb) then n ( X n − α α \+ β ) {\\displaystyle {\\sqrt {n}}\\left(X\_{n}-{\\tfrac {\\alpha }{\\alpha +\\beta }}\\right)} ![{\\displaystyle {\\sqrt {n}}\\left(X\_{n}-{\\tfrac {\\alpha }{\\alpha +\\beta }}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9fdaade6d88a840030d07b9c38e2c27d7020f872) converges in distribution to a normal distribution with mean 0 and variance α β ( α \+ β ) 3 {\\displaystyle {\\tfrac {\\alpha \\beta }{(\\alpha +\\beta )^{3}}}} ![{\\displaystyle {\\tfrac {\\alpha \\beta }{(\\alpha +\\beta )^{3}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0114023401152c80846cac38a22673de12583f03) as *n* increases. ### Derived from other distributions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=45 "Edit section: Derived from other distributions")\] - The *k*th [order statistic](https://en.wikipedia.org/wiki/Order_statistic "Order statistic") of a sample of size *n* from the [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") is a beta random variable, *U*(*k*) ~ Beta(*k*, *n*\+1−*k*).[\[40\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David1-40) - [Gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution"): If *X* ~ Gamma(α, θ) and *Y* ~ Gamma(β, θ) are independent, then X X \+ Y ∼ Beta ⁡ ( α , β ) {\\displaystyle {\\tfrac {X}{X+Y}}\\sim \\operatorname {Beta} (\\alpha ,\\beta )\\,} ![{\\displaystyle {\\tfrac {X}{X+Y}}\\sim \\operatorname {Beta} (\\alpha ,\\beta )\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5321274f06f425629d6e07f8bfac884b7508e30a) . - [Chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution "Chi-squared distribution"): If X ∼ χ 2 ( α ) {\\displaystyle X\\sim \\chi ^{2}(\\alpha )\\,} ![{\\displaystyle X\\sim \\chi ^{2}(\\alpha )\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d6335f0d8ec60cb75eb2d8203531be0c97677d09) and Y ∼ χ 2 ( β ) {\\displaystyle Y\\sim \\chi ^{2}(\\beta )\\,} ![{\\displaystyle Y\\sim \\chi ^{2}(\\beta )\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b3190978e14ca37cc03cfb58de1cec4413034ad8) are independent, then X X \+ Y ∼ Beta ⁡ ( α 2 , β 2 ) {\\displaystyle {\\tfrac {X}{X+Y}}\\sim \\operatorname {Beta} ({\\tfrac {\\alpha }{2}},{\\tfrac {\\beta }{2}})} ![{\\displaystyle {\\tfrac {X}{X+Y}}\\sim \\operatorname {Beta} ({\\tfrac {\\alpha }{2}},{\\tfrac {\\beta }{2}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9816b3e3f3991e974d32e38329d5fcecc0920914) . - The [power transformation](https://en.wikipedia.org/wiki/Power_transformation_\(statistics\) "Power transformation (statistics)") for the uniform distribution: If *X* ~ U(0, 1) and *α* \> 0 then *X*1/*α* ~ Beta(*α*, 1). - [Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution "Cauchy distribution"): If *X* ~ Cauchy(0, 1) then 1 1 \+ X 2 ∼ Beta ⁡ ( 1 2 , 1 2 ) {\\displaystyle {\\tfrac {1}{1+X^{2}}}\\sim \\operatorname {Beta} \\left({\\tfrac {1}{2}},{\\tfrac {1}{2}}\\right)\\,} ![{\\displaystyle {\\tfrac {1}{1+X^{2}}}\\sim \\operatorname {Beta} \\left({\\tfrac {1}{2}},{\\tfrac {1}{2}}\\right)\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5f8be38fd0a97a9044aa661f4b4234a703b4e0e6) ### Combination with other distributions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=46 "Edit section: Combination with other distributions")\] - *X* ~ Beta(*α*, *β*) and *Y* ~ F(2*β*,2*α*) then Pr ( X ≤ α α \+ β x ) \= Pr ( Y ≥ x ) {\\displaystyle \\Pr(X\\leq {\\tfrac {\\alpha }{\\alpha +\\beta x}})=\\Pr(Y\\geq x)\\,} ![{\\displaystyle \\Pr(X\\leq {\\tfrac {\\alpha }{\\alpha +\\beta x}})=\\Pr(Y\\geq x)\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ea4e0f1de85d4ebad666d63b2448cb71bea790f3) for all *x* \> 0. ### Compounding with other distributions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=47 "Edit section: Compounding with other distributions")\] - If *p* ~ Beta(α, β) and *X* ~ Bin(*k*, *p*) then *X* ~ [beta-binomial distribution](https://en.wikipedia.org/wiki/Beta-binomial_distribution "Beta-binomial distribution") - If *p* ~ Beta(α, β) and *X* ~ NB(*r*, *p*) then *X* ~ [beta negative binomial distribution](https://en.wikipedia.org/wiki/Beta_negative_binomial_distribution "Beta negative binomial distribution") ### Generalisations \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=48 "Edit section: Generalisations")\] - The generalization to multiple variables, i.e. a [multivariate Beta distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution "Dirichlet distribution"), is called a [Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution "Dirichlet distribution"). Univariate marginals of the Dirichlet distribution have a beta distribution. The beta distribution is [conjugate](https://en.wikipedia.org/wiki/Conjugate_prior "Conjugate prior") to the binomial and Bernoulli distributions in exactly the same way as the [Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution "Dirichlet distribution") is conjugate to the [multinomial distribution](https://en.wikipedia.org/wiki/Multinomial_distribution "Multinomial distribution") and [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution "Categorical distribution"). - The [Pearson type I distribution](https://en.wikipedia.org/wiki/Pearson_distribution#The_Pearson_type_I_distribution "Pearson distribution") is identical to the beta distribution (except for arbitrary shifting and re-scaling that can also be accomplished with the four parameter parametrization of the beta distribution). - The beta distribution is the special case of the [noncentral beta distribution](https://en.wikipedia.org/wiki/Noncentral_beta_distribution "Noncentral beta distribution") where λ \= 0 {\\displaystyle \\lambda =0} ![{\\displaystyle \\lambda =0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/00c4bba30544017fe76932de5a4e25adb5512d95) : Beta ⁡ ( α , β ) \= NonCentralBeta ⁡ ( α , β , 0 ) {\\displaystyle \\operatorname {Beta} (\\alpha ,\\beta )=\\operatorname {NonCentralBeta} (\\alpha ,\\beta ,0)} ![{\\displaystyle \\operatorname {Beta} (\\alpha ,\\beta )=\\operatorname {NonCentralBeta} (\\alpha ,\\beta ,0)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5bfd1dc936f885eb37c9460ddcfc9252f7528c7f) . - The [generalized beta distribution](https://en.wikipedia.org/wiki/Generalized_beta_distribution "Generalized beta distribution") is a five-parameter distribution family which has the beta distribution as a special case. - The [matrix variate beta distribution](https://en.wikipedia.org/wiki/Matrix_variate_beta_distribution "Matrix variate beta distribution") is a distribution for [positive-definite matrices](https://en.wikipedia.org/wiki/Positive-definite_matrices "Positive-definite matrices"). ## Statistical inference \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=49 "Edit section: Statistical inference")\] ### Parameter estimation \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=50 "Edit section: Parameter estimation")\] #### Method of moments \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=51 "Edit section: Method of moments")\] ##### Two unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=52 "Edit section: Two unknown parameters")\] Two unknown parameters (( α ^ , β ^ ) {\\displaystyle ({\\hat {\\alpha }},{\\hat {\\beta }})} ![{\\displaystyle ({\\hat {\\alpha }},{\\hat {\\beta }})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7877c02d2932e6ce21cf536d4f5bb3949fb7a285) of a beta distribution supported in the \[0,1\] interval) can be estimated, using the method of moments, with the first two moments (sample mean and sample variance) as follows. Let: sample mean(X) \= x ¯ \= 1 N ∑ i \= 1 N X i {\\displaystyle {\\text{sample mean(X)}}={\\bar {x}}={\\frac {1}{N}}\\sum \_{i=1}^{N}X\_{i}} ![{\\displaystyle {\\text{sample mean(X)}}={\\bar {x}}={\\frac {1}{N}}\\sum \_{i=1}^{N}X\_{i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bca659167e8ac6d0b9c74970a03d8e0ceea9cd20) be the [sample mean](https://en.wikipedia.org/wiki/Sample_mean "Sample mean") estimate and sample variance(X) \= v ¯ \= 1 N − 1 ∑ i \= 1 N ( X i − x ¯ ) 2 {\\displaystyle {\\text{sample variance(X)}}={\\bar {v}}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}\\left(X\_{i}-{\\bar {x}}\\right)^{2}} ![{\\displaystyle {\\text{sample variance(X)}}={\\bar {v}}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}\\left(X\_{i}-{\\bar {x}}\\right)^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b4817e917d747026cb3b0aeb8534247e3e67d7a5) be the [sample variance](https://en.wikipedia.org/wiki/Sample_variance "Sample variance") estimate. The [method-of-moments](https://en.wikipedia.org/wiki/Method_of_moments_\(statistics\) "Method of moments (statistics)") estimates of the parameters are α ^ \= x ¯ ( x ¯ ( 1 − x ¯ ) v ¯ − 1 ) if v ¯ \< x ¯ ( 1 − x ¯ ) , {\\displaystyle {\\hat {\\alpha }}={\\bar {x}}\\left({\\frac {{\\bar {x}}(1-{\\bar {x}})}{\\bar {v}}}-1\\right)\\ {\\text{if}}\\ {\\bar {v}}\<{\\bar {x}}(1-{\\bar {x}}),} ![{\\displaystyle {\\hat {\\alpha }}={\\bar {x}}\\left({\\frac {{\\bar {x}}(1-{\\bar {x}})}{\\bar {v}}}-1\\right)\\ {\\text{if}}\\ {\\bar {v}}\<{\\bar {x}}(1-{\\bar {x}}),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6897ae5cc4c9863166b6c18367d324cbf50b8086) β ^ \= ( 1 − x ¯ ) ( x ¯ ( 1 − x ¯ ) v ¯ − 1 ) if v ¯ \< x ¯ ( 1 − x ¯ ) . {\\displaystyle {\\hat {\\beta }}=(1-{\\bar {x}})\\left({\\frac {{\\bar {x}}(1-{\\bar {x}})}{\\bar {v}}}-1\\right)\\ {\\text{if}}\\ {\\bar {v}}\<{\\bar {x}}(1-{\\bar {x}}).} ![{\\displaystyle {\\hat {\\beta }}=(1-{\\bar {x}})\\left({\\frac {{\\bar {x}}(1-{\\bar {x}})}{\\bar {v}}}-1\\right)\\ {\\text{if}}\\ {\\bar {v}}\<{\\bar {x}}(1-{\\bar {x}}).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/622e18f3fe8fa5a34032e278238eff46fcd5f646) When the distribution is required over a known interval other than \[0, 1\] with random variable *X*, say \[*a*, *c*\] with random variable *Y*, then replace x ¯ {\\displaystyle {\\bar {x}}} ![{\\displaystyle {\\bar {x}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/466e03e1c9533b4dab1b9949dad393883f385d80) with y ¯ − a c − a , {\\displaystyle {\\frac {{\\bar {y}}-a}{c-a}},} ![{\\displaystyle {\\frac {{\\bar {y}}-a}{c-a}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc1c50b9065f399be03b7ccfe3e4cfd1df2c228b) and v ¯ {\\displaystyle {\\bar {v}}} ![{\\displaystyle {\\bar {v}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba1d09340f8f6c1979330c2f23e514e38f243a3b) with v Y ¯ ( c − a ) 2 {\\displaystyle {\\frac {\\bar {v\_{Y}}}{(c-a)^{2}}}} ![{\\displaystyle {\\frac {\\bar {v\_{Y}}}{(c-a)^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/891375739df6a6c363dd4836de599f29de1c790e) in the above couple of equations for the shape parameters (see the "Four unknown parameters" section below),[\[41\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-41) where: sample mean(Y) \= y ¯ \= 1 N ∑ i \= 1 N Y i {\\displaystyle {\\text{sample mean(Y)}}={\\bar {y}}={\\frac {1}{N}}\\sum \_{i=1}^{N}Y\_{i}} ![{\\displaystyle {\\text{sample mean(Y)}}={\\bar {y}}={\\frac {1}{N}}\\sum \_{i=1}^{N}Y\_{i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/064cd9b29b4931f7d973c01358f9b76979148e17) sample variance(Y) \= v ¯ Y \= 1 N − 1 ∑ i \= 1 N ( Y i − y ¯ ) 2 {\\displaystyle {\\text{sample variance(Y)}}={\\bar {v}}\_{Y}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}\\left(Y\_{i}-{\\bar {y}}\\right)^{2}} ![{\\displaystyle {\\text{sample variance(Y)}}={\\bar {v}}\_{Y}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}\\left(Y\_{i}-{\\bar {y}}\\right)^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/06ce4bb898c414da5e456519e24e270244e3d401) ##### Four unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=53 "Edit section: Four unknown parameters")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:\(alpha_and_beta\)_Parameter_estimates_vs._excess_Kurtosis_and_\(squared\)_Skewness_Beta_distribution_-_J._Rodal.png) Solutions for parameter estimates vs. (sample) excess Kurtosis and (sample) squared Skewness Beta distribution All four parameters (α ^ , β ^ , a ^ , c ^ {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }},{\\hat {a}},{\\hat {c}}} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }},{\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8116b37df2fff6248cb3bce7dd137af10ed8e5ab) of a beta distribution supported in the \[*a*, *c*\] interval, see section ["Alternative parametrizations, Four parameters"](https://en.wikipedia.org/wiki/Beta_distribution#Four_parameters)) can be estimated, using the method of moments developed by [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson"), by equating sample and population values of the first four central moments (mean, variance, skewness and excess kurtosis).[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42)[\[43\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton_and_Johnson-43) The excess kurtosis was expressed in terms of the square of the skewness, and the sample size ν = α + β, (see previous section ["Kurtosis"](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis)) as follows: excess kurtosis \= 6 3 \+ ν ( ( 2 \+ ν ) 4 ( skewness ) 2 − 1 ) if (skewness) 2 − 2 \< excess kurtosis \< 3 2 ( skewness ) 2 {\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}\\left({\\frac {(2+\\nu )}{4}}({\\text{skewness}})^{2}-1\\right){\\text{ if (skewness)}}^{2}-2\<{\\text{excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{skewness}})^{2}} ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}\\left({\\frac {(2+\\nu )}{4}}({\\text{skewness}})^{2}-1\\right){\\text{ if (skewness)}}^{2}-2\<{\\text{excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/40ad8aea80a012f7bbb462295760a9c2c6b2ea49) One can use this equation to solve for the sample size ν= α + β in terms of the square of the skewness and the excess kurtosis as follows:[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) ν ^ \= α ^ \+ β ^ \= 3 ( sample excess kurtosis ) − ( sample skewness ) 2 \+ 2 3 2 ( sample skewness ) 2 − (sample excess kurtosis) {\\displaystyle {\\hat {\\nu }}={\\hat {\\alpha }}+{\\hat {\\beta }}=3{\\frac {({\\text{sample excess kurtosis}})-({\\text{sample skewness}})^{2}+2}{{\\frac {3}{2}}({\\text{sample skewness}})^{2}-{\\text{(sample excess kurtosis)}}}}} ![{\\displaystyle {\\hat {\\nu }}={\\hat {\\alpha }}+{\\hat {\\beta }}=3{\\frac {({\\text{sample excess kurtosis}})-({\\text{sample skewness}})^{2}+2}{{\\frac {3}{2}}({\\text{sample skewness}})^{2}-{\\text{(sample excess kurtosis)}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f921c58ebdd7caa136034aee66eb29c214a96ff0) if (sample skewness) 2 − 2 \< sample excess kurtosis \< 3 2 ( sample skewness ) 2 {\\displaystyle {\\text{ if (sample skewness)}}^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}} ![{\\displaystyle {\\text{ if (sample skewness)}}^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2) This is the ratio (multiplied by a factor of 3) between the previously derived limit boundaries for the beta distribution in a space (as originally done by Karl Pearson[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21)) defined with coordinates of the square of the skewness in one axis and the excess kurtosis in the other axis (see [§ Kurtosis bounded by the square of the skewness](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness)): The case of zero skewness, can be immediately solved because for zero skewness, *α* = *β* and hence *ν* = 2*α* = 2*β*, therefore *α* = *β* = *ν*/2 α ^ \= β ^ \= ν ^ 2 \= 3 2 ( sample excess kurtosis ) \+ 3 − (sample excess kurtosis) {\\displaystyle {\\hat {\\alpha }}={\\hat {\\beta }}={\\frac {\\hat {\\nu }}{2}}={\\frac {{\\frac {3}{2}}({\\text{sample excess kurtosis}})+3}{-{\\text{(sample excess kurtosis)}}}}} ![{\\displaystyle {\\hat {\\alpha }}={\\hat {\\beta }}={\\frac {\\hat {\\nu }}{2}}={\\frac {{\\frac {3}{2}}({\\text{sample excess kurtosis}})+3}{-{\\text{(sample excess kurtosis)}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e7f2ee658c3932894542979710e9495be0ff74a5) if sample skewness \= 0 and − 2 \< sample excess kurtosis \< 0 {\\displaystyle {\\text{ if sample skewness}}=0{\\text{ and }}-2\<{\\text{sample excess kurtosis}}\<0} ![{\\displaystyle {\\text{ if sample skewness}}=0{\\text{ and }}-2\<{\\text{sample excess kurtosis}}\<0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8f99d8fb52fee902bf792a5ca7699a79828a212c) (Excess kurtosis is negative for the beta distribution with zero skewness, ranging from -2 to 0, so that ν ^ {\\displaystyle {\\hat {\\nu }}} ![{\\displaystyle {\\hat {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba8c4f0785c6b4c01435dcc0aa5b9cfba84bb1c3) -and therefore the sample shape parameters- is positive, ranging from zero when the shape parameters approach zero and the excess kurtosis approaches -2, to infinity when the shape parameters approach infinity and the excess kurtosis approaches zero). For non-zero sample skewness one needs to solve a system of two coupled equations. Since the skewness and the excess kurtosis are independent of the parameters a ^ , c ^ {\\displaystyle {\\hat {a}},{\\hat {c}}} ![{\\displaystyle {\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bfd585fc7cfe1831b2ddc258427b2e6ca017195d), the parameters α ^ , β ^ {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) can be uniquely determined from the sample skewness and the sample excess kurtosis, by solving the coupled equations with two known variables (sample skewness and sample excess kurtosis) and two unknowns (the shape parameters): ( sample skewness ) 2 \= 4 ( β ^ − α ^ ) 2 ( 1 \+ α ^ \+ β ^ ) α ^ β ^ ( 2 \+ α ^ \+ β ^ ) 2 {\\displaystyle ({\\text{sample skewness}})^{2}={\\frac {4\\left({\\hat {\\beta }}-{\\hat {\\alpha }}\\right)^{2}\\left(1+{\\hat {\\alpha }}+{\\hat {\\beta }}\\right)}{{\\hat {\\alpha }}{\\hat {\\beta }}\\left(2+{\\hat {\\alpha }}+{\\hat {\\beta }}\\right)^{2}}}} ![{\\displaystyle ({\\text{sample skewness}})^{2}={\\frac {4\\left({\\hat {\\beta }}-{\\hat {\\alpha }}\\right)^{2}\\left(1+{\\hat {\\alpha }}+{\\hat {\\beta }}\\right)}{{\\hat {\\alpha }}{\\hat {\\beta }}\\left(2+{\\hat {\\alpha }}+{\\hat {\\beta }}\\right)^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1f4c42ca60d8120a9ca85e67c6f6e722b7cbeb8d) sample excess kurtosis \= 6 3 \+ α ^ \+ β ^ ( ( 2 \+ α ^ \+ β ^ ) 4 ( sample skewness ) 2 − 1 ) {\\displaystyle {\\text{sample excess kurtosis}}={\\frac {6}{3+{\\hat {\\alpha }}+{\\hat {\\beta }}}}\\left({\\frac {(2+{\\hat {\\alpha }}+{\\hat {\\beta }})}{4}}({\\text{sample skewness}})^{2}-1\\right)} ![{\\displaystyle {\\text{sample excess kurtosis}}={\\frac {6}{3+{\\hat {\\alpha }}+{\\hat {\\beta }}}}\\left({\\frac {(2+{\\hat {\\alpha }}+{\\hat {\\beta }})}{4}}({\\text{sample skewness}})^{2}-1\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6e903b748dd46c02966e45c5444f51314c452937) if (sample skewness) 2 − 2 \< sample excess kurtosis \< 3 2 ( sample skewness ) 2 {\\displaystyle {\\text{ if (sample skewness)}}^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}} ![{\\displaystyle {\\text{ if (sample skewness)}}^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2) resulting in the following solution:[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) α ^ , β ^ \= ν ^ 2 ( 1 ± 1 1 \+ 16 ( ν ^ \+ 1 ) ( ν ^ \+ 2 ) 2 ( sample skewness ) 2 ) {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}={\\frac {\\hat {\\nu }}{2}}\\left(1\\pm {\\frac {1}{\\sqrt {1+{\\frac {16({\\hat {\\nu }}+1)}{({\\hat {\\nu }}+2)^{2}({\\text{sample skewness}})^{2}}}}}}\\right)} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}={\\frac {\\hat {\\nu }}{2}}\\left(1\\pm {\\frac {1}{\\sqrt {1+{\\frac {16({\\hat {\\nu }}+1)}{({\\hat {\\nu }}+2)^{2}({\\text{sample skewness}})^{2}}}}}}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9a9b9a43935818dc6c3a75fb41b49d803b2d2741) if sample skewness ≠ 0 and ( sample skewness ) 2 − 2 \< sample excess kurtosis \< 3 2 ( sample skewness ) 2 {\\displaystyle {\\text{ if sample skewness}}\\neq 0{\\text{ and }}({\\text{sample skewness}})^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}} ![{\\displaystyle {\\text{ if sample skewness}}\\neq 0{\\text{ and }}({\\text{sample skewness}})^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8d876ee057046414ad35c2551c997575576d025d) Where one should take the solutions as follows: α ^ \> β ^ {\\displaystyle {\\hat {\\alpha }}\>{\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }}\>{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bba17be3bb65a91cb1d98c314aa0545401c2109) for (negative) sample skewness \< 0, and α ^ \< β ^ {\\displaystyle {\\hat {\\alpha }}\<{\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }}\<{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad87e83b4996b5b82de8452d1f861db48a0987ff) for (positive) sample skewness \> 0. The accompanying plot shows these two solutions as surfaces in a space with horizontal axes of (sample excess kurtosis) and (sample squared skewness) and the shape parameters as the vertical axis. The surfaces are constrained by the condition that the sample excess kurtosis must be bounded by the sample squared skewness as stipulated in the above equation. The two surfaces meet at the right edge defined by zero skewness. Along this right edge, both parameters are equal and the distribution is symmetric U-shaped for α = β \< 1, uniform for α = β = 1, upside-down-U-shaped for 1 \< α = β \< 2 and bell-shaped for α = β \> 2. The surfaces also meet at the front (lower) edge defined by "the impossible boundary" line (excess kurtosis + 2 - skewness2 = 0). Along this front (lower) boundary both shape parameters approach zero, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities p \= β α \+ β {\\displaystyle p={\\tfrac {\\beta }{\\alpha +\\beta }}} ![{\\displaystyle p={\\tfrac {\\beta }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and q \= 1 − p \= α α \+ β {\\displaystyle q=1-p={\\tfrac {\\alpha }{\\alpha +\\beta }}} ![{\\displaystyle q=1-p={\\tfrac {\\alpha }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1. The two surfaces become further apart towards the rear edge. At this rear edge the surface parameters are quite different from each other. As remarked, for example, by Bowman and Shenton,[\[44\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BowmanShenton-44) sampling in the neighborhood of the line (sample excess kurtosis - (3/2)(sample skewness)2 = 0) (the just-J-shaped portion of the rear edge where blue meets beige), "is dangerously near to chaos", because at that line the denominator of the expression above for the estimate ν = α + β becomes zero and hence ν approaches infinity as that line is approached. Bowman and Shenton [\[44\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BowmanShenton-44) write that "the higher moment parameters (kurtosis and skewness) are extremely fragile (near that line). However, the mean and standard deviation are fairly reliable." Therefore, the problem is for the case of four parameter estimation for very skewed distributions such that the excess kurtosis approaches (3/2) times the square of the skewness. This boundary line is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. See [§ Kurtosis bounded by the square of the skewness](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness) for a numerical example and further comments about this rear edge boundary line (sample excess kurtosis - (3/2)(sample skewness)2 = 0). As remarked by Karl Pearson himself [\[45\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1936-45) this issue may not be of much practical importance as this trouble arises only for very skewed J-shaped (or mirror-image J-shaped) distributions with very different values of shape parameters that are unlikely to occur much in practice). The usual skewed-bell-shape distributions that occur in practice do not have this parameter estimation problem. The remaining two parameters a ^ , c ^ {\\displaystyle {\\hat {a}},{\\hat {c}}} ![{\\displaystyle {\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) can be determined using the sample mean and the sample variance using a variety of equations.[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) One alternative is to calculate the support interval range ( c ^ − a ^ ) {\\displaystyle ({\\hat {c}}-{\\hat {a}})} ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample kurtosis. For this purpose one can solve, in terms of the range ( c ^ − a ^ ) {\\displaystyle ({\\hat {c}}-{\\hat {a}})} ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the excess kurtosis in terms of the sample variance, and the sample size ν (see [§ Kurtosis](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis) and [§ Alternative parametrizations, four parameters](https://en.wikipedia.org/wiki/Beta_distribution#Alternative_parametrizations,_four_parameters)): sample excess kurtosis \= 6 ( 3 \+ ν ^ ) ( 2 \+ ν ^ ) ( ( c ^ − a ^ ) 2 (sample variance) − 6 − 5 ν ^ ) {\\displaystyle {\\text{sample excess kurtosis}}={\\frac {6}{(3+{\\hat {\\nu }})(2+{\\hat {\\nu }})}}{\\bigg (}{\\frac {({\\hat {c}}-{\\hat {a}})^{2}}{\\text{(sample variance)}}}-6-5{\\hat {\\nu }}{\\bigg )}} ![{\\displaystyle {\\text{sample excess kurtosis}}={\\frac {6}{(3+{\\hat {\\nu }})(2+{\\hat {\\nu }})}}{\\bigg (}{\\frac {({\\hat {c}}-{\\hat {a}})^{2}}{\\text{(sample variance)}}}-6-5{\\hat {\\nu }}{\\bigg )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0c460925d92ea38bdf5416e50b2dff13d5293329) to obtain: ( c ^ − a ^ ) \= (sample variance) 6 \+ 5 ν ^ \+ ( 2 \+ ν ^ ) ( 3 \+ ν ^ ) 6 (sample excess kurtosis) {\\displaystyle ({\\hat {c}}-{\\hat {a}})={\\sqrt {\\text{(sample variance)}}}{\\sqrt {6+5{\\hat {\\nu }}+{\\frac {(2+{\\hat {\\nu }})(3+{\\hat {\\nu }})}{6}}{\\text{(sample excess kurtosis)}}}}} ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})={\\sqrt {\\text{(sample variance)}}}{\\sqrt {6+5{\\hat {\\nu }}+{\\frac {(2+{\\hat {\\nu }})(3+{\\hat {\\nu }})}{6}}{\\text{(sample excess kurtosis)}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7cdf062d6ad446f67133c1f259b23d81dc80c0e9) Another alternative is to calculate the support interval range ( c ^ − a ^ ) {\\displaystyle ({\\hat {c}}-{\\hat {a}})} ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample skewness.[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) For this purpose one can solve, in terms of the range ( c ^ − a ^ ) {\\displaystyle ({\\hat {c}}-{\\hat {a}})} ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the squared skewness in terms of the sample variance, and the sample size ν (see section titled "Skewness" and "Alternative parametrizations, four parameters"): ( sample skewness ) 2 \= 4 ( 2 \+ ν ^ ) 2 ( ( c ^ − a ^ ) 2 (sample variance) − 4 ( 1 \+ ν ^ ) ) {\\displaystyle ({\\text{sample skewness}})^{2}={\\frac {4}{(2+{\\hat {\\nu }})^{2}}}{\\bigg (}{\\frac {({\\hat {c}}-{\\hat {a}})^{2}}{\\text{(sample variance)}}}-4(1+{\\hat {\\nu }}){\\bigg )}} ![{\\displaystyle ({\\text{sample skewness}})^{2}={\\frac {4}{(2+{\\hat {\\nu }})^{2}}}{\\bigg (}{\\frac {({\\hat {c}}-{\\hat {a}})^{2}}{\\text{(sample variance)}}}-4(1+{\\hat {\\nu }}){\\bigg )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d92409417c7a56192ac25d364a949bbb6eade754) to obtain:[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) ( c ^ − a ^ ) \= (sample variance) 2 ( 2 \+ ν ^ ) 2 ( sample skewness ) 2 \+ 16 ( 1 \+ ν ^ ) {\\displaystyle ({\\hat {c}}-{\\hat {a}})={\\frac {\\sqrt {\\text{(sample variance)}}}{2}}{\\sqrt {(2+{\\hat {\\nu }})^{2}({\\text{sample skewness}})^{2}+16(1+{\\hat {\\nu }})}}} ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})={\\frac {\\sqrt {\\text{(sample variance)}}}{2}}{\\sqrt {(2+{\\hat {\\nu }})^{2}({\\text{sample skewness}})^{2}+16(1+{\\hat {\\nu }})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/94357c829648f16ff30339863e46cb9bc0755da6) The remaining parameter can be determined from the sample mean and the previously obtained parameters: ( c ^ − a ^ ) , α ^ , ν ^ \= α ^ \+ β ^ {\\displaystyle ({\\hat {c}}-{\\hat {a}}),{\\hat {\\alpha }},{\\hat {\\nu }}={\\hat {\\alpha }}+{\\hat {\\beta }}} ![{\\displaystyle ({\\hat {c}}-{\\hat {a}}),{\\hat {\\alpha }},{\\hat {\\nu }}={\\hat {\\alpha }}+{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/dedfbdca756f6074846d73f732b0289a0751749b): a ^ \= ( sample mean ) − ( α ^ ν ^ ) ( c ^ − a ^ ) {\\displaystyle {\\hat {a}}=({\\text{sample mean}})-\\left({\\frac {\\hat {\\alpha }}{\\hat {\\nu }}}\\right)({\\hat {c}}-{\\hat {a}})} ![{\\displaystyle {\\hat {a}}=({\\text{sample mean}})-\\left({\\frac {\\hat {\\alpha }}{\\hat {\\nu }}}\\right)({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5cd4a8f52bbe61a10c591db75f2ac8551280d692) and finally, c ^ \= ( c ^ − a ^ ) \+ a ^ {\\displaystyle {\\hat {c}}=({\\hat {c}}-{\\hat {a}})+{\\hat {a}}} ![{\\displaystyle {\\hat {c}}=({\\hat {c}}-{\\hat {a}})+{\\hat {a}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a703629c1b8091aacfe6a7f8f104ee1f592893db). In the above formulas one may take, for example, as estimates of the sample moments: sample mean \= y ¯ \= 1 N ∑ i \= 1 N Y i sample variance \= v ¯ Y \= 1 N − 1 ∑ i \= 1 N ( Y i − y ¯ ) 2 sample skewness \= G 1 \= N ( N − 1 ) ( N − 2 ) ∑ i \= 1 N ( Y i − y ¯ ) 3 v ¯ Y 3 2 sample excess kurtosis \= G 2 \= N ( N \+ 1 ) ( N − 1 ) ( N − 2 ) ( N − 3 ) ∑ i \= 1 N ( Y i − y ¯ ) 4 v ¯ Y 2 − 3 ( N − 1 ) 2 ( N − 2 ) ( N − 3 ) {\\displaystyle {\\begin{aligned}{\\text{sample mean}}&={\\overline {y}}={\\frac {1}{N}}\\sum \_{i=1}^{N}Y\_{i}\\\\{\\text{sample variance}}&={\\overline {v}}\_{Y}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{2}\\\\{\\text{sample skewness}}&=G\_{1}={\\frac {N}{(N-1)(N-2)}}{\\frac {\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{3}}{{\\overline {v}}\_{Y}^{\\frac {3}{2}}}}\\\\{\\text{sample excess kurtosis}}&=G\_{2}={\\frac {N(N+1)}{(N-1)(N-2)(N-3)}}{\\frac {\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{4}}{{\\overline {v}}\_{Y}^{2}}}-{\\frac {3(N-1)^{2}}{(N-2)(N-3)}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\text{sample mean}}&={\\overline {y}}={\\frac {1}{N}}\\sum \_{i=1}^{N}Y\_{i}\\\\{\\text{sample variance}}&={\\overline {v}}\_{Y}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{2}\\\\{\\text{sample skewness}}&=G\_{1}={\\frac {N}{(N-1)(N-2)}}{\\frac {\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{3}}{{\\overline {v}}\_{Y}^{\\frac {3}{2}}}}\\\\{\\text{sample excess kurtosis}}&=G\_{2}={\\frac {N(N+1)}{(N-1)(N-2)(N-3)}}{\\frac {\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{4}}{{\\overline {v}}\_{Y}^{2}}}-{\\frac {3(N-1)^{2}}{(N-2)(N-3)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7afd9f1a8604887fe11cf57117dfea6848023586) The estimators *G*1 for [sample skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") and *G*2 for [sample kurtosis](https://en.wikipedia.org/wiki/Kurtosis "Kurtosis") are used by [DAP](https://en.wikipedia.org/wiki/DAP_\(software\) "DAP (software)")/[SAS](https://en.wikipedia.org/wiki/SAS_System "SAS System"), [PSPP](https://en.wikipedia.org/wiki/PSPP "PSPP")/[SPSS](https://en.wikipedia.org/wiki/SPSS "SPSS"), and [Excel](https://en.wikipedia.org/wiki/Microsoft_Excel "Microsoft Excel"). However, they are not used by [BMDP](https://en.wikipedia.org/wiki/BMDP "BMDP") and (according to [\[46\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Joanes_and_Gill-46)) they were not used by [MINITAB](https://en.wikipedia.org/wiki/MINITAB "MINITAB") in 1998. Actually, Joanes and Gill in their 1998 study[\[46\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Joanes_and_Gill-46) concluded that the skewness and kurtosis estimators used in [BMDP](https://en.wikipedia.org/wiki/BMDP "BMDP") and in [MINITAB](https://en.wikipedia.org/wiki/MINITAB "MINITAB") (at that time) had smaller variance and mean-squared error in normal samples, but the skewness and kurtosis estimators used in [DAP](https://en.wikipedia.org/wiki/DAP_\(software\) "DAP (software)")/[SAS](https://en.wikipedia.org/wiki/SAS_System "SAS System"), [PSPP](https://en.wikipedia.org/wiki/PSPP "PSPP")/[SPSS](https://en.wikipedia.org/wiki/SPSS "SPSS"), namely *G*1 and *G*2, had smaller mean-squared error in samples from a very skewed distribution. It is for this reason that we have spelled out "sample skewness", etc., in the above formulas, to make it explicit that the user should choose the best estimator according to the problem at hand, as the best estimator for skewness and kurtosis depends on the amount of skewness (as shown by Joanes and Gill[\[46\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Joanes_and_Gill-46)). #### Maximum likelihood \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=54 "Edit section: Maximum likelihood")\] ##### Two unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=55 "Edit section: Two unknown parameters")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Max_\(Joint_Log_Likelihood_per_N\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png) Max (joint log likelihood/*N*) for beta distribution maxima at *α* = *β* = 2 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Max_\(Joint_Log_Likelihood_per_N\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25,0.5,1,2,4,6,8_-_J._Rodal.png) Max (joint log likelihood/*N*) for Beta distribution maxima at *α* = *β* ∈ {0.25,0.5,1,2,4,6,8} As is also the case for [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimates for the [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution"), the maximum likelihood estimates for the beta distribution do not have a general closed form solution for arbitrary values of the shape parameters. If *X*1, ..., *XN* are independent random variables each having a beta distribution, the joint log likelihood function for *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: ln L ( α , β ∣ X ) \= ∑ i \= 1 N ln ⁡ L i ( α , β ∣ X i ) \= ∑ i \= 1 N ln ⁡ f ( X i ; α , β ) \= ∑ i \= 1 N ln ⁡ X i α − 1 ( 1 − X i ) β − 1 B ( α , β ) \= ( α − 1 ) ∑ i \= 1 N ln ⁡ X i \+ ( β − 1 ) ∑ i \= 1 N ln ⁡ ( 1 − X i ) − N ln ⁡ B ( α , β ) {\\displaystyle {\\begin{aligned}\\ln \\,{\\mathcal {L}}(\\alpha ,\\beta \\mid X)&=\\sum \_{i=1}^{N}\\ln {\\mathcal {L}}\_{i}(\\alpha ,\\beta \\mid X\_{i})\\\\&=\\sum \_{i=1}^{N}\\ln f(X\_{i};\\alpha ,\\beta )\\\\&=\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}^{\\alpha -1}(1-X\_{i})^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\\\&=(\\alpha -1)\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1)\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\ln \\,{\\mathcal {L}}(\\alpha ,\\beta \\mid X)&=\\sum \_{i=1}^{N}\\ln {\\mathcal {L}}\_{i}(\\alpha ,\\beta \\mid X\_{i})\\\\&=\\sum \_{i=1}^{N}\\ln f(X\_{i};\\alpha ,\\beta )\\\\&=\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}^{\\alpha -1}(1-X\_{i})^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\\\&=(\\alpha -1)\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1)\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ca9d8693adcca31abd266e44f7de59dffc5e0b17) Finding the maximum with respect to a shape parameter involves taking the [partial derivative](https://en.wikipedia.org/wiki/Partial_derivative "Partial derivative") with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator of the shape parameters: ∂ ln ⁡ L ( α , β ∣ X ) ∂ α \= ∑ i \= 1 N ln ⁡ X i − N ∂ ln ⁡ B ( α , β ) ∂ α \= 0 {\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha }}=\\sum \_{i=1}^{N}\\ln X\_{i}-N{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}=0} ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha }}=\\sum \_{i=1}^{N}\\ln X\_{i}-N{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d936dd94d5ad4b3c27e12654cb07764bcead5284) ∂ ln ⁡ L ( α , β ∣ X ) ∂ β \= ∑ i \= 1 N ln ⁡ ( 1 − X i ) − N ∂ ln ⁡ B ( α , β ) ∂ β \= 0 {\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta }}=\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta }}=0} ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta }}=\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta }}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/12538d1b65457b831fbd8127a4820d24632decc1) where: ∂ ln ⁡ B ( α , β ) ∂ α \= − ∂ ln ⁡ Γ ( α \+ β ) ∂ α \+ ∂ ln ⁡ Γ ( α ) ∂ α \+ ∂ ln ⁡ Γ ( β ) ∂ α \= − ψ ( α \+ β ) \+ ψ ( α ) \+ 0 {\\displaystyle {\\begin{aligned}{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}&=-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\alpha }}+{\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}+{\\frac {\\partial \\ln \\Gamma (\\beta )}{\\partial \\alpha }}\\\\\[1ex\]&=-\\psi (\\alpha +\\beta )+\\psi (\\alpha )+0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}&=-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\alpha }}+{\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}+{\\frac {\\partial \\ln \\Gamma (\\beta )}{\\partial \\alpha }}\\\\\[1ex\]&=-\\psi (\\alpha +\\beta )+\\psi (\\alpha )+0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/82bf10edac73617ec99a3adbad3ff020391c4a71) ∂ ln ⁡ B ( α , β ) ∂ β \= − ∂ ln ⁡ Γ ( α \+ β ) ∂ β \+ ∂ ln ⁡ Γ ( α ) ∂ β \+ ∂ ln ⁡ Γ ( β ) ∂ β \= − ψ ( α \+ β ) \+ 0 \+ ψ ( β ) {\\displaystyle {\\begin{aligned}{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta }}&=-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\beta }}+{\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\beta }}+{\\frac {\\partial \\ln \\Gamma (\\beta )}{\\partial \\beta }}\\\\\[1ex\]&=-\\psi (\\alpha +\\beta )+0+\\psi (\\beta )\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta }}&=-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\beta }}+{\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\beta }}+{\\frac {\\partial \\ln \\Gamma (\\beta )}{\\partial \\beta }}\\\\\[1ex\]&=-\\psi (\\alpha +\\beta )+0+\\psi (\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/10b097547a81011b4e212824977d66b5350dc780) since the **[digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function")** denoted ψ(α) is defined as the [logarithmic derivative](https://en.wikipedia.org/wiki/Logarithmic_derivative "Logarithmic derivative") of the [gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"):[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18) ψ ( α ) \= ∂ ln ⁡ Γ ( α ) ∂ α {\\displaystyle \\psi (\\alpha )={\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}} ![{\\displaystyle \\psi (\\alpha )={\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8a36357d4b6ef30c0ff68e5a25546b7873e11bd4) To ensure that the values with zero tangent slope are indeed a maximum (instead of a saddle-point or a minimum) one has to also satisfy the condition that the curvature is negative. This amounts to satisfying that the second partial derivative with respect to the shape parameters is negative ∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ α 2 \= − N ∂ 2 ln ⁡ B ( α , β ) ∂ α 2 \< 0 {\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha ^{2}}}=-N{\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha ^{2}}}\<0} ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha ^{2}}}=-N{\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha ^{2}}}\<0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4179be42b0a1f18afa18258ee78745457a592e5e) ∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ β 2 \= − N ∂ 2 ln ⁡ B ( α , β ) ∂ β 2 \< 0 {\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta ^{2}}}=-N{\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta ^{2}}}\<0} ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta ^{2}}}=-N{\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta ^{2}}}\<0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bf0985833d003549771de21323f1db4aee770c7a) using the previous equations, this is equivalent to: ∂ 2 ln ⁡ B ( α , β ) ∂ α 2 \= ψ 1 ( α ) − ψ 1 ( α \+ β ) \> 0 {\\displaystyle {\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha ^{2}}}=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )\>0} ![{\\displaystyle {\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha ^{2}}}=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1e6864a9409f9bba6a7bdda1e43695bad6c61cba) ∂ 2 ln ⁡ B ( α , β ) ∂ β 2 \= ψ 1 ( β ) − ψ 1 ( α \+ β ) \> 0 {\\displaystyle {\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta ^{2}}}=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )\>0} ![{\\displaystyle {\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta ^{2}}}=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c8355a7e7f6fa44f71e366b668191826ad5b051b) where the **[trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), and is defined as the derivative of the [digamma](https://en.wikipedia.org/wiki/Digamma "Digamma") function: ψ 1 ( α ) \= ∂ 2 ln ⁡ Γ ( α ) ∂ α 2 \= ∂ ψ ( α ) ∂ α . {\\displaystyle \\psi \_{1}(\\alpha )={\\frac {\\partial ^{2}\\ln \\Gamma (\\alpha )}{\\partial \\alpha ^{2}}}=\\,{\\frac {\\partial \\,\\psi (\\alpha )}{\\partial \\alpha }}.} ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {\\partial ^{2}\\ln \\Gamma (\\alpha )}{\\partial \\alpha ^{2}}}=\\,{\\frac {\\partial \\,\\psi (\\alpha )}{\\partial \\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/569f3595daf4226763ad208ce13b94c23b84c783) These conditions are equivalent to stating that the variances of the logarithmically transformed variables are positive, since: var ⁡ \[ ln ⁡ ( X ) \] \= E ⁡ \[ ln 2 ⁡ ( X ) \] − ( E ⁡ \[ ln ⁡ ( X ) \] ) 2 \= ψ 1 ( α ) − ψ 1 ( α \+ β ) {\\displaystyle \\operatorname {var} \[\\ln(X)\]=\\operatorname {E} \[\\ln ^{2}(X)\]-(\\operatorname {E} \[\\ln(X)\])^{2}=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\operatorname {var} \[\\ln(X)\]=\\operatorname {E} \[\\ln ^{2}(X)\]-(\\operatorname {E} \[\\ln(X)\])^{2}=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d7737d681fea7490e27f8760c6bcc8fccb154904) var ⁡ \[ ln ⁡ ( 1 − X ) \] \= E ⁡ \[ ln 2 ⁡ ( 1 − X ) \] − ( E ⁡ \[ ln ⁡ ( 1 − X ) \] ) 2 \= ψ 1 ( β ) − ψ 1 ( α \+ β ) {\\displaystyle \\operatorname {var} \[\\ln(1-X)\]=\\operatorname {E} \[\\ln ^{2}(1-X)\]-(\\operatorname {E} \[\\ln(1-X)\])^{2}=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )} ![{\\displaystyle \\operatorname {var} \[\\ln(1-X)\]=\\operatorname {E} \[\\ln ^{2}(1-X)\]-(\\operatorname {E} \[\\ln(1-X)\])^{2}=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f84c3747955206cf2190c61bd7875a6cd739ac04) Therefore, the condition of negative curvature at a maximum is equivalent to the statements: var ⁡ \[ ln ⁡ ( X ) \] \> 0 {\\displaystyle \\operatorname {var} \[\\ln(X)\]\>0} ![{\\displaystyle \\operatorname {var} \[\\ln(X)\]\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5fb5a5d0db057469fb9dad8df2902fe93e3f3b0d) var ⁡ \[ ln ⁡ ( 1 − X ) \] \> 0 {\\displaystyle \\operatorname {var} \[\\ln(1-X)\]\>0} ![{\\displaystyle \\operatorname {var} \[\\ln(1-X)\]\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c66a5aaa8362f578beb9b141b4108138b9d21e89) Alternatively, the condition of negative curvature at a maximum is also equivalent to stating that the following [logarithmic derivatives](https://en.wikipedia.org/wiki/Logarithmic_derivative "Logarithmic derivative") of the [geometric means](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") *GX* and *G(1−X)* are positive, since: ψ 1 ( α ) − ψ 1 ( α \+ β ) \= ∂ ln ⁡ G X ∂ α \> 0 {\\displaystyle \\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\frac {\\partial \\ln G\_{X}}{\\partial \\alpha }}\>0} ![{\\displaystyle \\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\frac {\\partial \\ln G\_{X}}{\\partial \\alpha }}\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/dff4369c6551204082795c96a76d02e1c3c09f1d) ψ 1 ( β ) − ψ 1 ( α \+ β ) \= ∂ ln ⁡ G ( 1 − X ) ∂ β \> 0 {\\displaystyle \\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\frac {\\partial \\ln G\_{(1-X)}}{\\partial \\beta }}\>0} ![{\\displaystyle \\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\frac {\\partial \\ln G\_{(1-X)}}{\\partial \\beta }}\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8032acad733eb8da414442492eba553fc57b91ee) While these slopes are indeed positive, the other slopes are negative: ∂ ln ⁡ G X ∂ β , ∂ ln ⁡ G 1 − X ∂ α \< 0\. {\\displaystyle {\\frac {\\partial \\,\\ln G\_{X}}{\\partial \\beta }},{\\frac {\\partial \\ln G\_{1-X}}{\\partial \\alpha }}\<0.} ![{\\displaystyle {\\frac {\\partial \\,\\ln G\_{X}}{\\partial \\beta }},{\\frac {\\partial \\ln G\_{1-X}}{\\partial \\alpha }}\<0.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/73da64f6bf33fd6c54b1f14562bb49f420a3d9b5) The slopes of the mean and the median with respect to *α* and *β* display similar sign behavior. From the condition that at a maximum, the partial derivative with respect to the shape parameter equals zero, we obtain the following system of coupled [maximum likelihood estimate](https://en.wikipedia.org/wiki/Maximum_likelihood_estimate "Maximum likelihood estimate") equations (for the average log-likelihoods) that needs to be inverted to obtain the (unknown) shape parameter estimates α ^ , β ^ {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) in terms of the (known) average of logarithms of the samples *X*1, ..., *XN*:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) E ^ \[ ln ⁡ ( X ) \] \= ψ ( α ^ ) − ψ ( α ^ \+ β ^ ) \= 1 N ∑ i \= 1 N ln ⁡ X i \= ln ⁡ G ^ X E ^ \[ ln ⁡ ( 1 − X ) \] \= ψ ( β ^ ) − ψ ( α ^ \+ β ^ ) \= 1 N ∑ i \= 1 N ln ⁡ ( 1 − X i ) \= ln ⁡ G ^ 1 − X {\\displaystyle {\\begin{aligned}{\\hat {\\operatorname {E} }}\[\\ln(X)\]&=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}=\\ln {\\hat {G}}\_{X}\\\\{\\hat {\\operatorname {E} }}\[\\ln(1-X)\]&=\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln(1-X\_{i})=\\ln {\\hat {G}}\_{1-X}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\hat {\\operatorname {E} }}\[\\ln(X)\]&=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}=\\ln {\\hat {G}}\_{X}\\\\{\\hat {\\operatorname {E} }}\[\\ln(1-X)\]&=\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln(1-X\_{i})=\\ln {\\hat {G}}\_{1-X}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42f099a4869ce2d200dc80c7675a237caae021e6) where we recognize log ⁡ G ^ X {\\displaystyle \\log {\\hat {G}}\_{X}} ![{\\displaystyle \\log {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c12392eb5835891720681a6588ad3f5de311c23e) as the logarithm of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") and log ⁡ G ^ 1 − X {\\displaystyle \\log {\\hat {G}}\_{1-X}} ![{\\displaystyle \\log {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f5d621e56a0238c0b3ae087ae15061fbd94a201c) as the logarithm of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") based on (1 − *X*), the mirror-image of *X*. For α ^ \= β ^ {\\displaystyle {\\hat {\\alpha }}={\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }}={\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c8c3e8e6b17efa205ed99e5cdb6b9c673f0c1cd4), it follows that G ^ X \= G ^ 1 − X {\\displaystyle {\\hat {G}}\_{X}={\\hat {G}}\_{1-X}} ![{\\displaystyle {\\hat {G}}\_{X}={\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5159708756a9966cd39e45bd9256d1a613f9440b). G ^ X \= ∏ i \= 1 N ( X i ) 1 / N G ^ 1 − X \= ∏ i \= 1 N ( 1 − X i ) 1 / N {\\displaystyle {\\begin{aligned}{\\hat {G}}\_{X}&=\\prod \_{i=1}^{N}(X\_{i})^{1/N}\\\\{\\hat {G}}\_{1-X}&=\\prod \_{i=1}^{N}(1-X\_{i})^{1/N}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\hat {G}}\_{X}&=\\prod \_{i=1}^{N}(X\_{i})^{1/N}\\\\{\\hat {G}}\_{1-X}&=\\prod \_{i=1}^{N}(1-X\_{i})^{1/N}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/77612533161d151c46ed38d52609f9ca1ddaab44) These coupled equations containing [digamma functions](https://en.wikipedia.org/wiki/Digamma_function "Digamma function") of the shape parameter estimates α ^ , β ^ {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) must be solved by numerical methods as done, for example, by Beckman et al.[\[47\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-47) Gnanadesikan et al. give numerical solutions for a few cases.[\[48\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-48) [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) suggest that for "not too small" shape parameter estimates α ^ , β ^ {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07), the logarithmic approximation to the digamma function ψ ( α ^ ) ≈ ln ⁡ ( α ^ − 1 2 ) {\\displaystyle \\psi ({\\hat {\\alpha }})\\approx \\ln({\\hat {\\alpha }}-{\\tfrac {1}{2}})} ![{\\displaystyle \\psi ({\\hat {\\alpha }})\\approx \\ln({\\hat {\\alpha }}-{\\tfrac {1}{2}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a3ec1442b30cbf1dc905e33ad571d2e53326bba1) may be used to obtain initial values for an iterative solution, since the equations resulting from this approximation can be solved exactly: ln ⁡ α ^ − 1 2 α ^ \+ β ^ − 1 2 ≈ ln ⁡ G ^ X {\\displaystyle \\ln {\\frac {{\\hat {\\alpha }}-{\\frac {1}{2}}}{{\\hat {\\alpha }}+{\\hat {\\beta }}-{\\frac {1}{2}}}}\\approx \\ln {\\hat {G}}\_{X}} ![{\\displaystyle \\ln {\\frac {{\\hat {\\alpha }}-{\\frac {1}{2}}}{{\\hat {\\alpha }}+{\\hat {\\beta }}-{\\frac {1}{2}}}}\\approx \\ln {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5aa1c1d36d619455ea3d54bf713efe79ebeafea1) ln ⁡ β ^ − 1 2 α ^ \+ β ^ − 1 2 ≈ ln ⁡ G ^ 1 − X {\\displaystyle \\ln {\\frac {{\\hat {\\beta }}-{\\frac {1}{2}}}{{\\hat {\\alpha }}+{\\hat {\\beta }}-{\\frac {1}{2}}}}\\approx \\ln {\\hat {G}}\_{1-X}} ![{\\displaystyle \\ln {\\frac {{\\hat {\\beta }}-{\\frac {1}{2}}}{{\\hat {\\alpha }}+{\\hat {\\beta }}-{\\frac {1}{2}}}}\\approx \\ln {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ab9f6dd9b841f20d8cbba1c6c55709ce08905e8) which leads to the following solution for the initial values (of the estimate shape parameters in terms of the sample geometric means) for an iterative solution: α ^ ≈ 1 2 \+ G ^ X 2 ( 1 − G ^ X − G ^ 1 − X ) if α ^ \> 1 {\\displaystyle {\\hat {\\alpha }}\\approx {\\frac {1}{2}}+{\\frac {{\\hat {G}}\_{X}}{2\\left(1-{\\hat {G}}\_{X}-{\\hat {G}}\_{1-X}\\right)}}{\\text{ if }}{\\hat {\\alpha }}\>1} ![{\\displaystyle {\\hat {\\alpha }}\\approx {\\frac {1}{2}}+{\\frac {{\\hat {G}}\_{X}}{2\\left(1-{\\hat {G}}\_{X}-{\\hat {G}}\_{1-X}\\right)}}{\\text{ if }}{\\hat {\\alpha }}\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/983d617675f31cf20c27ac710259e57964c33c32) β ^ ≈ 1 2 \+ G ^ 1 − X 2 ( 1 − G ^ X − G ^ 1 − X ) if β ^ \> 1 {\\displaystyle {\\hat {\\beta }}\\approx {\\frac {1}{2}}+{\\frac {{\\hat {G}}\_{1-X}}{2\\left(1-{\\hat {G}}\_{X}-{\\hat {G}}\_{1-X}\\right)}}{\\text{ if }}{\\hat {\\beta }}\>1} ![{\\displaystyle {\\hat {\\beta }}\\approx {\\frac {1}{2}}+{\\frac {{\\hat {G}}\_{1-X}}{2\\left(1-{\\hat {G}}\_{X}-{\\hat {G}}\_{1-X}\\right)}}{\\text{ if }}{\\hat {\\beta }}\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/540effc6d6fe123a575d06e5502edc57188d8afa) Alternatively, the estimates provided by the method of moments can instead be used as initial values for an iterative solution of the maximum likelihood coupled equations in terms of the digamma functions. When the distribution is required over a known interval other than \[0, 1\] with random variable *X*, say \[*a*, *c*\] with random variable *Y*, then replace ln(*Xi*) in the first equation with ln ⁡ Y i − a c − a , {\\displaystyle \\ln {\\frac {Y\_{i}-a}{c-a}},} ![{\\displaystyle \\ln {\\frac {Y\_{i}-a}{c-a}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6321d786e8900bc2fe0be4b9eca1e856fe524633) and replace ln(1−*Xi*) in the second equation with ln ⁡ c − Y i c − a {\\displaystyle \\ln {\\frac {c-Y\_{i}}{c-a}}} ![{\\displaystyle \\ln {\\frac {c-Y\_{i}}{c-a}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f56e30b510f8089a8d4fe6c41fca7ec41e6cb056) (see "Alternative parametrizations, four parameters" section below). If one of the shape parameters is known, the problem is considerably simplified. The following [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation can be used to solve for the unknown shape parameter (for skewed cases such that α ^ ≠ β ^ {\\displaystyle {\\hat {\\alpha }}\\neq {\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }}\\neq {\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/88de4dc6f2131efeb9861f9db76d8969f4d87db8), otherwise, if symmetric, both -equal- parameters are known when one is known): E ^ \[ ln ⁡ X 1 − X \] \= ψ ( α ^ ) − ψ ( β ^ ) \= 1 N ∑ i \= 1 N ln ⁡ X i 1 − X i \= ln ⁡ G ^ X − ln ⁡ G ^ 1 − X {\\displaystyle {\\hat {\\operatorname {E} }}\\left\[\\ln {\\frac {X}{1-X}}\\right\]=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}}{1-X\_{i}}}=\\ln {\\hat {G}}\_{X}-\\ln {\\hat {G}}\_{1-X}} ![{\\displaystyle {\\hat {\\operatorname {E} }}\\left\[\\ln {\\frac {X}{1-X}}\\right\]=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}}{1-X\_{i}}}=\\ln {\\hat {G}}\_{X}-\\ln {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ca41c85f9e8cd1b427e96fd209fea0522c951d65) This [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation is the logarithm of the transformation that divides the variable *X* by its mirror-image (*X*/(1 - *X*) resulting in the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")) with support \[0, +∞). As previously discussed in the section "Moments of logarithmically transformed random variables," the [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation ln ⁡ X 1 − X {\\displaystyle \\ln {\\frac {X}{1-X}}} ![{\\displaystyle \\ln {\\frac {X}{1-X}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/90cf9c0de659980f879076aa348d83a41ab985b0), studied by Johnson,[\[25\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JohnsonLogInv-25) extends the finite support \[0, 1\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞). If, for example, β ^ {\\displaystyle {\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/efdb50e00928e4013750a476dab75eeb3cbd5799) is known, the unknown parameter α ^ {\\displaystyle {\\hat {\\alpha }}} ![{\\displaystyle {\\hat {\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682d943d1947245b587f282aba6c88f0870fb302) can be obtained in terms of the inverse[\[49\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-invpsi.m-49) digamma function of the right hand side of this equation: ψ ( α ^ ) \= 1 N ∑ i \= 1 N ln ⁡ X i 1 − X i \+ ψ ( β ^ ) {\\displaystyle \\psi ({\\hat {\\alpha }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}}{1-X\_{i}}}+\\psi ({\\hat {\\beta }})} ![{\\displaystyle \\psi ({\\hat {\\alpha }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}}{1-X\_{i}}}+\\psi ({\\hat {\\beta }})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/29fdbc8a523905ff0f3b6f20200f8e5b4c258cad) α ^ \= ψ − 1 ( ln ⁡ G ^ X − ln ⁡ G ^ ( 1 − X ) \+ ψ ( β ^ ) ) {\\displaystyle {\\hat {\\alpha }}=\\psi ^{-1}\\left(\\ln {\\hat {G}}\_{X}-\\ln {\\hat {G}}\_{(1-X)}+\\psi ({\\hat {\\beta }})\\right)} ![{\\displaystyle {\\hat {\\alpha }}=\\psi ^{-1}\\left(\\ln {\\hat {G}}\_{X}-\\ln {\\hat {G}}\_{(1-X)}+\\psi ({\\hat {\\beta }})\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/73f0a716979c28c7040b8c5595e3d4186d9aec0a) In particular, if one of the shape parameters has a value of unity, for example for β ^ \= 1 {\\displaystyle {\\hat {\\beta }}=1} ![{\\displaystyle {\\hat {\\beta }}=1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a965585e069f798d68c78b5088a53097d4a338b7) (the power function distribution with bounded support \[0,1\]), using the identity ψ(*x* + 1) = ψ(*x*) + 1/*x* in the equation ψ ( α ^ ) − ψ ( α ^ \+ β ^ ) \= ln ⁡ G ^ X {\\displaystyle \\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{X}} ![{\\displaystyle \\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f58f925d2ede5013a91f3206874b2cabc827cb30), the maximum likelihood estimator for the unknown parameter α ^ {\\displaystyle {\\hat {\\alpha }}} ![{\\displaystyle {\\hat {\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682d943d1947245b587f282aba6c88f0870fb302) is,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) exactly: α ^ \= − 1 1 N ∑ i \= 1 N ln ⁡ X i \= − 1 ln ⁡ G ^ X {\\displaystyle {\\hat {\\alpha }}=-{\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}}}=-{\\frac {1}{\\ln {\\hat {G}}\_{X}}}} ![{\\displaystyle {\\hat {\\alpha }}=-{\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}}}=-{\\frac {1}{\\ln {\\hat {G}}\_{X}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/83d2c1dcfb9a5fee567dce7e354517f77ffa0c2f) The beta has support \[0, 1\], therefore G ^ X \< 1 {\\displaystyle {\\hat {G}}\_{X}\<1} ![{\\displaystyle {\\hat {G}}\_{X}\<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/67fd685a3f6f53b538ee1a4e8a3cb21988b806c2), and hence ( − ln ⁡ G ^ X ) \> 0 {\\displaystyle (-\\ln {\\hat {G}}\_{X})\>0} ![{\\displaystyle (-\\ln {\\hat {G}}\_{X})\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1e33bf71520d5c05a66871474c69a695632be63b), and therefore α ^ \> 0\. {\\displaystyle {\\hat {\\alpha }}\>0.} ![{\\displaystyle {\\hat {\\alpha }}\>0.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4717b1e2f228d1dcd6ef64e903f377faa8075e44) In conclusion, the maximum likelihood estimates of the shape parameters of a beta distribution are (in general) a complicated function of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean"), and of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") based on (1−*X*)), the mirror-image of *X*. One may ask, if the variance (in addition to the mean) is necessary to estimate two shape parameters with the method of moments, why is the (logarithmic or geometric) variance not necessary to estimate two shape parameters with the maximum likelihood method, for which only the geometric means suffice? The answer is because the mean does not provide as much information as the geometric mean. For a beta distribution with equal shape parameters *α* = *β*, the mean is exactly 1/2, regardless of the value of the shape parameters, and therefore regardless of the value of the statistical dispersion (the variance). On the other hand, the geometric mean of a beta distribution with equal shape parameters *α* = *β*, depends on the value of the shape parameters, and therefore it contains more information. Also, the geometric mean of a beta distribution does not satisfy the symmetry conditions satisfied by the mean, therefore, by employing both the geometric mean based on *X* and geometric mean based on (1 − *X*), the maximum likelihood method is able to provide best estimates for both parameters *α* = *β*, without need of employing the variance. One can express the joint log likelihood per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations in terms of the *[sufficient statistics](https://en.wikipedia.org/wiki/Sufficient_statistic "Sufficient statistic")* (the sample geometric means) as follows: ln ⁡ L ( α , β ∣ X ) N \= ( α − 1 ) ln ⁡ G ^ X \+ ( β − 1 ) ln ⁡ G ^ ( 1 − X ) − ln ⁡ B ( α , β ) . {\\displaystyle {\\frac {\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}=(\\alpha -1)\\ln {\\hat {G}}\_{X}+(\\beta -1)\\ln {\\hat {G}}\_{(1-X)}-\\ln \\mathrm {B} (\\alpha ,\\beta ).} ![{\\displaystyle {\\frac {\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}=(\\alpha -1)\\ln {\\hat {G}}\_{X}+(\\beta -1)\\ln {\\hat {G}}\_{(1-X)}-\\ln \\mathrm {B} (\\alpha ,\\beta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3d51aaa07c41f9f5c49c303cc059ae1aeb54489d) We can plot the joint log likelihood per *N* observations for fixed values of the sample geometric means to see the behavior of the likelihood function as a function of the shape parameters α and β. In such a plot, the shape parameter estimators α ^ , β ^ {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) correspond to the maxima of the likelihood function. See the accompanying graph that shows that all the likelihood functions intersect at α = β = 1, which corresponds to the values of the shape parameters that give the maximum entropy (the maximum entropy occurs for shape parameters equal to unity: the uniform distribution). It is evident from the plot that the likelihood function gives sharp peaks for values of the shape parameter estimators close to zero, but that for values of the shape parameters estimators greater than one, the likelihood function becomes quite flat, with less defined peaks. Obviously, the maximum likelihood parameter estimation method for the beta distribution becomes less acceptable for larger values of the shape parameter estimators, as the uncertainty in the peak definition increases with the value of the shape parameter estimators. One can arrive at the same conclusion by noticing that the expression for the curvature of the likelihood function is in terms of the geometric variances ∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ α 2 \= − var ⁡ \[ ln ⁡ X \] {\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha ^{2}}}=-\\operatorname {var} \[\\ln X\]} ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha ^{2}}}=-\\operatorname {var} \[\\ln X\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/517a09a3b13d22689a3e1e400cbcab3af08e130c) ∂ 2 ln ⁡ L ( α , β ∣ X ) ∂ β 2 \= − var ⁡ \[ ln ⁡ ( 1 − X ) \] {\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta ^{2}}}=-\\operatorname {var} \[\\ln(1-X)\]} ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta ^{2}}}=-\\operatorname {var} \[\\ln(1-X)\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1715f50fc408ceba307005a3f9e404520edd5a20) These variances (and therefore the curvatures) are much larger for small values of the shape parameter α and β. However, for shape parameter values α, β \> 1, the variances (and therefore the curvatures) flatten out. Equivalently, this result follows from the [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound "Cramér–Rao bound"), since the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information") matrix components for the beta distribution are these logarithmic variances. The [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound "Cramér–Rao bound") states that the [variance](https://en.wikipedia.org/wiki/Variance "Variance") of any *unbiased* estimator α ^ {\\displaystyle {\\hat {\\alpha }}} ![{\\displaystyle {\\hat {\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682d943d1947245b587f282aba6c88f0870fb302) of α is bounded by the [reciprocal](https://en.wikipedia.org/wiki/Multiplicative_inverse "Multiplicative inverse") of the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information"): v a r ( α ^ ) ≥ 1 var ⁡ \[ ln ⁡ X \] ≥ 1 ψ 1 ( α ^ ) − ψ 1 ( α ^ \+ β ^ ) {\\displaystyle \\mathrm {var} ({\\hat {\\alpha }})\\geq {\\frac {1}{\\operatorname {var} \[\\ln X\]}}\\geq {\\frac {1}{\\psi \_{1}({\\hat {\\alpha }})-\\psi \_{1}({\\hat {\\alpha }}+{\\hat {\\beta }})}}} ![{\\displaystyle \\mathrm {var} ({\\hat {\\alpha }})\\geq {\\frac {1}{\\operatorname {var} \[\\ln X\]}}\\geq {\\frac {1}{\\psi \_{1}({\\hat {\\alpha }})-\\psi \_{1}({\\hat {\\alpha }}+{\\hat {\\beta }})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/744f1e8421337ed7a2e6cae00fccec1eaf68e3dc) v a r ( β ^ ) ≥ 1 var ⁡ \[ ln ⁡ ( 1 − X ) \] ≥ 1 ψ 1 ( β ^ ) − ψ 1 ( α ^ \+ β ^ ) {\\displaystyle \\mathrm {var} ({\\hat {\\beta }})\\geq {\\frac {1}{\\operatorname {var} \[\\ln(1-X)\]}}\\geq {\\frac {1}{\\psi \_{1}({\\hat {\\beta }})-\\psi \_{1}({\\hat {\\alpha }}+{\\hat {\\beta }})}}} ![{\\displaystyle \\mathrm {var} ({\\hat {\\beta }})\\geq {\\frac {1}{\\operatorname {var} \[\\ln(1-X)\]}}\\geq {\\frac {1}{\\psi \_{1}({\\hat {\\beta }})-\\psi \_{1}({\\hat {\\alpha }}+{\\hat {\\beta }})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f01dc5c3eb614cfa71a500fd34f3fa430c183c76) so the variance of the estimators increases with increasing α and β, as the logarithmic variances decrease. Also one can express the joint log likelihood per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations in terms of the [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function") expressions for the logarithms of the sample geometric means as follows: ln L ( α , β ∣ X ) N \= ( α − 1 ) ( ψ ( α ^ ) − ψ ( α ^ \+ β ^ ) ) \+ ( β − 1 ) ( ψ ( β ^ ) − ψ ( α ^ \+ β ^ ) ) − ln ⁡ B ( α , β ) {\\displaystyle {\\frac {\\ln \\,{\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}=(\\alpha -1)(\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }}))+(\\beta -1)(\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }}))-\\ln \\mathrm {B} (\\alpha ,\\beta )} ![{\\displaystyle {\\frac {\\ln \\,{\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}=(\\alpha -1)(\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }}))+(\\beta -1)(\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }}))-\\ln \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/25a4676c5e903b8ecd627fe6effa892db4341aa0) this expression is identical to the negative of the cross-entropy (see section on "Quantities of information (entropy)"). Therefore, finding the maximum of the joint log likelihood of the shape parameters, per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations, is identical to finding the minimum of the cross-entropy for the beta distribution, as a function of the shape parameters. ln ⁡ L ( α , β ∣ X ) N \= − H \= − h − D K L \= − ln ⁡ B ( α , β ) \+ ( α − 1 ) ψ ( α ^ ) \+ ( β − 1 ) ψ ( β ^ ) − ( α \+ β − 2 ) ψ ( α ^ \+ β ^ ) {\\displaystyle {\\begin{aligned}{\\frac {\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}&=-H=-h-D\_{\\mathrm {KL} }\\\\&=-\\ln \\mathrm {B} (\\alpha ,\\beta )+(\\alpha -1)\\psi ({\\hat {\\alpha }})+(\\beta -1)\\psi ({\\hat {\\beta }})-(\\alpha +\\beta -2)\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\frac {\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}&=-H=-h-D\_{\\mathrm {KL} }\\\\&=-\\ln \\mathrm {B} (\\alpha ,\\beta )+(\\alpha -1)\\psi ({\\hat {\\alpha }})+(\\beta -1)\\psi ({\\hat {\\beta }})-(\\alpha +\\beta -2)\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b0a1878d4956258885b927cbfd24f522b8ebc9e5) with the cross-entropy defined as follows: H \= ∫ 0 1 − f ( X ; α ^ , β ^ ) ln ⁡ ( f ( X ; α , β ) ) d X {\\displaystyle H=\\int \_{0}^{1}-f(X;{\\hat {\\alpha }},{\\hat {\\beta }})\\ln(f(X;\\alpha ,\\beta ))\\,{\\rm {d}}X} ![{\\displaystyle H=\\int \_{0}^{1}-f(X;{\\hat {\\alpha }},{\\hat {\\beta }})\\ln(f(X;\\alpha ,\\beta ))\\,{\\rm {d}}X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1dacaf170c607c27fac660f26875d1e7915dffcd) ##### Four unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=56 "Edit section: Four unknown parameters")\] The procedure is similar to the one followed in the two unknown parameter case. If *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters, the joint log likelihood function for *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: ln ⁡ L ( α , β , a , c ∣ Y ) \= ∑ i \= 1 N ln L i ( α , β , a , c ∣ Y i ) \= ∑ i \= 1 N ln ⁡ f ( Y i ; α , β , a , c ) \= ∑ i \= 1 N ln ⁡ ( Y i − a ) α − 1 ( c − Y i ) β − 1 ( c − a ) α \+ β − 1 B ( α , β ) \= ( α − 1 ) ∑ i \= 1 N ln ⁡ ( Y i − a ) \+ ( β − 1 ) ∑ i \= 1 N ln ⁡ ( c − Y i ) − N ln ⁡ B ( α , β ) − N ( α \+ β − 1 ) ln ⁡ ( c − a ) {\\displaystyle {\\begin{aligned}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)&=\\sum \_{i=1}^{N}\\ln \\,{\\mathcal {L}}\_{i}(\\alpha ,\\beta ,a,c\\mid Y\_{i})\\\\&=\\sum \_{i=1}^{N}\\ln f(Y\_{i};\\alpha ,\\beta ,a,c)\\\\&=\\sum \_{i=1}^{N}\\ln {\\frac {(Y\_{i}-a)^{\\alpha -1}(c-Y\_{i})^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}\\mathrm {B} (\\alpha ,\\beta )}}\\\\&=(\\alpha -1)\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)+(\\beta -1)\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )-N(\\alpha +\\beta -1)\\ln(c-a)\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)&=\\sum \_{i=1}^{N}\\ln \\,{\\mathcal {L}}\_{i}(\\alpha ,\\beta ,a,c\\mid Y\_{i})\\\\&=\\sum \_{i=1}^{N}\\ln f(Y\_{i};\\alpha ,\\beta ,a,c)\\\\&=\\sum \_{i=1}^{N}\\ln {\\frac {(Y\_{i}-a)^{\\alpha -1}(c-Y\_{i})^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}\\mathrm {B} (\\alpha ,\\beta )}}\\\\&=(\\alpha -1)\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)+(\\beta -1)\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )-N(\\alpha +\\beta -1)\\ln(c-a)\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/568e233a41b47ffbcb62039f7f1018e98bebddfb) Finding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator of the shape parameters: ∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α \= ∑ i \= 1 N ln ⁡ ( Y i − a ) − N ( − ψ ( α \+ β ) \+ ψ ( α ) ) − N ln ⁡ ( c − a ) \= 0 {\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha }}=\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)-N(-\\psi (\\alpha +\\beta )+\\psi (\\alpha ))-N\\ln(c-a)=0} ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha }}=\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)-N(-\\psi (\\alpha +\\beta )+\\psi (\\alpha ))-N\\ln(c-a)=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/dec8ea829b8ce52d65edcf18375e9843f34f3bab) ∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β \= ∑ i \= 1 N ln ⁡ ( c − Y i ) − N ( − ψ ( α \+ β ) \+ ψ ( β ) ) − N ln ⁡ ( c − a ) \= 0 {\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta }}=\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-N(-\\psi (\\alpha +\\beta )+\\psi (\\beta ))-N\\ln(c-a)=0} ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta }}=\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-N(-\\psi (\\alpha +\\beta )+\\psi (\\beta ))-N\\ln(c-a)=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/decc8da83539ce590981c48b27edc90b521d7d0b) ∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a \= − ( α − 1 ) ∑ i \= 1 N 1 Y i − a \+ N ( α \+ β − 1 ) 1 c − a \= 0 {\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a}}=-(\\alpha -1)\\sum \_{i=1}^{N}{\\frac {1}{Y\_{i}-a}}\\,+N(\\alpha +\\beta -1){\\frac {1}{c-a}}=0} ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a}}=-(\\alpha -1)\\sum \_{i=1}^{N}{\\frac {1}{Y\_{i}-a}}\\,+N(\\alpha +\\beta -1){\\frac {1}{c-a}}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/025a6f1a4e3fdcf2f6740a331d8f8b3985fde351) ∂ ln ⁡ L ( α , β , a , c ∣ Y ) ∂ c \= ( β − 1 ) ∑ i \= 1 N 1 c − Y i − N ( α \+ β − 1 ) 1 c − a \= 0 {\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c}}=(\\beta -1)\\sum \_{i=1}^{N}{\\frac {1}{c-Y\_{i}}}\\,-N(\\alpha +\\beta -1){\\frac {1}{c-a}}=0} ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c}}=(\\beta -1)\\sum \_{i=1}^{N}{\\frac {1}{c-Y\_{i}}}\\,-N(\\alpha +\\beta -1){\\frac {1}{c-a}}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/976c431dd6db2b070e95ab4c7a685e3dc9039db2) these equations can be re-arranged as the following system of four coupled equations (the first two equations are geometric means and the second two equations are the harmonic means) in terms of the maximum likelihood estimates for the four parameters α ^ , β ^ , a ^ , c ^ {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }},{\\hat {a}},{\\hat {c}}} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }},{\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8116b37df2fff6248cb3bce7dd137af10ed8e5ab): 1 N ∑ i \= 1 N ln ⁡ Y i − a ^ c ^ − a ^ \= ψ ( α ^ ) − ψ ( α ^ \+ β ^ ) \= ln ⁡ G ^ X {\\displaystyle {\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {Y\_{i}-{\\hat {a}}}{{\\hat {c}}-{\\hat {a}}}}=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{X}} ![{\\displaystyle {\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {Y\_{i}-{\\hat {a}}}{{\\hat {c}}-{\\hat {a}}}}=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3d025e03965c2a5d2c23a9e493d7314c3f2dfca1) 1 N ∑ i \= 1 N ln ⁡ c ^ − Y i c ^ − a ^ \= ψ ( β ^ ) − ψ ( α ^ \+ β ^ ) \= ln ⁡ G ^ 1 − X {\\displaystyle {\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {{\\hat {c}}-Y\_{i}}{{\\hat {c}}-{\\hat {a}}}}=\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{1-X}} ![{\\displaystyle {\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {{\\hat {c}}-Y\_{i}}{{\\hat {c}}-{\\hat {a}}}}=\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b2851705444792cb8f4089f5dea6443823e5a15b) 1 1 N ∑ i \= 1 N c ^ − a ^ Y i − a ^ \= α ^ − 1 α ^ \+ β ^ − 1 \= H ^ X {\\displaystyle {\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}{\\frac {{\\hat {c}}-{\\hat {a}}}{Y\_{i}-{\\hat {a}}}}}}={\\frac {{\\hat {\\alpha }}-1}{{\\hat {\\alpha }}+{\\hat {\\beta }}-1}}={\\hat {H}}\_{X}} ![{\\displaystyle {\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}{\\frac {{\\hat {c}}-{\\hat {a}}}{Y\_{i}-{\\hat {a}}}}}}={\\frac {{\\hat {\\alpha }}-1}{{\\hat {\\alpha }}+{\\hat {\\beta }}-1}}={\\hat {H}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3918a55675f88d637c3c1d91ac5743155433350a) 1 1 N ∑ i \= 1 N c ^ − a ^ c ^ − Y i \= β ^ − 1 α ^ \+ β ^ − 1 \= H ^ 1 − X {\\displaystyle {\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}{\\frac {{\\hat {c}}-{\\hat {a}}}{{\\hat {c}}-Y\_{i}}}}}={\\frac {{\\hat {\\beta }}-1}{{\\hat {\\alpha }}+{\\hat {\\beta }}-1}}={\\hat {H}}\_{1-X}} ![{\\displaystyle {\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}{\\frac {{\\hat {c}}-{\\hat {a}}}{{\\hat {c}}-Y\_{i}}}}}={\\frac {{\\hat {\\beta }}-1}{{\\hat {\\alpha }}+{\\hat {\\beta }}-1}}={\\hat {H}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0d929f9e0acad7d24ac0112cd9fd865eb67f64ee) with sample geometric means: G ^ X \= ∏ i \= 1 N ( Y i − a ^ c ^ − a ^ ) 1 N {\\displaystyle {\\hat {G}}\_{X}=\\prod \_{i=1}^{N}\\left({\\frac {Y\_{i}-{\\hat {a}}}{{\\hat {c}}-{\\hat {a}}}}\\right)^{\\frac {1}{N}}} ![{\\displaystyle {\\hat {G}}\_{X}=\\prod \_{i=1}^{N}\\left({\\frac {Y\_{i}-{\\hat {a}}}{{\\hat {c}}-{\\hat {a}}}}\\right)^{\\frac {1}{N}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/24c05fd2cdd1acc7e2370a6c9155f0658e8bfc73) G ^ ( 1 − X ) \= ∏ i \= 1 N ( c ^ − Y i c ^ − a ^ ) 1 N {\\displaystyle {\\hat {G}}\_{(1-X)}=\\prod \_{i=1}^{N}\\left({\\frac {{\\hat {c}}-Y\_{i}}{{\\hat {c}}-{\\hat {a}}}}\\right)^{\\frac {1}{N}}} ![{\\displaystyle {\\hat {G}}\_{(1-X)}=\\prod \_{i=1}^{N}\\left({\\frac {{\\hat {c}}-Y\_{i}}{{\\hat {c}}-{\\hat {a}}}}\\right)^{\\frac {1}{N}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/66b3e17f4fea833bd98721a8bcef4e5769e8892f) The parameters a ^ , c ^ {\\displaystyle {\\hat {a}},{\\hat {c}}} ![{\\displaystyle {\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) are embedded inside the geometric mean expressions in a nonlinear way (to the power 1/*N*). This precludes, in general, a closed form solution, even for an initial value approximation for iteration purposes. One alternative is to use as initial values for iteration the values obtained from the method of moments solution for the four parameter case. Furthermore, the expressions for the harmonic means are well-defined only for α ^ , β ^ \> 1 {\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}\>1} ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8cf28629a51347b164038f8ed1561affcfc32a08), which precludes a maximum likelihood solution for shape parameters less than unity in the four-parameter case. Fisher's information matrix for the four parameter case is [positive-definite](https://en.wikipedia.org/wiki/Positive-definite_matrix "Positive-definite matrix") only for α, β \> 2 (for further discussion, see section on Fisher information matrix, four parameter case), for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. The following Fisher information components (that represent the expectations of the curvature of the log likelihood function) have [singularities](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") at the following values: α \= 2 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a 2 \] \= I a , a {\\displaystyle \\alpha =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a^{2}}}\\right\]={\\mathcal {I}}\_{a,a}} ![{\\displaystyle \\alpha =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a^{2}}}\\right\]={\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53538160a2404a5d7b74ae2033fbbb2dbc1045eb) β \= 2 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ c 2 \] \= I c , c {\\displaystyle \\beta =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c^{2}}}\\right\]={\\mathcal {I}}\_{c,c}} ![{\\displaystyle \\beta =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c^{2}}}\\right\]={\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ee2c6ecfcafe60e54799ab4a16451cf478d65f7d) α \= 2 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ a \] \= I α , a {\\displaystyle \\alpha =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\partial a}}\\right\]={\\mathcal {I}}\_{\\alpha ,a}} ![{\\displaystyle \\alpha =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\partial a}}\\right\]={\\mathcal {I}}\_{\\alpha ,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8b8af544d7c5e0cc278aa725daba7f3de2f70d31) β \= 1 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β ∂ c \] \= I β , c {\\displaystyle \\beta =1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\partial c}}\\right\]={\\mathcal {I}}\_{\\beta ,c}} ![{\\displaystyle \\beta =1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\partial c}}\\right\]={\\mathcal {I}}\_{\\beta ,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2f87b32b39997964dcbc95bc64c1364e6832db1d) (for further discussion see section on Fisher information matrix). Thus, it is not possible to strictly carry on the maximum likelihood estimation for some well known distributions belonging to the four-parameter beta distribution family, like the [uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution "Continuous uniform distribution") (Beta(1, 1, *a*, *c*)), and the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") (Beta(1/2, 1/2, *a*, *c*)). [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ignore the equations for the harmonic means and instead suggest "If a and c are unknown, and maximum likelihood estimators of *a*, *c*, α and β are required, the above procedure (for the two unknown parameter case, with *X* transformed as *X* = (*Y* − *a*)/(*c* − *a*)) can be repeated using a succession of trial values of *a* and *c*, until the pair (*a*, *c*) for which maximum likelihood (given *a* and *c*) is as great as possible, is attained" (where, for the purpose of clarity, their notation for the parameters has been translated into the present notation). #### Fisher information matrix \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=57 "Edit section: Fisher information matrix")\] Let a random variable X have a probability density *f*(*x*;*α*). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function") is called the [score](https://en.wikipedia.org/wiki/Score_\(statistics\) "Score (statistics)"). The second moment of the score is called the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information"): I ( α ) \= E ⁡ \[ ( ∂ ∂ α ln ⁡ L ( α ∣ X ) ) 2 \] , {\\displaystyle {\\mathcal {I}}(\\alpha )=\\operatorname {E} \\left\[\\left({\\frac {\\partial }{\\partial \\alpha }}\\ln {\\mathcal {L}}(\\alpha \\mid X)\\right)^{2}\\right\],} ![{\\displaystyle {\\mathcal {I}}(\\alpha )=\\operatorname {E} \\left\[\\left({\\frac {\\partial }{\\partial \\alpha }}\\ln {\\mathcal {L}}(\\alpha \\mid X)\\right)^{2}\\right\],}](https://wikimedia.org/api/rest_v1/media/math/render/svg/daec13972d17a073bcd447abfde55a6b0e168720) The [expectation](https://en.wikipedia.org/wiki/Expected_value "Expected value") of the [score](https://en.wikipedia.org/wiki/Score_\(statistics\) "Score (statistics)") is zero, therefore the Fisher information is also the second moment centered on the mean of the score: the [variance](https://en.wikipedia.org/wiki/Variance "Variance") of the score. If the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function") is twice differentiable with respect to the parameter α, and under certain regularity conditions,[\[50\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Silvey-50) then the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes): I ( α ) \= − E ⁡ \[ ∂ 2 ∂ α 2 ln ⁡ L ( α ∣ X ) \] . {\\displaystyle {\\mathcal {I}}(\\alpha )=-\\operatorname {E} \\left\[{\\frac {\\partial ^{2}}{\\partial \\alpha ^{2}}}\\ln {\\mathcal {L}}(\\alpha \\mid X)\\right\].} ![{\\displaystyle {\\mathcal {I}}(\\alpha )=-\\operatorname {E} \\left\[{\\frac {\\partial ^{2}}{\\partial \\alpha ^{2}}}\\ln {\\mathcal {L}}(\\alpha \\mid X)\\right\].}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3fdd5f6730d5ffb0a5f833c89ba784362322cbc8) Thus, the Fisher information is the negative of the expectation of the second [derivative](https://en.wikipedia.org/wiki/Derivative "Derivative") with respect to the parameter α of the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function"). Therefore, Fisher information is a measure of the [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") of the log likelihood function of α. A low [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") (and therefore high [radius of curvature](https://en.wikipedia.org/wiki/Radius_of_curvature_\(mathematics\) "Radius of curvature (mathematics)")), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") (and therefore low [radius of curvature](https://en.wikipedia.org/wiki/Radius_of_curvature_\(mathematics\) "Radius of curvature (mathematics)")) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters ("the observed Fisher information matrix") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms.[\[51\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-EdwardsLikelihood-51) The word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators. The [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound "Cramér–Rao bound") states that the inverse of the Fisher information is a lower bound on the variance of any [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of a parameter α: var ⁡ \[ α ^ \] ≥ 1 I ( α ) . {\\displaystyle \\operatorname {var} \[{\\hat {\\alpha }}\]\\geq {\\frac {1}{{\\mathcal {I}}(\\alpha )}}.} ![{\\displaystyle \\operatorname {var} \[{\\hat {\\alpha }}\]\\geq {\\frac {1}{{\\mathcal {I}}(\\alpha )}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d93d4983a2717258c52eb47d1562e849a3a66c5c) The precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two [alternative hypothesis](https://en.wikipedia.org/wiki/Alternative_hypothesis "Alternative hypothesis") of a parameter.[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) When there are *N* parameters \[ θ 1 θ 2 ⋮ θ N \] , {\\displaystyle {\\begin{bmatrix}\\theta \_{1}\\\\\\theta \_{2}\\\\\\vdots \\\\\\theta \_{N}\\end{bmatrix}},} ![{\\displaystyle {\\begin{bmatrix}\\theta \_{1}\\\\\\theta \_{2}\\\\\\vdots \\\\\\theta \_{N}\\end{bmatrix}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1c28d63a97dada5deb831115cd2588be050e522a) then the Fisher information takes the form of an *N*×*N* [positive semidefinite](https://en.wikipedia.org/wiki/Positive_semidefinite_matrix "Positive semidefinite matrix") [symmetric matrix](https://en.wikipedia.org/wiki/Symmetric_matrix "Symmetric matrix"), the Fisher information matrix, with typical element: ( I ( θ ) ) i , j \= E ⁡ \[ ∂ ln ⁡ L ∂ θ i ⋅ ∂ ln ⁡ L ∂ θ j \] . {\\displaystyle ({\\mathcal {I}}(\\theta ))\_{i,j}=\\operatorname {E} \\left\[{\\frac {\\partial \\ln {\\mathcal {L}}}{\\partial \\theta \_{i}}}\\cdot {\\frac {\\partial \\ln {\\mathcal {L}}}{\\partial \\theta \_{j}}}\\right\].} ![{\\displaystyle ({\\mathcal {I}}(\\theta ))\_{i,j}=\\operatorname {E} \\left\[{\\frac {\\partial \\ln {\\mathcal {L}}}{\\partial \\theta \_{i}}}\\cdot {\\frac {\\partial \\ln {\\mathcal {L}}}{\\partial \\theta \_{j}}}\\right\].}](https://wikimedia.org/api/rest_v1/media/math/render/svg/94187c0032daae02409e5323c356fae5fdcb73fd) Under certain regularity conditions,[\[50\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Silvey-50) the Fisher Information Matrix may also be written in the following form, which is often more convenient for computation: ( I ( θ ) ) i , j \= − E ⁡ \[ ∂ 2 ln ⁡ L ∂ θ i ∂ θ j \] . {\\displaystyle ({\\mathcal {I}}(\\theta ))\_{i,j}=-\\operatorname {E} \\left\[{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}}{\\partial \\theta \_{i}\\,\\partial \\theta \_{j}}}\\right\]\\,.} ![{\\displaystyle ({\\mathcal {I}}(\\theta ))\_{i,j}=-\\operatorname {E} \\left\[{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}}{\\partial \\theta \_{i}\\,\\partial \\theta \_{j}}}\\right\]\\,.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/86df084593df54dd2ed7220a683b9fbc41f230d7) With *X*1, ..., *XN* [iid](https://en.wikipedia.org/wiki/Iid "Iid") random variables, an *N*\-dimensional "box" can be constructed with sides *X*1, ..., *XN*. Costa and Cover[\[53\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-CostaCover-53) show that the (Shannon) differential entropy *h*(*X*) is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set. ##### Two parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=58 "Edit section: Two parameters")\] For *X*1, ..., *X**N* independent random variables each having a beta distribution parametrized with shape parameters *α* and *β*, the joint log likelihood function for *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: ln ⁡ L ( α , β ∣ X ) \= ( α − 1 ) ∑ i \= 1 N ln ⁡ X i \+ ( β − 1 ) ∑ i \= 1 N ln ⁡ ( 1 − X i ) − N ln ⁡ B ( α , β ) {\\displaystyle \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)=(\\alpha -1)\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1)\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )} ![{\\displaystyle \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)=(\\alpha -1)\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1)\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/df7eddeed085e8e7d19d2a5336c6503d06111280) therefore the joint log likelihood function per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is 1 N ln ⁡ L ( α , β ∣ X ) \= ( α − 1 ) 1 N ∑ i \= 1 N ln ⁡ X i \+ ( β − 1 ) 1 N ∑ i \= 1 N ln ⁡ ( 1 − X i ) − ln ⁡ B ( α , β ) . {\\displaystyle {\\frac {1}{N}}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)=(\\alpha -1){\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1){\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln(1-X\_{i})-\\,\\ln \\mathrm {B} (\\alpha ,\\beta ).} ![{\\displaystyle {\\frac {1}{N}}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)=(\\alpha -1){\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1){\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln(1-X\_{i})-\\,\\ln \\mathrm {B} (\\alpha ,\\beta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6997aeb66fcf95e5d64d65ca0aab679a30a94f00) For the two parameter case, the Fisher information has 4 components: 2 diagonal and 2 off-diagonal. Since the Fisher information matrix is symmetric, one of these off diagonal components is independent. Therefore, the Fisher information matrix has 3 independent components (2 diagonal and 1 off diagonal). Aryal and Nadarajah[\[54\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four-parameter case, from which the two parameter case can be obtained as follows: − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α 2 \= var ⁡ \[ ln ⁡ ( X ) \] \= ψ 1 ( α ) − ψ 1 ( α \+ β ) \= I α , α \= E ⁡ \[ − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α 2 \] \= ln ⁡ var G X {\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\alpha ^{2}}}=\\operatorname {var} \[\\ln(X)\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\alpha }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\alpha ^{2}}}\\right\]=\\ln \\operatorname {var} \_{GX}} ![{\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\alpha ^{2}}}=\\operatorname {var} \[\\ln(X)\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\alpha }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\alpha ^{2}}}\\right\]=\\ln \\operatorname {var} \_{GX}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c90003d5bd2f6d2bcfe2c788585689726b4b7e36) − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ β 2 \= var ⁡ \[ ln ⁡ ( 1 − X ) \] \= ψ 1 ( β ) − ψ 1 ( α \+ β ) \= I β , β \= E ⁡ \[ − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ β 2 \] \= ln ⁡ var G ( 1 − X ) {\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\beta ^{2}}}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\beta ,\\beta }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\beta ^{2}}}\\right\]=\\ln \\operatorname {var} \_{G(1-X)}} ![{\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\beta ^{2}}}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\beta ,\\beta }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\beta ^{2}}}\\right\]=\\ln \\operatorname {var} \_{G(1-X)}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b59bcc31f14f1f4b3b07bd92a66426bb6ac126b1) − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α ∂ β \= cov ⁡ \[ ln ⁡ X , ln ⁡ ( 1 − X ) \] \= − ψ 1 ( α \+ β ) \= I α , β \= E ⁡ \[ − ∂ 2 ln ⁡ L ( α , β ∣ X ) N ∂ α ∂ β \] \= ln ⁡ cov G X , ( 1 − X ) {\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\alpha \\,\\partial \\beta }}=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\beta }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\alpha \\,\\partial \\beta }}\\right\]=\\ln \\operatorname {cov} \_{G{X,(1-X)}}} ![{\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\alpha \\,\\partial \\beta }}=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\beta }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\alpha \\,\\partial \\beta }}\\right\]=\\ln \\operatorname {cov} \_{G{X,(1-X)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a6c3268da3b23c062ce981ab02d4c454a0267365) Since the Fisher information matrix is symmetric I α , β \= I β , α \= ln ⁡ cov G X , ( 1 − X ) {\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\beta }={\\mathcal {I}}\_{\\beta ,\\alpha }=\\ln \\operatorname {cov} \_{G{X,(1-X)}}} ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\beta }={\\mathcal {I}}\_{\\beta ,\\alpha }=\\ln \\operatorname {cov} \_{G{X,(1-X)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9f43e57ed436efb049ce99c8a2e92269acfa1128) The Fisher information components are equal to the log geometric variances and log geometric covariance. Therefore, they can be expressed as **[trigamma functions](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted ψ1(α), the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), defined as the derivative of the [digamma](https://en.wikipedia.org/wiki/Digamma "Digamma") function: ψ 1 ( α ) \= d 2 ln ⁡ Γ ( α ) ∂ α 2 \= ∂ ψ ( α ) ∂ α . {\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{\\partial \\alpha ^{2}}}=\\,{\\frac {\\partial \\psi (\\alpha )}{\\partial \\alpha }}.} ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{\\partial \\alpha ^{2}}}=\\,{\\frac {\\partial \\psi (\\alpha )}{\\partial \\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0629292f9b4428a64ffd75dd814ea9263dad115c) These derivatives are also derived in the [§ Two unknown parameters](https://en.wikipedia.org/wiki/Beta_distribution#Two_unknown_parameters) and plots of the log likelihood function are also shown in that section. [§ Geometric variance and covariance](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_variance_and_covariance) contains plots and further discussion of the Fisher information matrix components: the log geometric variances and log geometric covariance as a function of the shape parameters α and β. [§ Moments of logarithmically transformed random variables](https://en.wikipedia.org/wiki/Beta_distribution#Moments_of_logarithmically_transformed_random_variables) contains formulas for moments of logarithmically transformed random variables. Images for the Fisher information components I α , α , I β , β {\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\alpha },{\\mathcal {I}}\_{\\beta ,\\beta }} ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\alpha },{\\mathcal {I}}\_{\\beta ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4f6e447015dcb7f00c9d69c48c9b23ba0bc3ec0e) and I α , β {\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\beta }} ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fdbb33953faa5a53fe23c281a4e7dbf359c11fbc) are shown in [§ Geometric variance](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_variance). The determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability). From the expressions for the individual components of the Fisher information matrix, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution is: det ( I ( α , β ) ) \= I α , α I β , β − I α , β I α , β \= ( ψ 1 ( α ) − ψ 1 ( α \+ β ) ) ( ψ 1 ( β ) − ψ 1 ( α \+ β ) ) − ( − ψ 1 ( α \+ β ) ) ( − ψ 1 ( α \+ β ) ) \= ψ 1 ( α ) ψ 1 ( β ) − ( ψ 1 ( α ) \+ ψ 1 ( β ) ) ψ 1 ( α \+ β ) lim α → 0 det ( I ( α , β ) ) \= lim β → 0 det ( I ( α , β ) ) \= ∞ lim α → ∞ det ( I ( α , β ) ) \= lim β → ∞ det ( I ( α , β ) ) \= 0 {\\displaystyle {\\begin{aligned}\\det({\\mathcal {I}}(\\alpha ,\\beta ))&={\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\alpha ,\\beta }\\\\\[4pt\]&=(\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ))(\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ))-(-\\psi \_{1}(\\alpha +\\beta ))(-\\psi \_{1}(\\alpha +\\beta ))\\\\\[4pt\]&=\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )\\\\\[4pt\]\\lim \_{\\alpha \\to 0}\\det({\\mathcal {I}}(\\alpha ,\\beta ))&=\\lim \_{\\beta \\to 0}\\det({\\mathcal {I}}(\\alpha ,\\beta ))=\\infty \\\\\[4pt\]\\lim \_{\\alpha \\to \\infty }\\det({\\mathcal {I}}(\\alpha ,\\beta ))&=\\lim \_{\\beta \\to \\infty }\\det({\\mathcal {I}}(\\alpha ,\\beta ))=0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\det({\\mathcal {I}}(\\alpha ,\\beta ))&={\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\alpha ,\\beta }\\\\\[4pt\]&=(\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ))(\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ))-(-\\psi \_{1}(\\alpha +\\beta ))(-\\psi \_{1}(\\alpha +\\beta ))\\\\\[4pt\]&=\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )\\\\\[4pt\]\\lim \_{\\alpha \\to 0}\\det({\\mathcal {I}}(\\alpha ,\\beta ))&=\\lim \_{\\beta \\to 0}\\det({\\mathcal {I}}(\\alpha ,\\beta ))=\\infty \\\\\[4pt\]\\lim \_{\\alpha \\to \\infty }\\det({\\mathcal {I}}(\\alpha ,\\beta ))&=\\lim \_{\\beta \\to \\infty }\\det({\\mathcal {I}}(\\alpha ,\\beta ))=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b2c5ccf59b05ea730fc108360c07e9ac9634e829) From [Sylvester's criterion](https://en.wikipedia.org/wiki/Sylvester%27s_criterion "Sylvester's criterion") (checking whether the diagonal elements are all positive), it follows that the Fisher information matrix for the two parameter case is [positive-definite](https://en.wikipedia.org/wiki/Positive-definite_matrix "Positive-definite matrix") (under the standard condition that the shape parameters are positive *α* \> 0 and *β* \> 0). ##### Four parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=59 "Edit section: Four parameters")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png/250px-Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Fisher_Information_I\(a,a\)_for_alpha%3Dbeta_vs_range_\(c-a\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png) Fisher Information *I*(*a*,*a*) for *α* = *β* vs range (*c* − *a*) and exponent *α* = *β* [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png/250px-Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Fisher_Information_I\(alpha,a\)_for_alpha%3Dbeta,_vs._range_\(c_-_a\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png) Fisher Information *I*(*α*,*a*) for *α* = *β*, vs. range (*c* − *a*) and exponent *α* = *β* If *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters: the exponents *α* and *β*, and also *a* (the minimum of the distribution range), and *c* (the maximum of the distribution range) (section titled "Alternative parametrizations", "Four parameters"), with [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function"): f ( y ; α , β , a , c ) \= f ( x ; α , β ) c − a \= ( y − a c − a ) α − 1 ( c − y c − a ) β − 1 ( c − a ) B ( α , β ) \= ( y − a ) α − 1 ( c − y ) β − 1 ( c − a ) α \+ β − 1 B ( α , β ) . {\\displaystyle f(y;\\alpha ,\\beta ,a,c)={\\frac {f(x;\\alpha ,\\beta )}{c-a}}={\\frac {\\left({\\frac {y-a}{c-a}}\\right)^{\\alpha -1}\\left({\\frac {c-y}{c-a}}\\right)^{\\beta -1}}{(c-a)B(\\alpha ,\\beta )}}={\\frac {(y-a)^{\\alpha -1}(c-y)^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}B(\\alpha ,\\beta )}}.} ![{\\displaystyle f(y;\\alpha ,\\beta ,a,c)={\\frac {f(x;\\alpha ,\\beta )}{c-a}}={\\frac {\\left({\\frac {y-a}{c-a}}\\right)^{\\alpha -1}\\left({\\frac {c-y}{c-a}}\\right)^{\\beta -1}}{(c-a)B(\\alpha ,\\beta )}}={\\frac {(y-a)^{\\alpha -1}(c-y)^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}B(\\alpha ,\\beta )}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5e3a650c9f6ecc04d36869cc99297e5c853dd2f1) the joint log likelihood function per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: 1 N ln ⁡ L ( α , β , a , c ∣ Y ) \= α − 1 N ∑ i \= 1 N ln ⁡ ( Y i − a ) \+ β − 1 N ∑ i \= 1 N ln ⁡ ( c − Y i ) − ln ⁡ B ( α , β ) − ( α \+ β − 1 ) ln ⁡ ( c − a ) {\\displaystyle {\\frac {1}{N}}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)={\\frac {\\alpha -1}{N}}\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)+{\\frac {\\beta -1}{N}}\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-\\ln \\mathrm {B} (\\alpha ,\\beta )-(\\alpha +\\beta -1)\\ln(c-a)} ![{\\displaystyle {\\frac {1}{N}}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)={\\frac {\\alpha -1}{N}}\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)+{\\frac {\\beta -1}{N}}\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-\\ln \\mathrm {B} (\\alpha ,\\beta )-(\\alpha +\\beta -1)\\ln(c-a)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f915551b8f5e29a172c67ffc3e45f6c1ce349920) For the four parameter case, the Fisher information has 4\*4=16 components. It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components. Aryal and Nadarajah[\[54\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four parameter case as follows: − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α 2 \= var ⁡ \[ ln ⁡ ( X ) \] \= ψ 1 ( α ) − ψ 1 ( α \+ β ) \= I α , α \= E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α 2 \] \= ln ⁡ ( v a r G X ) {\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha ^{2}}}=\\operatorname {var} \[\\ln(X)\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\alpha }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha ^{2}}}\\right\]=\\ln(\\operatorname {var\_{GX}} )} ![{\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha ^{2}}}=\\operatorname {var} \[\\ln(X)\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\alpha }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha ^{2}}}\\right\]=\\ln(\\operatorname {var\_{GX}} )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/31be86f2c53663c6d3975bc2676806ba3e538423) − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β 2 \= var ⁡ \[ ln ⁡ ( 1 − X ) \] \= ψ 1 ( β ) − ψ 1 ( α \+ β ) \= I β , β \= E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β 2 \] \= ln ⁡ ( v a r G ( 1 \- X ) ) {\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta ^{2}}}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\beta ,\\beta }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta ^{2}}}\\right\]=\\ln(\\operatorname {var\_{G(1-X)}} )} ![{\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta ^{2}}}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\beta ,\\beta }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta ^{2}}}\\right\]=\\ln(\\operatorname {var\_{G(1-X)}} )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/25ab885119f25fae0b9919326db96395d13e3bc3) − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ β \= cov ⁡ \[ ln ⁡ X , ( 1 − X ) \] \= − ψ 1 ( α \+ β ) \= I α , β \= E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ β \] \= ln ⁡ ( cov G X , ( 1 − X ) ) {\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial \\beta }}=\\operatorname {cov} \[\\ln X,(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\beta }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial \\beta }}\\right\]=\\ln(\\operatorname {cov} \_{G{X,(1-X)}})} ![{\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial \\beta }}=\\operatorname {cov} \[\\ln X,(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\beta }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial \\beta }}\\right\]=\\ln(\\operatorname {cov} \_{G{X,(1-X)}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/02a56af746fb9315340cf382951fe0c3f3640678) In the above expressions, the use of *X* instead of *Y* in the expressions var\[ln(*X*)\] = ln(var*GX*) is *not an error*. The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter *X* ~ Beta(*α*, *β*) parametrization because when taking the partial derivatives with respect to the exponents (*α*, *β*) in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum *a* and maximum *c* of the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents *α* and *β* is the second derivative of the log of the beta function: ln(B(*α*, *β*)). This term is independent of the minimum *a* and maximum *c* of the distribution's range. Double differentiation of this term results in trigamma functions. The sections titled "Maximum likelihood", "Two unknown parameters" and "Four unknown parameters" also show this fact. The Fisher information for *N* [i.i.d.](https://en.wikipedia.org/wiki/I.i.d. "I.i.d.") samples is *N* times the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas[\[28\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Cover_and_Thomas-28)). (Aryal and Nadarajah[\[54\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Aryal-54) take a single observation, *N* = 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per *N* observations. Moreover, below the erroneous expression for I a , a {\\displaystyle {\\mathcal {I}}\_{a,a}} ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) in Aryal and Nadarajah has been corrected.) α \> 2 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a 2 \] \= I a , a \= β ( α \+ β − 1 ) ( α − 2 ) ( c − a ) 2 β \> 2 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ c 2 \] \= I c , c \= α ( α \+ β − 1 ) ( β − 2 ) ( c − a ) 2 E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ a ∂ c \] \= I a , c \= ( α \+ β − 1 ) ( c − a ) 2 α \> 1 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ a \] \= I α , a \= β ( α − 1 ) ( c − a ) E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ α ∂ c \] \= I α , c \= 1 ( c − a ) E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β ∂ a \] \= I β , a \= − 1 ( c − a ) β \> 1 : E ⁡ \[ − 1 N ∂ 2 ln ⁡ L ( α , β , a , c ∣ Y ) ∂ β ∂ c \] \= I β , c \= − α ( β − 1 ) ( c − a ) {\\displaystyle {\\begin{aligned}\\alpha \>2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a^{2}}}\\right\]&={\\mathcal {I}}\_{a,a}={\\frac {\\beta (\\alpha +\\beta -1)}{(\\alpha -2)(c-a)^{2}}}\\\\\\beta \>2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c^{2}}}\\right\]&={\\mathcal {I}}\_{c,c}={\\frac {\\alpha (\\alpha +\\beta -1)}{(\\beta -2)(c-a)^{2}}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a\\,\\partial c}}\\right\]&={\\mathcal {I}}\_{a,c}={\\frac {(\\alpha +\\beta -1)}{(c-a)^{2}}}\\\\\\alpha \>1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial a}}\\right\]&={\\mathcal {I}}\_{\\alpha ,a}={\\frac {\\beta }{(\\alpha -1)(c-a)}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial c}}\\right\]&={\\mathcal {I}}\_{\\alpha ,c}={\\frac {1}{(c-a)}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\,\\partial a}}\\right\]&={\\mathcal {I}}\_{\\beta ,a}=-{\\frac {1}{(c-a)}}\\\\\\beta \>1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\,\\partial c}}\\right\]&={\\mathcal {I}}\_{\\beta ,c}=-{\\frac {\\alpha }{(\\beta -1)(c-a)}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha \>2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a^{2}}}\\right\]&={\\mathcal {I}}\_{a,a}={\\frac {\\beta (\\alpha +\\beta -1)}{(\\alpha -2)(c-a)^{2}}}\\\\\\beta \>2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c^{2}}}\\right\]&={\\mathcal {I}}\_{c,c}={\\frac {\\alpha (\\alpha +\\beta -1)}{(\\beta -2)(c-a)^{2}}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a\\,\\partial c}}\\right\]&={\\mathcal {I}}\_{a,c}={\\frac {(\\alpha +\\beta -1)}{(c-a)^{2}}}\\\\\\alpha \>1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial a}}\\right\]&={\\mathcal {I}}\_{\\alpha ,a}={\\frac {\\beta }{(\\alpha -1)(c-a)}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial c}}\\right\]&={\\mathcal {I}}\_{\\alpha ,c}={\\frac {1}{(c-a)}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\,\\partial a}}\\right\]&={\\mathcal {I}}\_{\\beta ,a}=-{\\frac {1}{(c-a)}}\\\\\\beta \>1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\,\\partial c}}\\right\]&={\\mathcal {I}}\_{\\beta ,c}=-{\\frac {\\alpha }{(\\beta -1)(c-a)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/636646f51bdb1a3193b1721483878e98f4f19c3e) The lower two diagonal entries of the Fisher information matrix, with respect to the parameter *a* (the minimum of the distribution's range): I a , a {\\displaystyle {\\mathcal {I}}\_{a,a}} ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99), and with respect to the parameter *c* (the maximum of the distribution's range): I c , c {\\displaystyle {\\mathcal {I}}\_{c,c}} ![{\\displaystyle {\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) are only defined for exponents *α* \> 2 and *β* \> 2 respectively. The Fisher information matrix component I a , a {\\displaystyle {\\mathcal {I}}\_{a,a}} ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) for the minimum *a* approaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component I c , c {\\displaystyle {\\mathcal {I}}\_{c,c}} ![{\\displaystyle {\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) for the maximum *c* approaches infinity for exponent *β* approaching 2 from above. The Fisher information matrix for the four parameter case does not depend on the individual values of the minimum *a* and the maximum *c*, but only on the total range (*c* − *a*). Moreover, the components of the Fisher information matrix that depend on the range (*c* − *a*), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range (*c* − *a*). The accompanying images show the Fisher information components I a , a {\\displaystyle {\\mathcal {I}}\_{a,a}} ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) and I α , a {\\displaystyle {\\mathcal {I}}\_{\\alpha ,a}} ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e7931360bd161a55cc85431049248a347269b12). Images for the Fisher information components I α , α {\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\alpha }} ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\alpha }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b9abc676c8d87651a340ff84cb55e3674bab2683) and I β , β {\\displaystyle {\\mathcal {I}}\_{\\beta ,\\beta }} ![{\\displaystyle {\\mathcal {I}}\_{\\beta ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2659a9ad09d8af2a486a30d8aa82cc39d743a524) are shown in [§ Geometric variance](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_variance). All these Fisher information components look like a basin, with the "walls" of the basin being located at low values of the parameters. The following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter: *X* ~ Beta(α, β) expectations of the transformed ratio ((1 − *X*)/*X*) and of its mirror image (*X*/(1 − *X*)), scaled by the range (*c* − *a*), which may be helpful for interpretation: I α , a \= E ⁡ \[ 1 − X X \] c − a \= β ( α − 1 ) ( c − a ) if α \> 1 {\\displaystyle {\\mathcal {I}}\_{\\alpha ,a}={\\frac {\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]}{c-a}}={\\frac {\\beta }{(\\alpha -1)(c-a)}}{\\text{ if }}\\alpha \>1} ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,a}={\\frac {\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]}{c-a}}={\\frac {\\beta }{(\\alpha -1)(c-a)}}{\\text{ if }}\\alpha \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e670565bb8d06bace69cf892864520f5c83b5449) I β , c \= − E ⁡ \[ X 1 − X \] c − a \= − α ( β − 1 ) ( c − a ) if β \> 1 {\\displaystyle {\\mathcal {I}}\_{\\beta ,c}=-{\\frac {\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]}{c-a}}=-{\\frac {\\alpha }{(\\beta -1)(c-a)}}{\\text{ if }}\\beta \>1} ![{\\displaystyle {\\mathcal {I}}\_{\\beta ,c}=-{\\frac {\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]}{c-a}}=-{\\frac {\\alpha }{(\\beta -1)(c-a)}}{\\text{ if }}\\beta \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/94f9b7788a4f19e1cbc765ab8fc85a7ad55dec4f) These are also the expected values of the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")) [\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) and its mirror image, scaled by the range (*c* − *a*). Also, the following Fisher information components can be expressed in terms of the harmonic (1/X) variances or of variances based on the ratio transformed variables ((1-X)/X) as follows: α \> 2 : I a , a \= var ⁡ \[ 1 X \] ( α − 1 c − a ) 2 \= var ⁡ \[ 1 − X X \] ( α − 1 c − a ) 2 \= β ( α \+ β − 1 ) ( α − 2 ) ( c − a ) 2 β \> 2 : I c , c \= var ⁡ \[ 1 1 − X \] ( β − 1 c − a ) 2 \= var ⁡ \[ X 1 − X \] ( β − 1 c − a ) 2 \= α ( α \+ β − 1 ) ( β − 2 ) ( c − a ) 2 I a , c \= cov ⁡ \[ 1 X , 1 1 − X \] ( α − 1 ) ( β − 1 ) ( c − a ) 2 \= cov ⁡ \[ 1 − X X , X 1 − X \] ( α − 1 ) ( β − 1 ) ( c − a ) 2 \= ( α \+ β − 1 ) ( c − a ) 2 {\\displaystyle {\\begin{aligned}\\alpha \>2:\\quad {\\mathcal {I}}\_{a,a}&=\\operatorname {var} \\left\[{\\frac {1}{X}}\\right\]\\left({\\frac {\\alpha -1}{c-a}}\\right)^{2}=\\operatorname {var} \\left\[{\\frac {1-X}{X}}\\right\]\\left({\\frac {\\alpha -1}{c-a}}\\right)^{2}={\\frac {\\beta (\\alpha +\\beta -1)}{(\\alpha -2)(c-a)^{2}}}\\\\\\beta \>2:\\quad {\\mathcal {I}}\_{c,c}&=\\operatorname {var} \\left\[{\\frac {1}{1-X}}\\right\]\\left({\\frac {\\beta -1}{c-a}}\\right)^{2}=\\operatorname {var} \\left\[{\\frac {X}{1-X}}\\right\]\\left({\\frac {\\beta -1}{c-a}}\\right)^{2}={\\frac {\\alpha (\\alpha +\\beta -1)}{(\\beta -2)(c-a)^{2}}}\\\\{\\mathcal {I}}\_{a,c}&=\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {1}{1-X}}\\right\]{\\frac {(\\alpha -1)(\\beta -1)}{(c-a)^{2}}}=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {X}{1-X}}\\right\]{\\frac {(\\alpha -1)(\\beta -1)}{(c-a)^{2}}}={\\frac {(\\alpha +\\beta -1)}{(c-a)^{2}}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha \>2:\\quad {\\mathcal {I}}\_{a,a}&=\\operatorname {var} \\left\[{\\frac {1}{X}}\\right\]\\left({\\frac {\\alpha -1}{c-a}}\\right)^{2}=\\operatorname {var} \\left\[{\\frac {1-X}{X}}\\right\]\\left({\\frac {\\alpha -1}{c-a}}\\right)^{2}={\\frac {\\beta (\\alpha +\\beta -1)}{(\\alpha -2)(c-a)^{2}}}\\\\\\beta \>2:\\quad {\\mathcal {I}}\_{c,c}&=\\operatorname {var} \\left\[{\\frac {1}{1-X}}\\right\]\\left({\\frac {\\beta -1}{c-a}}\\right)^{2}=\\operatorname {var} \\left\[{\\frac {X}{1-X}}\\right\]\\left({\\frac {\\beta -1}{c-a}}\\right)^{2}={\\frac {\\alpha (\\alpha +\\beta -1)}{(\\beta -2)(c-a)^{2}}}\\\\{\\mathcal {I}}\_{a,c}&=\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {1}{1-X}}\\right\]{\\frac {(\\alpha -1)(\\beta -1)}{(c-a)^{2}}}=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {X}{1-X}}\\right\]{\\frac {(\\alpha -1)(\\beta -1)}{(c-a)^{2}}}={\\frac {(\\alpha +\\beta -1)}{(c-a)^{2}}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f1f89730020364bb58791ca0eb47d0de25c896c2) See section "Moments of linearly transformed, product and inverted random variables" for these expectations. The determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability). From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is: det ( I ( α , β , a , c ) ) \= − I a , c 2 I α , a I α , β \+ I a , a I a , c I α , c I α , β \+ I a , c 2 I α , β 2 − I a , a I c , c I α , β 2 − I a , c I α , a I α , c I β , a \+ I a , c 2 I α , α I β , a \+ 2 I c , c I α , a I α , β I β , a − 2 I a , c I α , c I α , β I β , a \+ I α , c 2 I β , a 2 − I c , c I α , α I β , a 2 \+ I a , c I α , a 2 I β , c − I a , a I a , c I α , α I β , c − I a , c I α , a I α , β I β , c \+ I a , a I α , c I α , β I β , c − I α , a I α , c I β , a I β , c \+ I a , c I α , α I β , a I β , c − I c , c I α , a 2 I β , β \+ 2 I a , c I α , a I α , c I β , β − I a , a I α , c 2 I β , β − I a , c 2 I α , α I β , β \+ I a , a I c , c I α , α I β , β if α , β \> 2 {\\displaystyle {\\begin{aligned}\\det({\\mathcal {I}}(\\alpha ,\\beta ,a,c))={}&-{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }+{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\beta }^{2}-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\beta }^{2}\\\\&{}-{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,a}+{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}+2{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,a}\\\\&{}-2{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,a}+{\\mathcal {I}}\_{\\alpha ,c}^{2}{\\mathcal {I}}\_{\\beta ,a}^{2}-{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}^{2}+{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}^{2}{\\mathcal {I}}\_{\\beta ,c}\\\\&{}-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,c}-{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,c}+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,c}\\\\&{}-{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,a}{\\mathcal {I}}\_{\\beta ,c}+{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}{\\mathcal {I}}\_{\\beta ,c}-{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,a}^{2}{\\mathcal {I}}\_{\\beta ,\\beta }\\\\&{}+2{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{\\alpha ,c}^{2}{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }{\\text{ if }}\\alpha ,\\beta \>2\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\det({\\mathcal {I}}(\\alpha ,\\beta ,a,c))={}&-{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }+{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\beta }^{2}-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\beta }^{2}\\\\&{}-{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,a}+{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}+2{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,a}\\\\&{}-2{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,a}+{\\mathcal {I}}\_{\\alpha ,c}^{2}{\\mathcal {I}}\_{\\beta ,a}^{2}-{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}^{2}+{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}^{2}{\\mathcal {I}}\_{\\beta ,c}\\\\&{}-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,c}-{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,c}+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,c}\\\\&{}-{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,a}{\\mathcal {I}}\_{\\beta ,c}+{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}{\\mathcal {I}}\_{\\beta ,c}-{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,a}^{2}{\\mathcal {I}}\_{\\beta ,\\beta }\\\\&{}+2{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{\\alpha ,c}^{2}{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }{\\text{ if }}\\alpha ,\\beta \>2\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2736604fb3cf676756af731d77faaf9041e60ae9) Using [Sylvester's criterion](https://en.wikipedia.org/wiki/Sylvester%27s_criterion "Sylvester's criterion") (checking whether the diagonal elements are all positive), and since diagonal components I a , a {\\displaystyle {\\mathcal {I}}\_{a,a}} ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) and I c , c {\\displaystyle {\\mathcal {I}}\_{c,c}} ![{\\displaystyle {\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) have [singularities](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") at α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is [positive-definite](https://en.wikipedia.org/wiki/Positive-definite_matrix "Positive-definite matrix") for α\>2 and β\>2. Since for α \> 2 and β \> 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the [uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution "Continuous uniform distribution") (Beta(1,1,a,c)) have Fisher information components (I a , a , I c , c , I α , a , I β , c {\\displaystyle {\\mathcal {I}}\_{a,a},{\\mathcal {I}}\_{c,c},{\\mathcal {I}}\_{\\alpha ,a},{\\mathcal {I}}\_{\\beta ,c}} ![{\\displaystyle {\\mathcal {I}}\_{a,a},{\\mathcal {I}}\_{c,c},{\\mathcal {I}}\_{\\alpha ,a},{\\mathcal {I}}\_{\\beta ,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0efbfead578f297a9f2aa81caedf6c8066d5a0a0)) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case). The four-parameter [Wigner semicircle distribution](https://en.wikipedia.org/wiki/Wigner_semicircle_distribution "Wigner semicircle distribution") (Beta(3/2,3/2,*a*,*c*)) and [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") (Beta(1/2,1/2,*a*,*c*)) have negative Fisher information determinants for the four-parameter case. ### Bayesian inference \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=60 "Edit section: Bayesian inference")\] Main article: [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference") [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png/250px-Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta\(1,1\)_Uniform_distribution_-_J._Rodal.png) B e t a ( 1 , 1 ) {\\displaystyle Beta(1,1)} ![{\\displaystyle Beta(1,1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1dc385b5aa92366df42c6b175833dd66518bcc5e) : The [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") probability density was proposed by [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes "Thomas Bayes") to represent ignorance of prior probabilities in [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference"). The use of Beta distributions in [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference") is due to the fact that they provide a family of [conjugate prior probability distributions](https://en.wikipedia.org/wiki/Conjugate_prior_distribution "Conjugate prior distribution") for [binomial](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") (including [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution")) and [geometric distributions](https://en.wikipedia.org/wiki/Geometric_distribution "Geometric distribution"). The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value *p*:[\[24\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-MacKay-24) P ( p ; α , β ) \= p α − 1 ( 1 − p ) β − 1 B ( α , β ) . {\\displaystyle P(p;\\alpha ,\\beta )={\\frac {p^{\\alpha -1}(1-p)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}.} ![{\\displaystyle P(p;\\alpha ,\\beta )={\\frac {p^{\\alpha -1}(1-p)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad22d0a4845670ac730383ed26b7028a9eec1314) Examples of beta distributions used as prior probabilities to represent ignorance of prior parameter values in Bayesian inference are Beta(1,1), Beta(0,0) and Beta(1/2,1/2). #### Rule of succession \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=61 "Edit section: Rule of succession")\] Main article: [Rule of succession](https://en.wikipedia.org/wiki/Rule_of_succession "Rule of succession") A classic application of the beta distribution is the [rule of succession](https://en.wikipedia.org/wiki/Rule_of_succession "Rule of succession"), introduced in the 18th century by [Pierre-Simon Laplace](https://en.wikipedia.org/wiki/Pierre-Simon_Laplace "Pierre-Simon Laplace")[\[55\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Laplace-55) in the course of treating the [sunrise problem](https://en.wikipedia.org/wiki/Sunrise_problem "Sunrise problem"). It states that, given *s* successes in *n* [conditionally independent](https://en.wikipedia.org/wiki/Conditional_independence "Conditional independence") [Bernoulli trials](https://en.wikipedia.org/wiki/Bernoulli_trial "Bernoulli trial") with probability *p,* that the estimate of the expected value in the next trial is s \+ 1 n \+ 2 {\\displaystyle {\\frac {s+1}{n+2}}} ![{\\displaystyle {\\frac {s+1}{n+2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5a520f64a100600a0356d2562721ad1f6c907f5b). This estimate is the expected value of the posterior distribution over *p,* namely Beta(*s*\+1, *n*−*s*\+1), which is given by [Bayes' rule](https://en.wikipedia.org/wiki/Bayes%27_rule "Bayes' rule") if one assumes a uniform prior probability over *p* (i.e., Beta(1, 1)) and then observes that *p* generated *s* successes in *n* trials. Laplace's rule of succession has been criticized by prominent scientists. R. T. Cox described Laplace's application of the rule of succession to the [sunrise problem](https://en.wikipedia.org/wiki/Sunrise_problem "Sunrise problem") ([\[56\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-CoxRT-56) p. 89) as "a travesty of the proper use of the principle". Keynes remarks ([\[57\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 382) "indeed this is so foolish a theorem that to entertain it is discreditable". Karl Pearson[\[58\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-PearsonRuleSuccession-58) showed that the probability that the next (*n* + 1) trials will be successes, after n successes in n trials, is only 50%, which has been considered too low by scientists like Jeffreys and unacceptable as a representation of the scientific process of experimentation to test a proposed scientific law. As pointed out by Jeffreys ([\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59) p. 128) (crediting [C. D. Broad](https://en.wikipedia.org/wiki/C._D._Broad "C. D. Broad")[\[60\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BroadMind-60) ) Laplace's rule of succession establishes a high probability of success ((n+1)/(n+2)) in the next trial, but only a moderate probability (50%) that a further sample (*n*\+1) comparable in size will be equally successful. As pointed out by Perks,[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) "The rule of succession itself is hard to accept. It assigns a probability to the next trial which implies the assumption that the actual run observed is an average run and that we are always at the end of an average run. It would, one would think, be more reasonable to assume that we were in the middle of an average run. Clearly a higher value for both probabilities is necessary if they are to accord with reasonable belief." These problems with Laplace's rule of succession motivated Haldane, Perks, Jeffreys and others to search for other forms of prior probability (see the next [§ Bayesian inference](https://en.wikipedia.org/wiki/Beta_distribution#Bayesian_inference)). According to Jaynes,[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) the main problem with the rule of succession is that it is not valid when s=0 or s=n (see [rule of succession](https://en.wikipedia.org/wiki/Rule_of_succession "Rule of succession"), for an analysis of its validity). #### Bayes–Laplace prior probability (Beta(1,1)) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=62 "Edit section: Bayes–Laplace prior probability (Beta(1,1))")\] The beta distribution achieves maximum differential entropy for Beta(1,1): the [uniform](https://en.wikipedia.org/wiki/Uniform_density "Uniform density") probability density, for which all values in the domain of the distribution have equal density. This uniform distribution Beta(1,1) was suggested ("with a great deal of doubt") by [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes "Thomas Bayes")[\[62\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-ThomasBayes-62) as the prior probability distribution to express ignorance about the correct prior distribution. This prior distribution was adopted (apparently, from his writings, with little sign of doubt[\[55\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Laplace-55)) by [Pierre-Simon Laplace](https://en.wikipedia.org/wiki/Pierre-Simon_Laplace "Pierre-Simon Laplace"), and hence it was also known as the "Bayes–Laplace rule" or the "Laplace rule" of "[inverse probability](https://en.wikipedia.org/wiki/Inverse_probability "Inverse probability")" in publications of the first half of the 20th century. In the later part of the 19th century and early part of the 20th century, scientists realized that the assumption of uniform "equal" probability density depended on the actual functions (for example whether a linear or a logarithmic scale was most appropriate) and parametrizations used. In particular, the behavior near the ends of distributions with finite support (for example near *x* = 0, for a distribution with initial support at *x* = 0) required particular attention. Keynes ([\[57\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 381) criticized the use of Bayes's uniform prior probability (Beta(1,1)) that all values between zero and one are equiprobable, as follows: "Thus experience, if it shows anything, shows that there is a very marked clustering of statistical ratios in the neighborhoods of zero and unity, of those for positive theories and for correlations between positive qualities in the neighborhood of zero, and of those for negative theories and for correlations between negative qualities in the neighborhood of unity. " #### Haldane's prior probability (Beta(0,0)) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=63 "Edit section: Haldane's prior probability (Beta(0,0))")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png/250px-Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png) B e t a ( 0 , 0 ) {\\displaystyle Beta(0,0)} ![{\\displaystyle Beta(0,0)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7b6c60512da3a9f0ea488c28d019448ba513cb1f) : The Haldane prior probability expressing total ignorance about prior information, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure. As α, β → 0, the beta distribution approaches a two-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with all probability density concentrated at each end, at 0 and 1, and nothing in between. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Beta(0,0) distribution was proposed by [J.B.S. Haldane](https://en.wikipedia.org/wiki/J.B.S._Haldane "J.B.S. Haldane"),[\[63\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-63) who suggested that the prior probability representing complete uncertainty should be proportional to *p*−1(1−*p*)−1. The function *p*−1(1−*p*)−1 can be viewed as the limit of the numerator of the beta distribution as both shape parameters approach zero: α, β → 0. The Beta function (in the denominator of the beta distribution) approaches infinity, for both parameters approaching zero, α, β → 0. Therefore, *p*−1(1−*p*)−1 divided by the Beta function approaches a 2-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each end, at 0 and 1, and nothing in between, as α, β → 0. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Haldane prior probability distribution Beta(0,0) is an "[improper prior](https://en.wikipedia.org/wiki/Improper_prior "Improper prior")" because its integration (from 0 to 1) fails to strictly converge to 1 due to the singularities at each end. However, this is not an issue for computing posterior probabilities unless the sample size is very small. Furthermore, Zellner[\[64\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Zellner-64) points out that on the [log-odds](https://en.wikipedia.org/wiki/Log-odds "Log-odds") scale, (the [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation log ⁡ ( p / ( 1 − p ) ) {\\displaystyle \\log(p/(1-p))} ![{\\displaystyle \\log(p/(1-p))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1873f1747bd2b5847dd9db649027cb857c53c451)), the Haldane prior is the uniformly flat prior. The fact that a uniform prior probability on the [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformed variable ln(*p*/1 − *p*) (with domain (−∞, ∞)) is equivalent to the Haldane prior on the domain \[0, 1\] was pointed out by [Harold Jeffreys](https://en.wikipedia.org/wiki/Harold_Jeffreys "Harold Jeffreys") in the first edition (1939) of his book Theory of Probability ([\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59) p. 123). Jeffreys writes "Certainly if we take the Bayes–Laplace rule right up to the extremes we are led to results that do not correspond to anybody's way of thinking. The (Haldane) rule d*x*/(*x*(1 − *x*)) goes too far the other way. It would lead to the conclusion that if a sample is of one type with respect to some property there is a probability 1 that the whole population is of that type." The fact that "uniform" depends on the parametrization, led Jeffreys to seek a form of prior that would be invariant under different parametrizations. #### Jeffreys' prior probability (Beta(1/2,1/2) for a Bernoulli or for a binomial distribution) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=64 "Edit section: Jeffreys' prior probability (Beta(1/2,1/2) for a Bernoulli or for a binomial distribution)")\] Main article: [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/30/Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png/250px-Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png) [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability for the beta distribution: the square root of the determinant of [Fisher's information](https://en.wikipedia.org/wiki/Fisher%27s_information "Fisher's information") matrix: det ( I ( α , β ) ) \= ψ 1 ( α ) ψ 1 ( β ) − ( ψ 1 ( α ) \+ ψ 1 ( β ) ) ψ 1 ( α \+ β ) {\\displaystyle \\scriptstyle {\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}={\\sqrt {\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )}}} ![{\\displaystyle \\scriptstyle {\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}={\\sqrt {\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d2e6efbd72082ebee1c3de355648f844c92ffce8) is a function of the [trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function") ψ1 of shape parameters α, β [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png/250px-Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png) Posterior Beta densities with samples having success = "s", failure = "f" of *s*/(*s* + *f*) = 1/2, and *s* + *f* = {3,10,50}, based on 3 different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1/2). Significant differences appear for very small sample sizes (the flatter distribution for sample size of 3) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_-_J._Rodal.png) Posterior Beta densities with samples having success = "s", failure = "f" of *s*/(*s* + *f*) = 1/4, and *s* + *f* ∈ {3,10,50}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1/4). Significant differences appear for very small sample sizes (the very skewed distribution for the degenerate case of sample size = 3, in this degenerate and unlikely case the Haldane prior results in a reverse "J" shape with mode at *p* = 0 instead of *p* = 1/4. If there is sufficient [sampling data](https://en.wikipedia.org/wiki/Sample_\(statistics\) "Sample (statistics)"), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1/2,1/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https://en.wikipedia.org/wiki/Posterior_probability "Posterior probability") densities. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_sample_size_%3D_\(4,12,40\)_-_J._Rodal.png) Posterior Beta densities with samples having success = *s*, failure = *f* of *s*/(*s* + *f*) = 1/4, and *s* + *f* ∈ {4,12,40}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 40 (with more pronounced peak near *p* = 1/4). Significant differences appear for very small sample sizes [Harold Jeffreys](https://en.wikipedia.org/wiki/Harold_Jeffreys "Harold Jeffreys")[\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59)[\[65\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JeffreysPRIOR-65) proposed to use an [uninformative prior](https://en.wikipedia.org/wiki/Uninformative_prior "Uninformative prior") probability measure that should be [invariant under reparameterization](https://en.wikipedia.org/wiki/Parametrization_invariance "Parametrization invariance"): proportional to the square root of the [determinant](https://en.wikipedia.org/wiki/Determinant "Determinant") of [Fisher's information](https://en.wikipedia.org/wiki/Fisher%27s_information "Fisher's information") matrix. For the [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), this can be shown as follows: for a coin that is "heads" with probability *p* ∈ \[0, 1\] and is "tails" with probability 1 − *p*, for a given (H,T) ∈ {(0,1), (1,0)} the probability is *pH*(1 − *p*)*T*. Since *T* = 1 − *H*, the [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") is *pH*(1 − *p*)1 − *H*. Considering *p* as the only parameter, it follows that the log likelihood for the Bernoulli distribution is ln ⁡ L ( p ∣ H ) \= H ln ⁡ p \+ ( 1 − H ) ln ⁡ ( 1 − p ) . {\\displaystyle \\ln {\\mathcal {L}}(p\\mid H)=H\\ln p+(1-H)\\ln(1-p).} ![{\\displaystyle \\ln {\\mathcal {L}}(p\\mid H)=H\\ln p+(1-H)\\ln(1-p).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/98141843c7029d988f28ad6f672780ffb339e3e9) The Fisher information matrix has only one component (it is a scalar, because there is only one parameter: *p*), therefore: I ( p ) \= E \[ ( d d p ln ⁡ L ( p ∣ H ) ) 2 \] \= E \[ ( H p − 1 − H 1 − p ) 2 \] \= p 1 ( 1 − p ) 0 ( 1 p − 0 1 − p ) 2 \+ p 0 ( 1 − p ) 1 ( 0 p − 1 1 − p ) 2 \= 1 p ( 1 − p ) . {\\displaystyle {\\begin{aligned}{\\sqrt {{\\mathcal {I}}(p)}}&={\\sqrt {\\operatorname {E} \\!\\left\[\\left({\\frac {d}{dp}}\\ln {\\mathcal {L}}(p\\mid H)\\right)^{2}\\right\]}}\\\\\[6pt\]&={\\sqrt {\\operatorname {E} \\!\\left\[\\left({\\frac {H}{p}}-{\\frac {1-H}{1-p}}\\right)^{2}\\right\]}}\\\\\[6pt\]&={\\sqrt {p^{1}(1-p)^{0}\\left({\\frac {1}{p}}-{\\frac {0}{1-p}}\\right)^{2}+p^{0}(1-p)^{1}\\left({\\frac {0}{p}}-{\\frac {1}{1-p}}\\right)^{2}}}\\\\&={\\frac {1}{\\sqrt {p(1-p)}}}.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\sqrt {{\\mathcal {I}}(p)}}&={\\sqrt {\\operatorname {E} \\!\\left\[\\left({\\frac {d}{dp}}\\ln {\\mathcal {L}}(p\\mid H)\\right)^{2}\\right\]}}\\\\\[6pt\]&={\\sqrt {\\operatorname {E} \\!\\left\[\\left({\\frac {H}{p}}-{\\frac {1-H}{1-p}}\\right)^{2}\\right\]}}\\\\\[6pt\]&={\\sqrt {p^{1}(1-p)^{0}\\left({\\frac {1}{p}}-{\\frac {0}{1-p}}\\right)^{2}+p^{0}(1-p)^{1}\\left({\\frac {0}{p}}-{\\frac {1}{1-p}}\\right)^{2}}}\\\\&={\\frac {1}{\\sqrt {p(1-p)}}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0c2541cc4a3017abaab79170bd990ca92a64bc89) Similarly, for the [Binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") with *n* [Bernoulli trials](https://en.wikipedia.org/wiki/Bernoulli_trials "Bernoulli trials"), it can be shown that I ( p ) \= n p ( 1 − p ) . {\\displaystyle {\\sqrt {{\\mathcal {I}}(p)}}={\\sqrt {\\frac {n}{p(1-p)}}}.} ![{\\displaystyle {\\sqrt {{\\mathcal {I}}(p)}}={\\sqrt {\\frac {n}{p(1-p)}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/636cdf433bd3d97188042cbcea0ad5a750f1479b) Thus, for the [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), and [Binomial distributions](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"), [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") is proportional to 1 p ( 1 − p ) {\\displaystyle \\scriptstyle {\\frac {1}{\\sqrt {p(1-p)}}}} ![{\\displaystyle \\scriptstyle {\\frac {1}{\\sqrt {p(1-p)}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a42c718aec7a9fce451ad65deb17197b0d516c56), which happens to be proportional to a beta distribution with domain variable *x* = *p*, and shape parameters α = β = 1/2, the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution"): Beta ⁡ ( 1 2 , 1 2 ) \= 1 π p ( 1 − p ) . {\\displaystyle \\operatorname {Beta} ({\\tfrac {1}{2}},{\\tfrac {1}{2}})={\\frac {1}{\\pi {\\sqrt {p(1-p)}}}}.} ![{\\displaystyle \\operatorname {Beta} ({\\tfrac {1}{2}},{\\tfrac {1}{2}})={\\frac {1}{\\pi {\\sqrt {p(1-p)}}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d350c13d6df17923903b2a413a7fdc29dee48898) It will be shown in the next section that the normalizing constant for Jeffreys prior is immaterial to the final result because the normalizing constant cancels out in Bayes' theorem for the posterior probability. Hence Beta(1/2,1/2) is used as the Jeffreys prior for both Bernoulli and binomial distributions. As shown in the next section, when using this expression as a prior probability times the likelihood in [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem "Bayes' theorem"), the posterior probability turns out to be a beta distribution. It is important to realize, however, that Jeffreys prior is proportional to 1 p ( 1 − p ) {\\textstyle {\\frac {1}{\\sqrt {p(1-p)}}}} ![{\\textstyle {\\frac {1}{\\sqrt {p(1-p)}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6a70f13cbeb0fa71cd4a677a4028f252f343dd49) for the Bernoulli and binomial distribution, but not for the beta distribution. Jeffreys prior for the beta distribution is given by the determinant of Fisher's information for the beta distribution, which, as shown in the [§ Fisher information matrix](https://en.wikipedia.org/wiki/Beta_distribution#Fisher_information_matrix) is a function of the [trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function") ψ1 of shape parameters α and β as follows: det ( I ( α , β ) ) \= ψ 1 ( α ) ψ 1 ( β ) − ( ψ 1 ( α ) \+ ψ 1 ( β ) ) ψ 1 ( α \+ β ) lim α → 0 det ( I ( α , β ) ) \= lim β → 0 det ( I ( α , β ) ) \= ∞ lim α → ∞ det ( I ( α , β ) ) \= lim β → ∞ det ( I ( α , β ) ) \= 0 {\\displaystyle {\\begin{aligned}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&={\\sqrt {\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )}}\\\\\\lim \_{\\alpha \\to 0}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&=\\lim \_{\\beta \\to 0}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}=\\infty \\\\\\lim \_{\\alpha \\to \\infty }{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&=\\lim \_{\\beta \\to \\infty }{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}=0\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&={\\sqrt {\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )}}\\\\\\lim \_{\\alpha \\to 0}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&=\\lim \_{\\beta \\to 0}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}=\\infty \\\\\\lim \_{\\alpha \\to \\infty }{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&=\\lim \_{\\beta \\to \\infty }{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/07c5b390f9d9ecda1e940272b7746e29edcb4bc3) As previously discussed, Jeffreys prior for the Bernoulli and binomial distributions is proportional to the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") Beta(1/2,1/2), a one-dimensional *curve* that looks like a basin as a function of the parameter *p* of the Bernoulli and binomial distributions. The walls of the basin are formed by *p* approaching the singularities at the ends *p* → 0 and *p* → 1, where Beta(1/2,1/2) approaches infinity. Jeffreys prior for the beta distribution is a *2-dimensional surface* (embedded in a three-dimensional space) that looks like a basin with only two of its walls meeting at the corner α = β = 0 (and missing the other two walls) as a function of the shape parameters α and β of the beta distribution. The two adjoining walls of this 2-dimensional surface are formed by the shape parameters α and β approaching the singularities (of the trigamma function) at α, β → 0. It has no walls for α, β → ∞ because in this case the determinant of Fisher's information matrix for the beta distribution approaches zero. It will be shown in the next section that Jeffreys prior probability results in posterior probabilities (when multiplied by the binomial likelihood function) that are intermediate between the posterior probability results of the Haldane and Bayes prior probabilities. Jeffreys prior may be difficult to obtain analytically, and for some cases it just doesn't exist (even for simple distribution functions like the asymmetric [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution")). Berger, Bernardo and Sun, in a 2009 paper[\[66\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BergerBernardoSun-66) defined a reference prior probability distribution that (unlike Jeffreys prior) exists for the asymmetric [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution"). They cannot obtain a closed-form expression for their reference prior, but numerical calculations show it to be nearly perfectly fitted by the (proper) prior Beta ⁡ ( 1 2 , 1 2 ) ∼ 1 θ ( 1 − θ ) {\\displaystyle \\operatorname {Beta} ({\\tfrac {1}{2}},{\\tfrac {1}{2}})\\sim {\\frac {1}{\\sqrt {\\theta (1-\\theta )}}}} ![{\\displaystyle \\operatorname {Beta} ({\\tfrac {1}{2}},{\\tfrac {1}{2}})\\sim {\\frac {1}{\\sqrt {\\theta (1-\\theta )}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/48f294007f1cccbad01219b7034dd9156ad4c18e) where θ is the vertex variable for the asymmetric triangular distribution with support \[0, 1\] (corresponding to the following parameter values in Wikipedia's article on the [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution"): vertex *c* = *θ*, left end *a* = 0, and right end *b* = 1). Berger et al. also give a heuristic argument that Beta(1/2,1/2) could indeed be the exact Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution. Therefore, Beta(1/2,1/2) not only is Jeffreys prior for the Bernoulli and binomial distributions, but also seems to be the Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution (for which the Jeffreys prior does not exist), a distribution used in project management and [PERT](https://en.wikipedia.org/wiki/PERT "PERT") analysis to describe the cost and duration of project tasks. Clarke and Barron[\[67\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-67) prove that, among continuous positive priors, Jeffreys prior (when it exists) asymptotically maximizes Shannon's [mutual information](https://en.wikipedia.org/wiki/Mutual_information "Mutual information") between a sample of size n and the parameter, and therefore *Jeffreys prior is the most uninformative prior* (measuring information as Shannon information). The proof rests on an examination of the [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") between probability density functions for [iid](https://en.wikipedia.org/wiki/Iid "Iid") random variables. #### Effect of different prior probability choices on the posterior beta distribution \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=65 "Edit section: Effect of different prior probability choices on the posterior beta distribution")\] If samples are drawn from the population of a random variable *X* that result in *s* successes and *f* failures in *n* [Bernoulli trials](https://en.wikipedia.org/wiki/Bernoulli_trial "Bernoulli trial") *n* = *s* + *f*, then the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function") for parameters *s* and *f* given *x* = *p* (the notation *x* = *p* in the expressions below will emphasize that the domain *x* stands for the value of the parameter *p* in the binomial distribution), is the following [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"): L ( s , f ∣ x \= p ) \= ( s \+ f s ) x s ( 1 − x ) f \= ( n s ) x s ( 1 − x ) n − s . {\\displaystyle {\\mathcal {L}}(s,f\\mid x=p)={s+f \\choose s}x^{s}(1-x)^{f}={n \\choose s}x^{s}(1-x)^{n-s}.} ![{\\displaystyle {\\mathcal {L}}(s,f\\mid x=p)={s+f \\choose s}x^{s}(1-x)^{f}={n \\choose s}x^{s}(1-x)^{n-s}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/666e972b7052ad62f6852366654b0aa56a1a4933) If beliefs about [prior probability](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") information are reasonably well approximated by a beta distribution with parameters *α* Prior and *β* Prior, then: PriorProbability ( x \= p ; α Prior , β Prior ) \= x α Prior − 1 ( 1 − x ) β Prior − 1 B ( α Prior , β Prior ) {\\displaystyle {\\operatorname {PriorProbability} }(x=p;\\alpha \\operatorname {Prior} ,\\beta \\operatorname {Prior} )={\\frac {x^{\\alpha \\operatorname {Prior} -1}(1-x)^{\\beta \\operatorname {Prior} -1}}{\\mathrm {B} (\\alpha \\operatorname {Prior} ,\\beta \\operatorname {Prior} )}}} ![{\\displaystyle {\\operatorname {PriorProbability} }(x=p;\\alpha \\operatorname {Prior} ,\\beta \\operatorname {Prior} )={\\frac {x^{\\alpha \\operatorname {Prior} -1}(1-x)^{\\beta \\operatorname {Prior} -1}}{\\mathrm {B} (\\alpha \\operatorname {Prior} ,\\beta \\operatorname {Prior} )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ea1f1fc91d398fc32f6c45a16e039b093b3b4cdd) According to [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem "Bayes' theorem") for a continuous event space, the [posterior probability](https://en.wikipedia.org/wiki/Posterior_probability "Posterior probability") density is given by the product of the [prior probability](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") and the likelihood function (given the evidence *s* and *f* = *n* − *s*), normalized so that the area under the curve equals one, as follows: posterior probability density ( x \= p ∣ s , n − s ) \= priorprobabilitydensity ⁡ ( x \= p ; α prior , β prior ) L ( s , f ∣ x \= p ) ∫ 0 1 prior probability density ( x \= p ; α prior , β prior ) L ( s , f ∣ x \= p ) d x \= ( n s ) x s \+ α prior − 1 ( 1 − x ) n − s \+ β prior − 1 / B ( α prior , β prior ) ∫ 0 1 ( ( n s ) x s \+ α prior − 1 ( 1 − x ) n − s \+ β prior − 1 / B ( α prior , β prior ) ) d x \= x s \+ α prior − 1 ( 1 − x ) n − s \+ β prior − 1 ∫ 0 1 ( x s \+ α prior − 1 ( 1 − x ) n − s \+ β prior − 1 ) d x \= x s \+ α prior − 1 ( 1 − x ) n − s \+ β prior − 1 B ( s \+ α prior , n − s \+ β prior ) . {\\displaystyle {\\begin{aligned}&{\\text{posterior probability density}}(x=p\\mid s,n-s)\\\\\[6pt\]={}&{\\frac {\\operatorname {priorprobabilitydensity} (x=p;\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} ){\\mathcal {L}}(s,f\\mid x=p)}{\\int \_{0}^{1}{\\text{prior probability density}}(x=p;\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} ){\\mathcal {L}}(s,f\\mid x=p)\\,dx}}\\\\\[6pt\]={}&{\\frac {{n \\choose s}x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}/\\mathrm {B} (\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} )}{\\int \_{0}^{1}\\left({n \\choose s}x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}/\\mathrm {B} (\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} )\\right)\\,dx}}\\\\\[6pt\]={}&{\\frac {x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}}{\\int \_{0}^{1}\\left(x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}\\right)\\,dx}}\\\\\[6pt\]={}&{\\frac {x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}}{\\mathrm {B} (s+\\alpha \\operatorname {prior} ,n-s+\\beta \\operatorname {prior} )}}.\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}&{\\text{posterior probability density}}(x=p\\mid s,n-s)\\\\\[6pt\]={}&{\\frac {\\operatorname {priorprobabilitydensity} (x=p;\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} ){\\mathcal {L}}(s,f\\mid x=p)}{\\int \_{0}^{1}{\\text{prior probability density}}(x=p;\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} ){\\mathcal {L}}(s,f\\mid x=p)\\,dx}}\\\\\[6pt\]={}&{\\frac {{n \\choose s}x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}/\\mathrm {B} (\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} )}{\\int \_{0}^{1}\\left({n \\choose s}x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}/\\mathrm {B} (\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} )\\right)\\,dx}}\\\\\[6pt\]={}&{\\frac {x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}}{\\int \_{0}^{1}\\left(x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}\\right)\\,dx}}\\\\\[6pt\]={}&{\\frac {x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}}{\\mathrm {B} (s+\\alpha \\operatorname {prior} ,n-s+\\beta \\operatorname {prior} )}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/569ad0317cf545ff98538c8a845f216120e87c08) The [binomial coefficient](https://en.wikipedia.org/wiki/Binomial_coefficient "Binomial coefficient") ( s \+ f s ) \= ( n s ) \= ( s \+ f ) \! s \! f \! \= n \! s \! ( n − s ) \! {\\displaystyle {s+f \\choose s}={n \\choose s}={\\frac {(s+f)!}{s!f!}}={\\frac {n!}{s!(n-s)!}}} ![{\\displaystyle {s+f \\choose s}={n \\choose s}={\\frac {(s+f)!}{s!f!}}={\\frac {n!}{s!(n-s)!}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bb56b3dc1365cfbe97c5e0f873d573c01086980) appears both in the numerator and the denominator of the posterior probability, and it does not depend on the integration variable *x*, hence it cancels out, and it is irrelevant to the final result. Similarly the normalizing factor for the prior probability, the beta function B(αPrior,βPrior) cancels out and it is immaterial to the final result. The same posterior probability result can be obtained if one uses an un-normalized prior x α prior − 1 ( 1 − x ) β prior − 1 {\\displaystyle x^{\\alpha \\operatorname {prior} -1}(1-x)^{\\beta \\operatorname {prior} -1}} ![{\\displaystyle x^{\\alpha \\operatorname {prior} -1}(1-x)^{\\beta \\operatorname {prior} -1}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/584f3f9c85036812a61ff48df8d78592be8af911) because the normalizing factors all cancel out. Several authors (including Jeffreys himself) thus use an un-normalized prior formula since the normalization constant cancels out. The numerator of the posterior probability ends up being just the (un-normalized) product of the prior probability and the likelihood function, and the denominator is its integral from zero to one. The beta function in the denominator, B(*s* + *α* Prior, *n* − *s* + *β* Prior), appears as a normalization constant to ensure that the total posterior probability integrates to unity. The ratio *s*/*n* of the number of successes to the total number of trials is a [sufficient statistic](https://en.wikipedia.org/wiki/Sufficient_statistic "Sufficient statistic") in the binomial case, which is relevant for the following results. For the **Bayes'** prior probability (Beta(1,1)), the posterior probability is: posteriorprobability ⁡ ( p \= x ∣ s , f ) \= x s ( 1 − x ) n − s B ( s \+ 1 , n − s \+ 1 ) , with mean \= s \+ 1 n \+ 2 , (and mode \= s n if 0 \< s \< n ) . {\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={\\frac {x^{s}(1-x)^{n-s}}{\\mathrm {B} (s+1,n-s+1)}},{\\text{ with mean }}={\\frac {s+1}{n+2}},{\\text{ (and mode}}={\\frac {s}{n}}{\\text{ if }}0\<s\<n).} ![{\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={\\frac {x^{s}(1-x)^{n-s}}{\\mathrm {B} (s+1,n-s+1)}},{\\text{ with mean }}={\\frac {s+1}{n+2}},{\\text{ (and mode}}={\\frac {s}{n}}{\\text{ if }}0\<s\<n).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/673b752f999f342d723a5283c10bf9e701cd6350) For the **Jeffreys'** prior probability (Beta(1/2,1/2)), the posterior probability is: posteriorprobability ⁡ ( p \= x ∣ s , f ) \= x s − 1 2 ( 1 − x ) n − s − 1 2 B ( s \+ 1 2 , n − s \+ 1 2 ) , with mean \= s \+ 1 2 n \+ 1 , (and mode \= s − 1 2 n − 1 if 1 2 \< s \< n − 1 2 ) . {\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={x^{s-{\\tfrac {1}{2}}}(1-x)^{n-s-{\\frac {1}{2}}} \\over \\mathrm {B} (s+{\\tfrac {1}{2}},n-s+{\\tfrac {1}{2}})},{\\text{ with mean}}={\\frac {s+{\\tfrac {1}{2}}}{n+1}},{\\text{ (and mode}}={\\frac {s-{\\tfrac {1}{2}}}{n-1}}{\\text{ if }}{\\tfrac {1}{2}}\<s\<n-{\\tfrac {1}{2}}).} ![{\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={x^{s-{\\tfrac {1}{2}}}(1-x)^{n-s-{\\frac {1}{2}}} \\over \\mathrm {B} (s+{\\tfrac {1}{2}},n-s+{\\tfrac {1}{2}})},{\\text{ with mean}}={\\frac {s+{\\tfrac {1}{2}}}{n+1}},{\\text{ (and mode}}={\\frac {s-{\\tfrac {1}{2}}}{n-1}}{\\text{ if }}{\\tfrac {1}{2}}\<s\<n-{\\tfrac {1}{2}}).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc61802878e4c0ab0fbe64a6b5d9cb1c4fce80a5) and for the **Haldane** prior probability (Beta(0,0)), the posterior probability is: posteriorprobability ⁡ ( p \= x ∣ s , f ) \= x s − 1 ( 1 − x ) n − s − 1 B ( s , n − s ) , with mean \= s n , (and mode \= s − 1 n − 2 if 1 \< s \< n − 1 ) . {\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={\\frac {x^{s-1}(1-x)^{n-s-1}}{\\mathrm {B} (s,n-s)}},{\\text{ with mean}}={\\frac {s}{n}},{\\text{ (and mode}}={\\frac {s-1}{n-2}}{\\text{ if }}1\<s\<n-1).} ![{\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={\\frac {x^{s-1}(1-x)^{n-s-1}}{\\mathrm {B} (s,n-s)}},{\\text{ with mean}}={\\frac {s}{n}},{\\text{ (and mode}}={\\frac {s-1}{n-2}}{\\text{ if }}1\<s\<n-1).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/79e9ef205b4458b9b10d3b81c34cf848e9598de4) From the above expressions it follows that for *s*/*n* = 1/2) all the above three prior probabilities result in the identical location for the posterior probability mean = mode = 1/2. For *s*/*n* \< 1/2, the mean of the posterior probabilities, using the following priors, are such that: mean for Bayes prior \> mean for Jeffreys prior \> mean for Haldane prior. For *s*/*n* \> 1/2 the order of these inequalities is reversed such that the Haldane prior probability results in the largest posterior mean. The *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the "next" trial) identical to the ratio *s*/*n* of the number of successes to the total number of trials. Therefore, the Haldane prior results in a posterior probability with expected value in the next trial equal to the maximum likelihood. The *Bayes* prior probability Beta(1,1) results in a posterior probability density with *mode* identical to the ratio *s*/*n* (the maximum likelihood). In the case that 100% of the trials have been successful *s* = *n*, the *Bayes* prior probability Beta(1,1) results in a posterior expected value equal to the rule of succession (*n* + 1)/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of 1 (absolute certainty of success in the next trial). Jeffreys prior probability results in a posterior expected value equal to (*n* + 1/2)/(*n* + 1). Perks[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) (p. 303) points out: "This provides a new rule of succession and expresses a 'reasonable' position to take up, namely, that after an unbroken run of n successes we assume a probability for the next trial equivalent to the assumption that we are about half-way through an average run, i.e. that we expect a failure once in (2*n* + 2) trials. The Bayes–Laplace rule implies that we are about at the end of an average run or that we expect a failure once in (*n* + 2) trials. The comparison clearly favours the new result (what is now called Jeffreys prior) from the point of view of 'reasonableness'." Conversely, in the case that 100% of the trials have resulted in failure (*s* = 0), the *Bayes* prior probability Beta(1,1) results in a posterior expected value for success in the next trial equal to 1/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of success in the next trial of 0 (absolute certainty of failure in the next trial). Jeffreys prior probability results in a posterior expected value for success in the next trial equal to (1/2)/(*n* + 1), which Perks[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) (p. 303) points out: "is a much more reasonably remote result than the Bayes–Laplace result 1/(*n* + 2)". Jaynes[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) questions (for the Haldane prior Beta(0,0)) the use of these formulas for the cases *s* = 0 or *s* = *n* because the integrals do not converge (Beta(0,0) is an improper prior for *s* = 0 or *s* = *n*). In practice, the conditions 0\<s\<n necessary for a mode to exist between both ends for the Bayes prior are usually met, and therefore the Bayes prior (as long as 0 \< *s* \< *n*) results in a posterior mode located between both ends of the domain. As remarked in the section on the rule of succession, K. Pearson showed that after *n* successes in *n* trials the posterior probability (based on the Bayes Beta(1,1) distribution as the prior probability) that the next (*n* + 1) trials will all be successes is exactly 1/2, whatever the value of *n*. Based on the Haldane Beta(0,0) distribution as the prior probability, this posterior probability is 1 (absolute certainty that after n successes in *n* trials the next (*n* + 1) trials will all be successes). Perks[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) (p. 303) shows that, for what is now known as the Jeffreys prior, this probability is ((*n* + 1/2)/(*n* + 1))((*n* + 3/2)/(*n* + 2))...(2*n* + 1/2)/(2*n* + 1), which for *n* = 1, 2, 3 gives 15/24, 315/480, 9009/13440; rapidly approaching a limiting value of 1 / 2 \= 0\.70710678 … {\\displaystyle 1/{\\sqrt {2}}=0.70710678\\ldots } ![{\\displaystyle 1/{\\sqrt {2}}=0.70710678\\ldots }](https://wikimedia.org/api/rest_v1/media/math/render/svg/8d8b88897e116bacec037ef4ef4711a4ce0305bd) as n tends to infinity. Perks remarks that what is now known as the Jeffreys prior: "is clearly more 'reasonable' than either the Bayes–Laplace result or the result on the (Haldane) alternative rule rejected by Jeffreys which gives certainty as the probability. It clearly provides a very much better correspondence with the process of induction. Whether it is 'absolutely' reasonable for the purpose, i.e. whether it is yet large enough, without the absurdity of reaching unity, is a matter for others to decide. But it must be realized that the result depends on the assumption of complete indifference and absence of knowledge prior to the sampling experiment." Following are the variances of the posterior distribution obtained with these three prior probability distributions: for the **Bayes'** prior probability (Beta(1,1)), the posterior variance is: variance \= ( n − s \+ 1 ) ( s \+ 1 ) ( 3 \+ n ) ( 2 \+ n ) 2 , which for s \= n 2 results in variance \= 1 12 \+ 4 n {\\displaystyle {\\text{variance}}={\\frac {(n-s+1)(s+1)}{(3+n)(2+n)^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in variance}}={\\frac {1}{12+4n}}} ![{\\displaystyle {\\text{variance}}={\\frac {(n-s+1)(s+1)}{(3+n)(2+n)^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in variance}}={\\frac {1}{12+4n}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f8b0a181eec5d97c53b6a05b2f88aa63372e0cb2) for the **Jeffreys'** prior probability (Beta(1/2,1/2)), the posterior variance is: variance \= ( n − s \+ 1 2 ) ( s \+ 1 2 ) ( 2 \+ n ) ( 1 \+ n ) 2 , which for s \= n 2 results in var \= 1 8 \+ 4 n {\\displaystyle {\\text{variance}}={\\frac {(n-s+{\\frac {1}{2}})(s+{\\frac {1}{2}})}{(2+n)(1+n)^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in var}}={\\frac {1}{8+4n}}} ![{\\displaystyle {\\text{variance}}={\\frac {(n-s+{\\frac {1}{2}})(s+{\\frac {1}{2}})}{(2+n)(1+n)^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in var}}={\\frac {1}{8+4n}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8e259bf78be07a741013e0a7b30627f49bde4437) and for the **Haldane** prior probability (Beta(0,0)), the posterior variance is: variance \= ( n − s ) s ( 1 \+ n ) n 2 , which for s \= n 2 results in variance \= 1 4 \+ 4 n {\\displaystyle {\\text{variance}}={\\frac {(n-s)s}{(1+n)n^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in variance}}={\\frac {1}{4+4n}}} ![{\\displaystyle {\\text{variance}}={\\frac {(n-s)s}{(1+n)n^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in variance}}={\\frac {1}{4+4n}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/61213e281fd4a510a97341e147dee5077bb04a0f) So, as remarked by Silvey,[\[50\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Silvey-50) for large *n*, the variance is small and hence the posterior distribution is highly concentrated, whereas the assumed prior distribution was very diffuse. This is in accord with what one would hope for, as vague prior knowledge is transformed (through Bayes' theorem) into a more precise posterior knowledge by an informative experiment. For small *n* the Haldane Beta(0,0) prior results in the largest posterior variance while the Bayes Beta(1,1) prior results in the more concentrated posterior. Jeffreys prior Beta(1/2,1/2) results in a posterior variance in between the other two. As *n* increases, the variance rapidly decreases so that the posterior variance for all three priors converges to approximately the same value (approaching zero variance as *n* → ∞). Recalling the previous result that the *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the "next" trial) identical to the ratio s/n of the number of successes to the total number of trials, it follows from the above expression that also the *Haldane* prior Beta(0,0) results in a posterior with *variance* identical to the variance expressed in terms of the max. likelihood estimate s/n and sample size (in [§ Variance](https://en.wikipedia.org/wiki/Beta_distribution#Variance)): variance \= μ ( 1 − μ ) 1 \+ ν \= ( n − s ) s ( 1 \+ n ) n 2 {\\displaystyle {\\text{variance}}={\\frac {\\mu (1-\\mu )}{1+\\nu }}={\\frac {(n-s)s}{(1+n)n^{2}}}} ![{\\displaystyle {\\text{variance}}={\\frac {\\mu (1-\\mu )}{1+\\nu }}={\\frac {(n-s)s}{(1+n)n^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad6c410c1c6a0ed633fa773ddcc91a55eaa68fdc) with the mean *μ* = *s*/*n* and the sample size *ν* = *n*. In Bayesian inference, using a [prior distribution](https://en.wikipedia.org/wiki/Prior_distribution "Prior distribution") Beta(*α*Prior,*β*Prior) prior to a binomial distribution is equivalent to adding (*α*Prior − 1) pseudo-observations of "success" and (*β*Prior − 1) pseudo-observations of "failure" to the actual number of successes and failures observed, then estimating the parameter *p* of the binomial distribution by the proportion of successes over both real- and pseudo-observations. A uniform prior Beta(1,1) does not add (or subtract) any pseudo-observations since for Beta(1,1) it follows that (*α*Prior − 1) = 0 and (*β*Prior − 1) = 0. The Haldane prior Beta(0,0) subtracts one pseudo observation from each and Jeffreys prior Beta(1/2,1/2) subtracts 1/2 pseudo-observation of success and an equal number of failure. This subtraction has the effect of [smoothing](https://en.wikipedia.org/wiki/Smoothing "Smoothing") out the posterior distribution. If the proportion of successes is not 50% (*s*/*n* ≠ 1/2) values of *α*Prior and *β*Prior less than 1 (and therefore negative (*α*Prior − 1) and (*β*Prior − 1)) favor sparsity, i.e. distributions where the parameter *p* is closer to either 0 or 1. In effect, values of *α*Prior and *β*Prior between 0 and 1, when operating together, function as a [concentration parameter](https://en.wikipedia.org/wiki/Concentration_parameter "Concentration parameter"). The accompanying plots show the posterior probability density functions for sample sizes *n* ∈ {3,10,50}, successes *s* ∈ {*n*/2,*n*/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1/2,1/2),Beta(1,1)}. Also shown are the cases for *n* = {4,12,40}, success *s* = {*n*/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1/2,1/2),Beta(1,1)}. The first plot shows the symmetric cases, for successes *s* ∈ {n/2}, with mean = mode = 1/2 and the second plot shows the skewed cases *s* ∈ {*n*/4}. The images show that there is little difference between the priors for the posterior with sample size of 50 (characterized by a more pronounced peak near *p* = 1/2). Significant differences appear for very small sample sizes (in particular for the flatter distribution for the degenerate case of sample size = 3). Therefore, the skewed cases, with successes *s* = {*n*/4}, show a larger effect from the choice of prior, at small sample size, than the symmetric cases. For symmetric distributions, the Bayes prior Beta(1,1) results in the most "peaky" and highest posterior distributions and the Haldane prior Beta(0,0) results in the flattest and lowest peak distribution. The Jeffreys prior Beta(1/2,1/2) lies in between them. For nearly symmetric, not too skewed distributions the effect of the priors is similar. For very small sample size (in this case for a sample size of 3) and skewed distribution (in this example for *s* ∈ {*n*/4}) the Haldane prior can result in a reverse-J-shaped distribution with a singularity at the left end. However, this happens only in degenerate cases (in this example *n* = 3 and hence *s* = 3/4 \< 1, a degenerate value because s should be greater than unity in order for the posterior of the Haldane prior to have a mode located between the ends, and because *s* = 3/4 is not an integer number, hence it violates the initial assumption of a binomial distribution for the likelihood) and it is not an issue in generic cases of reasonable sample size (such that the condition 1 \< *s* \< *n* − 1, necessary for a mode to exist between both ends, is fulfilled). In Chapter 12 (p. 385) of his book, Jaynes[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) asserts that the *Haldane prior* Beta(0,0) describes a *prior state of knowledge of complete ignorance*, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure, while the *Bayes (uniform) prior Beta(1,1) applies if* one knows that *both binary outcomes are possible*. Jaynes states: "*interpret the Bayes–Laplace (Beta(1,1)) prior as describing not a state of complete ignorance*, but the state of knowledge in which we have observed one success and one failure...once we have seen at least one success and one failure, then we know that the experiment is a true binary one, in the sense of physical possibility." Jaynes [\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) does not specifically discuss Jeffreys prior Beta(1/2,1/2) (Jaynes discussion of "Jeffreys prior" on pp. 181, 423 and on chapter 12 of Jaynes book[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) refers instead to the improper, un-normalized, prior "1/*p* *dp*" introduced by Jeffreys in the 1939 edition of his book,[\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59) seven years before he introduced what is now known as Jeffreys' invariant prior: the square root of the determinant of Fisher's information matrix. *"1/p" is Jeffreys' (1946) invariant prior for the [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution "Exponential distribution"), not for the Bernoulli or binomial distributions*). However, it follows from the above discussion that Jeffreys Beta(1/2,1/2) prior represents a state of knowledge in between the Haldane Beta(0,0) and Bayes Beta (1,1) prior. Similarly, [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") in his 1892 book [The Grammar of Science](https://en.wikipedia.org/wiki/The_Grammar_of_Science "The Grammar of Science")[\[68\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-PearsonGrammar-68)[\[69\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-PearsnGrammar2009-69) (p. 144 of 1900 edition) maintained that the Bayes (Beta(1,1) uniform prior was not a complete ignorance prior, and that it should be used when prior information justified to "distribute our ignorance equally"". K. Pearson wrote: "Yet the only supposition that we appear to have made is this: that, knowing nothing of nature, routine and anomy (from the Greek ανομία, namely: a- "without", and nomos "law") are to be considered as equally likely to occur. Now we were not really justified in making even this assumption, for it involves a knowledge that we do not possess regarding nature. We use our *experience* of the constitution and action of coins in general to assert that heads and tails are equally probable, but we have no right to assert before experience that, as we know nothing of nature, routine and breach are equally probable. In our ignorance we ought to consider before experience that nature may consist of all routines, all anomies (normlessness), or a mixture of the two in any proportion whatever, and that all such are equally probable. Which of these constitutions after experience is the most probable must clearly depend on what that experience has been like." If there is sufficient [sampling data](https://en.wikipedia.org/wiki/Sample_\(statistics\) "Sample (statistics)"), *and the posterior probability mode is not located at one of the extremes of the domain* (*x* = 0 or *x* = 1), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1/2,1/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https://en.wikipedia.org/wiki/Posterior_probability "Posterior probability") densities. Otherwise, as Gelman et al.[\[70\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Gelman-70) (p. 65) point out, "if so few data are available that the choice of noninformative prior distribution makes a difference, one should put relevant information into the prior distribution", or as Berger[\[4\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BergerDecisionTheory-4) (p. 125) points out "when different reasonable priors yield substantially different answers, can it be right to state that there *is* a single answer? Would it not be better to admit that there is scientific uncertainty, with the conclusion depending on prior beliefs?." ## Occurrence and applications \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=66 "Edit section: Occurrence and applications")\] ### Order statistics \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=67 "Edit section: Order statistics")\] Main article: [Order statistic](https://en.wikipedia.org/wiki/Order_statistic "Order statistic") The beta distribution has an important application in the theory of [order statistics](https://en.wikipedia.org/wiki/Order_statistic "Order statistic"). A basic result is that the distribution of the *k*th smallest of a sample of size *n* from a continuous [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") has a beta distribution.[\[40\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David1-40) This result is summarized as U ( k ) ∼ Beta ⁡ ( k , n \+ 1 − k ) . {\\displaystyle U\_{(k)}\\sim \\operatorname {Beta} (k,n+1-k).} ![{\\displaystyle U\_{(k)}\\sim \\operatorname {Beta} (k,n+1-k).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/28bbb16c265c1830b57c4bb5eb375ef984b360f2) From this, and application of the theory related to the [probability integral transform](https://en.wikipedia.org/wiki/Probability_integral_transform "Probability integral transform"), the distribution of any individual order statistic from any [continuous distribution](https://en.wikipedia.org/wiki/Continuous_distribution "Continuous distribution") can be derived.[\[40\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David1-40) ### Subjective logic \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=68 "Edit section: Subjective logic")\] Main article: [Subjective logic](https://en.wikipedia.org/wiki/Subjective_logic "Subjective logic") In standard logic, propositions are considered to be either true or false. In contradistinction, [subjective logic](https://en.wikipedia.org/wiki/Subjective_logic "Subjective logic") assumes that humans cannot determine with absolute certainty whether a proposition about the real world is absolutely true or false. In [subjective logic](https://en.wikipedia.org/wiki/Subjective_logic "Subjective logic") the [posteriori](https://en.wikipedia.org/wiki/A_posteriori "A posteriori") probability estimates of binary events can be represented by beta distributions.[\[71\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-J01-71) ### Wavelet analysis \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=69 "Edit section: Wavelet analysis")\] Main article: [Beta wavelet](https://en.wikipedia.org/wiki/Beta_wavelet "Beta wavelet") A [wavelet](https://en.wikipedia.org/wiki/Wavelet "Wavelet") is a wave-like [oscillation](https://en.wikipedia.org/wiki/Oscillation "Oscillation") with an [amplitude](https://en.wikipedia.org/wiki/Amplitude "Amplitude") that starts out at zero, increases, and then decreases back to zero. It can typically be visualized as a "brief oscillation" that promptly decays. Wavelets can be used to extract information from many different kinds of data, including – but certainly not limited to – audio signals and images. Thus, wavelets are purposefully crafted to have specific properties that make them useful for [signal processing](https://en.wikipedia.org/wiki/Signal_processing "Signal processing"). Wavelets are localized in both time and [frequency](https://en.wikipedia.org/wiki/Frequency "Frequency") whereas the standard [Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform "Fourier transform") is only localized in frequency. Therefore, standard Fourier Transforms are only applicable to [stationary processes](https://en.wikipedia.org/wiki/Stationary_process "Stationary process"), while [wavelets](https://en.wikipedia.org/wiki/Wavelet "Wavelet") are applicable to non-[stationary processes](https://en.wikipedia.org/wiki/Stationary_process "Stationary process"). Continuous wavelets can be constructed based on the beta distribution. [Beta wavelets](https://en.wikipedia.org/wiki/Beta_wavelet "Beta wavelet")[\[72\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-wavelet_oliveira-72) can be viewed as a soft variety of [Haar wavelets](https://en.wikipedia.org/wiki/Haar_wavelet "Haar wavelet") whose shape is fine-tuned by two shape parameters α and β. ### Population genetics \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=70 "Edit section: Population genetics")\] Main article: [Balding–Nichols model](https://en.wikipedia.org/wiki/Balding%E2%80%93Nichols_model "Balding–Nichols model") Further information: [F-statistics](https://en.wikipedia.org/wiki/F-statistics "F-statistics"), [Fixation index](https://en.wikipedia.org/wiki/Fixation_index "Fixation index"), and [Coefficient of relationship](https://en.wikipedia.org/wiki/Coefficient_of_relationship "Coefficient of relationship") The [Balding–Nichols model](https://en.wikipedia.org/wiki/Balding%E2%80%93Nichols_model "Balding–Nichols model") is a two-parameter [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") of the beta distribution used in [population genetics](https://en.wikipedia.org/wiki/Population_genetics "Population genetics").[\[73\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Balding-73) It is a statistical description of the [allele frequencies](https://en.wikipedia.org/wiki/Allele_frequencies "Allele frequencies") in the components of a sub-divided population: α \= μ ν , β \= ( 1 − μ ) ν , {\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,\\\\\\beta &=(1-\\mu )\\nu ,\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,\\\\\\beta &=(1-\\mu )\\nu ,\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/49c1e37dd960ad3b68ef0eaf2ab5fafd4b2209c8) where ν \= α \+ β \= 1 − F F {\\displaystyle \\nu =\\alpha +\\beta ={\\frac {1-F}{F}}} ![{\\displaystyle \\nu =\\alpha +\\beta ={\\frac {1-F}{F}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/59b55447a019bc1bc8f690546bbb416e0dbece32) and 0 \< F \< 1 {\\displaystyle 0\<F\<1} ![{\\displaystyle 0\<F\<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0f7e5ad065befe4700eeb8b5a154a1c558b44373); here *F* is (Wright's) genetic distance between two populations. ### Project management: task cost and schedule modeling \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=71 "Edit section: Project management: task cost and schedule modeling")\] The beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution") — is used extensively in [PERT](https://en.wikipedia.org/wiki/PERT "PERT"), [critical path method](https://en.wikipedia.org/wiki/Critical_path_method "Critical path method") (CPM), Joint Cost Schedule Modeling (JCSM) and other [project management](https://en.wikipedia.org/wiki/Project_management "Project management")/control systems to describe the time to completion and the cost of a task. In project management, shorthand computations are widely used to estimate the [mean](https://en.wikipedia.org/wiki/Mean "Mean") and [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") of the beta distribution:[\[39\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Malcolm-39) μ ( X ) \= a \+ 4 b \+ c 6 σ ( X ) \= c − a 6 {\\displaystyle {\\begin{aligned}\\mu (X)&={\\frac {a+4b+c}{6}}\\\\\[8pt\]\\sigma (X)&={\\frac {c-a}{6}}\\end{aligned}}} ![{\\displaystyle {\\begin{aligned}\\mu (X)&={\\frac {a+4b+c}{6}}\\\\\[8pt\]\\sigma (X)&={\\frac {c-a}{6}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7a89a68d1250ebe659be15e88edb5a9eb3e0cf87) where *a* is the minimum, *c* is the maximum, and *b* is the most likely value (the [mode](https://en.wikipedia.org/wiki/Mode_\(statistics\) "Mode (statistics)") for *α* \> 1 and *β* \> 1). The above estimate for the [mean](https://en.wikipedia.org/wiki/Mean "Mean") μ ( X ) \= a \+ 4 b \+ c 6 {\\displaystyle \\mu (X)={\\frac {a+4b+c}{6}}} ![{\\displaystyle \\mu (X)={\\frac {a+4b+c}{6}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/55659b9a1a4f5b15000858659deca16e38dc01fe) is known as the [PERT](https://en.wikipedia.org/wiki/PERT "PERT") [three-point estimation](https://en.wikipedia.org/wiki/Three-point_estimation "Three-point estimation") and it is exact for either of the following values of *β* (for arbitrary α within these ranges): *β* = *α* \> 1 (symmetric case) with [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") σ ( X ) \= c − a 2 1 \+ 2 α {\\displaystyle \\sigma (X)={\\frac {c-a}{2{\\sqrt {1+2\\alpha }}}}} ![{\\displaystyle \\sigma (X)={\\frac {c-a}{2{\\sqrt {1+2\\alpha }}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/023fd07dfd3669cde84d9cff72b6a6af3d8ffbab) , [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") = 0, and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = − 6 3 \+ 2 α {\\displaystyle {\\frac {-6}{3+2\\alpha }}} ![{\\displaystyle {\\frac {-6}{3+2\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/85f97b76a4804eeb490db5708337a448815594f5) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg/250px-Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg.png)](https://en.wikipedia.org/wiki/File:Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg) or *β* = 6 − *α* for 5 \> *α* \> 1 (skewed case) with [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") σ ( X ) \= ( c − a ) α ( 6 − α ) 6 7 , {\\displaystyle \\sigma (X)={\\frac {(c-a){\\sqrt {\\alpha (6-\\alpha )}}}{6{\\sqrt {7}}}},} ![{\\displaystyle \\sigma (X)={\\frac {(c-a){\\sqrt {\\alpha (6-\\alpha )}}}{6{\\sqrt {7}}}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3feac4fe845f042c246d8822d848826f95ac9a8d) [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness")\= ( 3 − α ) 7 2 α ( 6 − α ) {\\displaystyle {}={\\frac {(3-\\alpha ){\\sqrt {7}}}{2{\\sqrt {\\alpha (6-\\alpha )}}}}} ![{\\displaystyle {}={\\frac {(3-\\alpha ){\\sqrt {7}}}{2{\\sqrt {\\alpha (6-\\alpha )}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2dbd8720e8cbd9cb4b15fc3118e52d63c1d50e32), and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis")\= 21 α ( 6 − α ) − 3 {\\displaystyle {}={\\frac {21}{\\alpha (6-\\alpha )}}-3} ![{\\displaystyle {}={\\frac {21}{\\alpha (6-\\alpha )}}-3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/246152d7c35d8e6cd42c1f13aca7419321bf3e7c) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b9/Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg/250px-Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg.png)](https://en.wikipedia.org/wiki/File:Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg) The above estimate for the [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") *σ*(*X*) = (*c* − *a*)/6 is exact for either of the following values of *α* and *β*: *α* = *β* = 4 (symmetric) with [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") = 0, and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = −6/11. *β* = 6 − *α* and α \= 3 − 2 {\\displaystyle \\alpha =3-{\\sqrt {2}}} ![{\\displaystyle \\alpha =3-{\\sqrt {2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/92a7b6780ceaf0f8749f685b72f15717f3eb5495) (right-tailed, positive skew) with [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") \= 1 2 {\\displaystyle {}={\\frac {1}{\\sqrt {2}}}} ![{\\displaystyle {}={\\frac {1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/819e2bc3e97fb50bb05cf64ab50f15ecfc960dd4) , and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = 0 *β* = 6 − *α* and α \= 3 \+ 2 {\\displaystyle \\alpha =3+{\\sqrt {2}}} ![{\\displaystyle \\alpha =3+{\\sqrt {2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/208c420c6769274760ca91cb89d5ed807c49688e) (left-tailed, negative skew) with [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") \= − 1 2 {\\displaystyle {}={\\frac {-1}{\\sqrt {2}}}} ![{\\displaystyle {}={\\frac {-1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3bda0dcb0efa535b7484dbc63376664e24b20cdb) , and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/Beta_Distribution_for_conjugate_alpha_beta.svg/250px-Beta_Distribution_for_conjugate_alpha_beta.svg.png)](https://en.wikipedia.org/wiki/File:Beta_Distribution_for_conjugate_alpha_beta.svg) Otherwise, these can be poor approximations for beta distributions with other values of α and β, exhibiting average errors of 40% in the mean and 549% in the variance.[\[74\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-74)[\[75\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-75)[\[76\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-76) ## Random variate generation \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=72 "Edit section: Random variate generation")\] Further information: [Non-uniform random variate generation](https://en.wikipedia.org/wiki/Non-uniform_random_variate_generation "Non-uniform random variate generation") If *X* and *Y* are independent, with X ∼ Γ ( α , θ ) {\\displaystyle X\\sim \\Gamma (\\alpha ,\\theta )} ![{\\displaystyle X\\sim \\Gamma (\\alpha ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3cc24e37f0c7d4fb01955a76c7a624840eb4ccb0) and Y ∼ Γ ( β , θ ) {\\displaystyle Y\\sim \\Gamma (\\beta ,\\theta )} ![{\\displaystyle Y\\sim \\Gamma (\\beta ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/47a84f49e72d95c70ecdb7f3dcfc9dff7df0bd79) then X X \+ Y ∼ B ( α , β ) . {\\displaystyle {\\frac {X}{X+Y}}\\sim \\mathrm {B} (\\alpha ,\\beta ).} ![{\\displaystyle {\\frac {X}{X+Y}}\\sim \\mathrm {B} (\\alpha ,\\beta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bd2b6dea25aeefae71af31b8751b855d8848543) So one algorithm for generating beta variates is to generate X X \+ Y {\\displaystyle {\\frac {X}{X+Y}}} ![{\\displaystyle {\\frac {X}{X+Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42bc8b3f6470f35915b2708902dab39673407a1e), where *X* is a [gamma variate](https://en.wikipedia.org/wiki/Gamma_distribution#Random_variate_generation "Gamma distribution") with parameters (α, 1) and *Y* is an independent gamma variate with parameters (β, 1).[\[77\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-77) In fact, here X X \+ Y {\\displaystyle {\\frac {X}{X+Y}}} ![{\\displaystyle {\\frac {X}{X+Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42bc8b3f6470f35915b2708902dab39673407a1e) and X \+ Y {\\displaystyle X+Y} ![{\\displaystyle X+Y}](https://wikimedia.org/api/rest_v1/media/math/render/svg/191744cf9cddeff3ab2e750e22bcfce7766d355e) are independent, and X \+ Y ∼ Γ ( α \+ β , θ ) {\\displaystyle X+Y\\sim \\Gamma (\\alpha +\\beta ,\\theta )} ![{\\displaystyle X+Y\\sim \\Gamma (\\alpha +\\beta ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9d88f9521cace3408c444bb15dea5a4a3c2072d9). If Z ∼ Γ ( γ , θ ) {\\displaystyle Z\\sim \\Gamma (\\gamma ,\\theta )} ![{\\displaystyle Z\\sim \\Gamma (\\gamma ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/997d2e51e143bd9423b61aa7bc76d4caa6366cf2) and Z {\\displaystyle Z} ![{\\displaystyle Z}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1cc6b75e09a8aa3f04d8584b11db534f88fb56bd) is independent of X {\\displaystyle X} ![{\\displaystyle X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68baa052181f707c662844a465bfeeb135e82bab) and Y {\\displaystyle Y} ![{\\displaystyle Y}](https://wikimedia.org/api/rest_v1/media/math/render/svg/961d67d6b454b4df2301ac571808a3538b3a6d3f), then X \+ Y X \+ Y \+ Z ∼ B ( α \+ β , γ ) {\\displaystyle {\\frac {X+Y}{X+Y+Z}}\\sim \\mathrm {B} (\\alpha +\\beta ,\\gamma )} ![{\\displaystyle {\\frac {X+Y}{X+Y+Z}}\\sim \\mathrm {B} (\\alpha +\\beta ,\\gamma )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bd91125d334bb959e8ebf89504c23f7066be3b54) and X \+ Y X \+ Y \+ Z {\\displaystyle {\\frac {X+Y}{X+Y+Z}}} ![{\\displaystyle {\\frac {X+Y}{X+Y+Z}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6d5f48f66d0085cac1f622af4b9443c532d8c5eb) is independent of X X \+ Y {\\displaystyle {\\frac {X}{X+Y}}} ![{\\displaystyle {\\frac {X}{X+Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42bc8b3f6470f35915b2708902dab39673407a1e). This shows that the product of independent B ( α , β ) {\\displaystyle \\mathrm {B} (\\alpha ,\\beta )} ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1fea4d61abd27c28412c65add2f028b57b17fb12) and B ( α \+ β , γ ) {\\displaystyle \\mathrm {B} (\\alpha +\\beta ,\\gamma )} ![{\\displaystyle \\mathrm {B} (\\alpha +\\beta ,\\gamma )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2f2504a4ad7a50cab34808b42e35ff1b73e863b9) random variables is a B ( α , β \+ γ ) {\\displaystyle \\mathrm {B} (\\alpha ,\\beta +\\gamma )} ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta +\\gamma )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8f991b0924c7dcba924e3123313bc19f762b4e79) random variable. Also, the *k*th [order statistic](https://en.wikipedia.org/wiki/Order_statistic "Order statistic") of *n* [uniformly distributed](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") variates is B ( k , n \+ 1 − k ) {\\displaystyle \\mathrm {B} (k,n+1-k)} ![{\\displaystyle \\mathrm {B} (k,n+1-k)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6f6a88b3a90aa4cd75f5a828258720d1f4bcaa79), so an alternative if *α* and *β* are small integers is to generate α + β − 1 uniform variates and choose the α-th smallest.[\[40\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David1-40) Another way to generate the Beta distribution is by [Pólya urn model](https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model "Pólya urn model"). According to this method, one starts with an "urn" with α "black" balls and β "white" balls and draws uniformly with replacement. Every trial an additional ball is added according to the color of the last ball which was drawn. Asymptotically, the proportion of black and white balls will be distributed according to the Beta distribution, where each repetition of the experiment will produce a different value. It is also possible to use the [inverse transform sampling](https://en.wikipedia.org/wiki/Inverse_transform_sampling "Inverse transform sampling"). ## Normal approximation to the Beta distribution \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=73 "Edit section: Normal approximation to the Beta distribution")\] A beta distribution B ( α , β ) {\\displaystyle \\mathrm {B} (\\alpha ,\\beta )} ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1fea4d61abd27c28412c65add2f028b57b17fb12) with α ∼ β {\\displaystyle \\alpha \\sim \\beta } ![{\\displaystyle \\alpha \\sim \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/6c257de6795c5c1704c89812118c26f38d33c81b) and α {\\displaystyle \\alpha } ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and β \>\> 1 {\\displaystyle \\beta \>\>1} ![{\\displaystyle \\beta \>\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f71342cab85bc922316af9012de28319cc26047d) is approximately normal with mean 1 / 2 {\\displaystyle 1/2} ![{\\displaystyle 1/2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e308a3a46b7fdce07cc09dcab9e8d8f73e37d935) and variance 1 / ( 4 ( 2 α \+ 1 ) ) {\\displaystyle 1/(4(2\\alpha +1))} ![{\\displaystyle 1/(4(2\\alpha +1))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a5f494b7d9160e6d47406af3660f84f19a500d75). If α ≥ β {\\displaystyle \\alpha \\geq \\beta } ![{\\displaystyle \\alpha \\geq \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa366317721f774c7361a27f615e297691cf5468) the normal approximation can be improved by taking the cube-root of the logarithm of the reciprocal of B ( α , β ) {\\displaystyle \\mathrm {B} (\\alpha ,\\beta )} ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1fea4d61abd27c28412c65add2f028b57b17fb12)[\[78\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-78)[\[79\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-79) ## History \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=74 "Edit section: History")\] [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes "Thomas Bayes"), in a posthumous paper [\[62\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-ThomasBayes-62) published in 1763 by [Richard Price](https://en.wikipedia.org/wiki/Richard_Price "Richard Price"), obtained a beta distribution as the density of the probability of success in Bernoulli trials (see [§ Applications, Bayesian inference](https://en.wikipedia.org/wiki/Beta_distribution#Applications,_Bayesian_inference)), but the paper does not analyze any of the moments of the beta distribution or discuss any of its properties. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Karl_Pearson_2.jpg/250px-Karl_Pearson_2.jpg)](https://en.wikipedia.org/wiki/File:Karl_Pearson_2.jpg) [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") analyzed the beta distribution as the solution Type I of Pearson distributions The first systematic modern discussion of the beta distribution is probably due to [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson").[\[80\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-80)[\[81\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-rscat-81) In Pearson's papers[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21)[\[33\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1895-33) the beta distribution is couched as a solution of a differential equation: [Pearson's Type I distribution](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution") which it is essentially identical to except for arbitrary shifting and re-scaling (the beta and Pearson Type I distributions can always be equalized by proper choice of parameters). In fact, in several English books and journal articles in the few decades prior to World War II, it was common to refer to the beta distribution as Pearson's Type I distribution. [William P. Elderton](https://en.wikipedia.org/wiki/William_Palin_Elderton "William Palin Elderton") in his 1906 monograph "Frequency curves and correlation"[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) further analyzes the beta distribution as Pearson's Type I distribution, including a full discussion of the method of moments for the four parameter case, and diagrams of (what Elderton describes as) U-shaped, J-shaped, twisted J-shaped, "cocked-hat" shapes, horizontal and angled straight-line cases. Elderton wrote "I am chiefly indebted to Professor Pearson, but the indebtedness is of a kind for which it is impossible to offer formal thanks." [Elderton](https://en.wikipedia.org/wiki/William_Palin_Elderton "William Palin Elderton") in his 1906 monograph [\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) provides an impressive amount of information on the beta distribution, including equations for the origin of the distribution chosen to be the mode, as well as for other Pearson distributions: types I through VII. Elderton also included a number of appendixes, including one appendix ("II") on the beta and gamma functions. In later editions, Elderton added equations for the origin of the distribution chosen to be the mean, and analysis of Pearson distributions VIII through XII. As remarked by Bowman and Shenton[\[44\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BowmanShenton-44) "Fisher and Pearson had a difference of opinion in the approach to (parameter) estimation, in particular relating to (Pearson's method of) moments and (Fisher's method of) maximum likelihood in the case of the Beta distribution." Also according to Bowman and Shenton, "the case of a Type I (beta distribution) model being the center of the controversy was pure serendipity. A more difficult model of 4 parameters would have been hard to find." The long running public conflict of Fisher with Karl Pearson can be followed in a number of articles in prestigious journals. For example, concerning the estimation of the four parameters for the beta distribution, and Fisher's criticism of Pearson's method of moments as being arbitrary, see Pearson's article "Method of moments and method of maximum likelihood" [\[45\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1936-45) (published three years after his retirement from University College, London, where his position had been divided between Fisher and Pearson's son Egon) in which Pearson writes "I read (Koshai's paper in the Journal of the Royal Statistical Society, 1933) which as far as I am aware is the only case at present published of the application of Professor Fisher's method. To my astonishment that method depends on first working out the constants of the frequency curve by the (Pearson) Method of Moments and then superposing on it, by what Fisher terms "the Method of Maximum Likelihood" a further approximation to obtain, what he holds, he will thus get, 'more efficient values' of the curve constants". David and Edwards's treatise on the history of statistics[\[82\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David_History-82) cites the first modern treatment of the beta distribution, in 1911,[\[83\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-83) using the beta designation that has become standard, due to [Corrado Gini](https://en.wikipedia.org/wiki/Corrado_Gini "Corrado Gini"), an Italian [statistician](https://en.wikipedia.org/wiki/Statistician "Statistician"), [demographer](https://en.wikipedia.org/wiki/Demography "Demography"), and [sociologist](https://en.wikipedia.org/wiki/Sociology "Sociology"), who developed the [Gini coefficient](https://en.wikipedia.org/wiki/Gini_coefficient "Gini coefficient"). [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz"), in their comprehensive and very informative monograph[\[84\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-84) on leading historical personalities in statistical sciences credit [Corrado Gini](https://en.wikipedia.org/wiki/Corrado_Gini "Corrado Gini")[\[85\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-85) as "an early Bayesian...who dealt with the problem of eliciting the parameters of an initial Beta distribution, by singling out techniques which anticipated the advent of the so-called empirical Bayes approach." ## References \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=75 "Edit section: References")\] 1. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-5) [***g***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-6) [***h***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-7) [***i***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-8) [***j***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-9) [***k***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-10) [***l***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-11) [***m***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-12) [***n***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-13) [***o***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-14) [***p***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-15) [***q***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-16) [***r***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-17) [***s***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-18) [***t***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-19) [***u***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-20) [***v***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-21) [***w***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-22) [***x***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-23) [***y***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-24) Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). "Chapter 25: Beta Distributions". *Continuous Univariate Distributions Vol. 2* (2nd ed.). Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-471-58494-0](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-58494-0 "Special:BookSources/978-0-471-58494-0") . 2. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-1) Rose, Colin; Smith, Murray D. (2002). *Mathematical Statistics with MATHEMATICA*. Springer. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0387952345](https://en.wikipedia.org/wiki/Special:BookSources/978-0387952345 "Special:BookSources/978-0387952345") . 3. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2011_3-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2011_3-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2011_3-2) [Kruschke, John K.](https://en.wikipedia.org/wiki/John_K._Kruschke "John K. Kruschke") (2011). *Doing Bayesian data analysis: A tutorial with R and BUGS*. Academic Press / Elsevier. p. 83. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0123814852](https://en.wikipedia.org/wiki/Special:BookSources/978-0123814852 "Special:BookSources/978-0123814852") . 4. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BergerDecisionTheory_4-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BergerDecisionTheory_4-1) Berger, James O. (2010). *Statistical Decision Theory and Bayesian Analysis* (2nd ed.). Springer. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1441930743](https://en.wikipedia.org/wiki/Special:BookSources/978-1441930743 "Special:BookSources/978-1441930743") . 5. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Feller_5-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Feller_5-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Feller_5-2) Feller, William (1971). [*An Introduction to Probability Theory and Its Applications, Vol. 2*](https://archive.org/details/introductiontopr00fell). Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471257097](https://en.wikipedia.org/wiki/Special:BookSources/978-0471257097 "Special:BookSources/978-0471257097") . 6. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-6)** Wadsworth, G. P. (1960). [*Introduction to Probability and Random Variables*](https://archive.org/details/introductiontopr0000wads). New York: McGraw-Hill. p. [52](https://archive.org/details/introductiontopr0000wads/page/52). 7. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2015_7-0)** [Kruschke, John K.](https://en.wikipedia.org/wiki/John_K._Kruschke "John K. Kruschke") (2015). *Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan*. Academic Press / Elsevier. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-12-405888-0](https://en.wikipedia.org/wiki/Special:BookSources/978-0-12-405888-0 "Special:BookSources/978-0-12-405888-0") . 8. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Wadsworth_8-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Wadsworth_8-1) Wadsworth, George P. and Joseph Bryan (1960). [*Introduction to Probability and Random Variables*](https://archive.org/details/introductiontopr0000wads). McGraw-Hill. 9. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-5) [***g***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-6) Gupta, Arjun K., ed. (2004). *Handbook of Beta Distribution and Its Applications*. CRC Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0824753962](https://en.wikipedia.org/wiki/Special:BookSources/978-0824753962 "Special:BookSources/978-0824753962") . 10. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kerman2011_10-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kerman2011_10-1) Kerman, Jouni (2011). "A closed-form approximation for the median of the beta distribution". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1111\.0433](https://arxiv.org/abs/1111.0433) \[[math.ST](https://arxiv.org/archive/math.ST)\]. 11. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-MostellerTukey_11-0)** Mosteller, Frederick and John Tukey (1977). [*Data Analysis and Regression: A Second Course in Statistics*](https://archive.org/details/dataanalysisregr0000most). Addison-Wesley Pub. Co. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1977dars.book.....M](https://ui.adsabs.harvard.edu/abs/1977dars.book.....M). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0201048544](https://en.wikipedia.org/wiki/Special:BookSources/978-0201048544 "Special:BookSources/978-0201048544") . 12. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-WillyFeller1_12-0)** Feller, William (1968). *An Introduction to Probability Theory and Its Applications*. Vol. 1 (3rd ed.). Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471257080](https://en.wikipedia.org/wiki/Special:BookSources/978-0471257080 "Special:BookSources/978-0471257080") . 13. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-13)** Philip J. Fleming and John J. Wallace. *How not to lie with statistics: the correct way to summarize benchmark results*. Communications of the ACM, 29(3):218–221, March 1986. 14. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-14)** ["NIST/SEMATECH e-Handbook of Statistical Methods 1.3.6.6.17. Beta Distribution"](http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm). *[National Institute of Standards and Technology](https://en.wikipedia.org/wiki/National_Institute_of_Standards_and_Technology "National Institute of Standards and Technology") Information Technology Laboratory*. April 2012. Retrieved May 31, 2016. 15. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Oguamanam_15-0)** Oguamanam, D.C.D.; Martin, H. R.; Huissoon, J. P. (1995). "On the application of the beta distribution to gear damage analysis". *Applied Acoustics*. **45** (3): 247–261\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/0003-682X(95)00001-P](https://doi.org/10.1016%2F0003-682X%2895%2900001-P). 16. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Liang_16-0)** Zhiqiang Liang; Jianming Wei; Junyu Zhao; Haitao Liu; Baoqing Li; Jie Shen; Chunlei Zheng (27 August 2008). ["The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3705491). *Sensors*. **8** (8): 5106–5119\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2008Senso...8.5106L](https://ui.adsabs.harvard.edu/abs/2008Senso...8.5106L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/s8085106](https://doi.org/10.3390%2Fs8085106). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [3705491](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3705491). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [27873804](https://pubmed.ncbi.nlm.nih.gov/27873804). 17. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kenney_and_Keeping_17-0)** Kenney, J. F., and E. S. Keeping (1951). *Mathematics of Statistics Part Two, 2nd edition*. D. Van Nostrand Company Inc. `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 18. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-3) Abramowitz, Milton and Irene A. Stegun (1965). [*Handbook Of Mathematical Functions With Formulas, Graphs, And Mathematical Tables*](https://archive.org/details/handbookofmathe000abra). Dover. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-486-61272-0](https://en.wikipedia.org/wiki/Special:BookSources/978-0-486-61272-0 "Special:BookSources/978-0-486-61272-0") . 19. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Weisstein.Kurtosi_19-0)** Weisstein., Eric W. ["Kurtosis"](http://mathworld.wolfram.com/Kurtosis.html). MathWorld--A Wolfram Web Resource. Retrieved 13 August 2012. 20. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Panik_20-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Panik_20-1) Panik, Michael J (2005). *Advanced Statistics from an Elementary Point of View*. Academic Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0120884940](https://en.wikipedia.org/wiki/Special:BookSources/978-0120884940 "Special:BookSources/978-0120884940") . 21. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-5) [Pearson, Karl](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") (1916). ["Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation"](https://doi.org/10.1098%2Frsta.1916.0009). *Philosophical Transactions of the Royal Society A*. **216** (538–548\): 429–457\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1916RSPTA.216..429P](https://ui.adsabs.harvard.edu/abs/1916RSPTA.216..429P). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rsta.1916.0009](https://doi.org/10.1098%2Frsta.1916.0009). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [91092](https://www.jstor.org/stable/91092). 22. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Zwillinger_2014_22-0)** [Gradshteyn, Izrail Solomonovich](https://en.wikipedia.org/wiki/Izrail_Solomonovich_Gradshteyn "Izrail Solomonovich Gradshteyn"); [Ryzhik, Iosif Moiseevich](https://en.wikipedia.org/wiki/Iosif_Moiseevich_Ryzhik "Iosif Moiseevich Ryzhik"); [Geronimus, Yuri Veniaminovich](https://en.wikipedia.org/wiki/Yuri_Veniaminovich_Geronimus "Yuri Veniaminovich Geronimus"); [Tseytlin, Michail Yulyevich](https://en.wikipedia.org/wiki/Michail_Yulyevich_Tseytlin "Michail Yulyevich Tseytlin"); Jeffrey, Alan (2015) \[October 2014\]. Zwillinger, Daniel; [Moll, Victor Hugo](https://en.wikipedia.org/wiki/Victor_Hugo_Moll "Victor Hugo Moll") (eds.). [*Table of Integrals, Series, and Products*](https://en.wikipedia.org/wiki/Gradshteyn_and_Ryzhik "Gradshteyn and Ryzhik"). Translated by Scripta Technica, Inc. (8 ed.). [Academic Press, Inc.](https://en.wikipedia.org/wiki/Academic_Press,_Inc. "Academic Press, Inc.") [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-12-384933-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-12-384933-5 "Special:BookSources/978-0-12-384933-5") . [LCCN](https://en.wikipedia.org/wiki/LCCN_\(identifier\) "LCCN (identifier)") [2014010276](https://lccn.loc.gov/2014010276). 23. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-23)** Billingsley, Patrick (1995). "Section 30: The Method of Moments". *Probability and measure* (3rd ed.). Wiley-Interscience. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-471-00710-4](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-00710-4 "Special:BookSources/978-0-471-00710-4") . 24. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-MacKay_24-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-MacKay_24-1) MacKay, David (2003). *Information Theory, Inference and Learning Algorithms*. Cambridge University Press; First Edition. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2003itil.book.....M](https://ui.adsabs.harvard.edu/abs/2003itil.book.....M). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0521642989](https://en.wikipedia.org/wiki/Special:BookSources/978-0521642989 "Special:BookSources/978-0521642989") . 25. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JohnsonLogInv_25-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JohnsonLogInv_25-1) Johnson, N.L. (1949). ["Systems of frequency curves generated by methods of translation"](http://dml.cz/bitstream/handle/10338.dmlcz/135506/Kybernetika_39-2003-1_3.pdf) (PDF). *Biometrika*. **36** (1–2\): 149–176\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1093/biomet/36.1-2.149](https://doi.org/10.1093%2Fbiomet%2F36.1-2.149). [hdl](https://en.wikipedia.org/wiki/Hdl_\(identifier\) "Hdl (identifier)"):[10338\.dmlcz/135506](https://hdl.handle.net/10338.dmlcz%2F135506). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18132090](https://pubmed.ncbi.nlm.nih.gov/18132090). 26. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-26)** Verdugo Lazo, A. C. G.; Rathie, P. N. (1978). "On the entropy of continuous probability distributions". *IEEE Trans. Inf. Theory*. **24** (1): 120–122\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TIT.1978.1055832](https://doi.org/10.1109%2FTIT.1978.1055832). 27. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-27)** Shannon, Claude E. (1948). "A Mathematical Theory of Communication". *Bell System Technical Journal*. **27** (4): 623–656\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1002/j.1538-7305.1948.tb01338.x](https://doi.org/10.1002%2Fj.1538-7305.1948.tb01338.x). 28. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Cover_and_Thomas_28-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Cover_and_Thomas_28-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Cover_and_Thomas_28-2) Cover, Thomas M. and Joy A. Thomas (2006). *Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing)*. Wiley-Interscience; 2 edition. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471241959](https://en.wikipedia.org/wiki/Special:BookSources/978-0471241959 "Special:BookSources/978-0471241959") . 29. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Plunkett_29-0)** Plunkett, Kim, and Jeffrey Elman (1997). [*Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations (Neural Network Modeling and Connectionism)*](https://archive.org/details/exercisesinrethi0000plun). A Bradford Book. p. 166. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0262661058](https://en.wikipedia.org/wiki/Special:BookSources/978-0262661058 "Special:BookSources/978-0262661058") . `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 30. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Nallapati_30-0)** Nallapati, Ramesh (2006). [*The smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval*](http://maroo.cs.umass.edu/pub/web/getpdf.php?id=679) (Thesis). Computer Science Dept., University of Massachusetts Amherst. 31. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Egon_31-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Egon_31-1) Pearson, Egon S. (July 1969). ["Some historical reflections traced through the development of the use of frequency curves"](http://www.smu.edu/Dedman/Academics/Departments/Statistics/Research/TechnicalReports). *THEMIS Statistical Analysis Research Program, Technical Report 38*. Office of Naval Research, Contract N000014-68-A-0515 (Project NR 042–260). 32. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Hahn_and_Shapiro_32-0)** Hahn, Gerald J.; Shapiro, S. (1994). *Statistical Models in Engineering (Wiley Classics Library)*. Wiley-Interscience. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471040651](https://en.wikipedia.org/wiki/Special:BookSources/978-0471040651 "Special:BookSources/978-0471040651") . 33. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1895_33-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1895_33-1) [Pearson, Karl](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") (1895). ["Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material"](https://doi.org/10.1098%2Frsta.1895.0010). *Philosophical Transactions of the Royal Society*. **186**: 343–414\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1895RSPTA.186..343P](https://ui.adsabs.harvard.edu/abs/1895RSPTA.186..343P). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rsta.1895.0010](https://doi.org/10.1098%2Frsta.1895.0010). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [90649](https://www.jstor.org/stable/90649). 34. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-34)** Buchanan, K.; Rockway, J.; Sternberg, O.; Mai, N. N. (May 2016). ["Sum-difference beamforming for radar applications using circularly tapered random arrays"](https://zenodo.org/record/1279364). *2016 IEEE Radar Conference (RadarConf)*. pp. 1–5\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/RADAR.2016.7485289](https://doi.org/10.1109%2FRADAR.2016.7485289). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-5090-0863-6](https://en.wikipedia.org/wiki/Special:BookSources/978-1-5090-0863-6 "Special:BookSources/978-1-5090-0863-6") . [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [32525626](https://api.semanticscholar.org/CorpusID:32525626). 35. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-35)** Buchanan, K.; Flores, C.; Wheeland, S.; Jensen, J.; Grayson, D.; Huff, G. (May 2017). "Transmit beamforming for radar applications using circularly tapered random arrays". *2017 IEEE Radar Conference (RadarConf)*. pp. 0112–0117\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/RADAR.2017.7944181](https://doi.org/10.1109%2FRADAR.2017.7944181). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-4673-8823-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4673-8823-8 "Special:BookSources/978-1-4673-8823-8") . [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [38429370](https://api.semanticscholar.org/CorpusID:38429370). 36. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-36)** Ryan, Buchanan, Kristopher (2014-05-29). ["Theory and Applications of Aperiodic (Random) Phased Arrays"](http://oaktrust.library.tamu.edu/handle/1969.1/157918). `{{cite web}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 37. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pham-Gia2000_37-0)** Pham-Gia, T. (January 2000). ["Distributions of the ratios of independent beta variables and applications"](https://doi.org/10.1080/03610920008832632). *Communications in Statistics - Theory and Methods*. **29** (12): 2693–2715\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1080/03610920008832632](https://doi.org/10.1080%2F03610920008832632). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0361-0926](https://search.worldcat.org/issn/0361-0926). Retrieved 13 November 2024. 38. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-NewPERT_38-0)** Herrerías-Velasco, José Manuel and Herrerías-Pleguezuelo, Rafael and René van Dorp, Johan. (2011). Revisiting the PERT mean and Variance. European Journal of Operational Research (210), p. 448–451. 39. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Malcolm_39-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Malcolm_39-1) Malcolm, D. G.; Roseboom, J. H.; Clark, C. E.; Fazar, W. (September–October 1958). "Application of a Technique for Research and Development Program Evaluation". *Operations Research*. **7** (5): 646–669\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1287/opre.7.5.646](https://doi.org/10.1287%2Fopre.7.5.646). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0030-364X](https://search.worldcat.org/issn/0030-364X). 40. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-3) David, H. A., Nagaraja, H. N. (2003) *Order Statistics* (3rd Edition). Wiley, New Jersey pp 458. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [0-471-38926-9](https://en.wikipedia.org/wiki/Special:BookSources/0-471-38926-9 "Special:BookSources/0-471-38926-9") 41. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-41)** ["1.3.6.6.17. Beta Distribution"](https://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm). *www.itl.nist.gov*. 42. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-5) [***g***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-6) [***h***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-7) Elderton, William Palin (1906). [*Frequency-Curves and Correlation*](https://archive.org/details/frequencycurvesc00elderich). Charles and Edwin Layton (London). 43. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton_and_Johnson_43-0)** Elderton, William Palin and Norman Lloyd Johnson (2009). *Systems of Frequency Curves*. Cambridge University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0521093361](https://en.wikipedia.org/wiki/Special:BookSources/978-0521093361 "Special:BookSources/978-0521093361") . 44. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BowmanShenton_44-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BowmanShenton_44-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BowmanShenton_44-2) [Bowman, K. O.](https://en.wikipedia.org/wiki/Kimiko_O._Bowman "Kimiko O. Bowman"); Shenton, L. R. (2007). ["The beta distribution, moment method, Karl Pearson and R.A. Fisher"](http://www.csm.ornl.gov/~bowman/fjts232.pdf) (PDF). *Far East J. Theo. Stat*. **23** (2): 133–164\. 45. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1936_45-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1936_45-1) Pearson, Karl (June 1936). "Method of moments and method of maximum likelihood". *Biometrika*. **28** (1/2): 34–59\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/2334123](https://doi.org/10.2307%2F2334123). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2334123](https://www.jstor.org/stable/2334123). 46. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Joanes_and_Gill_46-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Joanes_and_Gill_46-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Joanes_and_Gill_46-2) Joanes, D. N.; C. A. Gill (1998). "Comparing measures of sample skewness and kurtosis". *The Statistician*. **47** (Part 1): 183–189\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1111/1467-9884.00122](https://doi.org/10.1111%2F1467-9884.00122). 47. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-47)** Beckman, R. J.; G. L. Tietjen (1978). "Maximum likelihood estimation for the beta distribution". *Journal of Statistical Computation and Simulation*. **7** (3–4\): 253–258\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1080/00949657808810232](https://doi.org/10.1080%2F00949657808810232). 48. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-48)** Gnanadesikan, R., Pinkham and Hughes (1967). "Maximum likelihood estimation of the parameters of the beta distribution from smallest order statistics". *Technometrics*. **9** (4): 607–620\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/1266199](https://doi.org/10.2307%2F1266199). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [1266199](https://www.jstor.org/stable/1266199). `{{cite journal}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 49. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-invpsi.m_49-0)** Fackler, Paul. ["Inverse Digamma Function (Matlab)"](http://hips.seas.harvard.edu/content/inverse-digamma-function-matlab). Harvard University School of Engineering and Applied Sciences. Retrieved 2012-08-18. 50. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Silvey_50-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Silvey_50-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Silvey_50-2) Silvey, S.D. (1975). *Statistical Inference*. Chapman and Hal. p. 40. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0412138201](https://en.wikipedia.org/wiki/Special:BookSources/978-0412138201 "Special:BookSources/978-0412138201") . 51. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-EdwardsLikelihood_51-0)** Edwards, A. W. F. (1992). *Likelihood*. The Johns Hopkins University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0801844430](https://en.wikipedia.org/wiki/Special:BookSources/978-0801844430 "Special:BookSources/978-0801844430") . 52. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-5) Jaynes, E.T. (2003). *Probability theory, the logic of science*. Cambridge University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0521592710](https://en.wikipedia.org/wiki/Special:BookSources/978-0521592710 "Special:BookSources/978-0521592710") . 53. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-CostaCover_53-0)** Costa, Max, and Cover, Thomas (September 1983). [*On the similarity of the entropy power inequality and the Brunn Minkowski inequality*](https://isl.stanford.edu/people/cover/papers/transIT/0837cost.pdf) (PDF). Tech.Report 48, Dept. Statistics, Stanford University. `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 54. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Aryal_54-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Aryal_54-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Aryal_54-2) Aryal, Gokarna; Saralees Nadarajah (2004). ["Information matrix for beta distributions"](http://www.math.bas.bg/serdica/2004/2004-513-526.pdf) (PDF). *Serdica Mathematical Journal (Bulgarian Academy of Science)*. **30**: 513–526\. 55. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Laplace_55-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Laplace_55-1) Laplace, Pierre Simon, marquis de (1902). [*A philosophical essay on probabilities*](https://archive.org/details/philosophicaless00lapliala). New York : J. Wiley; London : Chapman & Hall. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-60206-328-0](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60206-328-0 "Special:BookSources/978-1-60206-328-0") . `{{cite book}}`: ISBN / Date incompatibility ([help](https://en.wikipedia.org/wiki/Help:CS1_errors#invalid_isbn_date "Help:CS1 errors"))CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 56. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-CoxRT_56-0)** Cox, Richard T. (1961). *Algebra of Probable Inference*. The Johns Hopkins University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0801869822](https://en.wikipedia.org/wiki/Special:BookSources/978-0801869822 "Special:BookSources/978-0801869822") . `{{cite book}}`: ISBN / Date incompatibility ([help](https://en.wikipedia.org/wiki/Help:CS1_errors#invalid_isbn_date "Help:CS1 errors")) 57. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-KeynesTreatise_57-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-KeynesTreatise_57-1) Keynes, John Maynard (2010) \[1921\]. *A Treatise on Probability: The Connection Between Philosophy and the History of Science*. Wildside Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1434406965](https://en.wikipedia.org/wiki/Special:BookSources/978-1434406965 "Special:BookSources/978-1434406965") . 58. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-PearsonRuleSuccession_58-0)** Pearson, Karl (1907). "On the Influence of Past Experience on Future Expectation". *Philosophical Magazine*. **6** (13): 365–378\. 59. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-3) Jeffreys, Harold (1998). *Theory of Probability*. Oxford University Press, 3rd edition. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0198503682](https://en.wikipedia.org/wiki/Special:BookSources/978-0198503682 "Special:BookSources/978-0198503682") . 60. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BroadMind_60-0)** Broad, C. D. (October 1918). "On the relation between induction and probability". *MIND, A Quarterly Review of Psychology and Philosophy*. 27 (New Series) (108): 389–404\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1093/mind/XXVII.4.389](https://doi.org/10.1093%2Fmind%2FXXVII.4.389). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2249035](https://www.jstor.org/stable/2249035). 61. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-3) Perks, Wilfred (January 1947). ["Some observations on inverse probability including a new indifference rule"](https://web.archive.org/web/20140112111032/http://www.actuaries.org.uk/research-and-resources/documents/some-observations-inverse-probability-including-new-indifference-ru). *Journal of the Institute of Actuaries*. **73** (2): 285–334\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1017/S0020268100012270](https://doi.org/10.1017%2FS0020268100012270). Archived from [the original](http://www.actuaries.org.uk/research-and-resources/documents/some-observations-inverse-probability-including-new-indifference-ru) on 2014-01-12. Retrieved 2012-09-19. 62. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-ThomasBayes_62-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-ThomasBayes_62-1) Bayes, Thomas; communicated by Richard Price (1763). ["An Essay towards solving a Problem in the Doctrine of Chances"](https://doi.org/10.1098%2Frstl.1763.0053). *Philosophical Transactions of the Royal Society*. **53**: 370–418\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rstl.1763.0053](https://doi.org/10.1098%2Frstl.1763.0053). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [105741](https://www.jstor.org/stable/105741). 63. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-63)** [Haldane, J. B. S.](https://en.wikipedia.org/wiki/J._B._S._Haldane "J. B. S. Haldane") (1932). "A note on inverse probability". *[Mathematical Proceedings of the Cambridge Philosophical Society](https://en.wikipedia.org/wiki/Mathematical_Proceedings_of_the_Cambridge_Philosophical_Society "Mathematical Proceedings of the Cambridge Philosophical Society")*. **28** (1): 55–61\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1932PCPS...28...55H](https://ui.adsabs.harvard.edu/abs/1932PCPS...28...55H). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1017/s0305004100010495](https://doi.org/10.1017%2Fs0305004100010495). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [122773707](https://api.semanticscholar.org/CorpusID:122773707). 64. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Zellner_64-0)** Zellner, Arnold (1971). *An Introduction to Bayesian Inference in Econometrics*. Wiley-Interscience. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471169376](https://en.wikipedia.org/wiki/Special:BookSources/978-0471169376 "Special:BookSources/978-0471169376") . 65. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JeffreysPRIOR_65-0)** Jeffreys, Harold (September 1946). ["An Invariant Form for the Prior Probability in Estimation Problems"](https://doi.org/10.1098%2Frspa.1946.0056). *Proceedings of the Royal Society*. A 24. **186** (1007): 453–461\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1946RSPSA.186..453J](https://ui.adsabs.harvard.edu/abs/1946RSPSA.186..453J). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rspa.1946.0056](https://doi.org/10.1098%2Frspa.1946.0056). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20998741](https://pubmed.ncbi.nlm.nih.gov/20998741). 66. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BergerBernardoSun_66-0)** Berger, James; Bernardo, Jose; Sun, Dongchu (2009). ["The formal definition of reference priors"](http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdfview_1&handle=euclid.aos/1236693154). *The Annals of Statistics*. **37** (2): 905–938\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[0904\.0156](https://arxiv.org/abs/0904.0156). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2009arXiv0904.0156B](https://ui.adsabs.harvard.edu/abs/2009arXiv0904.0156B). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/07-AOS587](https://doi.org/10.1214%2F07-AOS587). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [3221355](https://api.semanticscholar.org/CorpusID:3221355). 67. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-67)** Clarke, Bertrand S.; Andrew R. Barron (1994). ["Jeffreys' prior is asymptotically least favorable under entropy risk"](http://www.stat.yale.edu/~arb4/publications_files/jeffery's%20prior.pdf) (PDF). *Journal of Statistical Planning and Inference*. **41**: 37–60\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/0378-3758(94)90153-8](https://doi.org/10.1016%2F0378-3758%2894%2990153-8). 68. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-PearsonGrammar_68-0)** Pearson, Karl (1892). [*The Grammar of Science*](https://books.google.com/books?id=IvdsEcFwcnsC&q=grammar+of+science&pg=PR19). Walter Scott, London. 69. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-PearsnGrammar2009_69-0)** Pearson, Karl (2009). *The Grammar of Science*. BiblioLife. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1110356119](https://en.wikipedia.org/wiki/Special:BookSources/978-1110356119 "Special:BookSources/978-1110356119") . 70. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Gelman_70-0)** Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). *Bayesian Data Analysis*. Chapman and Hall/CRC. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1584883883](https://en.wikipedia.org/wiki/Special:BookSources/978-1584883883 "Special:BookSources/978-1584883883") . `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 71. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-J01_71-0)** Jøsang, Audun (2001). ["A logic for uncertain probabilities"](https://scholar.archive.org/work/nilorkzfvjccjir72m75zk3pgy). *International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems*. **9** (3): 279–311\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1142/S0218488501000831](https://doi.org/10.1142%2FS0218488501000831). [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [1843261](https://mathscinet.ams.org/mathscinet-getitem?mr=1843261). 72. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-wavelet_oliveira_72-0)** H.M. de Oliveira and G.A.A. Araújo,. Compactly Supported One-cyclic Wavelets Derived from Beta Distributions. *Journal of Communication and Information Systems.* vol.20, n.3, pp.27-33, 2005. 73. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Balding_73-0)** [Balding, David J.](https://en.wikipedia.org/wiki/David_Balding "David Balding"); Nichols, Richard A. (1995). "A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity". *Genetica*. **96** (1–2\). Springer: 3–12\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/BF01441146](https://doi.org/10.1007%2FBF01441146). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [7607457](https://pubmed.ncbi.nlm.nih.gov/7607457). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [30680826](https://api.semanticscholar.org/CorpusID:30680826). 74. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-74)** Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 1086–1091. 75. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-75)** Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609. 76. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-76)** ["Defense Resource Management Institute - Naval Postgraduate School"](https://www.nps.edu/web/drmi/). *www.nps.edu*. 77. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-77)** van der Waerden, B. L., "Mathematical Statistics", Springer, [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-3-540-04507-6](https://en.wikipedia.org/wiki/Special:BookSources/978-3-540-04507-6 "Special:BookSources/978-3-540-04507-6") . 78. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-78)** On normalizing the incomplete beta-function for fitting to dose-response curves M.E. Wise Biometrika vol 47, No. 1/2, June 1960, pp. 173–175 79. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-79)** Pratt, John W. “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, II.” Journal of the American Statistical Association, vol. 63, no. 324, 1968, pp. 1457–83. JSTOR, <https://doi.org/10.2307/2285896>. Accessed 21 Oct. 2025. 80. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-80)** [Yule, G. U.](https://en.wikipedia.org/wiki/Udny_Yule "Udny Yule"); Filon, L. N. G. (1936). ["Karl Pearson. 1857–1936"](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson"). *[Obituary Notices of Fellows of the Royal Society](https://en.wikipedia.org/wiki/Obituary_Notices_of_Fellows_of_the_Royal_Society "Obituary Notices of Fellows of the Royal Society")*. **2** (5): 72. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rsbm.1936.0007](https://doi.org/10.1098%2Frsbm.1936.0007). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [769130](https://www.jstor.org/stable/769130). 81. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-rscat_81-0)** ["Library and Archive catalogue"](https://web.archive.org/web/20111025030931/http://www2.royalsociety.org/DServe/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29\)). *Sackler Digital Archive*. Royal Society. Archived from [the original](http://www2.royalsociety.org/DServe/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29%29) on 2011-10-25. Retrieved 2011-07-01. 82. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David_History_82-0)** David, H. A. and A.W.F. Edwards (2001). *Annotated Readings in the History of Statistics*. Springer; 1 edition. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0387988443](https://en.wikipedia.org/wiki/Special:BookSources/978-0387988443 "Special:BookSources/978-0387988443") . 83. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-83)** Gini, Corrado (1911). "Considerazioni Sulle Probabilità Posteriori e Applicazioni al Rapporto dei Sessi Nelle Nascite Umane". *Studi Economico-Giuridici della Università de Cagliari*. Anno III (reproduced in Metron 15, 133, 171, 1949): 5–41\. 84. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-84)** Johnson, Norman L. and Samuel Kotz, ed. (1997). *Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present (Wiley Series in Probability and Statistics*. Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471163817](https://en.wikipedia.org/wiki/Special:BookSources/978-0471163817 "Special:BookSources/978-0471163817") . 85. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-85)** Metron journal. ["Biography of Corrado Gini"](https://web.archive.org/web/20120716202225/http://www.metronjournal.it/storia/ginibio.htm). Metron Journal. Archived from [the original](http://www.metronjournal.it/storia/ginibio.htm) on 2012-07-16. Retrieved 2012-08-18. ## External links \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=76 "Edit section: External links")\] [![Wikimedia Commons logo](https://upload.wikimedia.org/wikipedia/en/thumb/4/4a/Commons-logo.svg/40px-Commons-logo.svg.png)](https://en.wikipedia.org/wiki/File:Commons-logo.svg) Wikimedia Commons has media related to [Beta distribution](https://commons.wikimedia.org/wiki/Category:Beta_distribution "commons:Category:Beta distribution"). - ["Beta Distribution"](http://demonstrations.wolfram.com/BetaDistribution/) by Fiona Maclachlan, the [Wolfram Demonstrations Project](https://en.wikipedia.org/wiki/Wolfram_Demonstrations_Project "Wolfram Demonstrations Project"), 2007. - [Beta Distribution – Overview and Example](http://www.xycoon.com/beta.htm), xycoon.com - [Beta Distribution](https://web.archive.org/web/20120829140915/http://www.brighton-webs.co.uk/distributions/beta.htm), brighton-webs.co.uk - [Beta Distribution Video](http://www.exstrom.com/blog/snark/posts/dancingbeta.html), exstrom.com - ["Beta-distribution"](https://www.encyclopediaofmath.org/index.php?title=Beta-distribution), *[Encyclopedia of Mathematics](https://en.wikipedia.org/wiki/Encyclopedia_of_Mathematics "Encyclopedia of Mathematics")*, [EMS Press](https://en.wikipedia.org/wiki/European_Mathematical_Society "European Mathematical Society"), 2001 \[1994\] - [Weisstein, Eric W.](https://en.wikipedia.org/wiki/Eric_W._Weisstein "Eric W. Weisstein") ["Beta Distribution"](https://mathworld.wolfram.com/BetaDistribution.html). *[MathWorld](https://en.wikipedia.org/wiki/MathWorld "MathWorld")*. - [Harvard University Statistics 110 Lecture 23 Beta Distribution, Prof. Joe Blitzstein](https://www.youtube.com/watch?v=UZjlBQbV1KU) | [v](https://en.wikipedia.org/wiki/Template:Probability_distributions "Template:Probability distributions") [t](https://en.wikipedia.org/wiki/Template_talk:Probability_distributions "Template talk:Probability distributions") [e](https://en.wikipedia.org/wiki/Special:EditPage/Template:Probability_distributions "Special:EditPage/Template:Probability distributions")[Probability distributions](https://en.wikipedia.org/wiki/Probability_distribution "Probability distribution") ([list](https://en.wikipedia.org/wiki/List_of_probability_distributions "List of probability distributions")) | | |---|---| | Discrete univariate | | | | | | with finite support | [Benford](https://en.wikipedia.org/wiki/Benford%27s_law "Benford's law") [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") [Beta-binomial](https://en.wikipedia.org/wiki/Beta-binomial_distribution "Beta-binomial distribution") [Binomial](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") [Categorical](https://en.wikipedia.org/wiki/Categorical_distribution "Categorical distribution") [Hypergeometric](https://en.wikipedia.org/wiki/Hypergeometric_distribution "Hypergeometric distribution") [Negative](https://en.wikipedia.org/wiki/Negative_hypergeometric_distribution "Negative hypergeometric distribution") [Poisson binomial](https://en.wikipedia.org/wiki/Poisson_binomial_distribution "Poisson binomial distribution") [Rademacher](https://en.wikipedia.org/wiki/Rademacher_distribution "Rademacher distribution") [Soliton](https://en.wikipedia.org/wiki/Soliton_distribution "Soliton distribution") [Discrete uniform](https://en.wikipedia.org/wiki/Discrete_uniform_distribution "Discrete uniform distribution") [Zipf](https://en.wikipedia.org/wiki/Zipf%27s_law "Zipf's law") [Zipf–Mandelbrot](https://en.wikipedia.org/wiki/Zipf%E2%80%93Mandelbrot_law "Zipf–Mandelbrot law") | | with infinite support | [Beta negative binomial](https://en.wikipedia.org/wiki/Beta_negative_binomial_distribution "Beta negative binomial distribution") [Borel](https://en.wikipedia.org/wiki/Borel_distribution "Borel distribution") [Conway–Maxwell–Poisson](https://en.wikipedia.org/wiki/Conway%E2%80%93Maxwell%E2%80%93Poisson_distribution "Conway–Maxwell–Poisson distribution") [Discrete phase-type](https://en.wikipedia.org/wiki/Discrete_phase-type_distribution "Discrete phase-type distribution") [Delaporte](https://en.wikipedia.org/wiki/Delaporte_distribution "Delaporte distribution") [Extended negative binomial](https://en.wikipedia.org/wiki/Extended_negative_binomial_distribution "Extended negative binomial distribution") [Flory–Schulz](https://en.wikipedia.org/wiki/Flory%E2%80%93Schulz_distribution "Flory–Schulz distribution") [Gauss–Kuzmin](https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution "Gauss–Kuzmin distribution") [Geometric](https://en.wikipedia.org/wiki/Geometric_distribution "Geometric distribution") [Logarithmic](https://en.wikipedia.org/wiki/Logarithmic_distribution "Logarithmic distribution") [Mixed Poisson](https://en.wikipedia.org/wiki/Mixed_Poisson_distribution "Mixed Poisson distribution") [Negative binomial](https://en.wikipedia.org/wiki/Negative_binomial_distribution "Negative binomial distribution") [Panjer](https://en.wikipedia.org/wiki/\(a,b,0\)_class_of_distributions "(a,b,0) class of distributions") [Parabolic fractal](https://en.wikipedia.org/wiki/Parabolic_fractal_distribution "Parabolic fractal distribution") [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution "Poisson distribution") [Skellam](https://en.wikipedia.org/wiki/Skellam_distribution "Skellam distribution") [Yule–Simon](https://en.wikipedia.org/wiki/Yule%E2%80%93Simon_distribution "Yule–Simon distribution") [Zeta](https://en.wikipedia.org/wiki/Zeta_distribution "Zeta distribution") | | Continuous univariate | | | | | | supported on a bounded interval | [Arcsine](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") [ARGUS](https://en.wikipedia.org/wiki/ARGUS_distribution "ARGUS distribution") [Balding–Nichols](https://en.wikipedia.org/wiki/Balding%E2%80%93Nichols_model "Balding–Nichols model") [Bates](https://en.wikipedia.org/wiki/Bates_distribution "Bates distribution") [Beta]() [Generalized](https://en.wikipedia.org/wiki/Generalized_beta_distribution "Generalized beta distribution") [Beta rectangular](https://en.wikipedia.org/wiki/Beta_rectangular_distribution "Beta rectangular distribution") [Continuous Bernoulli](https://en.wikipedia.org/wiki/Continuous_Bernoulli_distribution "Continuous Bernoulli distribution") [Irwin–Hall](https://en.wikipedia.org/wiki/Irwin%E2%80%93Hall_distribution "Irwin–Hall distribution") [Kumaraswamy](https://en.wikipedia.org/wiki/Kumaraswamy_distribution "Kumaraswamy distribution") [Logit-normal](https://en.wikipedia.org/wiki/Logit-normal_distribution "Logit-normal distribution") [Noncentral beta](https://en.wikipedia.org/wiki/Noncentral_beta_distribution "Noncentral beta distribution") [PERT](https://en.wikipedia.org/wiki/PERT_distribution "PERT distribution") [Power function](https://en.wikipedia.org/w/index.php?title=Power_function_distribution&action=edit&redlink=1 "Power function distribution (page does not exist)") [Raised cosine](https://en.wikipedia.org/wiki/Raised_cosine_distribution "Raised cosine distribution") [Reciprocal](https://en.wikipedia.org/wiki/Reciprocal_distribution "Reciprocal distribution") [Triangular](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution") [U-quadratic](https://en.wikipedia.org/wiki/U-quadratic_distribution "U-quadratic distribution") [Uniform](https://en.wikipedia.org/wiki/Continuous_uniform_distribution "Continuous uniform distribution") [Wigner semicircle](https://en.wikipedia.org/wiki/Wigner_semicircle_distribution "Wigner semicircle distribution") | | supported on a semi-infinite interval | [Benini](https://en.wikipedia.org/wiki/Benini_distribution "Benini distribution") [Benktander 1st kind](https://en.wikipedia.org/wiki/Benktander_type_I_distribution "Benktander type I distribution") [Benktander 2nd kind](https://en.wikipedia.org/wiki/Benktander_type_II_distribution "Benktander type II distribution") [Beta prime](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") [Burr](https://en.wikipedia.org/wiki/Burr_distribution "Burr distribution") [Chi](https://en.wikipedia.org/wiki/Chi_distribution "Chi distribution") [Chi-squared](https://en.wikipedia.org/wiki/Chi-squared_distribution "Chi-squared distribution") [Noncentral](https://en.wikipedia.org/wiki/Noncentral_chi-squared_distribution "Noncentral chi-squared distribution") [Inverse](https://en.wikipedia.org/wiki/Inverse-chi-squared_distribution "Inverse-chi-squared distribution") [Scaled](https://en.wikipedia.org/wiki/Scaled_inverse_chi-squared_distribution "Scaled inverse chi-squared distribution") [Dagum](https://en.wikipedia.org/wiki/Dagum_distribution "Dagum distribution") [Davis](https://en.wikipedia.org/wiki/Davis_distribution "Davis distribution") [Erlang](https://en.wikipedia.org/wiki/Erlang_distribution "Erlang distribution") [Hyper](https://en.wikipedia.org/wiki/Hyper-Erlang_distribution "Hyper-Erlang distribution") [Exponential](https://en.wikipedia.org/wiki/Exponential_distribution "Exponential distribution") [Hyperexponential](https://en.wikipedia.org/wiki/Hyperexponential_distribution "Hyperexponential distribution") [Hypoexponential](https://en.wikipedia.org/wiki/Hypoexponential_distribution "Hypoexponential distribution") [Logarithmic](https://en.wikipedia.org/wiki/Exponential-logarithmic_distribution "Exponential-logarithmic distribution") [*F*](https://en.wikipedia.org/wiki/F-distribution "F-distribution") [Noncentral](https://en.wikipedia.org/wiki/Noncentral_F-distribution "Noncentral F-distribution") [Folded normal](https://en.wikipedia.org/wiki/Folded_normal_distribution "Folded normal distribution") [Fréchet](https://en.wikipedia.org/wiki/Fr%C3%A9chet_distribution "Fréchet distribution") [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution") [Generalized](https://en.wikipedia.org/wiki/Generalized_gamma_distribution "Generalized gamma distribution") [Inverse](https://en.wikipedia.org/wiki/Inverse-gamma_distribution "Inverse-gamma distribution") [gamma/Gompertz](https://en.wikipedia.org/wiki/Gamma/Gompertz_distribution "Gamma/Gompertz distribution") [Gompertz](https://en.wikipedia.org/wiki/Gompertz_distribution "Gompertz distribution") [Shifted](https://en.wikipedia.org/wiki/Shifted_Gompertz_distribution "Shifted Gompertz distribution") [Half-logistic](https://en.wikipedia.org/wiki/Half-logistic_distribution "Half-logistic distribution") [Half-normal](https://en.wikipedia.org/wiki/Half-normal_distribution "Half-normal distribution") [Hotelling's *T*\-squared](https://en.wikipedia.org/wiki/Hotelling%27s_T-squared_distribution "Hotelling's T-squared distribution") [Hartman–Watson](https://en.wikipedia.org/wiki/Hartman%E2%80%93Watson_distribution "Hartman–Watson distribution") [Inverse Gaussian](https://en.wikipedia.org/wiki/Inverse_Gaussian_distribution "Inverse Gaussian distribution") [Generalized](https://en.wikipedia.org/wiki/Generalized_inverse_Gaussian_distribution "Generalized inverse Gaussian distribution") [Kolmogorov](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test "Kolmogorov–Smirnov test") [Lévy](https://en.wikipedia.org/wiki/L%C3%A9vy_distribution "Lévy distribution") [Log-Cauchy](https://en.wikipedia.org/wiki/Log-Cauchy_distribution "Log-Cauchy distribution") [Log-Laplace](https://en.wikipedia.org/wiki/Log-Laplace_distribution "Log-Laplace distribution") [Log-logistic](https://en.wikipedia.org/wiki/Log-logistic_distribution "Log-logistic distribution") [Log-normal](https://en.wikipedia.org/wiki/Log-normal_distribution "Log-normal distribution") [Log-t](https://en.wikipedia.org/wiki/Log-t_distribution "Log-t distribution") [Lomax](https://en.wikipedia.org/wiki/Lomax_distribution "Lomax distribution") [Matrix-exponential](https://en.wikipedia.org/wiki/Matrix-exponential_distribution "Matrix-exponential distribution") [Maxwell–Boltzmann](https://en.wikipedia.org/wiki/Maxwell%E2%80%93Boltzmann_distribution "Maxwell–Boltzmann distribution") [Maxwell–Jüttner](https://en.wikipedia.org/wiki/Maxwell%E2%80%93J%C3%BCttner_distribution "Maxwell–Jüttner distribution") [Mittag-Leffler](https://en.wikipedia.org/wiki/Mittag-Leffler_distribution "Mittag-Leffler distribution") [Nakagami](https://en.wikipedia.org/wiki/Nakagami_distribution "Nakagami distribution") [Pareto](https://en.wikipedia.org/wiki/Pareto_distribution "Pareto distribution") [Phase-type](https://en.wikipedia.org/wiki/Phase-type_distribution "Phase-type distribution") [Poly-Weibull](https://en.wikipedia.org/wiki/Poly-Weibull_distribution "Poly-Weibull distribution") [Rayleigh](https://en.wikipedia.org/wiki/Rayleigh_distribution "Rayleigh distribution") [Relativistic Breit–Wigner](https://en.wikipedia.org/wiki/Relativistic_Breit%E2%80%93Wigner_distribution "Relativistic Breit–Wigner distribution") [Rice](https://en.wikipedia.org/wiki/Rice_distribution "Rice distribution") [Truncated normal](https://en.wikipedia.org/wiki/Truncated_normal_distribution "Truncated normal distribution") [type-2 Gumbel](https://en.wikipedia.org/wiki/Type-2_Gumbel_distribution "Type-2 Gumbel distribution") [Weibull](https://en.wikipedia.org/wiki/Weibull_distribution "Weibull distribution") [Discrete](https://en.wikipedia.org/wiki/Discrete_Weibull_distribution "Discrete Weibull distribution") [Wilks's lambda](https://en.wikipedia.org/wiki/Wilks%27s_lambda_distribution "Wilks's lambda distribution") | | supported on the whole real line | [Cauchy](https://en.wikipedia.org/wiki/Cauchy_distribution "Cauchy distribution") [Exponential power](https://en.wikipedia.org/wiki/Generalized_normal_distribution#Version_1 "Generalized normal distribution") [Fisher's *z*](https://en.wikipedia.org/wiki/Fisher%27s_z-distribution "Fisher's z-distribution") [Kaniadakis κ-Gaussian](https://en.wikipedia.org/wiki/Kaniadakis_Gaussian_distribution "Kaniadakis Gaussian distribution") [Gaussian *q*](https://en.wikipedia.org/wiki/Gaussian_q-distribution "Gaussian q-distribution") [Generalized hyperbolic](https://en.wikipedia.org/wiki/Generalised_hyperbolic_distribution "Generalised hyperbolic distribution") [Generalized logistic (logistic-beta)](https://en.wikipedia.org/wiki/Generalized_logistic_distribution "Generalized logistic distribution") [Generalized normal](https://en.wikipedia.org/wiki/Generalized_normal_distribution "Generalized normal distribution") [Geometric stable](https://en.wikipedia.org/wiki/Geometric_stable_distribution "Geometric stable distribution") [Gumbel](https://en.wikipedia.org/wiki/Gumbel_distribution "Gumbel distribution") [Holtsmark](https://en.wikipedia.org/wiki/Holtsmark_distribution "Holtsmark distribution") [Hyperbolic secant](https://en.wikipedia.org/wiki/Hyperbolic_secant_distribution "Hyperbolic secant distribution") [Johnson's *SU*](https://en.wikipedia.org/wiki/Johnson%27s_SU-distribution "Johnson's SU-distribution") [Landau](https://en.wikipedia.org/wiki/Landau_distribution "Landau distribution") [Laplace](https://en.wikipedia.org/wiki/Laplace_distribution "Laplace distribution") [Asymmetric](https://en.wikipedia.org/wiki/Asymmetric_Laplace_distribution "Asymmetric Laplace distribution") [Logistic](https://en.wikipedia.org/wiki/Logistic_distribution "Logistic distribution") [Noncentral *t*](https://en.wikipedia.org/wiki/Noncentral_t-distribution "Noncentral t-distribution") [Normal (Gaussian)](https://en.wikipedia.org/wiki/Normal_distribution "Normal distribution") [Normal-inverse Gaussian](https://en.wikipedia.org/wiki/Normal-inverse_Gaussian_distribution "Normal-inverse Gaussian distribution") [Skew normal](https://en.wikipedia.org/wiki/Skew_normal_distribution "Skew normal distribution") [Slash](https://en.wikipedia.org/wiki/Slash_distribution "Slash distribution") [Stable](https://en.wikipedia.org/wiki/Stable_distribution "Stable distribution") [Student's *t*](https://en.wikipedia.org/wiki/Student%27s_t-distribution "Student's t-distribution") [Tracy–Widom](https://en.wikipedia.org/wiki/Tracy%E2%80%93Widom_distribution "Tracy–Widom distribution") [Variance-gamma](https://en.wikipedia.org/wiki/Variance-gamma_distribution "Variance-gamma distribution") [Voigt](https://en.wikipedia.org/wiki/Voigt_profile "Voigt profile") | | with support whose type varies | [Generalized chi-squared](https://en.wikipedia.org/wiki/Generalized_chi-squared_distribution "Generalized chi-squared distribution") [Generalized extreme value](https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution "Generalized extreme value distribution") [Generalized Pareto](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution "Generalized Pareto distribution") [Marchenko–Pastur](https://en.wikipedia.org/wiki/Marchenko%E2%80%93Pastur_distribution "Marchenko–Pastur distribution") [Kaniadakis *κ*\-exponential](https://en.wikipedia.org/wiki/Kaniadakis_Exponential_distribution "Kaniadakis Exponential distribution") [Kaniadakis *κ*\-Gamma](https://en.wikipedia.org/wiki/Kaniadakis_Gamma_distribution "Kaniadakis Gamma distribution") [Kaniadakis *κ*\-Weibull](https://en.wikipedia.org/wiki/Kaniadakis_Weibull_distribution "Kaniadakis Weibull distribution") [Kaniadakis *κ*\-Logistic](https://en.wikipedia.org/wiki/Kaniadakis_Logistic_distribution "Kaniadakis Logistic distribution") [Kaniadakis *κ*\-Erlang](https://en.wikipedia.org/wiki/Kaniadakis_Erlang_distribution "Kaniadakis Erlang distribution") [*q*\-exponential](https://en.wikipedia.org/wiki/Q-exponential_distribution "Q-exponential distribution") [*q*\-Gaussian](https://en.wikipedia.org/wiki/Q-Gaussian_distribution "Q-Gaussian distribution") [*q*\-Weibull](https://en.wikipedia.org/wiki/Q-Weibull_distribution "Q-Weibull distribution") [Shifted log-logistic](https://en.wikipedia.org/wiki/Shifted_log-logistic_distribution "Shifted log-logistic distribution") [Tukey lambda](https://en.wikipedia.org/wiki/Tukey_lambda_distribution "Tukey lambda distribution") | | Mixed univariate | | | | | | continuous- discrete | [Rectified Gaussian](https://en.wikipedia.org/wiki/Rectified_Gaussian_distribution "Rectified Gaussian distribution") | | [Multivariate (joint)](https://en.wikipedia.org/wiki/Joint_probability_distribution "Joint probability distribution") | *Discrete:* [Ewens](https://en.wikipedia.org/wiki/Ewens%27s_sampling_formula "Ewens's sampling formula") [Multinomial](https://en.wikipedia.org/wiki/Multinomial_distribution "Multinomial distribution") [Dirichlet](https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution "Dirichlet-multinomial distribution") [Negative](https://en.wikipedia.org/wiki/Negative_multinomial_distribution "Negative multinomial distribution") *Continuous:* [Dirichlet](https://en.wikipedia.org/wiki/Dirichlet_distribution "Dirichlet distribution") [Generalized](https://en.wikipedia.org/wiki/Generalized_Dirichlet_distribution "Generalized Dirichlet distribution") [Multivariate Laplace](https://en.wikipedia.org/wiki/Multivariate_Laplace_distribution "Multivariate Laplace distribution") [Multivariate normal](https://en.wikipedia.org/wiki/Multivariate_normal_distribution "Multivariate normal distribution") [Multivariate stable](https://en.wikipedia.org/wiki/Multivariate_stable_distribution "Multivariate stable distribution") [Multivariate *t*](https://en.wikipedia.org/wiki/Multivariate_t-distribution "Multivariate t-distribution") [Normal-gamma](https://en.wikipedia.org/wiki/Normal-gamma_distribution "Normal-gamma distribution") [Inverse](https://en.wikipedia.org/wiki/Normal-inverse-gamma_distribution "Normal-inverse-gamma distribution") *[Matrix-valued:](https://en.wikipedia.org/wiki/Random_matrix "Random matrix")* [LKJ](https://en.wikipedia.org/wiki/Lewandowski-Kurowicka-Joe_distribution "Lewandowski-Kurowicka-Joe distribution") [Matrix beta](https://en.wikipedia.org/wiki/Matrix_variate_beta_distribution "Matrix variate beta distribution") [Matrix *F*](https://en.wikipedia.org/wiki/Matrix_F-distribution "Matrix F-distribution") [Matrix normal](https://en.wikipedia.org/wiki/Matrix_normal_distribution "Matrix normal distribution") [Matrix *t*](https://en.wikipedia.org/wiki/Matrix_t-distribution "Matrix t-distribution") [Matrix gamma](https://en.wikipedia.org/wiki/Matrix_gamma_distribution "Matrix gamma distribution") [Inverse](https://en.wikipedia.org/wiki/Inverse_matrix_gamma_distribution "Inverse matrix gamma distribution") [Wishart](https://en.wikipedia.org/wiki/Wishart_distribution "Wishart distribution") [Normal](https://en.wikipedia.org/wiki/Normal-Wishart_distribution "Normal-Wishart distribution") [Inverse](https://en.wikipedia.org/wiki/Inverse-Wishart_distribution "Inverse-Wishart distribution") [Normal-inverse](https://en.wikipedia.org/wiki/Normal-inverse-Wishart_distribution "Normal-inverse-Wishart distribution") [Complex](https://en.wikipedia.org/wiki/Complex_Wishart_distribution "Complex Wishart distribution") [Uniform distribution on a Stiefel manifold](https://en.wikipedia.org/wiki/Uniform_distribution_on_a_Stiefel_manifold "Uniform distribution on a Stiefel manifold") | | [Directional](https://en.wikipedia.org/wiki/Directional_statistics "Directional statistics") | *Univariate (circular) [directional](https://en.wikipedia.org/wiki/Directional_statistics "Directional statistics")* [Circular uniform](https://en.wikipedia.org/wiki/Circular_uniform_distribution "Circular uniform distribution") [Univariate von Mises](https://en.wikipedia.org/wiki/Von_Mises_distribution "Von Mises distribution") [Wrapped normal](https://en.wikipedia.org/wiki/Wrapped_normal_distribution "Wrapped normal distribution") [Wrapped Cauchy](https://en.wikipedia.org/wiki/Wrapped_Cauchy_distribution "Wrapped Cauchy distribution") [Wrapped exponential](https://en.wikipedia.org/wiki/Wrapped_exponential_distribution "Wrapped exponential distribution") [Wrapped asymmetric Laplace](https://en.wikipedia.org/wiki/Wrapped_asymmetric_Laplace_distribution "Wrapped asymmetric Laplace distribution") [Wrapped Lévy](https://en.wikipedia.org/wiki/Wrapped_L%C3%A9vy_distribution "Wrapped Lévy distribution") *Bivariate (spherical)* [Kent](https://en.wikipedia.org/wiki/Kent_distribution "Kent distribution") *Bivariate (toroidal)* [Bivariate von Mises](https://en.wikipedia.org/wiki/Bivariate_von_Mises_distribution "Bivariate von Mises distribution") *Multivariate* [von Mises–Fisher](https://en.wikipedia.org/wiki/Von_Mises%E2%80%93Fisher_distribution "Von Mises–Fisher distribution") [Bingham](https://en.wikipedia.org/wiki/Bingham_distribution "Bingham distribution") | | [Degenerate](https://en.wikipedia.org/wiki/Degenerate_distribution "Degenerate distribution") and [singular](https://en.wikipedia.org/wiki/Singular_distribution "Singular distribution") | *Degenerate* [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") *Singular* [Cantor](https://en.wikipedia.org/wiki/Cantor_distribution "Cantor distribution") | | Families | [Circular](https://en.wikipedia.org/wiki/Circular_distribution "Circular distribution") [Compound Poisson](https://en.wikipedia.org/wiki/Compound_Poisson_distribution "Compound Poisson distribution") [Elliptical](https://en.wikipedia.org/wiki/Elliptical_distribution "Elliptical distribution") [Exponential](https://en.wikipedia.org/wiki/Exponential_family "Exponential family") [Natural exponential](https://en.wikipedia.org/wiki/Natural_exponential_family "Natural exponential family") [Location–scale](https://en.wikipedia.org/wiki/Location%E2%80%93scale_family "Location–scale family") [Maximum entropy](https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution "Maximum entropy probability distribution") [Mixture](https://en.wikipedia.org/wiki/Mixture_distribution "Mixture distribution") [Pearson](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution") [Tweedie](https://en.wikipedia.org/wiki/Tweedie_distribution "Tweedie distribution") [Wrapped](https://en.wikipedia.org/wiki/Wrapped_distribution "Wrapped distribution") | | ![](https://upload.wikimedia.org/wikipedia/en/thumb/9/96/Symbol_category_class.svg/20px-Symbol_category_class.svg.png) [Category](https://en.wikipedia.org/wiki/Category:Probability_distributions "Category:Probability distributions") [![](https://upload.wikimedia.org/wikipedia/en/thumb/4/4a/Commons-logo.svg/20px-Commons-logo.svg.png)](https://en.wikipedia.org/wiki/File:Commons-logo.svg "Commons page") [Commons](https://commons.wikimedia.org/wiki/Category:Probability_distributions "commons:Category:Probability distributions") | | ![](https://en.wikipedia.org/wiki/Special:CentralAutoLogin/start?useformat=desktop&type=1x1&usesul3=1) Retrieved from "<https://en.wikipedia.org/w/index.php?title=Beta_distribution&oldid=1345309542>" [Categories](https://en.wikipedia.org/wiki/Help:Category "Help:Category"): - [Continuous distributions](https://en.wikipedia.org/wiki/Category:Continuous_distributions "Category:Continuous distributions") - [Factorial and binomial topics](https://en.wikipedia.org/wiki/Category:Factorial_and_binomial_topics "Category:Factorial and binomial topics") - [Conjugate prior distributions](https://en.wikipedia.org/wiki/Category:Conjugate_prior_distributions "Category:Conjugate prior distributions") - [Exponential family distributions](https://en.wikipedia.org/wiki/Category:Exponential_family_distributions "Category:Exponential family distributions") Hidden categories: - [CS1 maint: multiple names: authors list](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list") - [CS1: long volume value](https://en.wikipedia.org/wiki/Category:CS1:_long_volume_value "Category:CS1: long volume value") - [CS1 errors: ISBN date](https://en.wikipedia.org/wiki/Category:CS1_errors:_ISBN_date "Category:CS1 errors: ISBN date") - [Articles with short description](https://en.wikipedia.org/wiki/Category:Articles_with_short_description "Category:Articles with short description") - [Short description is different from Wikidata](https://en.wikipedia.org/wiki/Category:Short_description_is_different_from_Wikidata "Category:Short description is different from Wikidata") - [All articles with unsourced statements](https://en.wikipedia.org/wiki/Category:All_articles_with_unsourced_statements "Category:All articles with unsourced statements") - [Articles with unsourced statements from February 2013](https://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements_from_February_2013 "Category:Articles with unsourced statements from February 2013") - [Articles with unsourced statements from December 2024](https://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements_from_December_2024 "Category:Articles with unsourced statements from December 2024") - [Commons category link from Wikidata](https://en.wikipedia.org/wiki/Category:Commons_category_link_from_Wikidata "Category:Commons category link from Wikidata") - This page was last edited on 25 March 2026, at 12:43 (UTC). - Text is available under the [Creative Commons Attribution-ShareAlike 4.0 License](https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License "Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License"); additional terms may apply. By using this site, you agree to the [Terms of Use](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Terms_of_Use "foundation:Special:MyLanguage/Policy:Terms of Use") and [Privacy Policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy "foundation:Special:MyLanguage/Policy:Privacy policy"). Wikipedia® is a registered trademark of the [Wikimedia Foundation, Inc.](https://wikimediafoundation.org/), a non-profit organization. - [Privacy policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy) - [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About) - [Disclaimers](https://en.wikipedia.org/wiki/Wikipedia:General_disclaimer) - [Contact Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Contact_us) - [Legal & safety contacts](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Legal:Wikimedia_Foundation_Legal_and_Safety_Contact_Information) - [Code of Conduct](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Universal_Code_of_Conduct) - [Developers](https://developer.wikimedia.org/) - [Statistics](https://stats.wikimedia.org/#/en.wikipedia.org) - [Cookie statement](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Cookie_statement) - [Mobile view](https://en.wikipedia.org/w/index.php?title=Beta_distribution&mobileaction=toggle_view_mobile) - [![Wikimedia Foundation](https://en.wikipedia.org/static/images/footer/wikimedia.svg)](https://www.wikimedia.org/) - [![Powered by MediaWiki](https://en.wikipedia.org/w/resources/assets/mediawiki_compact.svg)](https://www.mediawiki.org/) Search Toggle the table of contents Beta distribution 27 languages [Add topic](https://en.wikipedia.org/wiki/Beta_distribution)

Readable Markdown

| Beta | | |---|---| | Probability density function[![Probability density function for the beta distribution](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Beta_distribution_pdf.svg/330px-Beta_distribution_pdf.svg.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_pdf.svg "Probability density function for the beta distribution") | | | Cumulative distribution function[![Cumulative distribution function for the beta distribution](https://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Beta_distribution_cdf.svg/330px-Beta_distribution_cdf.svg.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_cdf.svg "Cumulative distribution function for the beta distribution") | | | Notation | Beta(*α*, *β*) | | [Parameters](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") | *α* \> 0 [shape](https://en.wikipedia.org/wiki/Shape_parameter "Shape parameter") ([real](https://en.wikipedia.org/wiki/Real_number "Real number")) *β* \> 0 [shape](https://en.wikipedia.org/wiki/Shape_parameter "Shape parameter") ([real](https://en.wikipedia.org/wiki/Real_number "Real number")) | | [Support](https://en.wikipedia.org/wiki/Support_\(mathematics\) "Support (mathematics)") | ![{\\displaystyle x\\in \[0,1\]\\!}](https://wikimedia.org/api/rest_v1/media/math/render/svg/09601f74a28f3e2cad381be1a915ab0c02fe39c6) or ![{\\displaystyle x\\in (0,1)\\!}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6c4bd4921b023da2cf81472604e1583c7526af1d) | In [probability theory](https://en.wikipedia.org/wiki/Probability_theory "Probability theory") and [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **beta distribution** is a family of continuous [probability distributions](https://en.wikipedia.org/wiki/Probability_distribution "Probability distribution") defined on the interval \[0, 1\] or (0, 1) in terms of two positive [parameters](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter"), denoted by *alpha* (*α*) and *beta* (*β*), that appear as exponents of the variable and its complement to 1, respectively, and control the [shape](https://en.wikipedia.org/wiki/Shape_parameter "Shape parameter") of the distribution. The beta distribution has been applied to model the behavior of [random variables](https://en.wikipedia.org/wiki/Random_variables "Random variables") limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions. In [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference"), the beta distribution is the [conjugate prior probability distribution](https://en.wikipedia.org/wiki/Conjugate_prior_distribution "Conjugate prior distribution") for the [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), [binomial](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"), [negative binomial](https://en.wikipedia.org/wiki/Negative_binomial_distribution "Negative binomial distribution"), and [geometric](https://en.wikipedia.org/wiki/Geometric_distribution "Geometric distribution") distributions. The formulation of the beta distribution discussed here is also known as the **beta distribution of the first kind**, whereas *beta distribution of the second kind* is an alternative name for the [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution"). The generalization to multiple variables is called a [Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution "Dirichlet distribution"). ### Probability density function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=2 "Edit section: Probability density function")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/PDF_of_the_Beta_distribution.gif/250px-PDF_of_the_Beta_distribution.gif)](https://en.wikipedia.org/wiki/File:PDF_of_the_Beta_distribution.gif) An animation of the beta distribution for different values of its parameters. The [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") (PDF) of the beta distribution, for ![{\\displaystyle 0\\leq x\\leq 1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/30810e06ad49f3a837bd2193d4392eda1f74e7ab) or ![{\\displaystyle 0\<x\<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a440e33e5630b5f22cd3ca24cfdf85f56965ac8f), and shape parameters ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3), ![{\\displaystyle \\beta \>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4a87dc52878418173659e6d0ff8e77ab2897eac9), is a [power function](https://en.wikipedia.org/wiki/Power_function "Power function") of the variable ![{\\displaystyle x}](https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4) and of its [reflection](https://en.wikipedia.org/wiki/Reflection_formula "Reflection formula") ![{\\displaystyle (1-x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0539aebb17a9b292910a1a750233eed6c569cd00) as follows: ![{\\displaystyle {\\begin{aligned}f(x;\\alpha ,\\beta )&=\\mathrm {constant} \\cdot x^{\\alpha -1}(1-x)^{\\beta -1}\\\\\[3pt\]&={\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\displaystyle \\int \_{0}^{1}u^{\\alpha -1}(1-u)^{\\beta -1}\\,du}}\\\\\[6pt\]&={\\frac {\\Gamma (\\alpha +\\beta )}{\\Gamma (\\alpha )\\Gamma (\\beta )}}\\,x^{\\alpha -1}(1-x)^{\\beta -1}\\\\\[6pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}x^{\\alpha -1}(1-x)^{\\beta -1}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5fc18388353b219c482e8e35ca4aae808ab1be81) where ![{\\displaystyle \\Gamma (z)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/11ca17f880240539116aac7e6326909299e2a080) is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"). The [beta function](https://en.wikipedia.org/wiki/Beta_function "Beta function"), ![{\\displaystyle \\mathrm {B} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/93003d072991ba424a73ed1e081afe55c124b6ce), is a [normalization constant](https://en.wikipedia.org/wiki/Normalization_constant "Normalization constant") to ensure that the total probability is 1. In the above equations ![{\\displaystyle x}](https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4) is a [realization](https://en.wikipedia.org/wiki/Realization_\(probability\) "Realization (probability)")—an observed value that actually occurred—of a [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") ![{\\displaystyle X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68baa052181f707c662844a465bfeeb135e82bab). Several authors, including [N. L. Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S. Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz"),[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) use the symbols ![{\\displaystyle p}](https://wikimedia.org/api/rest_v1/media/math/render/svg/81eac1e205430d1f40810df36a0edffdc367af36) and ![{\\displaystyle q}](https://wikimedia.org/api/rest_v1/media/math/render/svg/06809d64fa7c817ffc7e323f85997f783dbdf71d) (instead of ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\displaystyle \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ed48a5e36207156fb792fa79d29925d2f7901e8)) for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\displaystyle \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ed48a5e36207156fb792fa79d29925d2f7901e8) approach zero. In the following, a random variable ![{\\displaystyle X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68baa052181f707c662844a465bfeeb135e82bab) beta-distributed with parameters ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\displaystyle \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ed48a5e36207156fb792fa79d29925d2f7901e8) will be denoted by:[\[2\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2)[\[3\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2011-3) ![{\\displaystyle X\\sim \\operatorname {Beta} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/36783d6420752d49ce41b434457741100627c50a) Other notations for beta-distributed random variables used in the statistical literature are ![{\\displaystyle X\\sim {\\mathcal {B}}e(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/56a8faad7cb5575778caa99bdb3a393dbc22f42a)[\[4\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BergerDecisionTheory-4) and ![{\\displaystyle X\\sim \\beta \_{\\alpha ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cda49e80784b8a7df54ac2a81b26c67d9a01831a).[\[5\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Feller-5) ### Cumulative distribution function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=3 "Edit section: Cumulative distribution function")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg/250px-CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:CDF_for_symmetric_Beta_distribution_vs._x_and_alpha%3Dbeta_-_J._Rodal.jpg) CDF for symmetric beta distribution vs. *x* and *α* = *β* [![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg/250px-CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:CDF_for_skewed_Beta_distribution_vs._x_and_beta%3D_5_alpha_-_J._Rodal.jpg) CDF for skewed beta distribution vs. *x* and *β* = 5*α* The [cumulative distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function "Cumulative distribution function") is ![{\\displaystyle F(x;\\alpha ,\\beta )={\\frac {\\mathrm {B} {}(x;\\alpha ,\\beta )}{\\mathrm {B} {}(\\alpha ,\\beta )}}=I\_{x}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4ef58bb8473944bfb8efa7f5477fb7201d39ac21) where ![{\\displaystyle \\mathrm {B} (x;\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/95174df7b06d48c98cf8c754e2964784f71f1530) is the [incomplete beta function](https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function "Beta function") and ![{\\displaystyle I\_{x}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/663054c3f3dc36c0c8c445386d9e52aea7e26b07) is the [regularized incomplete beta function](https://en.wikipedia.org/wiki/Regularized_incomplete_beta_function "Regularized incomplete beta function"). For positive integers *α* and *β*, the cumulative distribution function of a beta distribution can be expressed in terms of the cumulative distribution function of a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") with[\[6\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-6) ![{\\displaystyle F\_{\\text{beta}}(x;\\alpha ,\\beta )=F\_{\\text{binomial}}(\\beta -1;\\alpha +\\beta -1,1-x).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/38f787a104f98d5db23c517a28454d8fa997c134) ### Alternative parameterizations \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=4 "Edit section: Alternative parameterizations")\] ##### Mean and sample size \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=6 "Edit section: Mean and sample size")\] The beta distribution may also be reparameterized in terms of its mean *μ* (0 \< *μ* \< 1) and the sum of the two shape parameters *ν* = *α* + *β* \> 0([\[3\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2011-3) p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes' theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = *ν* = *α*·Posterior + *β*·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = *α*·Posterior + *β* Posterior − 2, or *ν* = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section [Bayesian inference](https://en.wikipedia.org/wiki/Beta_distribution#Bayesian_inference) for further details.) *ν* = *α* + *β* is referred to as the "sample size" of a beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes' theorem. This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ *θ* ≤ 1) is drawn from a population-level beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters *α* and *β* via[\[3\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2011-3) *α* = *μν*, *β* = (1 − *μ*)*ν* Under this [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter"), one may place an [uninformative prior](https://en.wikipedia.org/wiki/Uninformative_prior "Uninformative prior") probability over the mean, and a vague prior probability (such as an [exponential](https://en.wikipedia.org/wiki/Exponential_distribution "Exponential distribution") or [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution")) over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it. ##### Mode and concentration \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=7 "Edit section: Mode and concentration")\] [Concave](https://en.wikipedia.org/wiki/Concave_function "Concave function") beta distributions, which have ![{\\displaystyle \\alpha ,\\beta \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc3f33fc553c096bb6e12987a13ab58edef863b6), can be parametrized in terms of mode and "concentration". The mode, ![{\\displaystyle \\omega ={\\frac {\\alpha -1}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3af2c8c28b40e585a0ee03663cd5e9a0a9663303), and concentration, ![{\\displaystyle \\kappa =\\alpha +\\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/083a5361b63ff338f0f521f743abe6c497967dbb), can be used to define the usual shape parameters as follows:[\[7\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kruschke2015-7) ![{\\displaystyle {\\begin{aligned}\\alpha &=\\omega (\\kappa -2)+1\\\\\\beta &=(1-\\omega )(\\kappa -2)+1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2f0609418607d04de71bfcf1aaccbe653ebc97dd) For the mode, ![{\\displaystyle 0\<\\omega \<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/31c81e865b4e001c098ca75a143060ec563cfbf3), to be well-defined, we need ![{\\displaystyle \\alpha ,\\beta \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc3f33fc553c096bb6e12987a13ab58edef863b6), or equivalently ![{\\displaystyle \\kappa \>2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4e53f495cab81087d61f1a1efc9e5bbbb91e3632). If instead we define the concentration as ![{\\displaystyle c=\\alpha +\\beta -2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c8ddb6cd3165b37fc58796c628e74e2b8028c57c), the condition simplifies to ![{\\displaystyle c\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2ba126f626d61752f62eaacaf11761a54de4dc84) and the beta density at ![{\\displaystyle \\alpha =1+c\\omega }](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba7c36ac271ae5f8221d66e2ad723511bd7e8c97) and ![{\\displaystyle \\beta =1+c(1-\\omega )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f2b5f8667e2d553e1b907808d7ba1a77f455c81f) can be written as: ![{\\displaystyle f(x;\\omega ,c)={\\frac {x^{c\\omega }(1-x)^{c(1-\\omega )}}{\\mathrm {B} {\\bigl (}1+c\\omega ,1+c(1-\\omega ){\\bigr )}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2d3157f6c9c92fb94a2058c8ee7a01b607676c9c) where ![{\\displaystyle c}](https://wikimedia.org/api/rest_v1/media/math/render/svg/86a67b81c2de995bd608d5b2df50cd8cd7d92455) directly scales the [sufficient statistics](https://en.wikipedia.org/wiki/Sufficient_statistics "Sufficient statistics"), ![{\\displaystyle \\log(x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4157d3b51ac7b147fca145d431d58ec92abc1f70) and ![{\\displaystyle \\log(1-x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d88b5d2afd979fe80e61f8c9c217d7801c810160). Note also that in the limit, ![{\\displaystyle c\\to 0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/daa7595054d0a6c13cd4431f85cb517c857a2109), the distribution becomes flat. Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters *α* and *β*, one can express the *α* and *β* parameters in terms of the mean (*μ*) and the variance (var): ![{\\displaystyle {\\begin{aligned}\\nu &=\\alpha +\\beta ={\\frac {\\mu (1-\\mu )}{\\mathrm {var} }}-1,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,{\\text{ therefore: }}{\\text{var}}\<\\mu (1-\\mu )\\\\\\alpha &=\\mu \\nu =\\mu \\left({\\frac {\\mu (1-\\mu )}{\\text{var}}}-1\\right),{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )\\\\\\beta &=(1-\\mu )\\nu =(1-\\mu )\\left({\\frac {\\mu (1-\\mu )}{\\text{var}}}-1\\right),{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a68ffe42cadd81968db6bc0e75ee9c8b739c2416) This [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters *α* and *β*. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance: [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg/330px-Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mode_Beta_Distribution_for_both_alpha_and_beta_greater_than_1_-_another_view_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_mean_full_range_and_variance_between_0.05_and_0.25_-_Dr._J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_mean_and_variance_both_full_range_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg/330px-Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_Beta_Distribution_with_mean_for_full_range_and_variance_from_0.05_to_0.25_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg/330px-Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_Beta_Distribution_with_mean_and_variance_for_full_range_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/02/Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.2_to_0.8_and_variance_from_0.01_to_0.09_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8f/Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_with_mean_from_0.3_to_0.7_and_variance_from_0_to_0.2_-_J._Rodal.jpg) A beta distribution with the two shape parameters *α* and *β* is supported on the range \[0,1\] or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, *a*, and maximum *c* (*c* \> *a*), values of the distribution,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) by a linear transformation substituting the non-dimensional variable *x* in terms of the new variable *y* (with support \[*a*,*c*\] or (*a*,*c*)) and the parameters *a* and *c*: ![{\\displaystyle y=x(c-a)+a,{\\text{ therefore }}x={\\frac {y-a}{c-a}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8570165db81ac45831ccffe1b07b63fb5292fecc) The [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (*c* − *a*), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows: ![{\\displaystyle {\\begin{aligned}f(y;\\alpha ,\\beta ,a,c)={\\frac {f(x;\\alpha ,\\beta )}{c-a}}&={\\frac {\\left({\\frac {y-a}{c-a}}\\right)^{\\alpha -1}\\left({\\frac {c-y}{c-a}}\\right)^{\\beta -1}}{(c-a)B(\\alpha ,\\beta )}}\\\\\[1ex\]&={\\frac {(y-a)^{\\alpha -1}(c-y)^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}B(\\alpha ,\\beta )}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ebfbb9c4da37593762747522d2d91a4ca72e0011) That a random variable *Y* is beta-distributed with four parameters *α*, *β*, *a*, and *c* will be denoted by: ![{\\displaystyle Y\\sim \\operatorname {Beta} (\\alpha ,\\beta ,a,c).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/085ce268909196f9aff5e61f3dca6e308bdb4797) Some measures of central location are scaled (by (*c* − *a*)) and shifted (by *a*), as follows: ![{\\displaystyle {\\begin{aligned}\\mu \_{Y}&=\\mu \_{X}(c-a)+a\\\\\[1ex\]&={\\frac {\\alpha }{\\alpha +\\beta }}\\left(c-a\\right)+a={\\frac {\\alpha c+\\beta a}{\\alpha +\\beta }}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/86a8b4fe30b5075b8c038d5b5b3e1f6ee8e5963f) ![{\\displaystyle {\\begin{aligned}{\\text{mode}}(Y)&={\\text{mode}}(X)(c-a)+a\\\\\[1ex\]&={\\frac {\\alpha -1}{\\alpha +\\beta -2}}\\left(c-a\\right)+a\\\\\[1ex\]&={\\frac {(\\alpha -1)c+(\\beta -1)a}{\\alpha +\\beta -2}}\\ ,&{\\text{ if }}\\alpha ,\\,\\beta \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/768c42362dbb2d2904c218dcfc6df1de62b5f635) ![{\\displaystyle {\\begin{aligned}{\\text{median}}(Y)&={\\text{median}}(X)(c-a)+a\\\\\[1ex\]&=I\_{\\frac {1}{2}}^{\[-1\]}(\\alpha ,\\beta )\\left(c-a\\right)+a\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9b039773a1da5743c1e27109447176e3cfe1925e) Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can. The shape parameters of *Y* can be written in term of its mean and variance as ![{\\displaystyle {\\begin{aligned}\\alpha &={\\frac {\\left(a-\\mu \_{Y}\\right)\\left(a\\,c-a\\,\\mu \_{Y}-c\\,\\mu \_{Y}+\\mu \_{Y}^{2}+\\sigma \_{Y}^{2}\\right)}{\\sigma \_{Y}^{2}(c-a)}}\\\\\\beta &=-{\\frac {\\left(c-\\mu \_{Y}\\right)\\left(a\\,c-a\\,\\mu \_{Y}-c\\,\\mu \_{Y}+\\mu \_{Y}^{2}+\\sigma \_{Y}^{2}\\right)}{\\sigma \_{Y}^{2}(c-a)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c05caf453d32d506f7290c48a091d9c48ddd4250) The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (*c* − *a*), linearly for the mean deviation and nonlinearly for the variance: ![{\\displaystyle {\\begin{aligned}&{\\text{(mean deviation around mean)}}(Y)\\\\\[1ex\]&=({\\text{(mean deviation around mean)}}(X))(c-a)\\\\&={\\frac {2\\alpha ^{\\alpha }\\beta ^{\\beta }}{\\mathrm {B} (\\alpha ,\\beta )(\\alpha +\\beta )^{\\alpha +\\beta +1}}}(c-a)\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/769fc1182a10805b989db4ae5c207769240dc3b5) ![{\\displaystyle {\\text{var}}(Y)={\\text{var}}(X)(c-a)^{2}={\\frac {\\alpha \\beta (c-a)^{2}}{(\\alpha +\\beta )^{2}(\\alpha +\\beta +1)}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/211065eb20fc7ce222957c21a73d8bd8a5906a5a) Since the [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") are non-dimensional quantities (as [moments](https://en.wikipedia.org/wiki/Moment_\(mathematics\) "Moment (mathematics)") centered on the mean and normalized by the [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation")), they are independent of the parameters *a* and *c*, and therefore equal to the expressions given above in terms of *X* (with support \[0,1\] or (0,1)): ![{\\displaystyle {\\text{skewness}}(Y)={\\text{skewness}}(X)={\\frac {2(\\beta -\\alpha ){\\sqrt {\\alpha +\\beta +1}}}{(\\alpha +\\beta +2){\\sqrt {\\alpha \\beta }}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/884dd9bbf9a6dcc30f70bba80abdfcbd1c4a18ed) ![{\\displaystyle {\\text{kurtosis excess}}(Y)={\\text{kurtosis excess}}(X)={\\frac {6\\left\[(\\alpha -\\beta )^{2}(\\alpha +\\beta +1)-\\alpha \\beta (\\alpha +\\beta +2)\\right\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/76f8b524c994e9cdaf8317555457b4369ab2271e) ### Measures of central tendency \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=11 "Edit section: Measures of central tendency")\] The [mode](https://en.wikipedia.org/wiki/Mode_\(statistics\) "Mode (statistics)") of a beta distributed [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* with *α*, *β* \> 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ![{\\displaystyle {\\frac {\\alpha -1}{\\alpha +\\beta -2}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/064149e2700adff9c3fb957a3682577905181336) When both parameters are less than one (*α*, *β* \< 1), this is the anti-mode: the lowest point of the probability density curve.[\[8\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Wadsworth-8) Letting *α* = *β*, the expression for the mode simplifies to 1/2, showing that for *α* = *β* \> 1 the mode (resp. anti-mode when *α*, *β* \< 1), is at the center of the distribution: it is symmetric in those cases. See [Shapes](https://en.wikipedia.org/wiki/Beta_distribution#Shapes) section in this article for a full list of mode cases, for arbitrary values of *α* and *β*. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of *α* = 2, *β* = 1 (or *α* = 1, *β* = 2), the density function becomes a [right-triangle distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution") which is finite at both ends. In several other cases there is a [singularity](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") at one end, where the value of the density function approaches infinity. For example, in the case *α* = *β* = 1/2, the beta distribution simplifies to become the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution"). There is debate among mathematicians about some of these cases and whether the ends (*x* = 0, and *x* = 1) can be called *modes* or not.[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\[2\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Mathematical_Statistics_with_MATHEMATICA-2) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mode_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) Mode for beta distribution for 1 ≤ *α* ≤ 5 and 1 ≤ β ≤ 5 - Whether the ends are part of the [domain](https://en.wikipedia.org/wiki/Domain_of_a_function "Domain of a function") of the density function - Whether a [singularity](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") can ever be called a *mode* - Whether cases with two maxima should be called *bimodal* [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg/330px-Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Median_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg) Median for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e9/%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg/250px-%28Mean_-_Median%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:\(Mean_-_Median\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg) (Mean–median) for beta distribution versus alpha and beta from 0 to 2 The median of the beta distribution is the unique real number ![{\\displaystyle x=I\_{1/2}^{\[-1\]}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d7510f94efa49f254eb3924678b527a6fd22d0fc) for which the [regularized incomplete beta function](https://en.wikipedia.org/wiki/Regularized_incomplete_beta_function "Regularized incomplete beta function") ![{\\displaystyle I\_{x}(\\alpha ,\\beta )={\\tfrac {1}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/80f2c2ef73043b11fddaf3488eb5108dabb78a4c). There is no general [closed-form expression](https://en.wikipedia.org/wiki/Closed-form_expression "Closed-form expression") for the [median](https://en.wikipedia.org/wiki/Median "Median") of the beta distribution for arbitrary values of *α* and *β*. [Closed-form expressions](https://en.wikipedia.org/wiki/Closed-form_expression "Closed-form expression") for particular values of the parameters *α* and *β* follow:\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] The following are the limits with one parameter finite (non-zero) and the other approaching these limits:\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}{\\text{median}}=\\lim \_{\\alpha \\to \\infty }{\\text{median}}=1,\\\\\\lim \_{\\alpha \\to 0}{\\text{median}}=\\lim \_{\\beta \\to \\infty }{\\text{median}}=0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cfb9ecb081e25c8eb5633c4e1ad99a00924d9f96) A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula[\[10\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kerman2011-10) ![{\\displaystyle {\\text{median}}\\approx {\\frac {\\alpha -{\\tfrac {1}{3}}}{\\alpha +\\beta -{\\tfrac {2}{3}}}}{\\text{ for }}\\alpha ,\\beta \\geq 1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/90cd0f42e583dd11e7add651ef1851597d88f184) When *α*, *β* ≥ 1, the [relative error](https://en.wikipedia.org/wiki/Relative_error "Relative error") (the [absolute error](https://en.wikipedia.org/wiki/Approximation_error "Approximation error") divided by the median) in this approximation is less than 4% and for both *α* ≥ 2 and *β* ≥ 2 it is less than 1%. The [absolute error](https://en.wikipedia.org/wiki/Approximation_error "Approximation error") divided by the difference between the mean and the mode is similarly small: [![Abs\[(Median-Appr.)/Median\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Relative_Error_for_Approximation_to_Median_of_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg "Abs[(Median-Appr.)/Median] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5")[![Abs\[(Median-Appr.)/(Mean-Mode)\] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e8/Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Error_in_Median_Apprx._relative_to_Mean-Mode_distance_for_Beta_Distribution_with_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg "Abs[(Median-Appr.)/(Mean-Mode)] for beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5") [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg/330px-Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mean_Beta_Distribution_for_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg) Mean for beta distribution for 0 ≤ *α* ≤ 5 and 0 ≤ *β* ≤ 5 The [expected value](https://en.wikipedia.org/wiki/Expected_value "Expected value") (mean) (*μ*) of a beta distribution [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* with two parameters *α* and *β* is a function of only the ratio *β*/*α* of these parameters:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ![{\\displaystyle {\\begin{aligned}\\mu =\\operatorname {E} \[X\]&=\\int \_{0}^{1}xf(x;\\alpha ,\\beta )\\,dx\\\\&=\\int \_{0}^{1}x\\,{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\,dx\\\\&={\\frac {\\alpha }{\\alpha +\\beta }}\\\\&={\\frac {1}{1+{\\frac {\\beta }{\\alpha }}}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e9137834d9d47360ed6c23550c6236fed5fd35f7) Letting *α* = *β* in the above expression one obtains *μ* = 1/2, showing that for *α* = *β* the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression: ![{\\displaystyle {\\begin{aligned}\\lim \_{{\\frac {\\beta }{\\alpha }}\\to 0}\\mu =1\\\\\\lim \_{{\\frac {\\beta }{\\alpha }}\\to \\infty }\\mu =0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/89775a4dd28774cf29d02e7bc054848f5d617946) Therefore, for *β*/*α* → 0, or for *α*/*β* → ∞, the mean is located at the right end, *x* = 1. For these limit ratios, the beta distribution becomes a one-point [degenerate distribution](https://en.wikipedia.org/wiki/Degenerate_distribution "Degenerate distribution") with a [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") spike at the right end, *x* = 1, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, *x* = 1. Similarly, for *β*/*α* → ∞, or for *α*/*β* → 0, the mean is located at the left end, *x* = 0. The beta distribution becomes a 1-point [Degenerate distribution](https://en.wikipedia.org/wiki/Degenerate_distribution "Degenerate distribution") with a [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") spike at the left end, *x* = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, *x* = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}\\mu =\\lim \_{\\alpha \\to \\infty }\\mu =1\\\\\\lim \_{\\alpha \\to 0}\\mu =\\lim \_{\\beta \\to \\infty }\\mu =0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/79321fbb81bc184dbb8f51471b36495d844c7a14) While for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(*α*, *β*) such that *α*, *β* \> 2) it is known that the sample mean (as an estimate of location) is not as [robust](https://en.wikipedia.org/wiki/Robust_statistics "Robust statistics") as the sample median, the opposite is the case for uniform or "U-shaped" bimodal distributions (with Beta(*α*, *β*) such that *α*, *β* ≤ 1), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ([\[11\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-MostellerTukey-11) p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta(*α*, *β*) such that *α*, *β* ≤ 1) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for [random walks](https://en.wikipedia.org/wiki/Random_walk "Random walk"), since the probability for the time of the last visit to the origin in a random walk is distributed as the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") Beta(1/2, 1/2):[\[5\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Feller-5)[\[12\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-WillyFeller1-12) the mean of a number of [realizations](https://en.wikipedia.org/wiki/Realization_\(probability\) "Realization (probability)") of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case). [![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg/250px-%28Mean_-_GeometricMean%29_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:\(Mean_-_GeometricMean\)_for_Beta_Distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg) (Mean − GeometricMean) for beta distribution versus *α* and *β* from 0 to 2, showing the asymmetry between *α* and *β* for the geometric mean [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Geometric_Means_for_Beta_distribution_Purple%3DG\(X\),_Yellow%3DG\(1-X\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Geometric means for beta distribution Purple = *G*(*x*), Yellow = *G*(1 − *x*), smaller values *α* and *β* in front [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Geometric_Means_for_Beta_distribution_Purple%3DG%28X%29%2C_Yellow%3DG%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Geometric_Means_for_Beta_distribution_Purple%3DG\(X\),_Yellow%3DG\(1-X\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Geometric means for beta distribution. purple = *G*(*x*), yellow = *G*(1 − *x*), larger values *α* and *β* in front The logarithm of the [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") *GX* of a distribution with [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* is the arithmetic mean of ln(*X*), or, equivalently, its expected value: ![{\\displaystyle \\ln G\_{X}=\\operatorname {E} \[\\ln X\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/64b67cb73b90bc0e09ba41003b44f84b6e1d3feb) For a beta distribution, the expected value integral gives: ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[\\ln X\]&=\\int \_{0}^{1}\\ln x\\,f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&=\\int \_{0}^{1}\\ln x\\,{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}\\,\\int \_{0}^{1}{\\frac {\\partial x^{\\alpha -1}(1-x)^{\\beta -1}}{\\partial \\alpha }}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}{\\frac {\\partial }{\\partial \\alpha }}\\int \_{0}^{1}x^{\\alpha -1}(1-x)^{\\beta -1}\\,dx\\\\\[4pt\]&={\\frac {1}{\\mathrm {B} (\\alpha ,\\beta )}}{\\frac {\\partial \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&={\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&={\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\alpha }}\\\\\[4pt\]&=\\psi (\\alpha )-\\psi (\\alpha +\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cd9db519e08e3c72cd6f9e2f0c90a7c57bdba035) where *ψ* is the [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function"). Therefore, the geometric mean of a beta distribution with shape parameters *α* and *β* is the exponential of the digamma functions of *α* and *β* as follows: ![{\\displaystyle G\_{X}=e^{\\operatorname {E} \[\\ln X\]}=e^{\\psi (\\alpha )-\\psi (\\alpha +\\beta )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c93ffa7f0155fa3816fcb151c3eb677700aabca2) While for a beta distribution with equal shape parameters *α* = *β*, it follows that skewness = 0 and mode = mean = median = 1/2, the geometric mean is less than 1/2: 0 \< *G**X* \< 1/2. The reason for this is that the logarithmic transformation strongly weights the values of *X* close to zero, as ln(*X*) strongly tends towards negative infinity as *X* approaches zero, while ln(*X*) flattens towards zero as *X* → 1. Along a line *α* = *β*, the following limits apply: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}G\_{X}=0\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }G\_{X}={\\tfrac {1}{2}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f79aaab766e7ff7eadf78bed8e0ba71401906469) Following are the limits with one parameter finite (non-zero) and the other approaching these limits: ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}G\_{X}=\\lim \_{\\alpha \\to \\infty }G\_{X}=1\\\\\\lim \_{\\alpha \\to 0}G\_{X}=\\lim \_{\\beta \\to \\infty }G\_{X}=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/17991f065f9f550de7ce3f62d4f6c0818871b4ff) The accompanying plot shows the difference between the mean and the geometric mean for shape parameters *α* and *β* from zero to 2. Besides the fact that the difference between them approaches zero as *α* and *β* approach infinity and that the difference becomes large for values of *α* and *β* approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters *α* and *β*. The difference between the geometric mean and the mean is larger for small values of *α* in relation to *β* than when exchanging the magnitudes of *β* and *α*. [N. L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S. Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) suggest the logarithmic approximation to the digamma function *ψ*(*α*) ≈ ln(*α* − 1/2) which results in the following approximation to the geometric mean: ![{\\displaystyle G\_{X}\\approx {\\frac {\\alpha \\,-{\\frac {1}{2}}}{\\alpha +\\beta -{\\frac {1}{2}}}}{\\text{ if }}\\alpha ,\\beta \>1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b99248644aa6d645f217ee91b14fd9dc653c044e) Numerical values for the [relative error](https://en.wikipedia.org/wiki/Relative_error "Relative error") in this approximation follow: \[(*α* = *β* = 1): 9.39%\]; \[(*α* = *β* = 2): 1.29%\]; \[(*α* = 2, *β* = 3): 1.51%\]; \[(*α* = 3, *β* = 2): 0.44%\]; \[(*α* = *β* = 3): 0.51%\]; \[(*α* = *β* = 4): 0.26%\]; \[(*α* = 3, *β* = 4): 0.55%\]; \[(*α* = 4, *β* = 3): 0.24%\]. Similarly, one can calculate the value of shape parameters required for the geometric mean to equal 1/2. Given the value of the parameter *β*, what would be the value of the other parameter, *α*, required for the geometric mean to equal 1/2?. The answer is that (for *β* \> 1), the value of *α* required tends towards *β* + 1/2 as *β* → ∞. For example, all these couples have the same geometric mean of 1/2: \[*β* = 1, *α* = 1.4427\], \[*β* = 2, *α* = 2.46958\], \[*β* = 3, *α* = 3.47943\], \[*β* = 4, *α* = 4.48449\], \[*β* = 5, *α* = 5.48756\], \[*β* = 10, *α* = 10.4938\], \[*β* = 100, *α* = 100.499\]. The fundamental property of the geometric mean, which can be proven to be false for any other mean, is ![{\\displaystyle G{\\left({\\frac {X\_{i}}{Y\_{i}}}\\right)}={\\frac {G(X\_{i})}{G(Y\_{i})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7f6dfb89e5cbd95bc0d8f931a46f5f7d8643feb9) This makes the geometric mean the only correct mean when averaging *normalized* results, that is results that are presented as ratios to reference values.[\[13\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-13) This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section "Parameter estimation, maximum likelihood." Actually, when performing maximum likelihood estimation, besides the [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") *GX* based on the random variable X, also another geometric mean appears naturally: the [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") based on the linear transformation ––(1 − *X*), the mirror-image of *X*, denoted by *G*(1−*X*): ![{\\displaystyle G\_{1-X}=e^{\\operatorname {E} \[\\ln(1-X)\]}=e^{\\psi (\\beta )-\\psi (\\alpha +\\beta )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/58d36e067302e87f85db3f0fb1e2902201e38d76) Along a line *α* = *β*, the following limits apply: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}G\_{1-X}=0\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }G\_{1-X}={\\tfrac {1}{2}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/003b2e791a1e220ab169db260dac39b0a180f7b4) Following are the limits with one parameter finite (non-zero) and the other approaching these limits: ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}G\_{(1-X)}=\\lim \_{\\alpha \\to \\infty }G\_{(1-X)}=0\\\\\\lim \_{\\alpha \\to 0}G\_{(1-X)}=\\lim \_{\\beta \\to \\infty }G\_{(1-X)}=1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0b3abc831911811340bd23f919c81b72ba4cbdbe) It has the following approximate value: ![{\\displaystyle G\_{(1-X)}\\approx {\\frac {\\beta -{\\frac {1}{2}}}{\\alpha +\\beta -{\\frac {1}{2}}}}{\\text{ if }}\\alpha ,\\beta \>1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/40d7459baac164c2fedfad9dd8316320553e3d10) Although both *G**X* and *G*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the geometric means are equal: *G**X* = *G*(1−*X*). This equality follows from the following symmetry displayed between both geometric means: ![{\\displaystyle G\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=G\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4153dc018580179899eb2d817b44418862eb0922) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg/250px-Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Harmonic_mean_for_Beta_distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg) Harmonic mean for beta distribution for 0 \< *α* \< 5 and 0 \< *β* \< 5 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c0/%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg/250px-%28Mean_-_HarmonicMean%29_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:\(Mean_-_HarmonicMean\)_for_Beta_distribution_versus_alpha_and_beta_from_0_to_2_-_J._Rodal.jpg) Harmonic mean for beta distribution versus *α* and *β* from 0 to 2 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\(X\),_Yellow%3DH\(1-X\),_smaller_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Harmonic means for beta distribution Purple = *H*(*X*), Yellow = *H*(1 − *X*), smaller values *α* and *β* in front [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/00/Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg/250px-Harmonic_Means_for_Beta_distribution_Purple%3DH%28X%29%2C_Yellow%3DH%281-X%29%2C_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Harmonic_Means_for_Beta_distribution_Purple%3DH\(X\),_Yellow%3DH\(1-X\),_larger_values_alpha_and_beta_in_front_-_J._Rodal.jpg) Harmonic means for beta distribution: purple = *H*(*X*), yellow = *H*(1 − *X*), larger values *α* and *β* in front The inverse of the [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*HX*) of a distribution with [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* is the arithmetic mean of 1/*X*, or, equivalently, its expected value. Therefore, the [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*HX*) of a beta distribution with shape parameters *α* and *β* is: ![{\\displaystyle {\\begin{aligned}H\_{X}&={\\frac {1}{\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]}}\\\\&={\\frac {1}{\\int \_{0}^{1}{\\frac {f(x;\\alpha ,\\beta )}{x}}\\,dx}}\\\\&={\\frac {1}{\\int \_{0}^{1}{\\frac {x^{\\alpha -1}(1-x)^{\\beta -1}}{x\\mathrm {B} (\\alpha ,\\beta )}}\\,dx}}\\\\&={\\frac {\\alpha -1}{\\alpha +\\beta -1}}{\\text{ if }}\\alpha \>1{\\text{ and }}\\beta \>0\\\\\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ed7d99dd7493b9c085cd5d407861730e2a2abf6c) The [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*HX*) of a beta distribution with *α* \< 1 is undefined, because its defining expression is not bounded in \[0, 1\] for shape parameter *α* less than unity. Letting *α* = *β* in the above expression one obtains ![{\\displaystyle H\_{X}={\\frac {\\alpha -1}{2\\alpha -1}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0748b4780dc8ea57db97db149af39c2c37fcd769) showing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1/2, for *α* = *β* → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}H\_{X}{\\text{ is undefined}}\\\\&\\lim \_{\\alpha \\to 1}H\_{X}=\\lim \_{\\beta \\to \\infty }H\_{X}=0\\\\&\\lim \_{\\beta \\to 0}H\_{X}=\\lim \_{\\alpha \\to \\infty }H\_{X}=1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f9c9be97ae581b7d1ba77fc375e8cedfd799bb4e) The harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean *HX* based on the random variable *X*, also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − *X*), the mirror-image of *X*, denoted by *H*1 − *X*: ![{\\displaystyle H\_{1-X}={\\frac {1}{\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]}}={\\frac {\\beta -1}{\\alpha +\\beta -1}}{\\text{ if }}\\beta \>1,{\\text{ and }}\\alpha \>0.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/48f4fd69f20c4259cb8a50e754df8dfed5a1ddca) The [harmonic mean](https://en.wikipedia.org/wiki/Harmonic_mean "Harmonic mean") (*H*(1 − *X*)) of a beta distribution with *β* \< 1 is undefined, because its defining expression is not bounded in \[0, 1\] for shape parameter *β* less than unity. Letting *α* = *β* in the above expression one obtains ![{\\displaystyle H\_{(1-X)}={\\frac {\\beta -1}{2\\beta -1}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e44f14a6bf3119656952a5ff13b75bc5b81cdec) showing that for *α* = *β* the harmonic mean ranges from 0, for *α* = *β* = 1, to 1/2, for *α* = *β* → ∞. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\beta \\to 0}H\_{1-X}{\\text{ is undefined}}\\\\&\\lim \_{\\beta \\to 1}H\_{1-X}=\\lim \_{\\alpha \\to \\infty }H\_{1-X}=0\\\\&\\lim \_{\\alpha \\to 0}H\_{1-X}=\\lim \_{\\beta \\to \\infty }H\_{1-X}=1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/383c5ac1f2d11de963103ce8fec670e2a5eba78c) Although both *H**X* and *H*1−*X* are asymmetric, in the case that both shape parameters are equal *α* = *β*, the harmonic means are equal: *H**X* = *H*1−*X*. This equality follows from the following symmetry displayed between both harmonic means: ![{\\displaystyle H\_{X}(\\mathrm {B} (\\alpha ,\\beta ))=H\_{1-X}(\\mathrm {B} (\\beta ,\\alpha )){\\text{ if }}\\alpha ,\\beta \>1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0e80c207c2d510bbda4077f80954f86f872c4986) ### Measures of statistical dispersion \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=17 "Edit section: Measures of statistical dispersion")\] The [variance](https://en.wikipedia.org/wiki/Variance "Variance") (the second moment centered on the mean) of a beta distribution [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* with parameters *α* and *β* is:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[14\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-14) ![{\\displaystyle \\operatorname {var} (X)=\\operatorname {E} \\left\[(X-\\mu )^{2}\\right\]={\\frac {\\alpha \\beta }{\\left(\\alpha +\\beta \\right)^{2}\\left(\\alpha +\\beta +1\\right)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d7d2effe47b57f9a004264ee6aac04029d3954de) Letting *α* = *β* in the above expression one obtains ![{\\displaystyle \\operatorname {var} (X)={\\frac {1}{4(2\\beta +1)}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9ddb2c17489caef8b881faaf9005d70cbfdb113f) showing that for *α* = *β* the variance decreases monotonically as *α* = *β* increases. Setting *α* = *β* = 0 in this expression, one finds the maximum variance var(*X*) = 1/4[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) which only occurs approaching the limit, at *α* = *β* = 0. The beta distribution may also be [parametrized](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of its mean *μ* (0 \< *μ* \< 1) and sample size *ν* = *α* + *β* (*ν* \> 0) (see subsection [Mean and sample size](https://en.wikipedia.org/wiki/Beta_distribution#Mean_and_sample_size)): ![{\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,\\\\\\beta &=(1-\\mu )\\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/20a45fa8a94426b232051253608b906c6822b55f) Using this [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter"), one can express the variance in terms of the mean *μ* and the sample size *ν* as follows: ![{\\displaystyle \\operatorname {var} (X)={\\frac {\\mu (1-\\mu )}{1+\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c47a19d1f9adc5d491983b5709e2cf1b54ccdc7f) Since *ν* = *α* + *β* \> 0, it follows that var(*X*) \< *μ*(1 − *μ*). For a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1/2, and therefore: ![{\\displaystyle \\operatorname {var} (X)={\\frac {1}{4(1+\\nu )}}{\\text{ if }}\\mu ={\\tfrac {1}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ce5b637b913b97c48ca2fb6582eaf867338b149a) Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\beta \\to 0}\\operatorname {var} (X)=\\lim \_{\\alpha \\to 0}\\operatorname {var} (X)=\\lim \_{\\beta \\to \\infty }\\operatorname {var} (X)=\\lim \_{\\alpha \\to \\infty }\\operatorname {var} (X)=0\\\\&\\lim \_{\\nu \\to \\infty }\\operatorname {var} (X)=\\lim \_{\\mu \\to 0}\\operatorname {var} (X)=\\lim \_{\\mu \\to 1}\\operatorname {var} (X)=0\\\\&\\lim \_{\\nu \\to 0}\\operatorname {var} (X)=\\mu (1-\\mu )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7fbf483ab2c1f8a471abc14c877b0b4ca13cbaff) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg/330px-Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Variance_for_Beta_Distribution_for_alpha_and_beta_ranging_from_0_to_5_-_J._Rodal.jpg) #### Geometric variance and covariance \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=19 "Edit section: Geometric variance and covariance")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/36/Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png/250px-Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_log_geometric_variances_front_view_-_J._Rodal.png) log geometric variances vs. *α* and *β* [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png/250px-Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_log_geometric_variances_back_view_-_J._Rodal.png) log geometric variances vs. *α* and *β* The logarithm of the geometric variance, ln(var*GX*), of a distribution with [random variable](https://en.wikipedia.org/wiki/Random_variable "Random variable") *X* is the second moment of the logarithm of *X* centered on the geometric mean of *X*, ln(*GX*): ![{\\displaystyle {\\begin{aligned}\\ln \\operatorname {var} \_{GX}&=\\operatorname {E} \\left\[\\left(\\ln X-\\ln G\_{X}\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln X-\\operatorname {E} \\left\[\\ln X\\right\]\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln X\\right)^{2}\\right\]-\\left(\\operatorname {E} \[\\ln X\]\\right)^{2}\\\\&=\\operatorname {var} \[\\ln X\]\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/03642aabe02f83fe6e01e5a136c245d82b898904) and therefore, the geometric variance is: ![{\\displaystyle \\operatorname {var} \_{GX}=e^{\\operatorname {var} \[\\ln X\]}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/524cf664ccfd5eb381fd1987926209f1c401a200) In the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information") matrix, and the curvature of the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function"), the logarithm of the geometric variance of the [reflected](https://en.wikipedia.org/wiki/Reflection_formula "Reflection formula") variable 1 − *X* and the logarithm of the geometric covariance between *X* and 1 − *X* appear: ![{\\displaystyle {\\begin{aligned}\\ln \\operatorname {var\_{G(1-X)}} &=\\operatorname {E} \\left\[\\left(\\ln(1-X)-\\ln G\_{1-X}\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[\\left(\\ln(1-X)-\\operatorname {E} \[\\ln(1-X)\]\\right)^{2}\\right\]\\\\&=\\operatorname {E} \\left\[(\\ln(1-X))^{2}\\right\]-\\left(\\operatorname {E} \[\\ln(1-X)\]\\right)^{2}\\\\&=\\operatorname {var} \[\\ln(1-X)\]\\\\&\\\\\\operatorname {var\_{G(1-X)}} &=e^{\\operatorname {var} \[\\ln(1-X)\]}\\\\&\\\\\\ln \\operatorname {cov\_{G{X,1-X}}} &=\\operatorname {E} \[(\\ln X-\\ln G\_{X})(\\ln(1-X)-\\ln G\_{1-X})\]\\\\&=\\operatorname {E} \[(\\ln X-\\operatorname {E} \[\\ln X\])(\\ln(1-X)-\\operatorname {E} \[\\ln(1-X)\])\]\\\\&=\\operatorname {E} \\left\[\\ln X\\ln(1-X)\\right\]-\\operatorname {E} \[\\ln X\]\\operatorname {E} \[\\ln(1-X)\]\\\\&=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\\\&\\\\\\operatorname {cov} \_{G{X,(1-X)}}&=e^{\\operatorname {cov} \[\\ln X,\\ln(1-X)\]}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/657c11ca41846366cb8d9843536af9e002ea4cdf) For a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section [§ Moments of logarithmically transformed random variables](https://en.wikipedia.org/wiki/Beta_distribution#Moments_of_logarithmically_transformed_random_variables). The [variance](https://en.wikipedia.org/wiki/Variance "Variance") of the logarithmic variables and [covariance](https://en.wikipedia.org/wiki/Covariance "Covariance") of ln *X* and ln(1−*X*) are: ![{\\displaystyle \\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e396e8700267735eb741f73e8906445579c43bc6) ![{\\displaystyle \\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/70eefadef46c7d56cc13c8221aa3df1d71596b7f) ![{\\displaystyle \\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e7a515ada0b9d62c5a3a7b35662b03256d66e3b9) where the **[trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), and is defined as the derivative of the [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function"): ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{d\\alpha ^{2}}}={\\frac {d\\psi (\\alpha )}{d\\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/041bbff527a17a19f628022e9d3bbb548e7c9f87) Therefore, ![{\\displaystyle \\ln \\operatorname {var} \_{GX}=\\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/194b00552edda5d8d026a24872cdb27b604516c9) ![{\\displaystyle \\ln \\operatorname {var} \_{G(1-X)}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/96dd82553307c025c84da68a3c373aad7467abd2) ![{\\displaystyle \\ln \\operatorname {cov} \_{GX,1-X}=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/40a793c0271e457f671edb0668edc15bbae8740f) The accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters *α* and *β*. The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters *α* and *β* greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values *α* and *β* less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for *α* and *β* less than unity. Following are the limits with one parameter finite (non-zero) and the other approaching these limits: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{GX}=\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{G(1-X)}=\\infty \\\\&\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{GX}=\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{GX}=\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{G(1-X)}=\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)}=0\\\\&\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {cov} \_{GX,(1-X)}=\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {cov} \_{GX,(1-X)}=0\\\\&\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{GX}=\\psi \_{1}(\\alpha )\\\\&\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)}=\\psi \_{1}(\\beta )\\\\&\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)}=-\\psi \_{1}(\\beta )\\\\&\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)}=-\\psi \_{1}(\\alpha )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/52b49d30ad8262df98b1571219f266c2555dc215) Limits with two parameters varying: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }\\ln \\operatorname {var} \_{GX})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }\\ln \\operatorname {var} \_{G(1-X)})=\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=0\\\\&\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to 0}\\ln \\operatorname {var} \_{GX})=\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {var} \_{G(1-X)})=\\infty \\\\&\\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=\\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to 0}\\ln \\operatorname {cov} \_{GX,(1-X)})=-\\infty \\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0bddeff859939cba195caabb8a8340195320ad31) Although both ln(var*GX*) and ln(var*G*(1 − *X*)) are asymmetric, when the shape parameters are equal, *α* = *β*, one has: ln(var*GX*) = ln(var*G*(1−*X*)). This equality follows from the following symmetry displayed between both log geometric variances: ![{\\displaystyle \\ln \\operatorname {var} \_{GX}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {var} \_{G(1-X)}(\\mathrm {B} (\\beta ,\\alpha )).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/73e9d65f936352d6c5ba0168fdc0a51177ec77ba) The log geometric covariance is symmetric: ![{\\displaystyle \\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\alpha ,\\beta ))=\\ln \\operatorname {cov} \_{GX,(1-X)}(\\mathrm {B} (\\beta ,\\alpha ))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d59fb83efc45a84f3606874dc5791681812ca46b) #### Mean absolute deviation around the mean \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=20 "Edit section: Mean absolute deviation around the mean")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_with_alpha_and_beta_from_0_to_5_-_J._Rodal.jpg) Ratio of, ean abs.dev. to std.dev. for beta distribution with α and β ranging from 0 to 5 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg/250px-Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Ratio_of_Mean_Abs._Dev._to_Std.Dev._Beta_distribution_vs._nu_from_0_to_10_and_vs._mean_-_J._Rodal.jpg) Ratio of mean abs.dev. to std.dev. for beta distribution with mean 0 ≤ *μ* ≤ 1 and sample size 0 \< *ν* ≤ 10 The [mean absolute deviation](https://en.wikipedia.org/wiki/Mean_absolute_deviation "Mean absolute deviation") around the mean for the beta distribution with shape parameters *α* and *β* is:[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9) ![{\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2\\alpha ^{\\alpha }\\beta ^{\\beta }}{\\mathrm {B} (\\alpha ,\\beta )(\\alpha +\\beta )^{\\alpha +\\beta +1}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6d1c6330a91df22b40cedc7903dbc70120d66cf9) The mean absolute deviation around the mean is a more [robust](https://en.wikipedia.org/wiki/Robust_statistics "Robust statistics") [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of [statistical dispersion](https://en.wikipedia.org/wiki/Statistical_dispersion "Statistical dispersion") than the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta(*α*, *β*) distributions with *α*,*β* \> 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean. Therefore, the effect of very large deviations from the mean are not as overly weighted. Using [Stirling's approximation](https://en.wikipedia.org/wiki/Stirling%27s_approximation "Stirling's approximation") to the [Gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"), [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) derived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for *α* = *β* = 1, and it decreases to zero as *α* → ∞, *β* → ∞): ![{\\displaystyle {\\begin{aligned}{\\frac {\\text{mean abs. dev. from mean}}{\\text{standard deviation}}}&={\\frac {\\operatorname {E} \[\|X-E\[X\]\|\]}{\\sqrt {\\operatorname {var} (X)}}}\\\\&\\approx {\\sqrt {\\frac {2}{\\pi }}}\\left(1+{\\frac {7}{12(\\alpha +\\beta )}}{}-{\\frac {1}{12\\alpha }}-{\\frac {1}{12\\beta }}\\right),{\\text{ if }}\\alpha ,\\beta \>1.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c196a5a2eb110b71471a3dc019241c6cb8c3f927) At the limit *α* → ∞, *β* → ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution: ![{\\displaystyle {\\sqrt {\\frac {2}{\\pi }}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/14c49a2b1362a06b132b9477b3669977ad5633dd). For *α* = *β* = 1 this ratio equals ![{\\displaystyle {\\frac {\\sqrt {3}}{2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4864a0c173339d1d88e89ca3c943f016744c879a), so that from *α* = *β* = 1 to *α*, *β* → ∞ the ratio decreases by 8.5%. For *α* = *β* = 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from *α* = *β* = 0 to *α* = *β* = 1, and by 25% from *α* = *β* = 0 to *α*, *β* → ∞ . However, for skewed beta distributions such that *α* → 0 or *β* → 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation. Using the [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of mean *μ* and sample size *ν* = *α* + *β* \> 0: *α* = *μν*, *β* = (1 − *μ*)*ν* one can express the mean [absolute deviation](https://en.wikipedia.org/wiki/Absolute_deviation "Absolute deviation") around the mean in terms of the mean *μ* and the sample size *ν* as follows: ![{\\displaystyle \\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2\\mu ^{\\mu \\nu }(1-\\mu )^{(1-\\mu )\\nu }}{\\nu \\mathrm {B} (\\mu \\nu ,(1-\\mu )\\nu )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/027efecf8aaefea8c805194e47a1374ffcb63cb8) For a symmetric distribution, the mean is at the middle of the distribution, *μ* = 1/2, and therefore: ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[\|X-E\[X\]\|\]={\\frac {2^{1-\\nu }}{\\nu \\mathrm {B} ({\\tfrac {\\nu }{2}},{\\tfrac {\\nu }{2}})}}&={\\frac {2^{1-\\nu }\\Gamma (\\nu )}{\\nu (\\Gamma ({\\tfrac {\\nu }{2}}))^{2}}}\\\\\\lim \_{\\nu \\to 0}\\left(\\lim \_{\\mu \\to {\\frac {1}{2}}}\\operatorname {E} \[\|X-E\[X\]\|\]\\right)&={\\frac {1}{2}}\\\\\\lim \_{\\nu \\to \\infty }\\left(\\lim \_{\\mu \\to {\\frac {1}{2}}}\\operatorname {E} \[\|X-E\[X\]\|\]\\right)&=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/745a053a1ef3cc7edf07332763b401bd09b40e42) Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: ![{\\displaystyle {\\begin{aligned}\\lim \_{\\beta \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\alpha \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\beta \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\alpha \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\mu \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&=\\lim \_{\\mu \\to 1}\\operatorname {E} \[\|X-E\[X\]\|\]=0\\\\\\lim \_{\\nu \\to 0}\\operatorname {E} \[\|X-E\[X\]\|\]&={\\sqrt {\\mu (1-\\mu )}}\\\\\\lim \_{\\nu \\to \\infty }\\operatorname {E} \[\|X-E\[X\]\|\]&=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/87c43b4a05f8ea3acf3f15b0a16f6ee07811ac6b) #### Mean absolute difference \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=21 "Edit section: Mean absolute difference")\] The [mean absolute difference](https://en.wikipedia.org/wiki/Mean_absolute_difference "Mean absolute difference") for the beta distribution is: ![{\\displaystyle {\\begin{aligned}\\mathrm {MD} &=\\int \_{0}^{1}\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\,f(y;\\alpha ,\\beta )\\left\|x-y\\right\|dx\\,dy\\\\\[1ex\]&={\\frac {4}{\\alpha +\\beta }}{\\frac {B(\\alpha +\\beta ,\\alpha +\\beta )}{B(\\alpha ,\\alpha )B(\\beta ,\\beta )}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0de290eea66b8a424727bff1b9a02f53f2607361) The [Gini coefficient](https://en.wikipedia.org/wiki/Gini_coefficient "Gini coefficient") for the beta distribution is half of the relative mean absolute difference: ![{\\displaystyle \\mathrm {G} =\\left({\\frac {2}{\\alpha }}\\right){\\frac {B(\\alpha +\\beta ,\\alpha +\\beta )}{B(\\alpha ,\\alpha )B(\\beta ,\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0b4dc9f001aea3434b57f12eaaabd341347cc169) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg/330px-Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_for_Beta_Distribution_as_a_function_of_the_variance_and_the_mean_-_J._Rodal.jpg) Skewness for beta distribution as a function of variance and mean The [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") (the third moment centered on the mean, normalized by the 3/2 power of the variance) of the beta distribution is[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ![{\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \\left\[\\left(X-\\mu \\right)^{3}\\right\]}{\\left(\\operatorname {var} (X)\\right)^{3/2}}}={\\frac {2\\left(\\beta -\\alpha \\right){\\sqrt {\\alpha +\\beta +1}}}{\\left(\\alpha +\\beta +2\\right){\\sqrt {\\alpha \\beta }}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1c880f0b8322d91fe382f87eaa4f8730faa164ed) Letting *α* = *β* in the above expression one obtains *γ*1 = 0, showing once again that for *α* = *β* the distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for *α* \< *β*, negative skew (left-tailed) for *α* \> *β*. Using the [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of mean *μ* and sample size *ν* = *α* + *β*: ![{\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0,\\\\\\beta &=(1-\\mu )\\nu ,&{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/20a45fa8a94426b232051253608b906c6822b55f) one can express the skewness in terms of the mean *μ* and the sample size ν as follows: ![{\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \[(X-\\mu )^{3}\]}{\\left(\\operatorname {var} (X)\\right)^{3/2}}}={\\frac {2(1-2\\mu ){\\sqrt {1+\\nu }}}{(2+\\nu ){\\sqrt {\\mu (1-\\mu )}}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c88399efb587f7d2443ddb232e4b24f2763050b9) The skewness can also be expressed just in terms of the variance *var* and the mean *μ* as follows: ![{\\displaystyle \\gamma \_{1}={\\frac {\\operatorname {E} \[(X-\\mu )^{3}\]}{(\\operatorname {var} (X))^{3/2}}}={\\frac {2(1-2\\mu ){\\sqrt {\\operatorname {var} }}}{\\mu (1-\\mu )+\\operatorname {var} }}{\\text{ if }}\\operatorname {var} \<\\mu (1-\\mu )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6b48373ac7ce8096381e7f74edf9e44bc435ad13) The accompanying plot of skewness as a function of variance and mean shows that maximum variance (1/4) is coupled with zero skewness and the symmetry condition (*μ* = 1/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the "mass" of the probability distribution is concentrated at the ends (minimum variance). The following expression for the square of the skewness, in terms of the sample size *ν* = *α* + *β* and the variance var, is useful for the method of moments estimation of four parameters: ![{\\displaystyle (\\gamma \_{1})^{2}={\\frac {\\left(\\operatorname {E} \[(X-\\mu )^{3}\]\\right)^{2}}{\\left(\\operatorname {var} (X)\\right)^{3}}}={\\frac {4}{(2+\\nu )^{2}}}\\left({\\frac {1}{\\operatorname {var} }}-4(1+\\nu )\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b4f7b7d9a6d73e9bbcc43812b8f3fd573bc02ee3) This expression correctly gives a skewness of zero for *α* = *β*, since in that case (see [§ Variance](https://en.wikipedia.org/wiki/Beta_distribution#Variance)): ![{\\displaystyle \\operatorname {var} ={\\frac {1}{4(1+\\nu )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ceee4bc548c8256d3770abfd91ced35cdaeb4305). For the symmetric case (*α* = *β*), skewness = 0 over the whole range, and the following limits apply: ![{\\displaystyle \\lim \_{\\alpha =\\beta \\to 0}\\gamma \_{1}=\\lim \_{\\alpha =\\beta \\to \\infty }\\gamma \_{1}=\\lim \_{\\nu \\to 0}\\gamma \_{1}=\\lim \_{\\nu \\to \\infty }\\gamma \_{1}=\\lim \_{\\mu \\to {\\frac {1}{2}}}\\gamma \_{1}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/62067392844dd260a0af419672fd2f6e8c964ea8) For the asymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}\\gamma \_{1}=\\lim \_{\\mu \\to 0}\\gamma \_{1}=\\infty \\\\&\\lim \_{\\beta \\to 0}\\gamma \_{1}=\\lim \_{\\mu \\to 1}\\gamma \_{1}=-\\infty \\\\&\\lim \_{\\alpha \\to \\infty }\\gamma \_{1}=-{\\frac {2}{\\sqrt {\\beta }}},\\quad \\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to \\infty }\\gamma \_{1})=-\\infty ,\\quad \\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }\\gamma \_{1})=0\\\\&\\lim \_{\\beta \\to \\infty }\\gamma \_{1}={\\frac {2}{\\sqrt {\\alpha }}},\\quad \\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to \\infty }\\gamma \_{1})=\\infty ,\\quad \\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }\\gamma \_{1})=0\\\\&\\lim \_{\\nu \\to 0}\\gamma \_{1}={\\frac {1-2\\mu }{\\sqrt {\\mu (1-\\mu )}}},\\quad \\lim \_{\\mu \\to 0}(\\lim \_{\\nu \\to 0}\\gamma \_{1})=\\infty ,\\quad \\lim \_{\\mu \\to 1}(\\lim \_{\\nu \\to 0}\\gamma \_{1})=-\\infty \\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bba90eaf084c6a3d43b4c92915911f61b0b7a77) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg/330px-Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Skewness_Beta_Distribution_for_alpha_and_beta_from_.1_to_5_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg/330px-Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_for_Beta_Distribution_as_a_function_of_variance_and_mean_-_J._Rodal.jpg) Excess Kurtosis for Beta Distribution as a function of variance and mean The beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear.[\[15\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Oguamanam-15) Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc.[\[16\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Liang-16) Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping[\[17\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kenney_and_Keeping-17) use the symbol γ2 for the [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis"), but [Abramowitz and Stegun](https://en.wikipedia.org/wiki/Abramowitz_and_Stegun "Abramowitz and Stegun")[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18) use different terminology. To prevent confusion[\[19\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Weisstein.Kurtosi-19) between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows:[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\[20\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Panik-20) ![{\\displaystyle {\\begin{aligned}{\\text{excess kurtosis}}&={\\text{kurtosis}}-3\\\\&={\\frac {\\operatorname {E} \[(X-\\mu )^{4}\]}{(\\operatorname {var} (X))^{2}}}-3\\\\&={\\frac {6\[\\alpha ^{3}-\\alpha ^{2}(2\\beta -1)+\\beta ^{2}(\\beta +1)-2\\alpha \\beta (\\beta +2)\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}\\\\&={\\frac {6\[(\\alpha -\\beta )^{2}(\\alpha +\\beta +1)-\\alpha \\beta (\\alpha +\\beta +2)\]}{\\alpha \\beta (\\alpha +\\beta +2)(\\alpha +\\beta +3)}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ed8320d4f38ba9260f8ad91c30238abc08306dc8) Letting *α* = *β* in the above expression one obtains ![{\\displaystyle {\\text{excess kurtosis}}=-{\\frac {6}{3+2\\alpha }}{\\text{ if }}\\alpha =\\beta .}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f71fa5123180a4ba12fefe14575f8ee902ecf6f5) Therefore, for symmetric beta distributions, the excess kurtosis is negative, increasing from a minimum value of −2 at the limit as {*α* = *β*} → 0, and approaching a maximum value of zero as {*α* = *β*} → ∞. The value of −2 is the minimum value of excess kurtosis that any distribution (not just beta distributions, but any distribution of any possible kind) can ever achieve. This minimum value is reached when all the probability density is entirely concentrated at each end *x* = 0 and *x* = 1, with nothing in between: a 2-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each end (a coin toss: see section below "Kurtosis bounded by the square of the skewness" for further discussion). The description of [kurtosis](https://en.wikipedia.org/wiki/Kurtosis "Kurtosis") as a measure of the "potential outliers" (or "potential rare, extreme values") of the probability distribution, is correct for all distributions including the beta distribution. When rare, extreme values can occur in the beta distribution, the higher its kurtosis; otherwise, the kurtosis is lower. For *α* ≠ *β*, skewed beta distributions, the excess kurtosis can reach unlimited positive values (particularly for *α* → 0 for finite *β*, or for *β* → 0 for finite *α*) because the side away from the mode will produce occasional extreme values. Minimum kurtosis takes place when the mass density is concentrated equally at each end (and therefore the mean is at the center), and there is no probability mass density in between the ends. Using the [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") in terms of mean *μ* and sample size *ν* = *α* + *β*: ![{\\displaystyle {\\begin{aligned}\\alpha &{}=\\mu \\nu ,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0\\\\\\beta &{}=(1-\\mu )\\nu ,{\\text{ where }}\\nu =(\\alpha +\\beta )\>0.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e9235083c23a44820d57502412277b6492733df3) one can express the excess kurtosis in terms of the mean *μ* and the sample size *ν* as follows: ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}{\\bigg (}{\\frac {(1-2\\mu )^{2}(1+\\nu )}{\\mu (1-\\mu )(2+\\nu )}}-1{\\bigg )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/93b68e62d58b197fa50fe15bc12d94b9a4accd9a) The excess kurtosis can also be expressed in terms of just the following two parameters: the variance var, and the sample size *ν* as follows: ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{(3+\\nu )(2+\\nu )}}\\left({\\frac {1}{\\text{ var }}}-6-5\\nu \\right){\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c62c8758de9adab39b9c265430d2c8be8c1156be) and, in terms of the variance *var* and the mean *μ* as follows: ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6{\\text{ var }}(1-{\\text{ var }}-5\\mu (1-\\mu ))}{({\\text{var }}+\\mu (1-\\mu ))(2{\\text{ var }}+\\mu (1-\\mu ))}}{\\text{ if }}{\\text{var}}\<\\mu (1-\\mu )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/742a17be5afc4513ea91bfa55f1b1483f0a66a02) The plot of excess kurtosis as a function of the variance and the mean shows that the minimum value of the excess kurtosis (−2, which is the minimum possible value for excess kurtosis for any distribution) is intimately coupled with the maximum value of variance (1/4) and the symmetry condition: the mean occurring at the midpoint (*μ* = 1/2). This occurs for the symmetric case of *α* = *β* = 0, with zero skewness. At the limit, this is the 2 point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") end *x* = 0 and *x* = 1 and zero probability everywhere else. (A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1.) Variance is maximum because the distribution is bimodal with nothing in between the two modes (spikes) at each end. Excess kurtosis is minimum: the probability density "mass" is zero at the mean and it is concentrated at the two peaks at each end. Excess kurtosis reaches the minimum possible value (for any distribution) when the probability density function has two spikes at each end: it is bi-"peaky" with nothing in between them. On the other hand, the plot shows that for extreme skewed cases, where the mean is located near one or the other end (*μ* = 0 or *μ* = 1), the variance is close to zero, and the excess kurtosis rapidly approaches infinity when the mean of the distribution approaches either end. Alternatively, the excess kurtosis can also be expressed in terms of just the following two parameters: the square of the skewness, and the sample size ν as follows: ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}{\\bigg (}{\\frac {(2+\\nu )}{4}}({\\text{skewness}})^{2}-1{\\bigg )}{\\text{ if (skewness)}}^{2}-2\<{\\text{excess kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5c4a2a00762216460d3146b5e185c584ca7894da) From this last expression, one can obtain the same limits published over a century ago by [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson")[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) for the beta distribution (see section below titled "Kurtosis bounded by the square of the skewness"). Setting *α* + *β* = *ν* = 0 in the above expression, one obtains Pearson's lower boundary (values for the skewness and excess kurtosis below the boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") appropriately called the region below this boundary the "impossible region"). The limit of *α* + *β* = *ν* → ∞ determines Pearson's upper boundary. ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}}=({\\text{skewness}})^{2}-2\\\\&\\lim \_{\\nu \\to \\infty }{\\text{excess kurtosis}}={\\tfrac {3}{2}}({\\text{skewness}})^{2}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5c1b3a5082942b2039499009068f1640b3cb8507) therefore: ![{\\displaystyle ({\\text{skewness}})^{2}-2\<{\\text{excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1cfacc2713aca2945a2a4d65e4a41d7ce3486ec4) Values of *ν* = *α* + *β* such that *ν* ranges from zero to infinity, 0 \< *ν* \< ∞, span the whole region of the beta distribution in the plane of excess kurtosis versus squared skewness. For the symmetric case (*α* = *β*), the following limits apply: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha =\\beta \\to 0}{\\text{excess kurtosis}}=-2\\\\&\\lim \_{\\alpha =\\beta \\to \\infty }{\\text{excess kurtosis}}=0\\\\&\\lim \_{\\mu \\to {\\frac {1}{2}}}{\\text{excess kurtosis}}=-{\\frac {6}{3+\\nu }}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/faf6d13398e34d7d3fab238a94b36633d89460b0) For the unsymmetric cases (*α* ≠ *β*) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions: ![{\\displaystyle {\\begin{aligned}&\\lim \_{\\alpha \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\beta \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\mu \\to 0}{\\text{excess kurtosis}}=\\lim \_{\\mu \\to 1}{\\text{excess kurtosis}}=\\infty \\\\&\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}}={\\frac {6}{\\beta }},{\\text{ }}\\lim \_{\\beta \\to 0}(\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\beta \\to \\infty }(\\lim \_{\\alpha \\to \\infty }{\\text{excess kurtosis}})=0\\\\&\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}}={\\frac {6}{\\alpha }},{\\text{ }}\\lim \_{\\alpha \\to 0}(\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\alpha \\to \\infty }(\\lim \_{\\beta \\to \\infty }{\\text{excess kurtosis}})=0\\\\&\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}}=-6+{\\frac {1}{\\mu (1-\\mu )}},{\\text{ }}\\lim \_{\\mu \\to 0}(\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}})=\\infty ,{\\text{ }}\\lim \_{\\mu \\to 1}(\\lim \_{\\nu \\to 0}{\\text{excess kurtosis}})=\\infty \\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/28a968a269123a07cf9a32ccfe2d75ee09e42460) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_1_to_5_-_J._Rodal.jpg)[![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg/330px-Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Excess_Kurtosis_for_Beta_Distribution_with_alpha_and_beta_ranging_from_0.1_to_5_-_J._Rodal.jpg) ### Characteristic function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=24 "Edit section: Characteristic function")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/24/Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg/330px-Re%28CharacteristicFunction%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacteristicFunction\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Back_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") symmetric case *α* = *β* ranging from 25 to 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg/330px-Re%28CharacteristicFunc%29_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacteristicFunc\)_Beta_Distr_alpha%3Dbeta_from_0_to_25_Front-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") symmetric case *α* = *β* ranging from 0 to 25 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/24/Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg/330px-Re%28CharacteristFunc%29_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacteristFunc\)_Beta_Distr_alpha_from_0_to_25_and_beta%3Dalpha%2B0.5_Back_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") *β* = *α* + 1/2; *α* ranging from 25 to 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg/330px-Re%28CharacterFunc%29_Beta_Distrib._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacterFunc\)_Beta_Distrib._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Back_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") *α* = *β* + 1/2; *β* ranging from 25 to 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg/330px-Re%28CharacterFunc%29_Beta_Distr._beta_from_0_to_25%2C_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Re\(CharacterFunc\)_Beta_Distr._beta_from_0_to_25,_alpha%3Dbeta%2B0.5_Front_-_J._Rodal.jpg) [Re(characteristic function)](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") *α* = *β* + 1/2; *β* ranging from 0 to 25 The [characteristic function](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") is the [Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform "Fourier transform") of the probability density function. The characteristic function of the beta distribution is [Kummer's confluent hypergeometric function](https://en.wikipedia.org/wiki/Confluent_hypergeometric_function "Confluent hypergeometric function") (of the first kind):[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18)[\[22\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Zwillinger_2014-22) ![{\\displaystyle {\\begin{aligned}\\varphi \_{X}(\\alpha ;\\beta ;t)&=\\operatorname {E} \\left\[e^{itX}\\right\]\\\\&=\\int \_{0}^{1}e^{itx}f(x;\\alpha ,\\beta )\\,dx\\\\&={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\!\\\\&=\\sum \_{n=0}^{\\infty }{\\frac {\\alpha ^{\\overline {n}}(it)^{n}}{(\\alpha +\\beta )^{\\overline {n}}n!}}\\\\&=1+\\sum \_{k=1}^{\\infty }\\left(\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}\\right){\\frac {(it)^{k}}{k!}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e0e8f5d3bf4ec0cbe9b911e26c961e9ebaefdd8) where ![{\\displaystyle x^{\\overline {n}}=x(x+1)(x+2)\\cdots (x+n-1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4621f37bf97baedc02455832a6345db0233819aa) is the [rising factorial](https://en.wikipedia.org/wiki/Rising_factorial "Rising factorial"). The value of the characteristic function for *t* = 0, is one: ![{\\displaystyle \\varphi \_{X}(\\alpha ;\\beta ;0)={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;0)=1.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4b9e9291a6bd2e67688aa52ef57fdf08b4c50bc4) Also, the real and imaginary parts of the characteristic function enjoy the following symmetries with respect to the origin of variable *t*: ![{\\displaystyle \\operatorname {Re} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\right\]=\\operatorname {Re} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/468f2135d76bd1b522c84092679209ff6abd5845) ![{\\displaystyle \\operatorname {Im} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;it)\\right\]=-\\operatorname {Im} \\left\[{}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;-it)\\right\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0fca8984292cc85cb8a37ecb5a2b7c04b5596282) The symmetric case *α* = *β* simplifies the characteristic function of the beta distribution to a [Bessel function](https://en.wikipedia.org/wiki/Bessel_function "Bessel function"), since in the special case *α* + *β* = 2*α* the [confluent hypergeometric function](https://en.wikipedia.org/wiki/Confluent_hypergeometric_function "Confluent hypergeometric function") (of the first kind) reduces to a [Bessel function](https://en.wikipedia.org/wiki/Bessel_function "Bessel function") (the modified Bessel function of the first kind ![{\\displaystyle I\_{\\alpha -{\\frac {1}{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/93d199b8c1bcbdb8b27bc66ea2a0eb51102aa71e) ) using [Kummer's](https://en.wikipedia.org/wiki/Ernst_Kummer "Ernst Kummer") second transformation as follows: ![{\\displaystyle {\\begin{aligned}{}\_{1}F\_{1}(\\alpha ;2\\alpha ;it)&=e^{\\frac {it}{2}}{}\_{0}F\_{1}\\left(;\\alpha +{\\tfrac {1}{2}};{\\frac {(it)^{2}}{16}}\\right)\\\\&=e^{\\frac {it}{2}}\\left({\\frac {it}{4}}\\right)^{{\\frac {1}{2}}-\\alpha }\\Gamma \\left(\\alpha +{\\tfrac {1}{2}}\\right)I\_{\\alpha -{\\frac {1}{2}}}\\left({\\frac {it}{2}}\\right).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f6729c240186898dc4200acd85640a1bc43a37f8) In the accompanying plots, the [real part](https://en.wikipedia.org/wiki/Complex_number "Complex number") (Re) of the [characteristic function](https://en.wikipedia.org/wiki/Characteristic_function_\(probability_theory\) "Characteristic function (probability theory)") of the beta distribution is displayed for symmetric (*α* = *β*) and skewed (*α* ≠ *β*) cases. #### Moment generating function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=26 "Edit section: Moment generating function")\] It also follows[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9) that the [moment generating function](https://en.wikipedia.org/wiki/Moment_generating_function "Moment generating function") is ![{\\displaystyle {\\begin{aligned}M\_{X}(\\alpha ;\\beta ;t)&=\\operatorname {E} \\left\[e^{tX}\\right\]\\\\\[4pt\]&=\\int \_{0}^{1}e^{tx}f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&={}\_{1}F\_{1}(\\alpha ;\\alpha +\\beta ;t)\\\\\[4pt\]&=\\sum \_{n=0}^{\\infty }{\\frac {\\alpha ^{\\overline {n}}}{(\\alpha +\\beta )^{\\overline {n}}}}{\\frac {t^{n}}{n!}}\\\\\[4pt\]&=1+\\sum \_{k=1}^{\\infty }\\left(\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}\\right){\\frac {t^{k}}{k!}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7c5c0eea7bcffadb73fef0c85534a8bdca36ebbb) In particular *M**X*(*α*; *β*; 0) = 1. Using the [moment generating function](https://en.wikipedia.org/wiki/Moment_generating_function "Moment generating function"), the *k*\-th [raw moment](https://en.wikipedia.org/wiki/Raw_moment "Raw moment") is given by[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) the factor ![{\\displaystyle \\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2ebd486800857fa26dc780277828ac6e8549b6dd) multiplying the (exponential series) term ![{\\displaystyle \\left({\\frac {t^{k}}{k!}}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0fa171a890daf87017345709927744967da720eb) in the series of the [moment generating function](https://en.wikipedia.org/wiki/Moment_generating_function "Moment generating function") ![{\\displaystyle \\operatorname {E} \[X^{k}\]={\\frac {\\alpha ^{\\overline {k}}}{(\\alpha +\\beta )^{\\overline {k}}}}=\\prod \_{r=0}^{k-1}{\\frac {\\alpha +r}{\\alpha +\\beta +r}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/af7e0927f7f82eccc61e9fb55883ce96d7f958f1) where (*x*)(*k*) is a [Pochhammer symbol](https://en.wikipedia.org/wiki/Pochhammer_symbol "Pochhammer symbol") representing rising factorial. It can also be written in a recursive form as ![{\\displaystyle \\operatorname {E} \[X^{k}\]={\\frac {\\alpha +k-1}{\\alpha +\\beta +k-1}}\\operatorname {E} \[X^{k-1}\].}](https://wikimedia.org/api/rest_v1/media/math/render/svg/069cb373a905b1e8a5a82a0e3b028e88f63672e2) Since the moment generating function ![{\\displaystyle M\_{X}(\\alpha ;\\beta ;\\cdot )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2da3ca5fb26a0197cfdb61c2bad8a839475aedb4) has a positive radius of convergence,\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] the beta distribution is [determined by its moments](https://en.wikipedia.org/wiki/Moment_problem "Moment problem").[\[23\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-23) #### Moments of transformed random variables \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=28 "Edit section: Moments of transformed random variables")\] ##### Moments of linearly transformed, product and inverted random variables \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=29 "Edit section: Moments of linearly transformed, product and inverted random variables")\] One can also show the following expectations for a transformed random variable,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) where the random variable *X* is Beta-distributed with parameters *α* and *β*: *X* ~ Beta(*α*, *β*). The expected value of the variable 1 − *X* is the mirror-symmetry of the expected value based on *X*: ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[1-X\]&={\\frac {\\beta }{\\alpha +\\beta }}\\\\\\operatorname {E} \[X(1-X)\]&=\\operatorname {E} \[(1-X)X\]={\\frac {\\alpha \\beta }{(\\alpha +\\beta )(\\alpha +\\beta +1)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/43fc49b9eafccd56d39c236b26d222dde51638ce) Due to the mirror-symmetry of the probability density function of the beta distribution, the variances based on variables *X* and 1 − *X* are identical, and the covariance on *X*(1 − *X*) is the negative of the variance: ![{\\displaystyle \\operatorname {var} \[(1-X)\]=\\operatorname {var} \[X\]=-\\operatorname {cov} \[X,(1-X)\]={\\frac {\\alpha \\beta }{(\\alpha +\\beta )^{2}(\\alpha +\\beta +1)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7273cc84a6c789724b985c34059fa75a62bce631) These are the expected values for inverted variables, (these are related to the harmonic means, see [§ Harmonic mean](https://en.wikipedia.org/wiki/Beta_distribution#Harmonic_mean)): ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]&={\\frac {\\alpha +\\beta -1}{\\alpha -1}}&&{\\text{ if }}\\alpha \>1\\\\\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]&={\\frac {\\alpha +\\beta -1}{\\beta -1}}&&{\\text{ if }}\\beta \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/972a9c1853e6991aac6666b06c0a633a0caad5b7) The following transformation by dividing the variable *X* by its mirror-image *X*/(1 − *X*)) results in the expected value of the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")):[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]&={\\frac {\\alpha }{\\beta -1}}&&{\\text{ if }}\\beta \>1\\\\\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]&={\\frac {\\beta }{\\alpha -1}}&&{\\text{ if }}\\alpha \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/db2a0928a7f906bcbe5cfd9d0c57713c9ab5cfb7) Variances of these transformed variables can be obtained by integration, as the expected values of the second moments centered on the corresponding variables: ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[{\\frac {1}{X}}\\right\]&=\\operatorname {E} \\left\[\\left({\\frac {1}{X}}-\\operatorname {E} \\left\[{\\frac {1}{X}}\\right\]\\right)^{2}\\right\]=\\operatorname {var} \\left\[{\\frac {1-X}{X}}\\right\]\\\\&=\\operatorname {E} \\left\[\\left({\\frac {1-X}{X}}-\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]\\right)^{2}\\right\]={\\frac {\\beta (\\alpha +\\beta -1)}{\\left(\\alpha -2\\right)\\left(\\alpha -1\\right)^{2}}}{\\text{ if }}\\alpha \>2\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ab8c3db5d04798c92c6360624060ba583ebdf3df) The following variance of the variable *X* divided by its mirror-image (*X*/(1−*X*) results in the variance of the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")):[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[{\\frac {1}{1-X}}\\right\]&=\\operatorname {E} \\left\[\\left({\\frac {1}{1-X}}-\\operatorname {E} \\left\[{\\frac {1}{1-X}}\\right\]\\right)^{2}\\right\]=\\operatorname {var} \\left\[{\\frac {X}{1-X}}\\right\]\\\\\[1ex\]&=\\operatorname {E} \\left\[\\left({\\frac {X}{1-X}}-\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]\\right)^{2}\\right\]={\\frac {\\alpha (\\alpha +\\beta -1)}{\\left(\\beta -2\\right)\\left(\\beta -1\\right)^{2}}}{\\text{ if }}\\beta \>2\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/76d83000746ff4852e63b063948f9db8110d2d22) The covariances are: ![{\\displaystyle {\\begin{aligned}\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {1}{1-X}}\\right\]&=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {X}{1-X}}\\right\]=\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {X}{1-X}}\\right\]\\\\\[1ex\]&=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {1}{1-X}}\\right\]={\\frac {\\alpha +\\beta -1}{(\\alpha -1)(\\beta -1)}}{\\text{ if }}\\alpha ,\\beta \>1\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ce6ebb40e9f206c0269353799307092c2edbcfa0) These expectations and variances appear in the four-parameter Fisher information matrix ([§ Fisher information](https://en.wikipedia.org/wiki/Beta_distribution#Fisher_information).) ##### Moments of logarithmically transformed random variables \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=30 "Edit section: Moments of logarithmically transformed random variables")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Logit.svg/500px-Logit.svg.png)](https://en.wikipedia.org/wiki/File:Logit.svg) Plot of logit(*X*) = ln(*X*/(1 −*X*)) (vertical axis) vs. *X* in the domain of 0 to 1 (horizontal axis). Logit transformations are interesting, as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable Expected values for [logarithmic transformations](https://en.wikipedia.org/wiki/Logarithm_transformation "Logarithm transformation") (useful for [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimates, see [§ Parameter estimation, Maximum likelihood](https://en.wikipedia.org/wiki/Beta_distribution#Parameter_estimation,_Maximum_likelihood)) are discussed in this section. The following logarithmic linear transformations are related to the geometric means *GX* and *G*1−*X* (see [§ Geometric Mean](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_Mean)): ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \[\\ln X\]&=\\psi (\\alpha )-\\psi (\\alpha +\\beta )=-\\operatorname {E} \\left\[\\ln {\\frac {1}{X}}\\right\],\\\\\\operatorname {E} \[\\ln(1-X)\]&=\\psi (\\beta )-\\psi (\\alpha +\\beta )=-\\operatorname {E} \\left\[\\ln {\\frac {1}{1-X}}\\right\].\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/af6c3188ac96160054472db2a23bab9a22f1e486) Where the **[digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function")** *ψ*(*α*) is defined as the [logarithmic derivative](https://en.wikipedia.org/wiki/Logarithmic_derivative "Logarithmic derivative") of the [gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"):[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18) ![{\\displaystyle \\psi (\\alpha )={\\frac {d}{d\\alpha }}\\ln \\Gamma (\\alpha )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7990c6470eee454b17e98d894aa6a2b31960a8f7) [Logit](https://en.wikipedia.org/wiki/Logit "Logit") transformations are interesting,[\[24\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-MacKay-24) as they usually transform various shapes (including J-shapes) into (usually skewed) bell-shaped densities over the logit variable, and they may remove the end singularities over the original variable: ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[\\ln {\\frac {X}{1-X}}\\right\]&=\\psi (\\alpha )-\\psi (\\beta )=\\operatorname {E} \[\\ln X\]+\\operatorname {E} \\left\[\\ln {\\frac {1}{1-X}}\\right\],\\\\\\operatorname {E} \\left\[\\ln {\\frac {1-X}{X}}\\right\]&=\\psi (\\beta )-\\psi (\\alpha )=-\\operatorname {E} \\left\[\\ln {\\frac {X}{1-X}}\\right\].\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a30b0fbbefff93de41d6775f2ab623670f09a4b2) Johnson[\[25\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JohnsonLogInv-25) considered the distribution of the [logit](https://en.wikipedia.org/wiki/Logit "Logit") – transformed variable ln(*X*/1 − *X*), including its moment generating function and approximations for large values of the shape parameters. This transformation extends the finite support \[0, 1\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞). The logit of a beta variate has the [logistic-beta distribution](https://en.wikipedia.org/wiki/Logistic-beta_distribution "Logistic-beta distribution"). Higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions as follows: ![{\\displaystyle {\\begin{aligned}\\operatorname {E} \\left\[\\ln ^{2}(X)\\right\]&=(\\psi (\\alpha )-\\psi (\\alpha +\\beta ))^{2}+\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {E} \\left\[\\ln ^{2}(1-X)\\right\]&=(\\psi (\\beta )-\\psi (\\alpha +\\beta ))^{2}+\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {E} \\left\[\\ln(X)\\ln(1-X)\\right\]&=(\\psi (\\alpha )-\\psi (\\alpha +\\beta ))(\\psi (\\beta )-\\psi (\\alpha +\\beta ))-\\psi \_{1}(\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b42eb1276e349df39df3051df11e0e16afe88e2e) therefore the [variance](https://en.wikipedia.org/wiki/Variance "Variance") of the logarithmic variables and [covariance](https://en.wikipedia.org/wiki/Covariance "Covariance") of ln(*X*) and ln(1−*X*) are: ![{\\displaystyle {\\begin{aligned}\\operatorname {cov} \[\\ln X,\\ln(1-X)\]&=\\operatorname {E} \\left\[\\ln X\\ln(1-X)\\right\]-\\operatorname {E} \[\\ln X\]\\operatorname {E} \[\\ln(1-X)\]\\\\&=-\\psi \_{1}(\\alpha +\\beta )\\\\&\\\\\\operatorname {var} \[\\ln X\]&=\\operatorname {E} \[\\ln ^{2}X\]-(\\operatorname {E} \[\\ln X\])^{2}\\\\&=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )\\\\&=\\psi \_{1}(\\alpha )+\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\\\&\\\\\\operatorname {var} \[\\ln(1-X)\]&=\\operatorname {E} \[\\ln ^{2}(1-X)\]-(\\operatorname {E} \[\\ln(1-X)\])^{2}\\\\&=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )\\\\&=\\psi \_{1}(\\beta )+\\operatorname {cov} \[\\ln X,\\ln(1-X)\]\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53f5f8222528ab3aa1e5f610f49d440d348f7d40) where the **[trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), and is defined as the derivative of the [digamma](https://en.wikipedia.org/wiki/Digamma "Digamma") function: ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{d\\alpha ^{2}}}={\\frac {d\\psi (\\alpha )}{d\\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/041bbff527a17a19f628022e9d3bbb548e7c9f87) The variances and covariance of the logarithmically transformed variables *X* and (1 − *X*) are different, in general, because the logarithmic transformation destroys the mirror-symmetry of the original variables *X* and (1 − *X*), as the logarithm approaches negative infinity for the variable approaching zero. These logarithmic variances and covariance are the elements of the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information") matrix for the beta distribution. They are also a measure of the curvature of the log likelihood function (see section on Maximum likelihood estimation). The variances of the log inverse variables are identical to the variances of the log variables: ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[\\ln {\\frac {1}{X}}\\right\]&=\\operatorname {var} \[\\ln X\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {var} \\left\[\\ln {\\frac {1}{1-X}}\\right\]&=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ),\\\\\\operatorname {cov} \\left\[\\ln {\\frac {1}{X}},\\,\\ln {\\frac {1}{1-X}}\\right\]&=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/18739db82ac6e431571537bb8f09ff006d670b04) It also follows that the variances of the [logit](https://en.wikipedia.org/wiki/Logit "Logit")\-transformed variables are ![{\\displaystyle {\\begin{aligned}\\operatorname {var} \\left\[\\ln {\\frac {X}{1-X}}\\right\]&=\\operatorname {var} \\left\[\\ln {\\frac {1-X}{X}}\\right\]\\\\&=-\\operatorname {cov} \\left\[\\ln {\\frac {X}{1-X}},\\,\\ln {\\frac {1-X}{X}}\\right\]\\\\\[1ex\]&=\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b0f473b85a33bfd20c16bd2e752af80def37388a) ### Quantities of information (entropy) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=31 "Edit section: Quantities of information (entropy)")\] Given a beta distributed random variable, *X* ~ Beta(*α*, *β*), the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") of *X* is (measured in [nats](https://en.wikipedia.org/wiki/Nat_\(unit\) "Nat (unit)")),[\[26\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-26) the expected value of the negative of the logarithm of the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function"): ![{\\displaystyle {\\begin{aligned}h(X)&=\\operatorname {E} \\left\[-\\ln f(X;\\alpha ,\\beta )\\right\]\\\\\[4pt\]&=\\int \_{0}^{1}-f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ,\\beta )\\,dx\\\\\[4pt\]&=\\ln \\mathrm {B} (\\alpha ,\\beta )-(\\alpha -1)\\psi (\\alpha )-(\\beta -1)\\psi (\\beta )+(\\alpha +\\beta -2)\\psi (\\alpha +\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ee7535c40811773d4239dc63ea6c2200c4b7a63c) where *f*(*x*; *α*, *β*) is the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") of the beta distribution: ![{\\displaystyle f(x;\\alpha ,\\beta )={\\frac {x^{\\alpha -1}\\left(1-x\\right)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4fe624be6412bdc2abe8b1775b1ca44b5a9ea27c) The [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function") *ψ* appears in the formula for the differential entropy as a consequence of Euler's integral formula for the [harmonic numbers](https://en.wikipedia.org/wiki/Harmonic_number "Harmonic number") which follows from the integral: ![{\\displaystyle \\int \_{0}^{1}{\\frac {1-x^{\\alpha -1}}{1-x}}\\,dx=\\psi (\\alpha )-\\psi (1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/364bd5d460a8db9c5038b2f19cc1e5d088671ae8) The [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") of the beta distribution is negative for all values of *α* and *β* greater than zero, except at *α* = *β* = 1 (for which values the beta distribution is the same as the [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)")), where the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") reaches its [maximum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of zero. It is to be expected that the maximum entropy should take place when the beta distribution becomes equal to the uniform distribution, since uncertainty is maximal when all possible events are equiprobable. For *α* or *β* approaching zero, the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") approaches its [minimum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of negative infinity. For (either or both) *α* or *β* approaching zero, there is a maximum amount of order: all the probability density is concentrated at the ends, and there is zero probability density at points located between the ends. Similarly for (either or both) *α* or *β* approaching infinity, the differential entropy approaches its minimum value of negative infinity, and a maximum amount of order. If either *α* or *β* approaches infinity (and the other is finite) all the probability density is concentrated at an end, and the probability density is zero everywhere else. If both shape parameters are equal (the symmetric case), *α* = *β*, and they approach infinity simultaneously, the probability density becomes a spike ([Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function")) concentrated at the middle *x* = 1/2, and hence there is 100% probability at the middle *x* = 1/2 and zero probability everywhere else. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)[![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg/330px-Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Differential_Entropy_Beta_Distribution_for_alpha_and_beta_from_0.1_to_5_-_J._Rodal.jpg) The (continuous case) [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") was introduced by Shannon in his original paper (where he named it the "entropy of a continuous distribution"), as the concluding part of the same paper where he defined the [discrete entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy").[\[27\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-27) It is known since then that the differential entropy may differ from the [infinitesimal](https://en.wikipedia.org/wiki/Infinitesimal "Infinitesimal") limit of the discrete entropy by an infinite offset, therefore the differential entropy can be negative (as it is for the beta distribution). What really matters is the relative value of entropy. Given two beta distributed random variables, *X*1 ~ Beta(*α*, *β*) and *X*2 ~ Beta(*α′*, *β′*), the [cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy "Cross-entropy") is (measured in nats)[\[28\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Cover_and_Thomas-28) ![{\\displaystyle {\\begin{aligned}H(X\_{1},X\_{2})&=\\int \_{0}^{1}-f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ',\\beta ')\\,dx\\\\\[4pt\]&=\\ln \\mathrm {B} (\\alpha ',\\beta ')-(\\alpha '-1)\\psi (\\alpha )-(\\beta '-1)\\psi (\\beta )+\\left(\\alpha '+\\beta '-2\\right)\\psi (\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f9e7662d68f802bd6e6b1019f0eb46e6a4bfc0a4) The [cross entropy](https://en.wikipedia.org/wiki/Cross_entropy "Cross entropy") has been used as an error metric to measure the distance between two hypotheses.[\[29\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Plunkett-29)[\[30\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Nallapati-30) Its absolute value is minimum when the two distributions are identical. It is the information measure most closely related to the log maximum likelihood [\[28\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Cover_and_Thomas-28)(see section on "Parameter estimation. Maximum likelihood estimation")). The relative entropy, or [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") *D*KL(*X*1 \|\| *X*2), is a measure of the inefficiency of assuming that the distribution is *X*2 ~ Beta(*α′*, *β′*) when the distribution is really *X*1 ~ Beta(*α*, *β*). It is defined as follows (measured in nats). ![{\\displaystyle {\\begin{aligned}D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})&=\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\,\\ln {\\frac {f(x;\\alpha ,\\beta )}{f(x;\\alpha ',\\beta ')}}\\,dx\\\\\[4pt\]&=\\left(\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ,\\beta )\\,dx\\right)-\\left(\\int \_{0}^{1}f(x;\\alpha ,\\beta )\\ln f(x;\\alpha ',\\beta ')\\,dx\\right)\\\\\[4pt\]&=-h(X\_{1})+H(X\_{1},X\_{2})\\\\\[4pt\]&=\\ln {\\frac {\\mathrm {B} (\\alpha ',\\beta ')}{\\mathrm {B} (\\alpha ,\\beta )}}+\\left(\\alpha -\\alpha '\\right)\\psi (\\alpha )+\\left(\\beta -\\beta '\\right)\\psi (\\beta )+\\left(\\alpha '-\\alpha +\\beta '-\\beta \\right)\\psi (\\alpha +\\beta ).\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/533f00a1d061ebd96a170013f1339d34fc8f1322) The relative entropy, or [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence"), is always non-negative. A few numerical examples follow: - *X*1 ~ Beta(1, 1) and *X*2 ~ Beta(3, 3); *D*KL(*X*1 \|\| *X*2) = 0.598803; *D*KL(*X*2 \|\| *X*1) = 0.267864; *h*(*X*1) = 0; *h*(*X*2) = −0.267864 - *X*1 ~ Beta(3, 0.5) and *X*2 ~ Beta(0.5, 3); *D*KL(*X*1 \|\| *X*2) = 7.21574; *D*KL(*X*2 \|\| *X*1) = 7.21574; *h*(*X*1) = −1.10805; *h*(*X*2) = −1.10805. The [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") is not symmetric *D*KL(*X*1 \|\| *X*2) ≠ *D*KL(*X*2 \|\| *X*1) for the case in which the individual beta distributions Beta(1, 1) and Beta(3, 3) are symmetric, but have different entropies *h*(*X*1) ≠ *h*(*X*2). The value of the Kullback divergence depends on the direction traveled: whether going from a higher (differential) entropy to a lower (differential) entropy or the other way around. In the numerical example above, the Kullback divergence measures the inefficiency of assuming that the distribution is (bell-shaped) Beta(3, 3), rather than (uniform) Beta(1, 1). The "h" entropy of Beta(1, 1) is higher than the "h" entropy of Beta(3, 3) because the uniform distribution Beta(1, 1) has a maximum amount of disorder. The Kullback divergence is more than two times higher (0.598803 instead of 0.267864) when measured in the direction of decreasing entropy: the direction that assumes that the (uniform) Beta(1, 1) distribution is (bell-shaped) Beta(3, 3) rather than the other way around. In this restricted sense, the Kullback divergence is consistent with the [second law of thermodynamics](https://en.wikipedia.org/wiki/Second_law_of_thermodynamics "Second law of thermodynamics"). The [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") is symmetric *D*KL(*X*1 \|\| *X*2) = *D*KL(*X*2 \|\| *X*1) for the skewed cases Beta(3, 0.5) and Beta(0.5, 3) that have equal differential entropy *h*(*X*1) = *h*(*X*2). The symmetry condition: ![{\\displaystyle D\_{\\mathrm {KL} }(X\_{1}\\parallel X\_{2})=D\_{\\mathrm {KL} }(X\_{2}\\parallel X\_{1}),{\\text{ if }}h(X\_{1})=h(X\_{2}),{\\text{ for (skewed) }}\\alpha \\neq \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba60c74ffe89210448b12938900faa4019c8daea) follows from the above definitions and the mirror-symmetry *f*(*x*; *α*, *β*) = *f*(1 − *x*; *α*, *β*) enjoyed by the beta distribution. ### Relationships between statistical measures \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=32 "Edit section: Relationships between statistical measures")\] #### Mean, mode and median relationship \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=33 "Edit section: Mean, mode and median relationship")\] If 1 \< *α* \< *β* then mode ≤ median ≤ mean.[\[10\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Kerman2011-10) Expressing the mode (only for *α*, *β* \> 1), and the mean in terms of *α* and *β*: ![{\\displaystyle {\\frac {\\alpha -1}{\\alpha +\\beta -2}}\\leq {\\text{median}}\\leq {\\frac {\\alpha }{\\alpha +\\beta }},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bce75b15e77da7773748f47ccb73d08207595281) If 1 \< *β* \< *α* then the order of the inequalities are reversed. For *α*, *β* \> 1 the absolute distance between the mean and the median is less than 5% of the distance between the maximum and minimum values of *x*. On the other hand, the absolute distance between the mean and the mode can reach 50% of the distance between the maximum and minimum values of *x*, for the ([pathological](https://en.wikipedia.org/wiki/Pathological_\(mathematics\) "Pathological (mathematics)")) case of *α* = 1 and *β* = 1, for which values the beta distribution approaches the uniform distribution and the [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") approaches its [maximum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value, and hence maximum "disorder". For example, for *α* = 1.0001 and *β* = 1.00000001: - mode = 0.9999; PDF(mode) = 1.00010 - mean = 0.500025; PDF(mean) = 1.00003 - median = 0.500035; PDF(median) = 1.00003 - mean − mode = −0.499875 - mean − median = −9.65538 × 10−6 where PDF stands for the value of the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function"). [![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/ff/Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mean_Median_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg/330px-Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Mean_Mode_Difference_-_Beta_Distribution_for_alpha_and_beta_from_1_to_5_-_J._Rodal.jpg) #### Mean, geometric mean and harmonic mean relationship \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=34 "Edit section: Mean, geometric mean and harmonic mean relationship")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png/250px-Mean%2C_Median%2C_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Mean,_Median,_Geometric_Mean_and_Harmonic_Mean_for_Beta_distribution_with_alpha_%3D_beta_from_0_to_5_-_J._Rodal.png) :Mean, median, geometric mean and harmonic mean for beta distribution with 0 \< *α* = *β* \< 5 It is known from the [inequality of arithmetic and geometric means](https://en.wikipedia.org/wiki/Inequality_of_arithmetic_and_geometric_means "Inequality of arithmetic and geometric means") that the geometric mean is lower than the mean. Similarly, the harmonic mean is lower than the geometric mean. The accompanying plot shows that for *α* = *β*, both the mean and the median are exactly equal to 1/2, regardless of the value of *α* = *β*, and the mode is also equal to 1/2 for *α* = *β* \> 1, however the geometric and harmonic means are lower than 1/2 and they only approach this value asymptotically as *α* = *β* → ∞. #### Kurtosis bounded by the square of the skewness \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=35 "Edit section: Kurtosis bounded by the square of the skewness")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:\(alpha_and_beta\)_Parameter_estimates_vs._excess_Kurtosis_and_\(squared\)_Skewness_Beta_distribution_-_J._Rodal.png) Beta distribution *α* and *β* parameters vs. excess kurtosis and squared skewness As remarked by [Feller](https://en.wikipedia.org/wiki/William_Feller "William Feller"),[\[5\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Feller-5) in the [Pearson system](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution") the beta probability density appears as [type I](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution") (any difference between the beta distribution and Pearson's type I distribution is only superficial and it makes no difference for the following discussion regarding the relationship between kurtosis and skewness). [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") showed, in Plate 1 of his paper [\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) published in 1916, a graph with the [kurtosis](https://en.wikipedia.org/wiki/Kurtosis "Kurtosis") as the vertical axis ([ordinate](https://en.wikipedia.org/wiki/Ordinate "Ordinate")) and the square of the [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") as the horizontal axis ([abscissa](https://en.wikipedia.org/wiki/Abscissa "Abscissa")), in which a number of distributions were displayed.[\[31\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Egon-31) The region occupied by the beta distribution is bounded by the following two [lines](https://en.wikipedia.org/wiki/Line_\(geometry\) "Line (geometry)") in the (skewness2,kurtosis) [plane](https://en.wikipedia.org/wiki/Cartesian_coordinate_system "Cartesian coordinate system"), or the (skewness2,excess kurtosis) [plane](https://en.wikipedia.org/wiki/Cartesian_coordinate_system "Cartesian coordinate system"): ![{\\displaystyle ({\\text{skewness}})^{2}+1\<{\\text{kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}+3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2061054109329b1c61c8bc91e10a8a45e80b9dc2) or, equivalently, ![{\\displaystyle ({\\text{skewness}})^{2}-2\<{\\text{excess kurtosis}}\<{\\frac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f75ee7e52ed7a7bceb6f1d754ee547640bebb19f) At a time when there were no powerful digital computers, [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") accurately computed further boundaries,[\[32\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Hahn_and_Shapiro-32)[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) for example, separating the "U-shaped" from the "J-shaped" distributions. The lower boundary line (excess kurtosis + 2 − skewness2 = 0) is produced by skewed "U-shaped" beta distributions with both values of shape parameters *α* and *β* close to zero. The upper boundary line (excess kurtosis − (3/2) skewness2 = 0) is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") showed[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21) that this upper boundary line (excess kurtosis − (3/2) skewness2 = 0) is also the intersection with Pearson's distribution III, which has unlimited support in one direction (towards positive infinity), and can be bell-shaped or J-shaped. His son, [Egon Pearson](https://en.wikipedia.org/wiki/Egon_Pearson "Egon Pearson"), showed[\[31\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Egon-31) that the region (in the kurtosis/squared-skewness plane) occupied by the beta distribution (equivalently, Pearson's distribution I) as it approaches this boundary (excess kurtosis − (3/2) skewness2 = 0) is shared with the [noncentral chi-squared distribution](https://en.wikipedia.org/wiki/Noncentral_chi-squared_distribution "Noncentral chi-squared distribution"). Karl Pearson[\[33\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1895-33) (Pearson 1895, pp. 357, 360, 373–376) also showed that the [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution") is a Pearson type III distribution. Hence this boundary line for Pearson's type III distribution is known as the gamma line. (This can be shown from the fact that the excess kurtosis of the gamma distribution is 6/*k* and the square of the skewness is 4/*k*, hence (excess kurtosis − (3/2) skewness2 = 0) is identically satisfied by the gamma distribution regardless of the value of the parameter "k"). Pearson later noted that the [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution "Chi-squared distribution") is a special case of Pearson's type III and also shares this boundary line (as it is apparent from the fact that for the [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution "Chi-squared distribution") the excess kurtosis is 12/*k* and the square of the skewness is 8/*k*, hence (excess kurtosis − (3/2) skewness2 = 0) is identically satisfied regardless of the value of the parameter "k"). This is to be expected, since the chi-squared distribution *X* ~ χ2(*k*) is a special case of the gamma distribution, with parametrization X ~ Γ(k/2, 1/2) where k is a positive integer that specifies the "number of degrees of freedom" of the chi-squared distribution. An example of a beta distribution near the upper boundary (excess kurtosis − (3/2) skewness2 = 0) is given by α = 0.1, β = 1000, for which the ratio (excess kurtosis)/(skewness2) = 1.49835 approaches the upper limit of 1.5 from below. An example of a beta distribution near the lower boundary (excess kurtosis + 2 − skewness2 = 0) is given by α= 0.0001, β = 0.1, for which values the expression (excess kurtosis + 2)/(skewness2) = 1.01621 approaches the lower limit of 1 from above. In the infinitesimal limit for both *α* and *β* approaching zero symmetrically, the excess kurtosis reaches its minimum value at −2. This minimum value occurs at the point at which the lower boundary line intersects the vertical axis ([ordinate](https://en.wikipedia.org/wiki/Ordinate "Ordinate")). (However, in Pearson's original chart, the ordinate is kurtosis, instead of excess kurtosis, and it increases downwards rather than upwards). Values for the skewness and excess kurtosis below the lower boundary (excess kurtosis + 2 − skewness2 = 0) cannot occur for any distribution, and hence [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") appropriately called the region below this boundary the "impossible region". The boundary for this "impossible region" is determined by (symmetric or skewed) bimodal U-shaped distributions for which the parameters *α* and *β* approach zero and hence all the probability density is concentrated at the ends: *x* = 0, 1 with practically nothing in between them. Since for *α* ≈ *β* ≈ 0 the probability density is concentrated at the two ends *x* = 0 and *x* = 1, this "impossible boundary" is determined by a [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), where the two only possible outcomes occur with respective probabilities *p* and *q* = 1 − *p*. For cases approaching this limit boundary with symmetry *α* = *β*, skewness ≈ 0, excess kurtosis ≈ −2 (this is the lowest excess kurtosis possible for any distribution), and the probabilities are *p* ≈ *q* ≈ 1/2. For cases approaching this limit boundary with skewness, excess kurtosis ≈ −2 + skewness2, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities ![{\\displaystyle p={\\tfrac {\\beta }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and ![{\\displaystyle q=1-p={\\tfrac {\\alpha }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1. All statements are conditional on *α*, *β* \> 0: ### Geometry of the probability density function \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=37 "Edit section: Geometry of the probability density function")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e0/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptl_view_-_J._Rodal.jpg) Inflection point location versus α and β showing regions with one inflection point [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg/250px-Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:Inflexion_points_Beta_Distribution_alpha_and_beta_ranging_from_0_to_5_large_ptr_view_-_J._Rodal.jpg) Inflection point location versus α and β showing region with two inflection points For certain values of the shape parameters α and β, the [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function") has [inflection points](https://en.wikipedia.org/wiki/Inflection_points "Inflection points"), at which the [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") changes sign. The position of these inflection points can be useful as a measure of the [dispersion](https://en.wikipedia.org/wiki/Statistical_dispersion "Statistical dispersion") or spread of the distribution. Defining the following quantity: ![{\\displaystyle \\kappa ={\\frac {\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0a25618bff371a2cacfa4d37cc75a397fc79eda3) Points of inflection occur,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[8\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Wadsworth-8)[\[9\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Handbook_of_Beta_Distribution-9)[\[20\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Panik-20) depending on the value of the shape parameters *α* and *β*, as follows: - (*α* \> 2, *β* \> 2) The distribution is bell-shaped (symmetric for *α* = *β* and skewed otherwise), with **two inflection points**, equidistant from the mode: ![{\\displaystyle x={\\text{mode}}\\pm \\kappa ={\\frac {\\alpha -1\\pm {\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/824e9ad23c78338bd281a68fa12cd6488803800a) - (*α* = 2, *β* \> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode: ![{\\displaystyle x={\\text{mode}}+\\kappa ={\\frac {2}{\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/620d8f2d2763e7caf8633301e4142d00a754e60a) - (*α* \> 2, β = 2) The distribution is unimodal, negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode: ![{\\displaystyle x={\\text{mode}}-\\kappa =1-{\\frac {2}{\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1c3330dc70173881e3a955fb2d48c7531ce4501c) - (1 \< *α* \< 2, β \> 2, *α* + *β* \> 2) The distribution is unimodal, positively skewed, right-tailed, with **one inflection point**, located to the right of the mode: ![{\\displaystyle x={\\text{mode}}+\\kappa ={\\frac {\\alpha -1+{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f6f4879f21644e05c71610250aae2fdf73b14418) - (0 \< *α* \< 1, 1 \< *β* \< 2) The distribution has a mode at the left end *x* = 0 and it is positively skewed, right-tailed. There is **one inflection point**, located to the right of the mode: ![{\\displaystyle x={\\frac {\\alpha -1+{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c27732cec33ec3f248a4a2f1ef607aef29e374a1) - (*α* \> 2, 1 \< *β* \< 2) The distribution is unimodal negatively skewed, left-tailed, with **one inflection point**, located to the left of the mode: ![{\\displaystyle x={\\text{mode}}-\\kappa ={\\frac {\\alpha -1-{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/711b8cb900645ea1295fc899aa63bdd0648315d4) - (1 \< *α* \< 2, 0 \< *β* \< 1) The distribution has a mode at the right end *x* = 1 and it is negatively skewed, left-tailed. There is **one inflection point**, located to the left of the mode: ![{\\displaystyle x={\\frac {\\alpha -1-{\\sqrt {\\frac {(\\alpha -1)(\\beta -1)}{\\alpha +\\beta -3}}}}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0d9005b2b06e722f6f78c8c0ace994575a519d65) There are no inflection points in the remaining (symmetric and skewed) regions: U-shaped: (*α*, *β* \< 1) upside-down-U-shaped: (1 \< *α* \< 2, 1 \< *β* \< 2), reverse-J-shaped (*α* \< 1, *β* \> 2) or J-shaped: (*α* \> 2, *β* \< 1) The accompanying plots show the inflection point locations (shown vertically, ranging from 0 to 1) versus *α* and *β* (the horizontal axes ranging from 0 to 5). There are large cuts at surfaces intersecting the lines *α* = 1, *β* = 1, *α* = 2, and *β* = 2 because at these values the beta distribution change from 2 modes, to 1 mode to no mode. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b9/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_30_-_J._Rodal.jpg) PDF for symmetric beta distribution vs. *x* and *α* = *β* from 0 to 30 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4e/PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg/250px-PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_symmetric_beta_distribution_vs._x_and_alpha%3Dbeta_from_0_to_2_-_J._Rodal.jpg) PDF for symmetric beta distribution vs. x and *α* = *β* from 0 to 2 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/de/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_2.5_alpha_from_0_to_9_-_J._Rodal.jpg) PDF for skewed beta distribution vs. *x* and *β* = 2.5*α* from 0 to 9 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_5.5_alpha_from_0_to_9_-_J._Rodal.jpg) PDF for skewed beta distribution vs. x and *β* = 5.5*α* from 0 to 9 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/85/PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg/250px-PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg)](https://en.wikipedia.org/wiki/File:PDF_for_skewed_beta_distribution_vs._x_and_beta%3D_8_alpha_from_0_to_10_-_J._Rodal.jpg) PDF for skewed beta distribution vs. x and *β* = 8*α* from 0 to 10 The beta density function can take a wide variety of different shapes depending on the values of the two parameters *α* and *β*. The ability of the beta distribution to take this great diversity of shapes (using only two parameters) is partly responsible for finding wide application for modeling actual measurements: - the density function is [symmetric](https://en.wikipedia.org/wiki/Symmetry "Symmetry") about 1/2 (blue & teal plots). - median = mean = 1/2. - skewness = 0. - variance = 1/(4(2*α* + 1)) - ***α* = *β* \< 1** - U-shaped (blue plot). - bimodal: left mode = 0, right mode =1, anti-mode = 1/2 - 1/12 \< var(*X*) \< 1/4[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) - −2 \< excess kurtosis(*X*) \< −6/5 - *α* = *β* = 1/2 is the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") - var(*X*) = 1/8 - excess kurtosis(*X*) = −3/2 - CF = Rinc (t) [\[34\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-34) - *α* = *β* → 0 is a 2-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") end *x* = 0 and *x* = 1 and zero probability everywhere else. A coin toss: one face of the coin being *x* = 0 and the other face being *x* = 1. - **α = β = 1** - the [uniform \[0, 1\] distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") - no mode - var(*X*) = 1/12 - excess kurtosis(*X*) = −6/5 - The (negative anywhere else) [differential entropy](https://en.wikipedia.org/wiki/Information_entropy "Information entropy") reaches its [maximum](https://en.wikipedia.org/wiki/Maxima_and_minima "Maxima and minima") value of zero - CF = Sinc (t) - ***α* = *β* \> 1** - symmetric [unimodal](https://en.wikipedia.org/wiki/Unimodal "Unimodal") - mode = 1/2. - 0 \< var(*X*) \< 1/12[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) - −6/5 \< excess kurtosis(*X*) \< 0 - *α* = *β* = 3/2 is a semi-elliptic \[0, 1\] distribution, see: [Wigner semicircle distribution](https://en.wikipedia.org/wiki/Wigner_semicircle_distribution "Wigner semicircle distribution")[\[35\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-35) - var(*X*) = 1/16. - excess kurtosis(*X*) = −1 - CF = 2 Jinc (t) - *α* = *β* = 2 is the parabolic \[0, 1\] distribution - var(*X*) = 1/20 - excess kurtosis(*X*) = −6/7 - CF = 3 Tinc (t) [\[36\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-36) - *α* = *β* \> 2 is bell-shaped, with [inflection points](https://en.wikipedia.org/wiki/Inflection_point "Inflection point") located to either side of the mode - 0 \< var(*X*) \< 1/20 - −6/7 \< excess kurtosis(*X*) \< 0 - *α* = *β* → ∞ is a 1-point [Degenerate distribution](https://en.wikipedia.org/wiki/Degenerate_distribution "Degenerate distribution") with a [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function "Dirac delta function") spike at the midpoint *x* = 1/2 with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the single point *x* = 1/2. The density function is [skewed](https://en.wikipedia.org/wiki/Skewness "Skewness"). An interchange of parameter values yields the [mirror image](https://en.wikipedia.org/wiki/Mirror_image "Mirror image") (the reverse) of the initial curve, some more specific cases: - ***α* \< 1, *β* \< 1** - U-shaped - Positive skew for *α* \< *β*, negative skew for *α* \> *β*. - bimodal: left mode = 0, right mode = 1, anti-mode = ![{\\displaystyle {\\tfrac {\\alpha -1}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1632b6874ce03bd0d75001d5816183986685c998) - 0 \< median \< 1. - 0 \< var(*X*) \< 1/4 - ***α* \> 1, *β* \> 1** - [unimodal](https://en.wikipedia.org/wiki/Unimodal "Unimodal") (magenta & cyan plots), - Positive skew for *α* \< *β*, negative skew for *α* \> *β*. - ![{\\displaystyle {\\text{mode}}={\\tfrac {\\alpha -1}{\\alpha +\\beta -2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2a200ee365bda4944379e77c02b4b8278b4a8815) - 0 \< median \< 1 - 0 \< var(*X*) \< 1/12 - ***α* \< 1, *β* ≥ 1** - ***α* ≥ 1, *β* \< 1** - ***α* = 1, *β* \> 1** - **α \> 1, β = 1** - If *X* ~ Beta(*α*, *β*) then 1 − *X* ~ Beta(*β*, *α*) [mirror-image](https://en.wikipedia.org/wiki/Mirror_image "Mirror image") symmetry - If *X* ~ Beta(*α*, *β*) then ![{\\displaystyle {\\tfrac {X}{1-X}}\\sim {\\beta '}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4729760d819d7894ca35e889bc9192ad601f5fd1). The [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution"), also called "beta distribution of the second kind". - If ![{\\displaystyle X\\sim {\\text{Beta}}(\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/54f5f5824479195eafef981fab9a5b7722002d15), then ![{\\displaystyle Y=\\log {\\frac {X}{1-X}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/45a2407291080b9e048ea5234c98581218a8aa46) has a [generalized logistic distribution](https://en.wikipedia.org/wiki/Generalized_logistic_distribution "Generalized logistic distribution"), with density ![{\\displaystyle {\\frac {\\sigma (y)^{\\alpha }\\sigma (-y)^{\\beta }}{B(\\alpha ,\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9c9e5c0ca7d0c451eeb1d6de5e6756a2ea1044c5), where ![{\\displaystyle \\sigma }](https://wikimedia.org/api/rest_v1/media/math/render/svg/59f59b7c3e6fdb1d0365a494b81fb9a696138c36) is the [logistic sigmoid](https://en.wikipedia.org/wiki/Logistic_sigmoid "Logistic sigmoid"). - If *X* ~ Beta(*α*, *β*) then ![{\\displaystyle {\\tfrac {1}{X}}-1\\sim {\\beta '}(\\beta ,\\alpha )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/589767ba92a975c90e9ec4542a6caddacdd2ee04). - If ![{\\displaystyle X\\sim {\\text{Beta}}(\\alpha \_{1},\\beta \_{1})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ca69afff636e2e379147763ab1fcd9ad720360cd) and ![{\\displaystyle Y\\sim {\\text{Beta}}(\\alpha \_{2},\\beta \_{2})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68129647ad7f447682d0c7d50ac8eddb789eb0b4) then ![{\\displaystyle Z={\\tfrac {X}{Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ef4bd5ded089856779a9af407dc80a9c407b6851) has density ![{\\displaystyle {\\tfrac {B(\\alpha \_{1}+\\alpha \_{2},\\beta \_{2})z^{\\alpha \_{1}-1}{}\_{2}F\_{1}(\\alpha \_{1}+\\alpha \_{2},1-\\beta \_{1};\\alpha \_{1}+\\alpha \_{2}+\\beta \_{2};z)}{B(\\alpha \_{1},\\beta \_{1})B(\\alpha \_{2},\\beta \_{2})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1403166914066b28598d36efe3835e4a320ad620) for ![{\\displaystyle 0\<z\\leq 1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2fd7451c82b8fde0587bd1de5a14e0384ef31691) and ![{\\displaystyle {\\tfrac {B(\\alpha \_{1}+\\alpha \_{2},\\beta \_{1})z^{-(\\alpha \_{2}+1)}{}\_{2}F\_{1}(\\alpha \_{1}+\\alpha \_{2},1-\\beta \_{2};\\alpha \_{1}+\\alpha \_{2}+\\beta \_{1};{\\tfrac {1}{z}})}{B(\\alpha \_{1},\\beta \_{1})B(\\alpha \_{2},\\beta \_{2})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3a5b9b30e91efa5764a5dbaf3d825e6f7c0c9584) for ![{\\displaystyle z\\geq 1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ec063fd092caf41bf16256d71e392658e394ee1f), where ![{\\displaystyle {}\_{2}F\_{1}(a,b;c;x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad33456f0f2b7432a504572054b22396ba9d3692) is the [Hypergeometric function](https://en.wikipedia.org/wiki/Hypergeometric_function "Hypergeometric function").[\[37\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pham-Gia2000-37) - If *X* ~ Beta(*n*/2, *m*/2) then ![{\\displaystyle {\\tfrac {mX}{n(1-X)}}\\sim F(n,m)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/457612b34711860ab2e561e98784b94be23068fc) (assuming *n* \> 0 and *m* \> 0), the [Fisher–Snedecor F distribution](https://en.wikipedia.org/wiki/F-distribution "F-distribution"). - If ![{\\displaystyle X\\sim \\operatorname {Beta} \\left(1+\\lambda {\\tfrac {m-\\min }{\\max -\\min }},1+\\lambda {\\tfrac {\\max -m}{\\max -\\min }}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cf1ef11aa6ae7b24fc6add2ad05667a5ece7f8b0) then min + *X*(max − min) ~ PERT(min, max, *m*, *λ*) where *PERT* denotes a [PERT distribution](https://en.wikipedia.org/wiki/PERT_distribution "PERT distribution") used in [PERT](https://en.wikipedia.org/wiki/PERT "PERT") analysis, and *m*\=most likely value.[\[38\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-NewPERT-38) Traditionally[\[39\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Malcolm-39) *λ* = 4 in PERT analysis. - If *X* ~ Beta(1, *β*) then *X* ~ [Kumaraswamy distribution](https://en.wikipedia.org/wiki/Kumaraswamy_distribution "Kumaraswamy distribution") with parameters (1, *β*) - If *X* ~ Beta(*α*, 1) then *X* ~ [Kumaraswamy distribution](https://en.wikipedia.org/wiki/Kumaraswamy_distribution "Kumaraswamy distribution") with parameters (*α*, 1) - If *X* ~ Beta(*α*, 1) then −ln(*X*) ~ Exponential(*α*) ### Special and limiting cases \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=44 "Edit section: Special and limiting cases")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Random_Walk_example.svg/250px-Random_Walk_example.svg.png)](https://en.wikipedia.org/wiki/File:Random_Walk_example.svg) Example of eight realizations of a random walk in one dimension starting at 0: the probability for the time of the last visit to the origin is distributed as Beta(1/2, 1/2) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/db/Arcsin_density.svg/250px-Arcsin_density.svg.png)](https://en.wikipedia.org/wiki/File:Arcsin_density.svg) Beta(1/2, 1/2): The [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") probability density was proposed by [Harold Jeffreys](https://en.wikipedia.org/wiki/Harold_Jeffreys "Harold Jeffreys") to represent uncertainty for a [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") or a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") in [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference"), and is now commonly referred to as [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior"): *p*−1/2(1 − *p*)−1/2. This distribution also appears in several [random walk](https://en.wikipedia.org/wiki/Random_walk "Random walk") fundamental theorems - Beta(1, 1) ~ [U(0, 1)](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") with density 1 on that interval. - Beta(n, 1) ~ Maximum of *n* independent rvs. with [U(0, 1)](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)"), sometimes called a *a standard power function distribution* with density *n* *x**n*–1 on that interval. - Beta(1, n) ~ Minimum of *n* independent rvs. with [U(0, 1)](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") with density *n*(1 − *x*)*n*−1 on that interval. - If *X* ~ Beta(3/2, 3/2) and *r* \> 0 then 2*rX* − *r* ~ [Wigner semicircle distribution](https://en.wikipedia.org/wiki/Wigner_semicircle_distribution "Wigner semicircle distribution"). - Beta(1/2, 1/2) is equivalent to the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution"). This distribution is also [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability for the [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") and [binomial distributions](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"). - ![{\\displaystyle \\lim \_{n\\to \\infty }n\\operatorname {Beta} (1,n)=\\operatorname {Exponential} (1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e36846ea4b3cb7e13970a0fb5618bb9e53c5b72f) the [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution "Exponential distribution"). - ![{\\displaystyle \\lim \_{n\\to \\infty }n\\operatorname {Beta} (k,n)=\\operatorname {Gamma} (k,1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3f88394ad2fa55d9ea30ac2220c0f12befc29869) the [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution"). - For large ![{\\displaystyle n}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a601995d55609f2d9f5e233e36fbe9ea26011b3b), ![{\\displaystyle \\operatorname {Beta} (\\alpha n,\\beta n)\\to {\\mathcal {N}}\\left({\\frac {\\alpha }{\\alpha +\\beta }},{\\frac {\\alpha \\beta }{(\\alpha +\\beta )^{3}}}{\\frac {1}{n}}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/485d916ae0bdbd9f069c23bd938746587ed3b0ab) the [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution "Normal distribution"). More precisely, if ![{\\displaystyle X\_{n}\\sim \\operatorname {Beta} (\\alpha n,\\beta n)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/39946932655f63cd48bdd025f3fdb545ec112dbb) then ![{\\displaystyle {\\sqrt {n}}\\left(X\_{n}-{\\tfrac {\\alpha }{\\alpha +\\beta }}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9fdaade6d88a840030d07b9c38e2c27d7020f872) converges in distribution to a normal distribution with mean 0 and variance ![{\\displaystyle {\\tfrac {\\alpha \\beta }{(\\alpha +\\beta )^{3}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0114023401152c80846cac38a22673de12583f03) as *n* increases. ### Derived from other distributions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=45 "Edit section: Derived from other distributions")\] ### Combination with other distributions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=46 "Edit section: Combination with other distributions")\] - *X* ~ Beta(*α*, *β*) and *Y* ~ F(2*β*,2*α*) then ![{\\displaystyle \\Pr(X\\leq {\\tfrac {\\alpha }{\\alpha +\\beta x}})=\\Pr(Y\\geq x)\\,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ea4e0f1de85d4ebad666d63b2448cb71bea790f3) for all *x* \> 0. ### Compounding with other distributions \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=47 "Edit section: Compounding with other distributions")\] - If *p* ~ Beta(α, β) and *X* ~ Bin(*k*, *p*) then *X* ~ [beta-binomial distribution](https://en.wikipedia.org/wiki/Beta-binomial_distribution "Beta-binomial distribution") - If *p* ~ Beta(α, β) and *X* ~ NB(*r*, *p*) then *X* ~ [beta negative binomial distribution](https://en.wikipedia.org/wiki/Beta_negative_binomial_distribution "Beta negative binomial distribution") ## Statistical inference \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=49 "Edit section: Statistical inference")\] ### Parameter estimation \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=50 "Edit section: Parameter estimation")\] ##### Two unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=52 "Edit section: Two unknown parameters")\] Two unknown parameters (![{\\displaystyle ({\\hat {\\alpha }},{\\hat {\\beta }})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7877c02d2932e6ce21cf536d4f5bb3949fb7a285) of a beta distribution supported in the \[0,1\] interval) can be estimated, using the method of moments, with the first two moments (sample mean and sample variance) as follows. Let: ![{\\displaystyle {\\text{sample mean(X)}}={\\bar {x}}={\\frac {1}{N}}\\sum \_{i=1}^{N}X\_{i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bca659167e8ac6d0b9c74970a03d8e0ceea9cd20) be the [sample mean](https://en.wikipedia.org/wiki/Sample_mean "Sample mean") estimate and ![{\\displaystyle {\\text{sample variance(X)}}={\\bar {v}}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}\\left(X\_{i}-{\\bar {x}}\\right)^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b4817e917d747026cb3b0aeb8534247e3e67d7a5) be the [sample variance](https://en.wikipedia.org/wiki/Sample_variance "Sample variance") estimate. The [method-of-moments](https://en.wikipedia.org/wiki/Method_of_moments_\(statistics\) "Method of moments (statistics)") estimates of the parameters are ![{\\displaystyle {\\hat {\\alpha }}={\\bar {x}}\\left({\\frac {{\\bar {x}}(1-{\\bar {x}})}{\\bar {v}}}-1\\right)\\ {\\text{if}}\\ {\\bar {v}}\<{\\bar {x}}(1-{\\bar {x}}),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6897ae5cc4c9863166b6c18367d324cbf50b8086) ![{\\displaystyle {\\hat {\\beta }}=(1-{\\bar {x}})\\left({\\frac {{\\bar {x}}(1-{\\bar {x}})}{\\bar {v}}}-1\\right)\\ {\\text{if}}\\ {\\bar {v}}\<{\\bar {x}}(1-{\\bar {x}}).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/622e18f3fe8fa5a34032e278238eff46fcd5f646) When the distribution is required over a known interval other than \[0, 1\] with random variable *X*, say \[*a*, *c*\] with random variable *Y*, then replace ![{\\displaystyle {\\bar {x}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/466e03e1c9533b4dab1b9949dad393883f385d80) with ![{\\displaystyle {\\frac {{\\bar {y}}-a}{c-a}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc1c50b9065f399be03b7ccfe3e4cfd1df2c228b) and ![{\\displaystyle {\\bar {v}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba1d09340f8f6c1979330c2f23e514e38f243a3b) with ![{\\displaystyle {\\frac {\\bar {v\_{Y}}}{(c-a)^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/891375739df6a6c363dd4836de599f29de1c790e) in the above couple of equations for the shape parameters (see the "Four unknown parameters" section below),[\[41\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-41) where: ![{\\displaystyle {\\text{sample mean(Y)}}={\\bar {y}}={\\frac {1}{N}}\\sum \_{i=1}^{N}Y\_{i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/064cd9b29b4931f7d973c01358f9b76979148e17) ![{\\displaystyle {\\text{sample variance(Y)}}={\\bar {v}}\_{Y}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}\\left(Y\_{i}-{\\bar {y}}\\right)^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/06ce4bb898c414da5e456519e24e270244e3d401) ##### Four unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=53 "Edit section: Four unknown parameters")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png/250px-%28alpha_and_beta%29_Parameter_estimates_vs._excess_Kurtosis_and_%28squared%29_Skewness_Beta_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:\(alpha_and_beta\)_Parameter_estimates_vs._excess_Kurtosis_and_\(squared\)_Skewness_Beta_distribution_-_J._Rodal.png) Solutions for parameter estimates vs. (sample) excess Kurtosis and (sample) squared Skewness Beta distribution All four parameters (![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }},{\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8116b37df2fff6248cb3bce7dd137af10ed8e5ab) of a beta distribution supported in the \[*a*, *c*\] interval, see section ["Alternative parametrizations, Four parameters"](https://en.wikipedia.org/wiki/Beta_distribution#Four_parameters)) can be estimated, using the method of moments developed by [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson"), by equating sample and population values of the first four central moments (mean, variance, skewness and excess kurtosis).[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42)[\[43\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton_and_Johnson-43) The excess kurtosis was expressed in terms of the square of the skewness, and the sample size ν = α + β, (see previous section ["Kurtosis"](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis)) as follows: ![{\\displaystyle {\\text{excess kurtosis}}={\\frac {6}{3+\\nu }}\\left({\\frac {(2+\\nu )}{4}}({\\text{skewness}})^{2}-1\\right){\\text{ if (skewness)}}^{2}-2\<{\\text{excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/40ad8aea80a012f7bbb462295760a9c2c6b2ea49) One can use this equation to solve for the sample size ν= α + β in terms of the square of the skewness and the excess kurtosis as follows:[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) ![{\\displaystyle {\\hat {\\nu }}={\\hat {\\alpha }}+{\\hat {\\beta }}=3{\\frac {({\\text{sample excess kurtosis}})-({\\text{sample skewness}})^{2}+2}{{\\frac {3}{2}}({\\text{sample skewness}})^{2}-{\\text{(sample excess kurtosis)}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f921c58ebdd7caa136034aee66eb29c214a96ff0) ![{\\displaystyle {\\text{ if (sample skewness)}}^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2) This is the ratio (multiplied by a factor of 3) between the previously derived limit boundaries for the beta distribution in a space (as originally done by Karl Pearson[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21)) defined with coordinates of the square of the skewness in one axis and the excess kurtosis in the other axis (see [§ Kurtosis bounded by the square of the skewness](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness)): The case of zero skewness, can be immediately solved because for zero skewness, *α* = *β* and hence *ν* = 2*α* = 2*β*, therefore *α* = *β* = *ν*/2 ![{\\displaystyle {\\hat {\\alpha }}={\\hat {\\beta }}={\\frac {\\hat {\\nu }}{2}}={\\frac {{\\frac {3}{2}}({\\text{sample excess kurtosis}})+3}{-{\\text{(sample excess kurtosis)}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e7f2ee658c3932894542979710e9495be0ff74a5) ![{\\displaystyle {\\text{ if sample skewness}}=0{\\text{ and }}-2\<{\\text{sample excess kurtosis}}\<0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8f99d8fb52fee902bf792a5ca7699a79828a212c) (Excess kurtosis is negative for the beta distribution with zero skewness, ranging from -2 to 0, so that ![{\\displaystyle {\\hat {\\nu }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba8c4f0785c6b4c01435dcc0aa5b9cfba84bb1c3) -and therefore the sample shape parameters- is positive, ranging from zero when the shape parameters approach zero and the excess kurtosis approaches -2, to infinity when the shape parameters approach infinity and the excess kurtosis approaches zero). For non-zero sample skewness one needs to solve a system of two coupled equations. Since the skewness and the excess kurtosis are independent of the parameters ![{\\displaystyle {\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bfd585fc7cfe1831b2ddc258427b2e6ca017195d), the parameters ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) can be uniquely determined from the sample skewness and the sample excess kurtosis, by solving the coupled equations with two known variables (sample skewness and sample excess kurtosis) and two unknowns (the shape parameters): ![{\\displaystyle ({\\text{sample skewness}})^{2}={\\frac {4\\left({\\hat {\\beta }}-{\\hat {\\alpha }}\\right)^{2}\\left(1+{\\hat {\\alpha }}+{\\hat {\\beta }}\\right)}{{\\hat {\\alpha }}{\\hat {\\beta }}\\left(2+{\\hat {\\alpha }}+{\\hat {\\beta }}\\right)^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1f4c42ca60d8120a9ca85e67c6f6e722b7cbeb8d) ![{\\displaystyle {\\text{sample excess kurtosis}}={\\frac {6}{3+{\\hat {\\alpha }}+{\\hat {\\beta }}}}\\left({\\frac {(2+{\\hat {\\alpha }}+{\\hat {\\beta }})}{4}}({\\text{sample skewness}})^{2}-1\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6e903b748dd46c02966e45c5444f51314c452937) ![{\\displaystyle {\\text{ if (sample skewness)}}^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cee1ded39b3377f9eef5d7ecfcb6db6cea5c11a2) resulting in the following solution:[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}={\\frac {\\hat {\\nu }}{2}}\\left(1\\pm {\\frac {1}{\\sqrt {1+{\\frac {16({\\hat {\\nu }}+1)}{({\\hat {\\nu }}+2)^{2}({\\text{sample skewness}})^{2}}}}}}\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9a9b9a43935818dc6c3a75fb41b49d803b2d2741) ![{\\displaystyle {\\text{ if sample skewness}}\\neq 0{\\text{ and }}({\\text{sample skewness}})^{2}-2\<{\\text{sample excess kurtosis}}\<{\\tfrac {3}{2}}({\\text{sample skewness}})^{2}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8d876ee057046414ad35c2551c997575576d025d) Where one should take the solutions as follows: ![{\\displaystyle {\\hat {\\alpha }}\>{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bba17be3bb65a91cb1d98c314aa0545401c2109) for (negative) sample skewness \< 0, and ![{\\displaystyle {\\hat {\\alpha }}\<{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad87e83b4996b5b82de8452d1f861db48a0987ff) for (positive) sample skewness \> 0. The accompanying plot shows these two solutions as surfaces in a space with horizontal axes of (sample excess kurtosis) and (sample squared skewness) and the shape parameters as the vertical axis. The surfaces are constrained by the condition that the sample excess kurtosis must be bounded by the sample squared skewness as stipulated in the above equation. The two surfaces meet at the right edge defined by zero skewness. Along this right edge, both parameters are equal and the distribution is symmetric U-shaped for α = β \< 1, uniform for α = β = 1, upside-down-U-shaped for 1 \< α = β \< 2 and bell-shaped for α = β \> 2. The surfaces also meet at the front (lower) edge defined by "the impossible boundary" line (excess kurtosis + 2 - skewness2 = 0). Along this front (lower) boundary both shape parameters approach zero, and the probability density is concentrated more at one end than the other end (with practically nothing in between), with probabilities ![{\\displaystyle p={\\tfrac {\\beta }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bceba26790901da810d62299a9ed4c8199828f47) at the left end *x* = 0 and ![{\\displaystyle q=1-p={\\tfrac {\\alpha }{\\alpha +\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5af01bdfe4d6efa34624bc5d506a322a33c2018a) at the right end *x* = 1. The two surfaces become further apart towards the rear edge. At this rear edge the surface parameters are quite different from each other. As remarked, for example, by Bowman and Shenton,[\[44\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BowmanShenton-44) sampling in the neighborhood of the line (sample excess kurtosis - (3/2)(sample skewness)2 = 0) (the just-J-shaped portion of the rear edge where blue meets beige), "is dangerously near to chaos", because at that line the denominator of the expression above for the estimate ν = α + β becomes zero and hence ν approaches infinity as that line is approached. Bowman and Shenton [\[44\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BowmanShenton-44) write that "the higher moment parameters (kurtosis and skewness) are extremely fragile (near that line). However, the mean and standard deviation are fairly reliable." Therefore, the problem is for the case of four parameter estimation for very skewed distributions such that the excess kurtosis approaches (3/2) times the square of the skewness. This boundary line is produced by extremely skewed distributions with very large values of one of the parameters and very small values of the other parameter. See [§ Kurtosis bounded by the square of the skewness](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis_bounded_by_the_square_of_the_skewness) for a numerical example and further comments about this rear edge boundary line (sample excess kurtosis - (3/2)(sample skewness)2 = 0). As remarked by Karl Pearson himself [\[45\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1936-45) this issue may not be of much practical importance as this trouble arises only for very skewed J-shaped (or mirror-image J-shaped) distributions with very different values of shape parameters that are unlikely to occur much in practice). The usual skewed-bell-shape distributions that occur in practice do not have this parameter estimation problem. The remaining two parameters ![{\\displaystyle {\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) can be determined using the sample mean and the sample variance using a variety of equations.[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1)[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) One alternative is to calculate the support interval range ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample kurtosis. For this purpose one can solve, in terms of the range ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the excess kurtosis in terms of the sample variance, and the sample size ν (see [§ Kurtosis](https://en.wikipedia.org/wiki/Beta_distribution#Kurtosis) and [§ Alternative parametrizations, four parameters](https://en.wikipedia.org/wiki/Beta_distribution#Alternative_parametrizations,_four_parameters)): ![{\\displaystyle {\\text{sample excess kurtosis}}={\\frac {6}{(3+{\\hat {\\nu }})(2+{\\hat {\\nu }})}}{\\bigg (}{\\frac {({\\hat {c}}-{\\hat {a}})^{2}}{\\text{(sample variance)}}}-6-5{\\hat {\\nu }}{\\bigg )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0c460925d92ea38bdf5416e50b2dff13d5293329) to obtain: ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})={\\sqrt {\\text{(sample variance)}}}{\\sqrt {6+5{\\hat {\\nu }}+{\\frac {(2+{\\hat {\\nu }})(3+{\\hat {\\nu }})}{6}}{\\text{(sample excess kurtosis)}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7cdf062d6ad446f67133c1f259b23d81dc80c0e9) Another alternative is to calculate the support interval range ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d) based on the sample variance and the sample skewness.[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) For this purpose one can solve, in terms of the range ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa2bf77c3305592bf46ab55b0de8908e2eb10c1d), the equation expressing the squared skewness in terms of the sample variance, and the sample size ν (see section titled "Skewness" and "Alternative parametrizations, four parameters"): ![{\\displaystyle ({\\text{sample skewness}})^{2}={\\frac {4}{(2+{\\hat {\\nu }})^{2}}}{\\bigg (}{\\frac {({\\hat {c}}-{\\hat {a}})^{2}}{\\text{(sample variance)}}}-4(1+{\\hat {\\nu }}){\\bigg )}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d92409417c7a56192ac25d364a949bbb6eade754) to obtain:[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) ![{\\displaystyle ({\\hat {c}}-{\\hat {a}})={\\frac {\\sqrt {\\text{(sample variance)}}}{2}}{\\sqrt {(2+{\\hat {\\nu }})^{2}({\\text{sample skewness}})^{2}+16(1+{\\hat {\\nu }})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/94357c829648f16ff30339863e46cb9bc0755da6) The remaining parameter can be determined from the sample mean and the previously obtained parameters: ![{\\displaystyle ({\\hat {c}}-{\\hat {a}}),{\\hat {\\alpha }},{\\hat {\\nu }}={\\hat {\\alpha }}+{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/dedfbdca756f6074846d73f732b0289a0751749b): ![{\\displaystyle {\\hat {a}}=({\\text{sample mean}})-\\left({\\frac {\\hat {\\alpha }}{\\hat {\\nu }}}\\right)({\\hat {c}}-{\\hat {a}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5cd4a8f52bbe61a10c591db75f2ac8551280d692) and finally, ![{\\displaystyle {\\hat {c}}=({\\hat {c}}-{\\hat {a}})+{\\hat {a}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a703629c1b8091aacfe6a7f8f104ee1f592893db). In the above formulas one may take, for example, as estimates of the sample moments: ![{\\displaystyle {\\begin{aligned}{\\text{sample mean}}&={\\overline {y}}={\\frac {1}{N}}\\sum \_{i=1}^{N}Y\_{i}\\\\{\\text{sample variance}}&={\\overline {v}}\_{Y}={\\frac {1}{N-1}}\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{2}\\\\{\\text{sample skewness}}&=G\_{1}={\\frac {N}{(N-1)(N-2)}}{\\frac {\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{3}}{{\\overline {v}}\_{Y}^{\\frac {3}{2}}}}\\\\{\\text{sample excess kurtosis}}&=G\_{2}={\\frac {N(N+1)}{(N-1)(N-2)(N-3)}}{\\frac {\\sum \_{i=1}^{N}(Y\_{i}-{\\overline {y}})^{4}}{{\\overline {v}}\_{Y}^{2}}}-{\\frac {3(N-1)^{2}}{(N-2)(N-3)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7afd9f1a8604887fe11cf57117dfea6848023586) The estimators *G*1 for [sample skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") and *G*2 for [sample kurtosis](https://en.wikipedia.org/wiki/Kurtosis "Kurtosis") are used by [DAP](https://en.wikipedia.org/wiki/DAP_\(software\) "DAP (software)")/[SAS](https://en.wikipedia.org/wiki/SAS_System "SAS System"), [PSPP](https://en.wikipedia.org/wiki/PSPP "PSPP")/[SPSS](https://en.wikipedia.org/wiki/SPSS "SPSS"), and [Excel](https://en.wikipedia.org/wiki/Microsoft_Excel "Microsoft Excel"). However, they are not used by [BMDP](https://en.wikipedia.org/wiki/BMDP "BMDP") and (according to [\[46\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Joanes_and_Gill-46)) they were not used by [MINITAB](https://en.wikipedia.org/wiki/MINITAB "MINITAB") in 1998. Actually, Joanes and Gill in their 1998 study[\[46\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Joanes_and_Gill-46) concluded that the skewness and kurtosis estimators used in [BMDP](https://en.wikipedia.org/wiki/BMDP "BMDP") and in [MINITAB](https://en.wikipedia.org/wiki/MINITAB "MINITAB") (at that time) had smaller variance and mean-squared error in normal samples, but the skewness and kurtosis estimators used in [DAP](https://en.wikipedia.org/wiki/DAP_\(software\) "DAP (software)")/[SAS](https://en.wikipedia.org/wiki/SAS_System "SAS System"), [PSPP](https://en.wikipedia.org/wiki/PSPP "PSPP")/[SPSS](https://en.wikipedia.org/wiki/SPSS "SPSS"), namely *G*1 and *G*2, had smaller mean-squared error in samples from a very skewed distribution. It is for this reason that we have spelled out "sample skewness", etc., in the above formulas, to make it explicit that the user should choose the best estimator according to the problem at hand, as the best estimator for skewness and kurtosis depends on the amount of skewness (as shown by Joanes and Gill[\[46\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Joanes_and_Gill-46)). ##### Two unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=55 "Edit section: Two unknown parameters")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Max_\(Joint_Log_Likelihood_per_N\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D2_-_J._Rodal.png) Max (joint log likelihood/*N*) for beta distribution maxima at *α* = *β* = 2 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png/250px-Max_%28Joint_Log_Likelihood_per_N%29_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25%2C0.5%2C1%2C2%2C4%2C6%2C8_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Max_\(Joint_Log_Likelihood_per_N\)_for_Beta_distribution_Maxima_at_alpha%3Dbeta%3D_0.25,0.5,1,2,4,6,8_-_J._Rodal.png) Max (joint log likelihood/*N*) for Beta distribution maxima at *α* = *β* ∈ {0.25,0.5,1,2,4,6,8} As is also the case for [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimates for the [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution "Gamma distribution"), the maximum likelihood estimates for the beta distribution do not have a general closed form solution for arbitrary values of the shape parameters. If *X*1, ..., *XN* are independent random variables each having a beta distribution, the joint log likelihood function for *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: ![{\\displaystyle {\\begin{aligned}\\ln \\,{\\mathcal {L}}(\\alpha ,\\beta \\mid X)&=\\sum \_{i=1}^{N}\\ln {\\mathcal {L}}\_{i}(\\alpha ,\\beta \\mid X\_{i})\\\\&=\\sum \_{i=1}^{N}\\ln f(X\_{i};\\alpha ,\\beta )\\\\&=\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}^{\\alpha -1}(1-X\_{i})^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}\\\\&=(\\alpha -1)\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1)\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ca9d8693adcca31abd266e44f7de59dffc5e0b17) Finding the maximum with respect to a shape parameter involves taking the [partial derivative](https://en.wikipedia.org/wiki/Partial_derivative "Partial derivative") with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator of the shape parameters: ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha }}=\\sum \_{i=1}^{N}\\ln X\_{i}-N{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d936dd94d5ad4b3c27e12654cb07764bcead5284) ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta }}=\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta }}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/12538d1b65457b831fbd8127a4820d24632decc1) where: ![{\\displaystyle {\\begin{aligned}{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha }}&=-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\alpha }}+{\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}+{\\frac {\\partial \\ln \\Gamma (\\beta )}{\\partial \\alpha }}\\\\\[1ex\]&=-\\psi (\\alpha +\\beta )+\\psi (\\alpha )+0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/82bf10edac73617ec99a3adbad3ff020391c4a71) ![{\\displaystyle {\\begin{aligned}{\\frac {\\partial \\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta }}&=-{\\frac {\\partial \\ln \\Gamma (\\alpha +\\beta )}{\\partial \\beta }}+{\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\beta }}+{\\frac {\\partial \\ln \\Gamma (\\beta )}{\\partial \\beta }}\\\\\[1ex\]&=-\\psi (\\alpha +\\beta )+0+\\psi (\\beta )\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/10b097547a81011b4e212824977d66b5350dc780) since the **[digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function")** denoted ψ(α) is defined as the [logarithmic derivative](https://en.wikipedia.org/wiki/Logarithmic_derivative "Logarithmic derivative") of the [gamma function](https://en.wikipedia.org/wiki/Gamma_function "Gamma function"):[\[18\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Abramowitz-18) ![{\\displaystyle \\psi (\\alpha )={\\frac {\\partial \\ln \\Gamma (\\alpha )}{\\partial \\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8a36357d4b6ef30c0ff68e5a25546b7873e11bd4) To ensure that the values with zero tangent slope are indeed a maximum (instead of a saddle-point or a minimum) one has to also satisfy the condition that the curvature is negative. This amounts to satisfying that the second partial derivative with respect to the shape parameters is negative ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha ^{2}}}=-N{\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha ^{2}}}\<0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4179be42b0a1f18afa18258ee78745457a592e5e) ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta ^{2}}}=-N{\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta ^{2}}}\<0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bf0985833d003549771de21323f1db4aee770c7a) using the previous equations, this is equivalent to: ![{\\displaystyle {\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\alpha ^{2}}}=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1e6864a9409f9bba6a7bdda1e43695bad6c61cba) ![{\\displaystyle {\\frac {\\partial ^{2}\\ln \\mathrm {B} (\\alpha ,\\beta )}{\\partial \\beta ^{2}}}=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c8355a7e7f6fa44f71e366b668191826ad5b051b) where the **[trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted *ψ*1(*α*), is the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), and is defined as the derivative of the [digamma](https://en.wikipedia.org/wiki/Digamma "Digamma") function: ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {\\partial ^{2}\\ln \\Gamma (\\alpha )}{\\partial \\alpha ^{2}}}=\\,{\\frac {\\partial \\,\\psi (\\alpha )}{\\partial \\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/569f3595daf4226763ad208ce13b94c23b84c783) These conditions are equivalent to stating that the variances of the logarithmically transformed variables are positive, since: ![{\\displaystyle \\operatorname {var} \[\\ln(X)\]=\\operatorname {E} \[\\ln ^{2}(X)\]-(\\operatorname {E} \[\\ln(X)\])^{2}=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d7737d681fea7490e27f8760c6bcc8fccb154904) ![{\\displaystyle \\operatorname {var} \[\\ln(1-X)\]=\\operatorname {E} \[\\ln ^{2}(1-X)\]-(\\operatorname {E} \[\\ln(1-X)\])^{2}=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f84c3747955206cf2190c61bd7875a6cd739ac04) Therefore, the condition of negative curvature at a maximum is equivalent to the statements: ![{\\displaystyle \\operatorname {var} \[\\ln(X)\]\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5fb5a5d0db057469fb9dad8df2902fe93e3f3b0d) ![{\\displaystyle \\operatorname {var} \[\\ln(1-X)\]\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c66a5aaa8362f578beb9b141b4108138b9d21e89) Alternatively, the condition of negative curvature at a maximum is also equivalent to stating that the following [logarithmic derivatives](https://en.wikipedia.org/wiki/Logarithmic_derivative "Logarithmic derivative") of the [geometric means](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") *GX* and *G(1−X)* are positive, since: ![{\\displaystyle \\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\frac {\\partial \\ln G\_{X}}{\\partial \\alpha }}\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/dff4369c6551204082795c96a76d02e1c3c09f1d) ![{\\displaystyle \\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\frac {\\partial \\ln G\_{(1-X)}}{\\partial \\beta }}\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8032acad733eb8da414442492eba553fc57b91ee) While these slopes are indeed positive, the other slopes are negative: ![{\\displaystyle {\\frac {\\partial \\,\\ln G\_{X}}{\\partial \\beta }},{\\frac {\\partial \\ln G\_{1-X}}{\\partial \\alpha }}\<0.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/73da64f6bf33fd6c54b1f14562bb49f420a3d9b5) The slopes of the mean and the median with respect to *α* and *β* display similar sign behavior. From the condition that at a maximum, the partial derivative with respect to the shape parameter equals zero, we obtain the following system of coupled [maximum likelihood estimate](https://en.wikipedia.org/wiki/Maximum_likelihood_estimate "Maximum likelihood estimate") equations (for the average log-likelihoods) that needs to be inverted to obtain the (unknown) shape parameter estimates ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) in terms of the (known) average of logarithms of the samples *X*1, ..., *XN*:[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ![{\\displaystyle {\\begin{aligned}{\\hat {\\operatorname {E} }}\[\\ln(X)\]&=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}=\\ln {\\hat {G}}\_{X}\\\\{\\hat {\\operatorname {E} }}\[\\ln(1-X)\]&=\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln(1-X\_{i})=\\ln {\\hat {G}}\_{1-X}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42f099a4869ce2d200dc80c7675a237caae021e6) where we recognize ![{\\displaystyle \\log {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c12392eb5835891720681a6588ad3f5de311c23e) as the logarithm of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") and ![{\\displaystyle \\log {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f5d621e56a0238c0b3ae087ae15061fbd94a201c) as the logarithm of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") based on (1 − *X*), the mirror-image of *X*. For ![{\\displaystyle {\\hat {\\alpha }}={\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c8c3e8e6b17efa205ed99e5cdb6b9c673f0c1cd4), it follows that ![{\\displaystyle {\\hat {G}}\_{X}={\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5159708756a9966cd39e45bd9256d1a613f9440b). ![{\\displaystyle {\\begin{aligned}{\\hat {G}}\_{X}&=\\prod \_{i=1}^{N}(X\_{i})^{1/N}\\\\{\\hat {G}}\_{1-X}&=\\prod \_{i=1}^{N}(1-X\_{i})^{1/N}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/77612533161d151c46ed38d52609f9ca1ddaab44) These coupled equations containing [digamma functions](https://en.wikipedia.org/wiki/Digamma_function "Digamma function") of the shape parameter estimates ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) must be solved by numerical methods as done, for example, by Beckman et al.[\[47\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-47) Gnanadesikan et al. give numerical solutions for a few cases.[\[48\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-48) [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) suggest that for "not too small" shape parameter estimates ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07), the logarithmic approximation to the digamma function ![{\\displaystyle \\psi ({\\hat {\\alpha }})\\approx \\ln({\\hat {\\alpha }}-{\\tfrac {1}{2}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a3ec1442b30cbf1dc905e33ad571d2e53326bba1) may be used to obtain initial values for an iterative solution, since the equations resulting from this approximation can be solved exactly: ![{\\displaystyle \\ln {\\frac {{\\hat {\\alpha }}-{\\frac {1}{2}}}{{\\hat {\\alpha }}+{\\hat {\\beta }}-{\\frac {1}{2}}}}\\approx \\ln {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5aa1c1d36d619455ea3d54bf713efe79ebeafea1) ![{\\displaystyle \\ln {\\frac {{\\hat {\\beta }}-{\\frac {1}{2}}}{{\\hat {\\alpha }}+{\\hat {\\beta }}-{\\frac {1}{2}}}}\\approx \\ln {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7ab9f6dd9b841f20d8cbba1c6c55709ce08905e8) which leads to the following solution for the initial values (of the estimate shape parameters in terms of the sample geometric means) for an iterative solution: ![{\\displaystyle {\\hat {\\alpha }}\\approx {\\frac {1}{2}}+{\\frac {{\\hat {G}}\_{X}}{2\\left(1-{\\hat {G}}\_{X}-{\\hat {G}}\_{1-X}\\right)}}{\\text{ if }}{\\hat {\\alpha }}\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/983d617675f31cf20c27ac710259e57964c33c32) ![{\\displaystyle {\\hat {\\beta }}\\approx {\\frac {1}{2}}+{\\frac {{\\hat {G}}\_{1-X}}{2\\left(1-{\\hat {G}}\_{X}-{\\hat {G}}\_{1-X}\\right)}}{\\text{ if }}{\\hat {\\beta }}\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/540effc6d6fe123a575d06e5502edc57188d8afa) Alternatively, the estimates provided by the method of moments can instead be used as initial values for an iterative solution of the maximum likelihood coupled equations in terms of the digamma functions. When the distribution is required over a known interval other than \[0, 1\] with random variable *X*, say \[*a*, *c*\] with random variable *Y*, then replace ln(*Xi*) in the first equation with ![{\\displaystyle \\ln {\\frac {Y\_{i}-a}{c-a}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6321d786e8900bc2fe0be4b9eca1e856fe524633) and replace ln(1−*Xi*) in the second equation with ![{\\displaystyle \\ln {\\frac {c-Y\_{i}}{c-a}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f56e30b510f8089a8d4fe6c41fca7ec41e6cb056) (see "Alternative parametrizations, four parameters" section below). If one of the shape parameters is known, the problem is considerably simplified. The following [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation can be used to solve for the unknown shape parameter (for skewed cases such that ![{\\displaystyle {\\hat {\\alpha }}\\neq {\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/88de4dc6f2131efeb9861f9db76d8969f4d87db8), otherwise, if symmetric, both -equal- parameters are known when one is known): ![{\\displaystyle {\\hat {\\operatorname {E} }}\\left\[\\ln {\\frac {X}{1-X}}\\right\]=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\beta }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}}{1-X\_{i}}}=\\ln {\\hat {G}}\_{X}-\\ln {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ca41c85f9e8cd1b427e96fd209fea0522c951d65) This [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation is the logarithm of the transformation that divides the variable *X* by its mirror-image (*X*/(1 - *X*) resulting in the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")) with support \[0, +∞). As previously discussed in the section "Moments of logarithmically transformed random variables," the [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation ![{\\displaystyle \\ln {\\frac {X}{1-X}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/90cf9c0de659980f879076aa348d83a41ab985b0), studied by Johnson,[\[25\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JohnsonLogInv-25) extends the finite support \[0, 1\] based on the original variable *X* to infinite support in both directions of the real line (−∞, +∞). If, for example, ![{\\displaystyle {\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/efdb50e00928e4013750a476dab75eeb3cbd5799) is known, the unknown parameter ![{\\displaystyle {\\hat {\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682d943d1947245b587f282aba6c88f0870fb302) can be obtained in terms of the inverse[\[49\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-invpsi.m-49) digamma function of the right hand side of this equation: ![{\\displaystyle \\psi ({\\hat {\\alpha }})={\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {X\_{i}}{1-X\_{i}}}+\\psi ({\\hat {\\beta }})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/29fdbc8a523905ff0f3b6f20200f8e5b4c258cad) ![{\\displaystyle {\\hat {\\alpha }}=\\psi ^{-1}\\left(\\ln {\\hat {G}}\_{X}-\\ln {\\hat {G}}\_{(1-X)}+\\psi ({\\hat {\\beta }})\\right)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/73f0a716979c28c7040b8c5595e3d4186d9aec0a) In particular, if one of the shape parameters has a value of unity, for example for ![{\\displaystyle {\\hat {\\beta }}=1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a965585e069f798d68c78b5088a53097d4a338b7) (the power function distribution with bounded support \[0,1\]), using the identity ψ(*x* + 1) = ψ(*x*) + 1/*x* in the equation ![{\\displaystyle \\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f58f925d2ede5013a91f3206874b2cabc827cb30), the maximum likelihood estimator for the unknown parameter ![{\\displaystyle {\\hat {\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682d943d1947245b587f282aba6c88f0870fb302) is,[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) exactly: ![{\\displaystyle {\\hat {\\alpha }}=-{\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}}}=-{\\frac {1}{\\ln {\\hat {G}}\_{X}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/83d2c1dcfb9a5fee567dce7e354517f77ffa0c2f) The beta has support \[0, 1\], therefore ![{\\displaystyle {\\hat {G}}\_{X}\<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/67fd685a3f6f53b538ee1a4e8a3cb21988b806c2), and hence ![{\\displaystyle (-\\ln {\\hat {G}}\_{X})\>0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1e33bf71520d5c05a66871474c69a695632be63b), and therefore ![{\\displaystyle {\\hat {\\alpha }}\>0.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4717b1e2f228d1dcd6ef64e903f377faa8075e44) In conclusion, the maximum likelihood estimates of the shape parameters of a beta distribution are (in general) a complicated function of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean"), and of the sample [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean "Geometric mean") based on (1−*X*)), the mirror-image of *X*. One may ask, if the variance (in addition to the mean) is necessary to estimate two shape parameters with the method of moments, why is the (logarithmic or geometric) variance not necessary to estimate two shape parameters with the maximum likelihood method, for which only the geometric means suffice? The answer is because the mean does not provide as much information as the geometric mean. For a beta distribution with equal shape parameters *α* = *β*, the mean is exactly 1/2, regardless of the value of the shape parameters, and therefore regardless of the value of the statistical dispersion (the variance). On the other hand, the geometric mean of a beta distribution with equal shape parameters *α* = *β*, depends on the value of the shape parameters, and therefore it contains more information. Also, the geometric mean of a beta distribution does not satisfy the symmetry conditions satisfied by the mean, therefore, by employing both the geometric mean based on *X* and geometric mean based on (1 − *X*), the maximum likelihood method is able to provide best estimates for both parameters *α* = *β*, without need of employing the variance. One can express the joint log likelihood per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations in terms of the *[sufficient statistics](https://en.wikipedia.org/wiki/Sufficient_statistic "Sufficient statistic")* (the sample geometric means) as follows: ![{\\displaystyle {\\frac {\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}=(\\alpha -1)\\ln {\\hat {G}}\_{X}+(\\beta -1)\\ln {\\hat {G}}\_{(1-X)}-\\ln \\mathrm {B} (\\alpha ,\\beta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3d51aaa07c41f9f5c49c303cc059ae1aeb54489d) We can plot the joint log likelihood per *N* observations for fixed values of the sample geometric means to see the behavior of the likelihood function as a function of the shape parameters α and β. In such a plot, the shape parameter estimators ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c26bb8b654aff9b053b200fa71dce1dac87dfa07) correspond to the maxima of the likelihood function. See the accompanying graph that shows that all the likelihood functions intersect at α = β = 1, which corresponds to the values of the shape parameters that give the maximum entropy (the maximum entropy occurs for shape parameters equal to unity: the uniform distribution). It is evident from the plot that the likelihood function gives sharp peaks for values of the shape parameter estimators close to zero, but that for values of the shape parameters estimators greater than one, the likelihood function becomes quite flat, with less defined peaks. Obviously, the maximum likelihood parameter estimation method for the beta distribution becomes less acceptable for larger values of the shape parameter estimators, as the uncertainty in the peak definition increases with the value of the shape parameter estimators. One can arrive at the same conclusion by noticing that the expression for the curvature of the likelihood function is in terms of the geometric variances ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\alpha ^{2}}}=-\\operatorname {var} \[\\ln X\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/517a09a3b13d22689a3e1e400cbcab3af08e130c) ![{\\displaystyle {\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{\\partial \\beta ^{2}}}=-\\operatorname {var} \[\\ln(1-X)\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1715f50fc408ceba307005a3f9e404520edd5a20) These variances (and therefore the curvatures) are much larger for small values of the shape parameter α and β. However, for shape parameter values α, β \> 1, the variances (and therefore the curvatures) flatten out. Equivalently, this result follows from the [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound "Cramér–Rao bound"), since the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information") matrix components for the beta distribution are these logarithmic variances. The [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound "Cramér–Rao bound") states that the [variance](https://en.wikipedia.org/wiki/Variance "Variance") of any *unbiased* estimator ![{\\displaystyle {\\hat {\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/682d943d1947245b587f282aba6c88f0870fb302) of α is bounded by the [reciprocal](https://en.wikipedia.org/wiki/Multiplicative_inverse "Multiplicative inverse") of the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information"): ![{\\displaystyle \\mathrm {var} ({\\hat {\\alpha }})\\geq {\\frac {1}{\\operatorname {var} \[\\ln X\]}}\\geq {\\frac {1}{\\psi \_{1}({\\hat {\\alpha }})-\\psi \_{1}({\\hat {\\alpha }}+{\\hat {\\beta }})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/744f1e8421337ed7a2e6cae00fccec1eaf68e3dc) ![{\\displaystyle \\mathrm {var} ({\\hat {\\beta }})\\geq {\\frac {1}{\\operatorname {var} \[\\ln(1-X)\]}}\\geq {\\frac {1}{\\psi \_{1}({\\hat {\\beta }})-\\psi \_{1}({\\hat {\\alpha }}+{\\hat {\\beta }})}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f01dc5c3eb614cfa71a500fd34f3fa430c183c76) so the variance of the estimators increases with increasing α and β, as the logarithmic variances decrease. Also one can express the joint log likelihood per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations in terms of the [digamma function](https://en.wikipedia.org/wiki/Digamma_function "Digamma function") expressions for the logarithms of the sample geometric means as follows: ![{\\displaystyle {\\frac {\\ln \\,{\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}=(\\alpha -1)(\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }}))+(\\beta -1)(\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }}))-\\ln \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/25a4676c5e903b8ecd627fe6effa892db4341aa0) this expression is identical to the negative of the cross-entropy (see section on "Quantities of information (entropy)"). Therefore, finding the maximum of the joint log likelihood of the shape parameters, per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations, is identical to finding the minimum of the cross-entropy for the beta distribution, as a function of the shape parameters. ![{\\displaystyle {\\begin{aligned}{\\frac {\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N}}&=-H=-h-D\_{\\mathrm {KL} }\\\\&=-\\ln \\mathrm {B} (\\alpha ,\\beta )+(\\alpha -1)\\psi ({\\hat {\\alpha }})+(\\beta -1)\\psi ({\\hat {\\beta }})-(\\alpha +\\beta -2)\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b0a1878d4956258885b927cbfd24f522b8ebc9e5) with the cross-entropy defined as follows: ![{\\displaystyle H=\\int \_{0}^{1}-f(X;{\\hat {\\alpha }},{\\hat {\\beta }})\\ln(f(X;\\alpha ,\\beta ))\\,{\\rm {d}}X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1dacaf170c607c27fac660f26875d1e7915dffcd) ##### Four unknown parameters \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=56 "Edit section: Four unknown parameters")\] The procedure is similar to the one followed in the two unknown parameter case. If *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters, the joint log likelihood function for *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: ![{\\displaystyle {\\begin{aligned}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)&=\\sum \_{i=1}^{N}\\ln \\,{\\mathcal {L}}\_{i}(\\alpha ,\\beta ,a,c\\mid Y\_{i})\\\\&=\\sum \_{i=1}^{N}\\ln f(Y\_{i};\\alpha ,\\beta ,a,c)\\\\&=\\sum \_{i=1}^{N}\\ln {\\frac {(Y\_{i}-a)^{\\alpha -1}(c-Y\_{i})^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}\\mathrm {B} (\\alpha ,\\beta )}}\\\\&=(\\alpha -1)\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)+(\\beta -1)\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )-N(\\alpha +\\beta -1)\\ln(c-a)\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/568e233a41b47ffbcb62039f7f1018e98bebddfb) Finding the maximum with respect to a shape parameter involves taking the partial derivative with respect to the shape parameter and setting the expression equal to zero yielding the [maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood "Maximum likelihood") estimator of the shape parameters: ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha }}=\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)-N(-\\psi (\\alpha +\\beta )+\\psi (\\alpha ))-N\\ln(c-a)=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/dec8ea829b8ce52d65edcf18375e9843f34f3bab) ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta }}=\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-N(-\\psi (\\alpha +\\beta )+\\psi (\\beta ))-N\\ln(c-a)=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/decc8da83539ce590981c48b27edc90b521d7d0b) ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a}}=-(\\alpha -1)\\sum \_{i=1}^{N}{\\frac {1}{Y\_{i}-a}}\\,+N(\\alpha +\\beta -1){\\frac {1}{c-a}}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/025a6f1a4e3fdcf2f6740a331d8f8b3985fde351) ![{\\displaystyle {\\frac {\\partial \\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c}}=(\\beta -1)\\sum \_{i=1}^{N}{\\frac {1}{c-Y\_{i}}}\\,-N(\\alpha +\\beta -1){\\frac {1}{c-a}}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/976c431dd6db2b070e95ab4c7a685e3dc9039db2) these equations can be re-arranged as the following system of four coupled equations (the first two equations are geometric means and the second two equations are the harmonic means) in terms of the maximum likelihood estimates for the four parameters ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }},{\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8116b37df2fff6248cb3bce7dd137af10ed8e5ab): ![{\\displaystyle {\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {Y\_{i}-{\\hat {a}}}{{\\hat {c}}-{\\hat {a}}}}=\\psi ({\\hat {\\alpha }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3d025e03965c2a5d2c23a9e493d7314c3f2dfca1) ![{\\displaystyle {\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln {\\frac {{\\hat {c}}-Y\_{i}}{{\\hat {c}}-{\\hat {a}}}}=\\psi ({\\hat {\\beta }})-\\psi ({\\hat {\\alpha }}+{\\hat {\\beta }})=\\ln {\\hat {G}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b2851705444792cb8f4089f5dea6443823e5a15b) ![{\\displaystyle {\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}{\\frac {{\\hat {c}}-{\\hat {a}}}{Y\_{i}-{\\hat {a}}}}}}={\\frac {{\\hat {\\alpha }}-1}{{\\hat {\\alpha }}+{\\hat {\\beta }}-1}}={\\hat {H}}\_{X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3918a55675f88d637c3c1d91ac5743155433350a) ![{\\displaystyle {\\frac {1}{{\\frac {1}{N}}\\sum \_{i=1}^{N}{\\frac {{\\hat {c}}-{\\hat {a}}}{{\\hat {c}}-Y\_{i}}}}}={\\frac {{\\hat {\\beta }}-1}{{\\hat {\\alpha }}+{\\hat {\\beta }}-1}}={\\hat {H}}\_{1-X}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0d929f9e0acad7d24ac0112cd9fd865eb67f64ee) with sample geometric means: ![{\\displaystyle {\\hat {G}}\_{X}=\\prod \_{i=1}^{N}\\left({\\frac {Y\_{i}-{\\hat {a}}}{{\\hat {c}}-{\\hat {a}}}}\\right)^{\\frac {1}{N}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/24c05fd2cdd1acc7e2370a6c9155f0658e8bfc73) ![{\\displaystyle {\\hat {G}}\_{(1-X)}=\\prod \_{i=1}^{N}\\left({\\frac {{\\hat {c}}-Y\_{i}}{{\\hat {c}}-{\\hat {a}}}}\\right)^{\\frac {1}{N}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/66b3e17f4fea833bd98721a8bcef4e5769e8892f) The parameters ![{\\displaystyle {\\hat {a}},{\\hat {c}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bfd585fc7cfe1831b2ddc258427b2e6ca017195d) are embedded inside the geometric mean expressions in a nonlinear way (to the power 1/*N*). This precludes, in general, a closed form solution, even for an initial value approximation for iteration purposes. One alternative is to use as initial values for iteration the values obtained from the method of moments solution for the four parameter case. Furthermore, the expressions for the harmonic means are well-defined only for ![{\\displaystyle {\\hat {\\alpha }},{\\hat {\\beta }}\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8cf28629a51347b164038f8ed1561affcfc32a08), which precludes a maximum likelihood solution for shape parameters less than unity in the four-parameter case. Fisher's information matrix for the four parameter case is [positive-definite](https://en.wikipedia.org/wiki/Positive-definite_matrix "Positive-definite matrix") only for α, β \> 2 (for further discussion, see section on Fisher information matrix, four parameter case), for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. The following Fisher information components (that represent the expectations of the curvature of the log likelihood function) have [singularities](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") at the following values: ![{\\displaystyle \\alpha =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a^{2}}}\\right\]={\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/53538160a2404a5d7b74ae2033fbbb2dbc1045eb) ![{\\displaystyle \\beta =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c^{2}}}\\right\]={\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ee2c6ecfcafe60e54799ab4a16451cf478d65f7d) ![{\\displaystyle \\alpha =2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\partial a}}\\right\]={\\mathcal {I}}\_{\\alpha ,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8b8af544d7c5e0cc278aa725daba7f3de2f70d31) ![{\\displaystyle \\beta =1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\partial c}}\\right\]={\\mathcal {I}}\_{\\beta ,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2f87b32b39997964dcbc95bc64c1364e6832db1d) (for further discussion see section on Fisher information matrix). Thus, it is not possible to strictly carry on the maximum likelihood estimation for some well known distributions belonging to the four-parameter beta distribution family, like the [uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution "Continuous uniform distribution") (Beta(1, 1, *a*, *c*)), and the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") (Beta(1/2, 1/2, *a*, *c*)). [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz")[\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) ignore the equations for the harmonic means and instead suggest "If a and c are unknown, and maximum likelihood estimators of *a*, *c*, α and β are required, the above procedure (for the two unknown parameter case, with *X* transformed as *X* = (*Y* − *a*)/(*c* − *a*)) can be repeated using a succession of trial values of *a* and *c*, until the pair (*a*, *c*) for which maximum likelihood (given *a* and *c*) is as great as possible, is attained" (where, for the purpose of clarity, their notation for the parameters has been translated into the present notation). #### Fisher information matrix \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=57 "Edit section: Fisher information matrix")\] Let a random variable X have a probability density *f*(*x*;*α*). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function") is called the [score](https://en.wikipedia.org/wiki/Score_\(statistics\) "Score (statistics)"). The second moment of the score is called the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information "Fisher information"): ![{\\displaystyle {\\mathcal {I}}(\\alpha )=\\operatorname {E} \\left\[\\left({\\frac {\\partial }{\\partial \\alpha }}\\ln {\\mathcal {L}}(\\alpha \\mid X)\\right)^{2}\\right\],}](https://wikimedia.org/api/rest_v1/media/math/render/svg/daec13972d17a073bcd447abfde55a6b0e168720) The [expectation](https://en.wikipedia.org/wiki/Expected_value "Expected value") of the [score](https://en.wikipedia.org/wiki/Score_\(statistics\) "Score (statistics)") is zero, therefore the Fisher information is also the second moment centered on the mean of the score: the [variance](https://en.wikipedia.org/wiki/Variance "Variance") of the score. If the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function") is twice differentiable with respect to the parameter α, and under certain regularity conditions,[\[50\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Silvey-50) then the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes): ![{\\displaystyle {\\mathcal {I}}(\\alpha )=-\\operatorname {E} \\left\[{\\frac {\\partial ^{2}}{\\partial \\alpha ^{2}}}\\ln {\\mathcal {L}}(\\alpha \\mid X)\\right\].}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3fdd5f6730d5ffb0a5f833c89ba784362322cbc8) Thus, the Fisher information is the negative of the expectation of the second [derivative](https://en.wikipedia.org/wiki/Derivative "Derivative") with respect to the parameter α of the log [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function"). Therefore, Fisher information is a measure of the [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") of the log likelihood function of α. A low [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") (and therefore high [radius of curvature](https://en.wikipedia.org/wiki/Radius_of_curvature_\(mathematics\) "Radius of curvature (mathematics)")), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature") (and therefore low [radius of curvature](https://en.wikipedia.org/wiki/Radius_of_curvature_\(mathematics\) "Radius of curvature (mathematics)")) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters ("the observed Fisher information matrix") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms.[\[51\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-EdwardsLikelihood-51) The word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators. The [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound "Cramér–Rao bound") states that the inverse of the Fisher information is a lower bound on the variance of any [estimator](https://en.wikipedia.org/wiki/Estimator "Estimator") of a parameter α: ![{\\displaystyle \\operatorname {var} \[{\\hat {\\alpha }}\]\\geq {\\frac {1}{{\\mathcal {I}}(\\alpha )}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d93d4983a2717258c52eb47d1562e849a3a66c5c) The precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two [alternative hypothesis](https://en.wikipedia.org/wiki/Alternative_hypothesis "Alternative hypothesis") of a parameter.[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) When there are *N* parameters ![{\\displaystyle {\\begin{bmatrix}\\theta \_{1}\\\\\\theta \_{2}\\\\\\vdots \\\\\\theta \_{N}\\end{bmatrix}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1c28d63a97dada5deb831115cd2588be050e522a) then the Fisher information takes the form of an *N*×*N* [positive semidefinite](https://en.wikipedia.org/wiki/Positive_semidefinite_matrix "Positive semidefinite matrix") [symmetric matrix](https://en.wikipedia.org/wiki/Symmetric_matrix "Symmetric matrix"), the Fisher information matrix, with typical element: ![{\\displaystyle ({\\mathcal {I}}(\\theta ))\_{i,j}=\\operatorname {E} \\left\[{\\frac {\\partial \\ln {\\mathcal {L}}}{\\partial \\theta \_{i}}}\\cdot {\\frac {\\partial \\ln {\\mathcal {L}}}{\\partial \\theta \_{j}}}\\right\].}](https://wikimedia.org/api/rest_v1/media/math/render/svg/94187c0032daae02409e5323c356fae5fdcb73fd) Under certain regularity conditions,[\[50\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Silvey-50) the Fisher Information Matrix may also be written in the following form, which is often more convenient for computation: ![{\\displaystyle ({\\mathcal {I}}(\\theta ))\_{i,j}=-\\operatorname {E} \\left\[{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}}{\\partial \\theta \_{i}\\,\\partial \\theta \_{j}}}\\right\]\\,.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/86df084593df54dd2ed7220a683b9fbc41f230d7) With *X*1, ..., *XN* [iid](https://en.wikipedia.org/wiki/Iid "Iid") random variables, an *N*\-dimensional "box" can be constructed with sides *X*1, ..., *XN*. Costa and Cover[\[53\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-CostaCover-53) show that the (Shannon) differential entropy *h*(*X*) is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set. For *X*1, ..., *X**N* independent random variables each having a beta distribution parametrized with shape parameters *α* and *β*, the joint log likelihood function for *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: ![{\\displaystyle \\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)=(\\alpha -1)\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1)\\sum \_{i=1}^{N}\\ln(1-X\_{i})-N\\ln \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/df7eddeed085e8e7d19d2a5336c6503d06111280) therefore the joint log likelihood function per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is ![{\\displaystyle {\\frac {1}{N}}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)=(\\alpha -1){\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln X\_{i}+(\\beta -1){\\frac {1}{N}}\\sum \_{i=1}^{N}\\ln(1-X\_{i})-\\,\\ln \\mathrm {B} (\\alpha ,\\beta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6997aeb66fcf95e5d64d65ca0aab679a30a94f00) For the two parameter case, the Fisher information has 4 components: 2 diagonal and 2 off-diagonal. Since the Fisher information matrix is symmetric, one of these off diagonal components is independent. Therefore, the Fisher information matrix has 3 independent components (2 diagonal and 1 off diagonal). Aryal and Nadarajah[\[54\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four-parameter case, from which the two parameter case can be obtained as follows: ![{\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\alpha ^{2}}}=\\operatorname {var} \[\\ln(X)\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\alpha }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\alpha ^{2}}}\\right\]=\\ln \\operatorname {var} \_{GX}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c90003d5bd2f6d2bcfe2c788585689726b4b7e36) ![{\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\beta ^{2}}}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\beta ,\\beta }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\partial \\beta ^{2}}}\\right\]=\\ln \\operatorname {var} \_{G(1-X)}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b59bcc31f14f1f4b3b07bd92a66426bb6ac126b1) ![{\\displaystyle -{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\alpha \\,\\partial \\beta }}=\\operatorname {cov} \[\\ln X,\\ln(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\beta }=\\operatorname {E} \\left\[-{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta \\mid X)}{N\\,\\partial \\alpha \\,\\partial \\beta }}\\right\]=\\ln \\operatorname {cov} \_{G{X,(1-X)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a6c3268da3b23c062ce981ab02d4c454a0267365) Since the Fisher information matrix is symmetric ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\beta }={\\mathcal {I}}\_{\\beta ,\\alpha }=\\ln \\operatorname {cov} \_{G{X,(1-X)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9f43e57ed436efb049ce99c8a2e92269acfa1128) The Fisher information components are equal to the log geometric variances and log geometric covariance. Therefore, they can be expressed as **[trigamma functions](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function")**, denoted ψ1(α), the second of the [polygamma functions](https://en.wikipedia.org/wiki/Polygamma_function "Polygamma function"), defined as the derivative of the [digamma](https://en.wikipedia.org/wiki/Digamma "Digamma") function: ![{\\displaystyle \\psi \_{1}(\\alpha )={\\frac {d^{2}\\ln \\Gamma (\\alpha )}{\\partial \\alpha ^{2}}}=\\,{\\frac {\\partial \\psi (\\alpha )}{\\partial \\alpha }}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0629292f9b4428a64ffd75dd814ea9263dad115c) These derivatives are also derived in the [§ Two unknown parameters](https://en.wikipedia.org/wiki/Beta_distribution#Two_unknown_parameters) and plots of the log likelihood function are also shown in that section. [§ Geometric variance and covariance](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_variance_and_covariance) contains plots and further discussion of the Fisher information matrix components: the log geometric variances and log geometric covariance as a function of the shape parameters α and β. [§ Moments of logarithmically transformed random variables](https://en.wikipedia.org/wiki/Beta_distribution#Moments_of_logarithmically_transformed_random_variables) contains formulas for moments of logarithmically transformed random variables. Images for the Fisher information components ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\alpha },{\\mathcal {I}}\_{\\beta ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4f6e447015dcb7f00c9d69c48c9b23ba0bc3ec0e) and ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fdbb33953faa5a53fe23c281a4e7dbf359c11fbc) are shown in [§ Geometric variance](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_variance). The determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability). From the expressions for the individual components of the Fisher information matrix, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution is: ![{\\displaystyle {\\begin{aligned}\\det({\\mathcal {I}}(\\alpha ,\\beta ))&={\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\alpha ,\\beta }\\\\\[4pt\]&=(\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta ))(\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta ))-(-\\psi \_{1}(\\alpha +\\beta ))(-\\psi \_{1}(\\alpha +\\beta ))\\\\\[4pt\]&=\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )\\\\\[4pt\]\\lim \_{\\alpha \\to 0}\\det({\\mathcal {I}}(\\alpha ,\\beta ))&=\\lim \_{\\beta \\to 0}\\det({\\mathcal {I}}(\\alpha ,\\beta ))=\\infty \\\\\[4pt\]\\lim \_{\\alpha \\to \\infty }\\det({\\mathcal {I}}(\\alpha ,\\beta ))&=\\lim \_{\\beta \\to \\infty }\\det({\\mathcal {I}}(\\alpha ,\\beta ))=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b2c5ccf59b05ea730fc108360c07e9ac9634e829) From [Sylvester's criterion](https://en.wikipedia.org/wiki/Sylvester%27s_criterion "Sylvester's criterion") (checking whether the diagonal elements are all positive), it follows that the Fisher information matrix for the two parameter case is [positive-definite](https://en.wikipedia.org/wiki/Positive-definite_matrix "Positive-definite matrix") (under the standard condition that the shape parameters are positive *α* \> 0 and *β* \> 0). [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png/250px-Fisher_Information_I%28a%2Ca%29_for_alpha%3Dbeta_vs_range_%28c-a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Fisher_Information_I\(a,a\)_for_alpha%3Dbeta_vs_range_\(c-a\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png) Fisher Information *I*(*a*,*a*) for *α* = *β* vs range (*c* − *a*) and exponent *α* = *β* [![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png/250px-Fisher_Information_I%28alpha%2Ca%29_for_alpha%3Dbeta%2C_vs._range_%28c_-_a%29_and_exponent_alpha%3Dbeta_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Fisher_Information_I\(alpha,a\)_for_alpha%3Dbeta,_vs._range_\(c_-_a\)_and_exponent_alpha%3Dbeta_-_J._Rodal.png) Fisher Information *I*(*α*,*a*) for *α* = *β*, vs. range (*c* − *a*) and exponent *α* = *β* If *Y*1, ..., *YN* are independent random variables each having a beta distribution with four parameters: the exponents *α* and *β*, and also *a* (the minimum of the distribution range), and *c* (the maximum of the distribution range) (section titled "Alternative parametrizations", "Four parameters"), with [probability density function](https://en.wikipedia.org/wiki/Probability_density_function "Probability density function"): ![{\\displaystyle f(y;\\alpha ,\\beta ,a,c)={\\frac {f(x;\\alpha ,\\beta )}{c-a}}={\\frac {\\left({\\frac {y-a}{c-a}}\\right)^{\\alpha -1}\\left({\\frac {c-y}{c-a}}\\right)^{\\beta -1}}{(c-a)B(\\alpha ,\\beta )}}={\\frac {(y-a)^{\\alpha -1}(c-y)^{\\beta -1}}{(c-a)^{\\alpha +\\beta -1}B(\\alpha ,\\beta )}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5e3a650c9f6ecc04d36869cc99297e5c853dd2f1) the joint log likelihood function per *N* [iid](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables "Independent and identically distributed random variables") observations is: ![{\\displaystyle {\\frac {1}{N}}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)={\\frac {\\alpha -1}{N}}\\sum \_{i=1}^{N}\\ln(Y\_{i}-a)+{\\frac {\\beta -1}{N}}\\sum \_{i=1}^{N}\\ln(c-Y\_{i})-\\ln \\mathrm {B} (\\alpha ,\\beta )-(\\alpha +\\beta -1)\\ln(c-a)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f915551b8f5e29a172c67ffc3e45f6c1ce349920) For the four parameter case, the Fisher information has 4\*4=16 components. It has 12 off-diagonal components = (4×4 total − 4 diagonal). Since the Fisher information matrix is symmetric, half of these components (12/2=6) are independent. Therefore, the Fisher information matrix has 6 independent off-diagonal + 4 diagonal = 10 independent components. Aryal and Nadarajah[\[54\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Aryal-54) calculated Fisher's information matrix for the four parameter case as follows: ![{\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha ^{2}}}=\\operatorname {var} \[\\ln(X)\]=\\psi \_{1}(\\alpha )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\alpha }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha ^{2}}}\\right\]=\\ln(\\operatorname {var\_{GX}} )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/31be86f2c53663c6d3975bc2676806ba3e538423) ![{\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta ^{2}}}=\\operatorname {var} \[\\ln(1-X)\]=\\psi \_{1}(\\beta )-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\beta ,\\beta }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta ^{2}}}\\right\]=\\ln(\\operatorname {var\_{G(1-X)}} )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/25ab885119f25fae0b9919326db96395d13e3bc3) ![{\\displaystyle -{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial \\beta }}=\\operatorname {cov} \[\\ln X,(1-X)\]=-\\psi \_{1}(\\alpha +\\beta )={\\mathcal {I}}\_{\\alpha ,\\beta }=\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial \\beta }}\\right\]=\\ln(\\operatorname {cov} \_{G{X,(1-X)}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/02a56af746fb9315340cf382951fe0c3f3640678) In the above expressions, the use of *X* instead of *Y* in the expressions var\[ln(*X*)\] = ln(var*GX*) is *not an error*. The expressions in terms of the log geometric variances and log geometric covariance occur as functions of the two parameter *X* ~ Beta(*α*, *β*) parametrization because when taking the partial derivatives with respect to the exponents (*α*, *β*) in the four parameter case, one obtains the identical expressions as for the two parameter case: these terms of the four parameter Fisher information matrix are independent of the minimum *a* and maximum *c* of the distribution's range. The only non-zero term upon double differentiation of the log likelihood function with respect to the exponents *α* and *β* is the second derivative of the log of the beta function: ln(B(*α*, *β*)). This term is independent of the minimum *a* and maximum *c* of the distribution's range. Double differentiation of this term results in trigamma functions. The sections titled "Maximum likelihood", "Two unknown parameters" and "Four unknown parameters" also show this fact. The Fisher information for *N* [i.i.d.](https://en.wikipedia.org/wiki/I.i.d. "I.i.d.") samples is *N* times the individual Fisher information (eq. 11.279, page 394 of Cover and Thomas[\[28\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Cover_and_Thomas-28)). (Aryal and Nadarajah[\[54\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Aryal-54) take a single observation, *N* = 1, to calculate the following components of the Fisher information, which leads to the same result as considering the derivatives of the log likelihood per *N* observations. Moreover, below the erroneous expression for ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) in Aryal and Nadarajah has been corrected.) ![{\\displaystyle {\\begin{aligned}\\alpha \>2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a^{2}}}\\right\]&={\\mathcal {I}}\_{a,a}={\\frac {\\beta (\\alpha +\\beta -1)}{(\\alpha -2)(c-a)^{2}}}\\\\\\beta \>2:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial c^{2}}}\\right\]&={\\mathcal {I}}\_{c,c}={\\frac {\\alpha (\\alpha +\\beta -1)}{(\\beta -2)(c-a)^{2}}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial a\\,\\partial c}}\\right\]&={\\mathcal {I}}\_{a,c}={\\frac {(\\alpha +\\beta -1)}{(c-a)^{2}}}\\\\\\alpha \>1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial a}}\\right\]&={\\mathcal {I}}\_{\\alpha ,a}={\\frac {\\beta }{(\\alpha -1)(c-a)}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\alpha \\,\\partial c}}\\right\]&={\\mathcal {I}}\_{\\alpha ,c}={\\frac {1}{(c-a)}}\\\\\\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\,\\partial a}}\\right\]&={\\mathcal {I}}\_{\\beta ,a}=-{\\frac {1}{(c-a)}}\\\\\\beta \>1:\\quad \\operatorname {E} \\left\[-{\\frac {1}{N}}{\\frac {\\partial ^{2}\\ln {\\mathcal {L}}(\\alpha ,\\beta ,a,c\\mid Y)}{\\partial \\beta \\,\\partial c}}\\right\]&={\\mathcal {I}}\_{\\beta ,c}=-{\\frac {\\alpha }{(\\beta -1)(c-a)}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/636646f51bdb1a3193b1721483878e98f4f19c3e) The lower two diagonal entries of the Fisher information matrix, with respect to the parameter *a* (the minimum of the distribution's range): ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99), and with respect to the parameter *c* (the maximum of the distribution's range): ![{\\displaystyle {\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) are only defined for exponents *α* \> 2 and *β* \> 2 respectively. The Fisher information matrix component ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) for the minimum *a* approaches infinity for exponent α approaching 2 from above, and the Fisher information matrix component ![{\\displaystyle {\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) for the maximum *c* approaches infinity for exponent *β* approaching 2 from above. The Fisher information matrix for the four parameter case does not depend on the individual values of the minimum *a* and the maximum *c*, but only on the total range (*c* − *a*). Moreover, the components of the Fisher information matrix that depend on the range (*c* − *a*), depend only through its inverse (or the square of the inverse), such that the Fisher information decreases for increasing range (*c* − *a*). The accompanying images show the Fisher information components ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) and ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2e7931360bd161a55cc85431049248a347269b12). Images for the Fisher information components ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,\\alpha }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b9abc676c8d87651a340ff84cb55e3674bab2683) and ![{\\displaystyle {\\mathcal {I}}\_{\\beta ,\\beta }}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2659a9ad09d8af2a486a30d8aa82cc39d743a524) are shown in [§ Geometric variance](https://en.wikipedia.org/wiki/Beta_distribution#Geometric_variance). All these Fisher information components look like a basin, with the "walls" of the basin being located at low values of the parameters. The following four-parameter-beta-distribution Fisher information components can be expressed in terms of the two-parameter: *X* ~ Beta(α, β) expectations of the transformed ratio ((1 − *X*)/*X*) and of its mirror image (*X*/(1 − *X*)), scaled by the range (*c* − *a*), which may be helpful for interpretation: ![{\\displaystyle {\\mathcal {I}}\_{\\alpha ,a}={\\frac {\\operatorname {E} \\left\[{\\frac {1-X}{X}}\\right\]}{c-a}}={\\frac {\\beta }{(\\alpha -1)(c-a)}}{\\text{ if }}\\alpha \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e670565bb8d06bace69cf892864520f5c83b5449) ![{\\displaystyle {\\mathcal {I}}\_{\\beta ,c}=-{\\frac {\\operatorname {E} \\left\[{\\frac {X}{1-X}}\\right\]}{c-a}}=-{\\frac {\\alpha }{(\\beta -1)(c-a)}}{\\text{ if }}\\beta \>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/94f9b7788a4f19e1cbc765ab8fc85a7ad55dec4f) These are also the expected values of the "inverted beta distribution" or [beta prime distribution](https://en.wikipedia.org/wiki/Beta_prime_distribution "Beta prime distribution") (also known as beta distribution of the second kind or [Pearson's Type VI](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution")) [\[1\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JKB-1) and its mirror image, scaled by the range (*c* − *a*). Also, the following Fisher information components can be expressed in terms of the harmonic (1/X) variances or of variances based on the ratio transformed variables ((1-X)/X) as follows: ![{\\displaystyle {\\begin{aligned}\\alpha \>2:\\quad {\\mathcal {I}}\_{a,a}&=\\operatorname {var} \\left\[{\\frac {1}{X}}\\right\]\\left({\\frac {\\alpha -1}{c-a}}\\right)^{2}=\\operatorname {var} \\left\[{\\frac {1-X}{X}}\\right\]\\left({\\frac {\\alpha -1}{c-a}}\\right)^{2}={\\frac {\\beta (\\alpha +\\beta -1)}{(\\alpha -2)(c-a)^{2}}}\\\\\\beta \>2:\\quad {\\mathcal {I}}\_{c,c}&=\\operatorname {var} \\left\[{\\frac {1}{1-X}}\\right\]\\left({\\frac {\\beta -1}{c-a}}\\right)^{2}=\\operatorname {var} \\left\[{\\frac {X}{1-X}}\\right\]\\left({\\frac {\\beta -1}{c-a}}\\right)^{2}={\\frac {\\alpha (\\alpha +\\beta -1)}{(\\beta -2)(c-a)^{2}}}\\\\{\\mathcal {I}}\_{a,c}&=\\operatorname {cov} \\left\[{\\frac {1}{X}},{\\frac {1}{1-X}}\\right\]{\\frac {(\\alpha -1)(\\beta -1)}{(c-a)^{2}}}=\\operatorname {cov} \\left\[{\\frac {1-X}{X}},{\\frac {X}{1-X}}\\right\]{\\frac {(\\alpha -1)(\\beta -1)}{(c-a)^{2}}}={\\frac {(\\alpha +\\beta -1)}{(c-a)^{2}}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f1f89730020364bb58791ca0eb47d0de25c896c2) See section "Moments of linearly transformed, product and inverted random variables" for these expectations. The determinant of Fisher's information matrix is of interest (for example for the calculation of [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability). From the expressions for the individual components, it follows that the determinant of Fisher's (symmetric) information matrix for the beta distribution with four parameters is: ![{\\displaystyle {\\begin{aligned}\\det({\\mathcal {I}}(\\alpha ,\\beta ,a,c))={}&-{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }+{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\beta }^{2}-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\beta }^{2}\\\\&{}-{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,a}+{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}+2{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,a}\\\\&{}-2{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,a}+{\\mathcal {I}}\_{\\alpha ,c}^{2}{\\mathcal {I}}\_{\\beta ,a}^{2}-{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}^{2}+{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}^{2}{\\mathcal {I}}\_{\\beta ,c}\\\\&{}-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,c}-{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,c}+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\alpha ,\\beta }{\\mathcal {I}}\_{\\beta ,c}\\\\&{}-{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,a}{\\mathcal {I}}\_{\\beta ,c}+{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,a}{\\mathcal {I}}\_{\\beta ,c}-{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,a}^{2}{\\mathcal {I}}\_{\\beta ,\\beta }\\\\&{}+2{\\mathcal {I}}\_{a,c}{\\mathcal {I}}\_{\\alpha ,a}{\\mathcal {I}}\_{\\alpha ,c}{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{\\alpha ,c}^{2}{\\mathcal {I}}\_{\\beta ,\\beta }-{\\mathcal {I}}\_{a,c}^{2}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }+{\\mathcal {I}}\_{a,a}{\\mathcal {I}}\_{c,c}{\\mathcal {I}}\_{\\alpha ,\\alpha }{\\mathcal {I}}\_{\\beta ,\\beta }{\\text{ if }}\\alpha ,\\beta \>2\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2736604fb3cf676756af731d77faaf9041e60ae9) Using [Sylvester's criterion](https://en.wikipedia.org/wiki/Sylvester%27s_criterion "Sylvester's criterion") (checking whether the diagonal elements are all positive), and since diagonal components ![{\\displaystyle {\\mathcal {I}}\_{a,a}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f278587aa7ba2520daa70abd786de17e083f9a99) and ![{\\displaystyle {\\mathcal {I}}\_{c,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f0d1fde4e1dff40eca80d1e1dd26147940f58b24) have [singularities](https://en.wikipedia.org/wiki/Mathematical_singularity "Mathematical singularity") at α=2 and β=2 it follows that the Fisher information matrix for the four parameter case is [positive-definite](https://en.wikipedia.org/wiki/Positive-definite_matrix "Positive-definite matrix") for α\>2 and β\>2. Since for α \> 2 and β \> 2 the beta distribution is (symmetric or unsymmetric) bell shaped, it follows that the Fisher information matrix is positive-definite only for bell-shaped (symmetric or unsymmetric) beta distributions, with inflection points located to either side of the mode. Thus, important well known distributions belonging to the four-parameter beta distribution family, like the parabolic distribution (Beta(2,2,a,c)) and the [uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution "Continuous uniform distribution") (Beta(1,1,a,c)) have Fisher information components (![{\\displaystyle {\\mathcal {I}}\_{a,a},{\\mathcal {I}}\_{c,c},{\\mathcal {I}}\_{\\alpha ,a},{\\mathcal {I}}\_{\\beta ,c}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0efbfead578f297a9f2aa81caedf6c8066d5a0a0)) that blow up (approach infinity) in the four-parameter case (although their Fisher information components are all defined for the two parameter case). The four-parameter [Wigner semicircle distribution](https://en.wikipedia.org/wiki/Wigner_semicircle_distribution "Wigner semicircle distribution") (Beta(3/2,3/2,*a*,*c*)) and [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") (Beta(1/2,1/2,*a*,*c*)) have negative Fisher information determinants for the four-parameter case. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png/250px-Beta%281%2C1%29_Uniform_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta\(1,1\)_Uniform_distribution_-_J._Rodal.png) ![{\\displaystyle Beta(1,1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1dc385b5aa92366df42c6b175833dd66518bcc5e): The [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") probability density was proposed by [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes "Thomas Bayes") to represent ignorance of prior probabilities in [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference"). The use of Beta distributions in [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference "Bayesian inference") is due to the fact that they provide a family of [conjugate prior probability distributions](https://en.wikipedia.org/wiki/Conjugate_prior_distribution "Conjugate prior distribution") for [binomial](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") (including [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution")) and [geometric distributions](https://en.wikipedia.org/wiki/Geometric_distribution "Geometric distribution"). The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value *p*:[\[24\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-MacKay-24) ![{\\displaystyle P(p;\\alpha ,\\beta )={\\frac {p^{\\alpha -1}(1-p)^{\\beta -1}}{\\mathrm {B} (\\alpha ,\\beta )}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad22d0a4845670ac730383ed26b7028a9eec1314) Examples of beta distributions used as prior probabilities to represent ignorance of prior parameter values in Bayesian inference are Beta(1,1), Beta(0,0) and Beta(1/2,1/2). A classic application of the beta distribution is the [rule of succession](https://en.wikipedia.org/wiki/Rule_of_succession "Rule of succession"), introduced in the 18th century by [Pierre-Simon Laplace](https://en.wikipedia.org/wiki/Pierre-Simon_Laplace "Pierre-Simon Laplace")[\[55\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Laplace-55) in the course of treating the [sunrise problem](https://en.wikipedia.org/wiki/Sunrise_problem "Sunrise problem"). It states that, given *s* successes in *n* [conditionally independent](https://en.wikipedia.org/wiki/Conditional_independence "Conditional independence") [Bernoulli trials](https://en.wikipedia.org/wiki/Bernoulli_trial "Bernoulli trial") with probability *p,* that the estimate of the expected value in the next trial is ![{\\displaystyle {\\frac {s+1}{n+2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5a520f64a100600a0356d2562721ad1f6c907f5b). This estimate is the expected value of the posterior distribution over *p,* namely Beta(*s*\+1, *n*−*s*\+1), which is given by [Bayes' rule](https://en.wikipedia.org/wiki/Bayes%27_rule "Bayes' rule") if one assumes a uniform prior probability over *p* (i.e., Beta(1, 1)) and then observes that *p* generated *s* successes in *n* trials. Laplace's rule of succession has been criticized by prominent scientists. R. T. Cox described Laplace's application of the rule of succession to the [sunrise problem](https://en.wikipedia.org/wiki/Sunrise_problem "Sunrise problem") ([\[56\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-CoxRT-56) p. 89) as "a travesty of the proper use of the principle". Keynes remarks ([\[57\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 382) "indeed this is so foolish a theorem that to entertain it is discreditable". Karl Pearson[\[58\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-PearsonRuleSuccession-58) showed that the probability that the next (*n* + 1) trials will be successes, after n successes in n trials, is only 50%, which has been considered too low by scientists like Jeffreys and unacceptable as a representation of the scientific process of experimentation to test a proposed scientific law. As pointed out by Jeffreys ([\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59) p. 128) (crediting [C. D. Broad](https://en.wikipedia.org/wiki/C._D._Broad "C. D. Broad")[\[60\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BroadMind-60) ) Laplace's rule of succession establishes a high probability of success ((n+1)/(n+2)) in the next trial, but only a moderate probability (50%) that a further sample (*n*\+1) comparable in size will be equally successful. As pointed out by Perks,[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) "The rule of succession itself is hard to accept. It assigns a probability to the next trial which implies the assumption that the actual run observed is an average run and that we are always at the end of an average run. It would, one would think, be more reasonable to assume that we were in the middle of an average run. Clearly a higher value for both probabilities is necessary if they are to accord with reasonable belief." These problems with Laplace's rule of succession motivated Haldane, Perks, Jeffreys and others to search for other forms of prior probability (see the next [§ Bayesian inference](https://en.wikipedia.org/wiki/Beta_distribution#Bayesian_inference)). According to Jaynes,[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) the main problem with the rule of succession is that it is not valid when s=0 or s=n (see [rule of succession](https://en.wikipedia.org/wiki/Rule_of_succession "Rule of succession"), for an analysis of its validity). #### Bayes–Laplace prior probability (Beta(1,1)) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=62 "Edit section: Bayes–Laplace prior probability (Beta(1,1))")\] The beta distribution achieves maximum differential entropy for Beta(1,1): the [uniform](https://en.wikipedia.org/wiki/Uniform_density "Uniform density") probability density, for which all values in the domain of the distribution have equal density. This uniform distribution Beta(1,1) was suggested ("with a great deal of doubt") by [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes "Thomas Bayes")[\[62\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-ThomasBayes-62) as the prior probability distribution to express ignorance about the correct prior distribution. This prior distribution was adopted (apparently, from his writings, with little sign of doubt[\[55\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Laplace-55)) by [Pierre-Simon Laplace](https://en.wikipedia.org/wiki/Pierre-Simon_Laplace "Pierre-Simon Laplace"), and hence it was also known as the "Bayes–Laplace rule" or the "Laplace rule" of "[inverse probability](https://en.wikipedia.org/wiki/Inverse_probability "Inverse probability")" in publications of the first half of the 20th century. In the later part of the 19th century and early part of the 20th century, scientists realized that the assumption of uniform "equal" probability density depended on the actual functions (for example whether a linear or a logarithmic scale was most appropriate) and parametrizations used. In particular, the behavior near the ends of distributions with finite support (for example near *x* = 0, for a distribution with initial support at *x* = 0) required particular attention. Keynes ([\[57\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-KeynesTreatise-57) Ch.XXX, p. 381) criticized the use of Bayes's uniform prior probability (Beta(1,1)) that all values between zero and one are equiprobable, as follows: "Thus experience, if it shows anything, shows that there is a very marked clustering of statistical ratios in the neighborhoods of zero and unity, of those for positive theories and for correlations between positive qualities in the neighborhood of zero, and of those for negative theories and for correlations between negative qualities in the neighborhood of unity. " #### Haldane's prior probability (Beta(0,0)) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=63 "Edit section: Haldane's prior probability (Beta(0,0))")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png/250px-Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_alpha_and_beta_approaching_zero_-_J._Rodal.png) ![{\\displaystyle Beta(0,0)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7b6c60512da3a9f0ea488c28d019448ba513cb1f): The Haldane prior probability expressing total ignorance about prior information, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure. As α, β → 0, the beta distribution approaches a two-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with all probability density concentrated at each end, at 0 and 1, and nothing in between. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Beta(0,0) distribution was proposed by [J.B.S. Haldane](https://en.wikipedia.org/wiki/J.B.S._Haldane "J.B.S. Haldane"),[\[63\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-63) who suggested that the prior probability representing complete uncertainty should be proportional to *p*−1(1−*p*)−1. The function *p*−1(1−*p*)−1 can be viewed as the limit of the numerator of the beta distribution as both shape parameters approach zero: α, β → 0. The Beta function (in the denominator of the beta distribution) approaches infinity, for both parameters approaching zero, α, β → 0. Therefore, *p*−1(1−*p*)−1 divided by the Beta function approaches a 2-point [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") with equal probability 1/2 at each end, at 0 and 1, and nothing in between, as α, β → 0. A coin-toss: one face of the coin being at 0 and the other face being at 1. The Haldane prior probability distribution Beta(0,0) is an "[improper prior](https://en.wikipedia.org/wiki/Improper_prior "Improper prior")" because its integration (from 0 to 1) fails to strictly converge to 1 due to the singularities at each end. However, this is not an issue for computing posterior probabilities unless the sample size is very small. Furthermore, Zellner[\[64\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Zellner-64) points out that on the [log-odds](https://en.wikipedia.org/wiki/Log-odds "Log-odds") scale, (the [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformation ![{\\displaystyle \\log(p/(1-p))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1873f1747bd2b5847dd9db649027cb857c53c451)), the Haldane prior is the uniformly flat prior. The fact that a uniform prior probability on the [logit](https://en.wikipedia.org/wiki/Logit "Logit") transformed variable ln(*p*/1 − *p*) (with domain (−∞, ∞)) is equivalent to the Haldane prior on the domain \[0, 1\] was pointed out by [Harold Jeffreys](https://en.wikipedia.org/wiki/Harold_Jeffreys "Harold Jeffreys") in the first edition (1939) of his book Theory of Probability ([\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59) p. 123). Jeffreys writes "Certainly if we take the Bayes–Laplace rule right up to the extremes we are led to results that do not correspond to anybody's way of thinking. The (Haldane) rule d*x*/(*x*(1 − *x*)) goes too far the other way. It would lead to the conclusion that if a sample is of one type with respect to some property there is a probability 1 that the whole population is of that type." The fact that "uniform" depends on the parametrization, led Jeffreys to seek a form of prior that would be invariant under different parametrizations. #### Jeffreys' prior probability (Beta(1/2,1/2) for a Bernoulli or for a binomial distribution) \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=64 "Edit section: Jeffreys' prior probability (Beta(1/2,1/2) for a Bernoulli or for a binomial distribution)")\] [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/30/Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png/250px-Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Jeffreys_prior_probability_for_the_beta_distribution_-_J._Rodal.png) [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") probability for the beta distribution: the square root of the determinant of [Fisher's information](https://en.wikipedia.org/wiki/Fisher%27s_information "Fisher's information") matrix: ![{\\displaystyle \\scriptstyle {\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}={\\sqrt {\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d2e6efbd72082ebee1c3de355648f844c92ffce8) is a function of the [trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function") ψ1 of shape parameters α, β [![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png/250px-Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_3_different_prior_probability_functions_-_J._Rodal.png) Posterior Beta densities with samples having success = "s", failure = "f" of *s*/(*s* + *f*) = 1/2, and *s* + *f* = {3,10,50}, based on 3 different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1/2). Significant differences appear for very small sample sizes (the flatter distribution for sample size of 3) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_-_J._Rodal.png) Posterior Beta densities with samples having success = "s", failure = "f" of *s*/(*s* + *f*) = 1/4, and *s* + *f* ∈ {3,10,50}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 50 (with more pronounced peak near *p* = 1/4). Significant differences appear for very small sample sizes (the very skewed distribution for the degenerate case of sample size = 3, in this degenerate and unlikely case the Haldane prior results in a reverse "J" shape with mode at *p* = 0 instead of *p* = 1/4. If there is sufficient [sampling data](https://en.wikipedia.org/wiki/Sample_\(statistics\) "Sample (statistics)"), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1/2,1/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https://en.wikipedia.org/wiki/Posterior_probability "Posterior probability") densities. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png/250px-Beta_distribution_for_3_different_prior_probability_functions%2C_skewed_case_sample_size_%3D_%284%2C12%2C40%29_-_J._Rodal.png)](https://en.wikipedia.org/wiki/File:Beta_distribution_for_3_different_prior_probability_functions,_skewed_case_sample_size_%3D_\(4,12,40\)_-_J._Rodal.png) Posterior Beta densities with samples having success = *s*, failure = *f* of *s*/(*s* + *f*) = 1/4, and *s* + *f* ∈ {4,12,40}, based on three different prior probability functions: Haldane (Beta(0,0), Jeffreys (Beta(1/2,1/2)) and Bayes (Beta(1,1)). The image shows that there is little difference between the priors for the posterior with sample size of 40 (with more pronounced peak near *p* = 1/4). Significant differences appear for very small sample sizes [Harold Jeffreys](https://en.wikipedia.org/wiki/Harold_Jeffreys "Harold Jeffreys")[\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59)[\[65\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-JeffreysPRIOR-65) proposed to use an [uninformative prior](https://en.wikipedia.org/wiki/Uninformative_prior "Uninformative prior") probability measure that should be [invariant under reparameterization](https://en.wikipedia.org/wiki/Parametrization_invariance "Parametrization invariance"): proportional to the square root of the [determinant](https://en.wikipedia.org/wiki/Determinant "Determinant") of [Fisher's information](https://en.wikipedia.org/wiki/Fisher%27s_information "Fisher's information") matrix. For the [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), this can be shown as follows: for a coin that is "heads" with probability *p* ∈ \[0, 1\] and is "tails" with probability 1 − *p*, for a given (H,T) ∈ {(0,1), (1,0)} the probability is *pH*(1 − *p*)*T*. Since *T* = 1 − *H*, the [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution") is *pH*(1 − *p*)1 − *H*. Considering *p* as the only parameter, it follows that the log likelihood for the Bernoulli distribution is ![{\\displaystyle \\ln {\\mathcal {L}}(p\\mid H)=H\\ln p+(1-H)\\ln(1-p).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/98141843c7029d988f28ad6f672780ffb339e3e9) The Fisher information matrix has only one component (it is a scalar, because there is only one parameter: *p*), therefore: ![{\\displaystyle {\\begin{aligned}{\\sqrt {{\\mathcal {I}}(p)}}&={\\sqrt {\\operatorname {E} \\!\\left\[\\left({\\frac {d}{dp}}\\ln {\\mathcal {L}}(p\\mid H)\\right)^{2}\\right\]}}\\\\\[6pt\]&={\\sqrt {\\operatorname {E} \\!\\left\[\\left({\\frac {H}{p}}-{\\frac {1-H}{1-p}}\\right)^{2}\\right\]}}\\\\\[6pt\]&={\\sqrt {p^{1}(1-p)^{0}\\left({\\frac {1}{p}}-{\\frac {0}{1-p}}\\right)^{2}+p^{0}(1-p)^{1}\\left({\\frac {0}{p}}-{\\frac {1}{1-p}}\\right)^{2}}}\\\\&={\\frac {1}{\\sqrt {p(1-p)}}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0c2541cc4a3017abaab79170bd990ca92a64bc89) Similarly, for the [Binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution") with *n* [Bernoulli trials](https://en.wikipedia.org/wiki/Bernoulli_trials "Bernoulli trials"), it can be shown that ![{\\displaystyle {\\sqrt {{\\mathcal {I}}(p)}}={\\sqrt {\\frac {n}{p(1-p)}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/636cdf433bd3d97188042cbcea0ad5a750f1479b) Thus, for the [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution "Bernoulli distribution"), and [Binomial distributions](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"), [Jeffreys prior](https://en.wikipedia.org/wiki/Jeffreys_prior "Jeffreys prior") is proportional to ![{\\displaystyle \\scriptstyle {\\frac {1}{\\sqrt {p(1-p)}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a42c718aec7a9fce451ad65deb17197b0d516c56), which happens to be proportional to a beta distribution with domain variable *x* = *p*, and shape parameters α = β = 1/2, the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution"): ![{\\displaystyle \\operatorname {Beta} ({\\tfrac {1}{2}},{\\tfrac {1}{2}})={\\frac {1}{\\pi {\\sqrt {p(1-p)}}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d350c13d6df17923903b2a413a7fdc29dee48898) It will be shown in the next section that the normalizing constant for Jeffreys prior is immaterial to the final result because the normalizing constant cancels out in Bayes' theorem for the posterior probability. Hence Beta(1/2,1/2) is used as the Jeffreys prior for both Bernoulli and binomial distributions. As shown in the next section, when using this expression as a prior probability times the likelihood in [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem "Bayes' theorem"), the posterior probability turns out to be a beta distribution. It is important to realize, however, that Jeffreys prior is proportional to ![{\\textstyle {\\frac {1}{\\sqrt {p(1-p)}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6a70f13cbeb0fa71cd4a677a4028f252f343dd49) for the Bernoulli and binomial distribution, but not for the beta distribution. Jeffreys prior for the beta distribution is given by the determinant of Fisher's information for the beta distribution, which, as shown in the [§ Fisher information matrix](https://en.wikipedia.org/wiki/Beta_distribution#Fisher_information_matrix) is a function of the [trigamma function](https://en.wikipedia.org/wiki/Trigamma_function "Trigamma function") ψ1 of shape parameters α and β as follows: ![{\\displaystyle {\\begin{aligned}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&={\\sqrt {\\psi \_{1}(\\alpha )\\psi \_{1}(\\beta )-(\\psi \_{1}(\\alpha )+\\psi \_{1}(\\beta ))\\psi \_{1}(\\alpha +\\beta )}}\\\\\\lim \_{\\alpha \\to 0}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&=\\lim \_{\\beta \\to 0}{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}=\\infty \\\\\\lim \_{\\alpha \\to \\infty }{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}&=\\lim \_{\\beta \\to \\infty }{\\sqrt {\\det({\\mathcal {I}}(\\alpha ,\\beta ))}}=0\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/07c5b390f9d9ecda1e940272b7746e29edcb4bc3) As previously discussed, Jeffreys prior for the Bernoulli and binomial distributions is proportional to the [arcsine distribution](https://en.wikipedia.org/wiki/Arcsine_distribution "Arcsine distribution") Beta(1/2,1/2), a one-dimensional *curve* that looks like a basin as a function of the parameter *p* of the Bernoulli and binomial distributions. The walls of the basin are formed by *p* approaching the singularities at the ends *p* → 0 and *p* → 1, where Beta(1/2,1/2) approaches infinity. Jeffreys prior for the beta distribution is a *2-dimensional surface* (embedded in a three-dimensional space) that looks like a basin with only two of its walls meeting at the corner α = β = 0 (and missing the other two walls) as a function of the shape parameters α and β of the beta distribution. The two adjoining walls of this 2-dimensional surface are formed by the shape parameters α and β approaching the singularities (of the trigamma function) at α, β → 0. It has no walls for α, β → ∞ because in this case the determinant of Fisher's information matrix for the beta distribution approaches zero. It will be shown in the next section that Jeffreys prior probability results in posterior probabilities (when multiplied by the binomial likelihood function) that are intermediate between the posterior probability results of the Haldane and Bayes prior probabilities. Jeffreys prior may be difficult to obtain analytically, and for some cases it just doesn't exist (even for simple distribution functions like the asymmetric [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution")). Berger, Bernardo and Sun, in a 2009 paper[\[66\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BergerBernardoSun-66) defined a reference prior probability distribution that (unlike Jeffreys prior) exists for the asymmetric [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution"). They cannot obtain a closed-form expression for their reference prior, but numerical calculations show it to be nearly perfectly fitted by the (proper) prior ![{\\displaystyle \\operatorname {Beta} ({\\tfrac {1}{2}},{\\tfrac {1}{2}})\\sim {\\frac {1}{\\sqrt {\\theta (1-\\theta )}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/48f294007f1cccbad01219b7034dd9156ad4c18e) where θ is the vertex variable for the asymmetric triangular distribution with support \[0, 1\] (corresponding to the following parameter values in Wikipedia's article on the [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution"): vertex *c* = *θ*, left end *a* = 0, and right end *b* = 1). Berger et al. also give a heuristic argument that Beta(1/2,1/2) could indeed be the exact Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution. Therefore, Beta(1/2,1/2) not only is Jeffreys prior for the Bernoulli and binomial distributions, but also seems to be the Berger–Bernardo–Sun reference prior for the asymmetric triangular distribution (for which the Jeffreys prior does not exist), a distribution used in project management and [PERT](https://en.wikipedia.org/wiki/PERT "PERT") analysis to describe the cost and duration of project tasks. Clarke and Barron[\[67\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-67) prove that, among continuous positive priors, Jeffreys prior (when it exists) asymptotically maximizes Shannon's [mutual information](https://en.wikipedia.org/wiki/Mutual_information "Mutual information") between a sample of size n and the parameter, and therefore *Jeffreys prior is the most uninformative prior* (measuring information as Shannon information). The proof rests on an examination of the [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence "Kullback–Leibler divergence") between probability density functions for [iid](https://en.wikipedia.org/wiki/Iid "Iid") random variables. #### Effect of different prior probability choices on the posterior beta distribution \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=65 "Edit section: Effect of different prior probability choices on the posterior beta distribution")\] If samples are drawn from the population of a random variable *X* that result in *s* successes and *f* failures in *n* [Bernoulli trials](https://en.wikipedia.org/wiki/Bernoulli_trial "Bernoulli trial") *n* = *s* + *f*, then the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function "Likelihood function") for parameters *s* and *f* given *x* = *p* (the notation *x* = *p* in the expressions below will emphasize that the domain *x* stands for the value of the parameter *p* in the binomial distribution), is the following [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution "Binomial distribution"): ![{\\displaystyle {\\mathcal {L}}(s,f\\mid x=p)={s+f \\choose s}x^{s}(1-x)^{f}={n \\choose s}x^{s}(1-x)^{n-s}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/666e972b7052ad62f6852366654b0aa56a1a4933) If beliefs about [prior probability](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") information are reasonably well approximated by a beta distribution with parameters *α* Prior and *β* Prior, then: ![{\\displaystyle {\\operatorname {PriorProbability} }(x=p;\\alpha \\operatorname {Prior} ,\\beta \\operatorname {Prior} )={\\frac {x^{\\alpha \\operatorname {Prior} -1}(1-x)^{\\beta \\operatorname {Prior} -1}}{\\mathrm {B} (\\alpha \\operatorname {Prior} ,\\beta \\operatorname {Prior} )}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ea1f1fc91d398fc32f6c45a16e039b093b3b4cdd) According to [Bayes' theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem "Bayes' theorem") for a continuous event space, the [posterior probability](https://en.wikipedia.org/wiki/Posterior_probability "Posterior probability") density is given by the product of the [prior probability](https://en.wikipedia.org/wiki/Prior_probability "Prior probability") and the likelihood function (given the evidence *s* and *f* = *n* − *s*), normalized so that the area under the curve equals one, as follows: ![{\\displaystyle {\\begin{aligned}&{\\text{posterior probability density}}(x=p\\mid s,n-s)\\\\\[6pt\]={}&{\\frac {\\operatorname {priorprobabilitydensity} (x=p;\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} ){\\mathcal {L}}(s,f\\mid x=p)}{\\int \_{0}^{1}{\\text{prior probability density}}(x=p;\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} ){\\mathcal {L}}(s,f\\mid x=p)\\,dx}}\\\\\[6pt\]={}&{\\frac {{n \\choose s}x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}/\\mathrm {B} (\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} )}{\\int \_{0}^{1}\\left({n \\choose s}x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}/\\mathrm {B} (\\alpha \\operatorname {prior} ,\\beta \\operatorname {prior} )\\right)\\,dx}}\\\\\[6pt\]={}&{\\frac {x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}}{\\int \_{0}^{1}\\left(x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}\\right)\\,dx}}\\\\\[6pt\]={}&{\\frac {x^{s+\\alpha \\operatorname {prior} -1}(1-x)^{n-s+\\beta \\operatorname {prior} -1}}{\\mathrm {B} (s+\\alpha \\operatorname {prior} ,n-s+\\beta \\operatorname {prior} )}}.\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/569ad0317cf545ff98538c8a845f216120e87c08) The [binomial coefficient](https://en.wikipedia.org/wiki/Binomial_coefficient "Binomial coefficient") ![{\\displaystyle {s+f \\choose s}={n \\choose s}={\\frac {(s+f)!}{s!f!}}={\\frac {n!}{s!(n-s)!}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bb56b3dc1365cfbe97c5e0f873d573c01086980) appears both in the numerator and the denominator of the posterior probability, and it does not depend on the integration variable *x*, hence it cancels out, and it is irrelevant to the final result. Similarly the normalizing factor for the prior probability, the beta function B(αPrior,βPrior) cancels out and it is immaterial to the final result. The same posterior probability result can be obtained if one uses an un-normalized prior ![{\\displaystyle x^{\\alpha \\operatorname {prior} -1}(1-x)^{\\beta \\operatorname {prior} -1}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/584f3f9c85036812a61ff48df8d78592be8af911) because the normalizing factors all cancel out. Several authors (including Jeffreys himself) thus use an un-normalized prior formula since the normalization constant cancels out. The numerator of the posterior probability ends up being just the (un-normalized) product of the prior probability and the likelihood function, and the denominator is its integral from zero to one. The beta function in the denominator, B(*s* + *α* Prior, *n* − *s* + *β* Prior), appears as a normalization constant to ensure that the total posterior probability integrates to unity. The ratio *s*/*n* of the number of successes to the total number of trials is a [sufficient statistic](https://en.wikipedia.org/wiki/Sufficient_statistic "Sufficient statistic") in the binomial case, which is relevant for the following results. For the **Bayes'** prior probability (Beta(1,1)), the posterior probability is: ![{\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={\\frac {x^{s}(1-x)^{n-s}}{\\mathrm {B} (s+1,n-s+1)}},{\\text{ with mean }}={\\frac {s+1}{n+2}},{\\text{ (and mode}}={\\frac {s}{n}}{\\text{ if }}0\<s\<n).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/673b752f999f342d723a5283c10bf9e701cd6350) For the **Jeffreys'** prior probability (Beta(1/2,1/2)), the posterior probability is: ![{\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={x^{s-{\\tfrac {1}{2}}}(1-x)^{n-s-{\\frac {1}{2}}} \\over \\mathrm {B} (s+{\\tfrac {1}{2}},n-s+{\\tfrac {1}{2}})},{\\text{ with mean}}={\\frac {s+{\\tfrac {1}{2}}}{n+1}},{\\text{ (and mode}}={\\frac {s-{\\tfrac {1}{2}}}{n-1}}{\\text{ if }}{\\tfrac {1}{2}}\<s\<n-{\\tfrac {1}{2}}).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cc61802878e4c0ab0fbe64a6b5d9cb1c4fce80a5) and for the **Haldane** prior probability (Beta(0,0)), the posterior probability is: ![{\\displaystyle \\operatorname {posteriorprobability} (p=x\\mid s,f)={\\frac {x^{s-1}(1-x)^{n-s-1}}{\\mathrm {B} (s,n-s)}},{\\text{ with mean}}={\\frac {s}{n}},{\\text{ (and mode}}={\\frac {s-1}{n-2}}{\\text{ if }}1\<s\<n-1).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/79e9ef205b4458b9b10d3b81c34cf848e9598de4) From the above expressions it follows that for *s*/*n* = 1/2) all the above three prior probabilities result in the identical location for the posterior probability mean = mode = 1/2. For *s*/*n* \< 1/2, the mean of the posterior probabilities, using the following priors, are such that: mean for Bayes prior \> mean for Jeffreys prior \> mean for Haldane prior. For *s*/*n* \> 1/2 the order of these inequalities is reversed such that the Haldane prior probability results in the largest posterior mean. The *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the "next" trial) identical to the ratio *s*/*n* of the number of successes to the total number of trials. Therefore, the Haldane prior results in a posterior probability with expected value in the next trial equal to the maximum likelihood. The *Bayes* prior probability Beta(1,1) results in a posterior probability density with *mode* identical to the ratio *s*/*n* (the maximum likelihood). In the case that 100% of the trials have been successful *s* = *n*, the *Bayes* prior probability Beta(1,1) results in a posterior expected value equal to the rule of succession (*n* + 1)/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of 1 (absolute certainty of success in the next trial). Jeffreys prior probability results in a posterior expected value equal to (*n* + 1/2)/(*n* + 1). Perks[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) (p. 303) points out: "This provides a new rule of succession and expresses a 'reasonable' position to take up, namely, that after an unbroken run of n successes we assume a probability for the next trial equivalent to the assumption that we are about half-way through an average run, i.e. that we expect a failure once in (2*n* + 2) trials. The Bayes–Laplace rule implies that we are about at the end of an average run or that we expect a failure once in (*n* + 2) trials. The comparison clearly favours the new result (what is now called Jeffreys prior) from the point of view of 'reasonableness'." Conversely, in the case that 100% of the trials have resulted in failure (*s* = 0), the *Bayes* prior probability Beta(1,1) results in a posterior expected value for success in the next trial equal to 1/(*n* + 2), while the Haldane prior Beta(0,0) results in a posterior expected value of success in the next trial of 0 (absolute certainty of failure in the next trial). Jeffreys prior probability results in a posterior expected value for success in the next trial equal to (1/2)/(*n* + 1), which Perks[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) (p. 303) points out: "is a much more reasonably remote result than the Bayes–Laplace result 1/(*n* + 2)". Jaynes[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) questions (for the Haldane prior Beta(0,0)) the use of these formulas for the cases *s* = 0 or *s* = *n* because the integrals do not converge (Beta(0,0) is an improper prior for *s* = 0 or *s* = *n*). In practice, the conditions 0\<s\<n necessary for a mode to exist between both ends for the Bayes prior are usually met, and therefore the Bayes prior (as long as 0 \< *s* \< *n*) results in a posterior mode located between both ends of the domain. As remarked in the section on the rule of succession, K. Pearson showed that after *n* successes in *n* trials the posterior probability (based on the Bayes Beta(1,1) distribution as the prior probability) that the next (*n* + 1) trials will all be successes is exactly 1/2, whatever the value of *n*. Based on the Haldane Beta(0,0) distribution as the prior probability, this posterior probability is 1 (absolute certainty that after n successes in *n* trials the next (*n* + 1) trials will all be successes). Perks[\[61\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Perks-61) (p. 303) shows that, for what is now known as the Jeffreys prior, this probability is ((*n* + 1/2)/(*n* + 1))((*n* + 3/2)/(*n* + 2))...(2*n* + 1/2)/(2*n* + 1), which for *n* = 1, 2, 3 gives 15/24, 315/480, 9009/13440; rapidly approaching a limiting value of ![{\\displaystyle 1/{\\sqrt {2}}=0.70710678\\ldots }](https://wikimedia.org/api/rest_v1/media/math/render/svg/8d8b88897e116bacec037ef4ef4711a4ce0305bd) as n tends to infinity. Perks remarks that what is now known as the Jeffreys prior: "is clearly more 'reasonable' than either the Bayes–Laplace result or the result on the (Haldane) alternative rule rejected by Jeffreys which gives certainty as the probability. It clearly provides a very much better correspondence with the process of induction. Whether it is 'absolutely' reasonable for the purpose, i.e. whether it is yet large enough, without the absurdity of reaching unity, is a matter for others to decide. But it must be realized that the result depends on the assumption of complete indifference and absence of knowledge prior to the sampling experiment." Following are the variances of the posterior distribution obtained with these three prior probability distributions: for the **Bayes'** prior probability (Beta(1,1)), the posterior variance is: ![{\\displaystyle {\\text{variance}}={\\frac {(n-s+1)(s+1)}{(3+n)(2+n)^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in variance}}={\\frac {1}{12+4n}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f8b0a181eec5d97c53b6a05b2f88aa63372e0cb2) for the **Jeffreys'** prior probability (Beta(1/2,1/2)), the posterior variance is: ![{\\displaystyle {\\text{variance}}={\\frac {(n-s+{\\frac {1}{2}})(s+{\\frac {1}{2}})}{(2+n)(1+n)^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in var}}={\\frac {1}{8+4n}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8e259bf78be07a741013e0a7b30627f49bde4437) and for the **Haldane** prior probability (Beta(0,0)), the posterior variance is: ![{\\displaystyle {\\text{variance}}={\\frac {(n-s)s}{(1+n)n^{2}}},{\\text{ which for }}s={\\frac {n}{2}}{\\text{ results in variance}}={\\frac {1}{4+4n}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/61213e281fd4a510a97341e147dee5077bb04a0f) So, as remarked by Silvey,[\[50\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Silvey-50) for large *n*, the variance is small and hence the posterior distribution is highly concentrated, whereas the assumed prior distribution was very diffuse. This is in accord with what one would hope for, as vague prior knowledge is transformed (through Bayes' theorem) into a more precise posterior knowledge by an informative experiment. For small *n* the Haldane Beta(0,0) prior results in the largest posterior variance while the Bayes Beta(1,1) prior results in the more concentrated posterior. Jeffreys prior Beta(1/2,1/2) results in a posterior variance in between the other two. As *n* increases, the variance rapidly decreases so that the posterior variance for all three priors converges to approximately the same value (approaching zero variance as *n* → ∞). Recalling the previous result that the *Haldane* prior probability Beta(0,0) results in a posterior probability density with *mean* (the expected value for the probability of success in the "next" trial) identical to the ratio s/n of the number of successes to the total number of trials, it follows from the above expression that also the *Haldane* prior Beta(0,0) results in a posterior with *variance* identical to the variance expressed in terms of the max. likelihood estimate s/n and sample size (in [§ Variance](https://en.wikipedia.org/wiki/Beta_distribution#Variance)): ![{\\displaystyle {\\text{variance}}={\\frac {\\mu (1-\\mu )}{1+\\nu }}={\\frac {(n-s)s}{(1+n)n^{2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ad6c410c1c6a0ed633fa773ddcc91a55eaa68fdc) with the mean *μ* = *s*/*n* and the sample size *ν* = *n*. In Bayesian inference, using a [prior distribution](https://en.wikipedia.org/wiki/Prior_distribution "Prior distribution") Beta(*α*Prior,*β*Prior) prior to a binomial distribution is equivalent to adding (*α*Prior − 1) pseudo-observations of "success" and (*β*Prior − 1) pseudo-observations of "failure" to the actual number of successes and failures observed, then estimating the parameter *p* of the binomial distribution by the proportion of successes over both real- and pseudo-observations. A uniform prior Beta(1,1) does not add (or subtract) any pseudo-observations since for Beta(1,1) it follows that (*α*Prior − 1) = 0 and (*β*Prior − 1) = 0. The Haldane prior Beta(0,0) subtracts one pseudo observation from each and Jeffreys prior Beta(1/2,1/2) subtracts 1/2 pseudo-observation of success and an equal number of failure. This subtraction has the effect of [smoothing](https://en.wikipedia.org/wiki/Smoothing "Smoothing") out the posterior distribution. If the proportion of successes is not 50% (*s*/*n* ≠ 1/2) values of *α*Prior and *β*Prior less than 1 (and therefore negative (*α*Prior − 1) and (*β*Prior − 1)) favor sparsity, i.e. distributions where the parameter *p* is closer to either 0 or 1. In effect, values of *α*Prior and *β*Prior between 0 and 1, when operating together, function as a [concentration parameter](https://en.wikipedia.org/wiki/Concentration_parameter "Concentration parameter"). The accompanying plots show the posterior probability density functions for sample sizes *n* ∈ {3,10,50}, successes *s* ∈ {*n*/2,*n*/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1/2,1/2),Beta(1,1)}. Also shown are the cases for *n* = {4,12,40}, success *s* = {*n*/4} and Beta(*α*Prior,*β*Prior) ∈ {Beta(0,0),Beta(1/2,1/2),Beta(1,1)}. The first plot shows the symmetric cases, for successes *s* ∈ {n/2}, with mean = mode = 1/2 and the second plot shows the skewed cases *s* ∈ {*n*/4}. The images show that there is little difference between the priors for the posterior with sample size of 50 (characterized by a more pronounced peak near *p* = 1/2). Significant differences appear for very small sample sizes (in particular for the flatter distribution for the degenerate case of sample size = 3). Therefore, the skewed cases, with successes *s* = {*n*/4}, show a larger effect from the choice of prior, at small sample size, than the symmetric cases. For symmetric distributions, the Bayes prior Beta(1,1) results in the most "peaky" and highest posterior distributions and the Haldane prior Beta(0,0) results in the flattest and lowest peak distribution. The Jeffreys prior Beta(1/2,1/2) lies in between them. For nearly symmetric, not too skewed distributions the effect of the priors is similar. For very small sample size (in this case for a sample size of 3) and skewed distribution (in this example for *s* ∈ {*n*/4}) the Haldane prior can result in a reverse-J-shaped distribution with a singularity at the left end. However, this happens only in degenerate cases (in this example *n* = 3 and hence *s* = 3/4 \< 1, a degenerate value because s should be greater than unity in order for the posterior of the Haldane prior to have a mode located between the ends, and because *s* = 3/4 is not an integer number, hence it violates the initial assumption of a binomial distribution for the likelihood) and it is not an issue in generic cases of reasonable sample size (such that the condition 1 \< *s* \< *n* − 1, necessary for a mode to exist between both ends, is fulfilled). In Chapter 12 (p. 385) of his book, Jaynes[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) asserts that the *Haldane prior* Beta(0,0) describes a *prior state of knowledge of complete ignorance*, where we are not even sure whether it is physically possible for an experiment to yield either a success or a failure, while the *Bayes (uniform) prior Beta(1,1) applies if* one knows that *both binary outcomes are possible*. Jaynes states: "*interpret the Bayes–Laplace (Beta(1,1)) prior as describing not a state of complete ignorance*, but the state of knowledge in which we have observed one success and one failure...once we have seen at least one success and one failure, then we know that the experiment is a true binary one, in the sense of physical possibility." Jaynes [\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) does not specifically discuss Jeffreys prior Beta(1/2,1/2) (Jaynes discussion of "Jeffreys prior" on pp. 181, 423 and on chapter 12 of Jaynes book[\[52\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jaynes-52) refers instead to the improper, un-normalized, prior "1/*p* *dp*" introduced by Jeffreys in the 1939 edition of his book,[\[59\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Jeffreys-59) seven years before he introduced what is now known as Jeffreys' invariant prior: the square root of the determinant of Fisher's information matrix. *"1/p" is Jeffreys' (1946) invariant prior for the [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution "Exponential distribution"), not for the Bernoulli or binomial distributions*). However, it follows from the above discussion that Jeffreys Beta(1/2,1/2) prior represents a state of knowledge in between the Haldane Beta(0,0) and Bayes Beta (1,1) prior. Similarly, [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") in his 1892 book [The Grammar of Science](https://en.wikipedia.org/wiki/The_Grammar_of_Science "The Grammar of Science")[\[68\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-PearsonGrammar-68)[\[69\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-PearsnGrammar2009-69) (p. 144 of 1900 edition) maintained that the Bayes (Beta(1,1) uniform prior was not a complete ignorance prior, and that it should be used when prior information justified to "distribute our ignorance equally"". K. Pearson wrote: "Yet the only supposition that we appear to have made is this: that, knowing nothing of nature, routine and anomy (from the Greek ανομία, namely: a- "without", and nomos "law") are to be considered as equally likely to occur. Now we were not really justified in making even this assumption, for it involves a knowledge that we do not possess regarding nature. We use our *experience* of the constitution and action of coins in general to assert that heads and tails are equally probable, but we have no right to assert before experience that, as we know nothing of nature, routine and breach are equally probable. In our ignorance we ought to consider before experience that nature may consist of all routines, all anomies (normlessness), or a mixture of the two in any proportion whatever, and that all such are equally probable. Which of these constitutions after experience is the most probable must clearly depend on what that experience has been like." If there is sufficient [sampling data](https://en.wikipedia.org/wiki/Sample_\(statistics\) "Sample (statistics)"), *and the posterior probability mode is not located at one of the extremes of the domain* (*x* = 0 or *x* = 1), the three priors of Bayes (Beta(1,1)), Jeffreys (Beta(1/2,1/2)) and Haldane (Beta(0,0)) should yield similar [*posterior* probability](https://en.wikipedia.org/wiki/Posterior_probability "Posterior probability") densities. Otherwise, as Gelman et al.[\[70\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Gelman-70) (p. 65) point out, "if so few data are available that the choice of noninformative prior distribution makes a difference, one should put relevant information into the prior distribution", or as Berger[\[4\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BergerDecisionTheory-4) (p. 125) points out "when different reasonable priors yield substantially different answers, can it be right to state that there *is* a single answer? Would it not be better to admit that there is scientific uncertainty, with the conclusion depending on prior beliefs?." ## Occurrence and applications \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=66 "Edit section: Occurrence and applications")\] The beta distribution has an important application in the theory of [order statistics](https://en.wikipedia.org/wiki/Order_statistic "Order statistic"). A basic result is that the distribution of the *k*th smallest of a sample of size *n* from a continuous [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") has a beta distribution.[\[40\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David1-40) This result is summarized as ![{\\displaystyle U\_{(k)}\\sim \\operatorname {Beta} (k,n+1-k).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/28bbb16c265c1830b57c4bb5eb375ef984b360f2) From this, and application of the theory related to the [probability integral transform](https://en.wikipedia.org/wiki/Probability_integral_transform "Probability integral transform"), the distribution of any individual order statistic from any [continuous distribution](https://en.wikipedia.org/wiki/Continuous_distribution "Continuous distribution") can be derived.[\[40\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David1-40) In standard logic, propositions are considered to be either true or false. In contradistinction, [subjective logic](https://en.wikipedia.org/wiki/Subjective_logic "Subjective logic") assumes that humans cannot determine with absolute certainty whether a proposition about the real world is absolutely true or false. In [subjective logic](https://en.wikipedia.org/wiki/Subjective_logic "Subjective logic") the [posteriori](https://en.wikipedia.org/wiki/A_posteriori "A posteriori") probability estimates of binary events can be represented by beta distributions.[\[71\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-J01-71) A [wavelet](https://en.wikipedia.org/wiki/Wavelet "Wavelet") is a wave-like [oscillation](https://en.wikipedia.org/wiki/Oscillation "Oscillation") with an [amplitude](https://en.wikipedia.org/wiki/Amplitude "Amplitude") that starts out at zero, increases, and then decreases back to zero. It can typically be visualized as a "brief oscillation" that promptly decays. Wavelets can be used to extract information from many different kinds of data, including – but certainly not limited to – audio signals and images. Thus, wavelets are purposefully crafted to have specific properties that make them useful for [signal processing](https://en.wikipedia.org/wiki/Signal_processing "Signal processing"). Wavelets are localized in both time and [frequency](https://en.wikipedia.org/wiki/Frequency "Frequency") whereas the standard [Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform "Fourier transform") is only localized in frequency. Therefore, standard Fourier Transforms are only applicable to [stationary processes](https://en.wikipedia.org/wiki/Stationary_process "Stationary process"), while [wavelets](https://en.wikipedia.org/wiki/Wavelet "Wavelet") are applicable to non-[stationary processes](https://en.wikipedia.org/wiki/Stationary_process "Stationary process"). Continuous wavelets can be constructed based on the beta distribution. [Beta wavelets](https://en.wikipedia.org/wiki/Beta_wavelet "Beta wavelet")[\[72\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-wavelet_oliveira-72) can be viewed as a soft variety of [Haar wavelets](https://en.wikipedia.org/wiki/Haar_wavelet "Haar wavelet") whose shape is fine-tuned by two shape parameters α and β. ### Population genetics \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=70 "Edit section: Population genetics")\] The [Balding–Nichols model](https://en.wikipedia.org/wiki/Balding%E2%80%93Nichols_model "Balding–Nichols model") is a two-parameter [parametrization](https://en.wikipedia.org/wiki/Statistical_parameter "Statistical parameter") of the beta distribution used in [population genetics](https://en.wikipedia.org/wiki/Population_genetics "Population genetics").[\[73\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Balding-73) It is a statistical description of the [allele frequencies](https://en.wikipedia.org/wiki/Allele_frequencies "Allele frequencies") in the components of a sub-divided population: ![{\\displaystyle {\\begin{aligned}\\alpha &=\\mu \\nu ,\\\\\\beta &=(1-\\mu )\\nu ,\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/49c1e37dd960ad3b68ef0eaf2ab5fafd4b2209c8) where ![{\\displaystyle \\nu =\\alpha +\\beta ={\\frac {1-F}{F}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/59b55447a019bc1bc8f690546bbb416e0dbece32) and ![{\\displaystyle 0\<F\<1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/0f7e5ad065befe4700eeb8b5a154a1c558b44373); here *F* is (Wright's) genetic distance between two populations. ### Project management: task cost and schedule modeling \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=71 "Edit section: Project management: task cost and schedule modeling")\] The beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution "Triangular distribution") — is used extensively in [PERT](https://en.wikipedia.org/wiki/PERT "PERT"), [critical path method](https://en.wikipedia.org/wiki/Critical_path_method "Critical path method") (CPM), Joint Cost Schedule Modeling (JCSM) and other [project management](https://en.wikipedia.org/wiki/Project_management "Project management")/control systems to describe the time to completion and the cost of a task. In project management, shorthand computations are widely used to estimate the [mean](https://en.wikipedia.org/wiki/Mean "Mean") and [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") of the beta distribution:[\[39\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Malcolm-39) ![{\\displaystyle {\\begin{aligned}\\mu (X)&={\\frac {a+4b+c}{6}}\\\\\[8pt\]\\sigma (X)&={\\frac {c-a}{6}}\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/7a89a68d1250ebe659be15e88edb5a9eb3e0cf87) where *a* is the minimum, *c* is the maximum, and *b* is the most likely value (the [mode](https://en.wikipedia.org/wiki/Mode_\(statistics\) "Mode (statistics)") for *α* \> 1 and *β* \> 1). The above estimate for the [mean](https://en.wikipedia.org/wiki/Mean "Mean") ![{\\displaystyle \\mu (X)={\\frac {a+4b+c}{6}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/55659b9a1a4f5b15000858659deca16e38dc01fe) is known as the [PERT](https://en.wikipedia.org/wiki/PERT "PERT") [three-point estimation](https://en.wikipedia.org/wiki/Three-point_estimation "Three-point estimation") and it is exact for either of the following values of *β* (for arbitrary α within these ranges): *β* = *α* \> 1 (symmetric case) with [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") ![{\\displaystyle \\sigma (X)={\\frac {c-a}{2{\\sqrt {1+2\\alpha }}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/023fd07dfd3669cde84d9cff72b6a6af3d8ffbab), [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") = 0, and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = ![{\\displaystyle {\\frac {-6}{3+2\\alpha }}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/85f97b76a4804eeb490db5708337a448815594f5) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg/250px-Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg.png)](https://en.wikipedia.org/wiki/File:Beta_Distribution_beta%3Dalpha_from_1.05_to_4.95.svg) or *β* = 6 − *α* for 5 \> *α* \> 1 (skewed case) with [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") ![{\\displaystyle \\sigma (X)={\\frac {(c-a){\\sqrt {\\alpha (6-\\alpha )}}}{6{\\sqrt {7}}}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3feac4fe845f042c246d8822d848826f95ac9a8d) [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness")![{\\displaystyle {}={\\frac {(3-\\alpha ){\\sqrt {7}}}{2{\\sqrt {\\alpha (6-\\alpha )}}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2dbd8720e8cbd9cb4b15fc3118e52d63c1d50e32), and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis")![{\\displaystyle {}={\\frac {21}{\\alpha (6-\\alpha )}}-3}](https://wikimedia.org/api/rest_v1/media/math/render/svg/246152d7c35d8e6cd42c1f13aca7419321bf3e7c) [![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b9/Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg/250px-Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg.png)](https://en.wikipedia.org/wiki/File:Beta_Distribution_beta%3D6-alpha_from_1.05_to_4.95.svg) The above estimate for the [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation") *σ*(*X*) = (*c* − *a*)/6 is exact for either of the following values of *α* and *β*: *α* = *β* = 4 (symmetric) with [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness") = 0, and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = −6/11. *β* = 6 − *α* and ![{\\displaystyle \\alpha =3-{\\sqrt {2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/92a7b6780ceaf0f8749f685b72f15717f3eb5495) (right-tailed, positive skew) with [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness")![{\\displaystyle {}={\\frac {1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/819e2bc3e97fb50bb05cf64ab50f15ecfc960dd4), and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = 0 *β* = 6 − *α* and ![{\\displaystyle \\alpha =3+{\\sqrt {2}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/208c420c6769274760ca91cb89d5ed807c49688e) (left-tailed, negative skew) with [skewness](https://en.wikipedia.org/wiki/Skewness "Skewness")![{\\displaystyle {}={\\frac {-1}{\\sqrt {2}}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3bda0dcb0efa535b7484dbc63376664e24b20cdb), and [excess kurtosis](https://en.wikipedia.org/wiki/Excess_kurtosis "Excess kurtosis") = 0 [![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/Beta_Distribution_for_conjugate_alpha_beta.svg/250px-Beta_Distribution_for_conjugate_alpha_beta.svg.png)](https://en.wikipedia.org/wiki/File:Beta_Distribution_for_conjugate_alpha_beta.svg) Otherwise, these can be poor approximations for beta distributions with other values of α and β, exhibiting average errors of 40% in the mean and 549% in the variance.[\[74\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-74)[\[75\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-75)[\[76\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-76) ## Random variate generation \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=72 "Edit section: Random variate generation")\] If *X* and *Y* are independent, with ![{\\displaystyle X\\sim \\Gamma (\\alpha ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3cc24e37f0c7d4fb01955a76c7a624840eb4ccb0) and ![{\\displaystyle Y\\sim \\Gamma (\\beta ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/47a84f49e72d95c70ecdb7f3dcfc9dff7df0bd79) then ![{\\displaystyle {\\frac {X}{X+Y}}\\sim \\mathrm {B} (\\alpha ,\\beta ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9bd2b6dea25aeefae71af31b8751b855d8848543) So one algorithm for generating beta variates is to generate ![{\\displaystyle {\\frac {X}{X+Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42bc8b3f6470f35915b2708902dab39673407a1e), where *X* is a [gamma variate](https://en.wikipedia.org/wiki/Gamma_distribution#Random_variate_generation "Gamma distribution") with parameters (α, 1) and *Y* is an independent gamma variate with parameters (β, 1).[\[77\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-77) In fact, here ![{\\displaystyle {\\frac {X}{X+Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42bc8b3f6470f35915b2708902dab39673407a1e) and ![{\\displaystyle X+Y}](https://wikimedia.org/api/rest_v1/media/math/render/svg/191744cf9cddeff3ab2e750e22bcfce7766d355e) are independent, and ![{\\displaystyle X+Y\\sim \\Gamma (\\alpha +\\beta ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9d88f9521cace3408c444bb15dea5a4a3c2072d9). If ![{\\displaystyle Z\\sim \\Gamma (\\gamma ,\\theta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/997d2e51e143bd9423b61aa7bc76d4caa6366cf2) and ![{\\displaystyle Z}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1cc6b75e09a8aa3f04d8584b11db534f88fb56bd) is independent of ![{\\displaystyle X}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68baa052181f707c662844a465bfeeb135e82bab) and ![{\\displaystyle Y}](https://wikimedia.org/api/rest_v1/media/math/render/svg/961d67d6b454b4df2301ac571808a3538b3a6d3f), then ![{\\displaystyle {\\frac {X+Y}{X+Y+Z}}\\sim \\mathrm {B} (\\alpha +\\beta ,\\gamma )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/bd91125d334bb959e8ebf89504c23f7066be3b54) and ![{\\displaystyle {\\frac {X+Y}{X+Y+Z}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6d5f48f66d0085cac1f622af4b9443c532d8c5eb) is independent of ![{\\displaystyle {\\frac {X}{X+Y}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/42bc8b3f6470f35915b2708902dab39673407a1e). This shows that the product of independent ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1fea4d61abd27c28412c65add2f028b57b17fb12) and ![{\\displaystyle \\mathrm {B} (\\alpha +\\beta ,\\gamma )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2f2504a4ad7a50cab34808b42e35ff1b73e863b9) random variables is a ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta +\\gamma )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8f991b0924c7dcba924e3123313bc19f762b4e79) random variable. Also, the *k*th [order statistic](https://en.wikipedia.org/wiki/Order_statistic "Order statistic") of *n* [uniformly distributed](https://en.wikipedia.org/wiki/Uniform_distribution_\(continuous\) "Uniform distribution (continuous)") variates is ![{\\displaystyle \\mathrm {B} (k,n+1-k)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6f6a88b3a90aa4cd75f5a828258720d1f4bcaa79), so an alternative if *α* and *β* are small integers is to generate α + β − 1 uniform variates and choose the α-th smallest.[\[40\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David1-40) Another way to generate the Beta distribution is by [Pólya urn model](https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model "Pólya urn model"). According to this method, one starts with an "urn" with α "black" balls and β "white" balls and draws uniformly with replacement. Every trial an additional ball is added according to the color of the last ball which was drawn. Asymptotically, the proportion of black and white balls will be distributed according to the Beta distribution, where each repetition of the experiment will produce a different value. It is also possible to use the [inverse transform sampling](https://en.wikipedia.org/wiki/Inverse_transform_sampling "Inverse transform sampling"). ## Normal approximation to the Beta distribution \[[edit](https://en.wikipedia.org/w/index.php?title=Beta_distribution&action=edit&section=73 "Edit section: Normal approximation to the Beta distribution")\] A beta distribution ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1fea4d61abd27c28412c65add2f028b57b17fb12) with ![{\\displaystyle \\alpha \\sim \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/6c257de6795c5c1704c89812118c26f38d33c81b) and ![{\\displaystyle \\alpha }](https://wikimedia.org/api/rest_v1/media/math/render/svg/b79333175c8b3f0840bfb4ec41b8072c83ea88d3) and ![{\\displaystyle \\beta \>\>1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f71342cab85bc922316af9012de28319cc26047d) is approximately normal with mean ![{\\displaystyle 1/2}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e308a3a46b7fdce07cc09dcab9e8d8f73e37d935) and variance ![{\\displaystyle 1/(4(2\\alpha +1))}](https://wikimedia.org/api/rest_v1/media/math/render/svg/a5f494b7d9160e6d47406af3660f84f19a500d75). If ![{\\displaystyle \\alpha \\geq \\beta }](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa366317721f774c7361a27f615e297691cf5468) the normal approximation can be improved by taking the cube-root of the logarithm of the reciprocal of ![{\\displaystyle \\mathrm {B} (\\alpha ,\\beta )}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1fea4d61abd27c28412c65add2f028b57b17fb12)[\[78\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-78)[\[79\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-79) [Thomas Bayes](https://en.wikipedia.org/wiki/Thomas_Bayes "Thomas Bayes"), in a posthumous paper [\[62\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-ThomasBayes-62) published in 1763 by [Richard Price](https://en.wikipedia.org/wiki/Richard_Price "Richard Price"), obtained a beta distribution as the density of the probability of success in Bernoulli trials (see [§ Applications, Bayesian inference](https://en.wikipedia.org/wiki/Beta_distribution#Applications,_Bayesian_inference)), but the paper does not analyze any of the moments of the beta distribution or discuss any of its properties. [![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Karl_Pearson_2.jpg/250px-Karl_Pearson_2.jpg)](https://en.wikipedia.org/wiki/File:Karl_Pearson_2.jpg) [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") analyzed the beta distribution as the solution Type I of Pearson distributions The first systematic modern discussion of the beta distribution is probably due to [Karl Pearson](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson").[\[80\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-80)[\[81\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-rscat-81) In Pearson's papers[\[21\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson-21)[\[33\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1895-33) the beta distribution is couched as a solution of a differential equation: [Pearson's Type I distribution](https://en.wikipedia.org/wiki/Pearson_distribution "Pearson distribution") which it is essentially identical to except for arbitrary shifting and re-scaling (the beta and Pearson Type I distributions can always be equalized by proper choice of parameters). In fact, in several English books and journal articles in the few decades prior to World War II, it was common to refer to the beta distribution as Pearson's Type I distribution. [William P. Elderton](https://en.wikipedia.org/wiki/William_Palin_Elderton "William Palin Elderton") in his 1906 monograph "Frequency curves and correlation"[\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) further analyzes the beta distribution as Pearson's Type I distribution, including a full discussion of the method of moments for the four parameter case, and diagrams of (what Elderton describes as) U-shaped, J-shaped, twisted J-shaped, "cocked-hat" shapes, horizontal and angled straight-line cases. Elderton wrote "I am chiefly indebted to Professor Pearson, but the indebtedness is of a kind for which it is impossible to offer formal thanks." [Elderton](https://en.wikipedia.org/wiki/William_Palin_Elderton "William Palin Elderton") in his 1906 monograph [\[42\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Elderton1906-42) provides an impressive amount of information on the beta distribution, including equations for the origin of the distribution chosen to be the mode, as well as for other Pearson distributions: types I through VII. Elderton also included a number of appendixes, including one appendix ("II") on the beta and gamma functions. In later editions, Elderton added equations for the origin of the distribution chosen to be the mean, and analysis of Pearson distributions VIII through XII. As remarked by Bowman and Shenton[\[44\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-BowmanShenton-44) "Fisher and Pearson had a difference of opinion in the approach to (parameter) estimation, in particular relating to (Pearson's method of) moments and (Fisher's method of) maximum likelihood in the case of the Beta distribution." Also according to Bowman and Shenton, "the case of a Type I (beta distribution) model being the center of the controversy was pure serendipity. A more difficult model of 4 parameters would have been hard to find." The long running public conflict of Fisher with Karl Pearson can be followed in a number of articles in prestigious journals. For example, concerning the estimation of the four parameters for the beta distribution, and Fisher's criticism of Pearson's method of moments as being arbitrary, see Pearson's article "Method of moments and method of maximum likelihood" [\[45\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-Pearson1936-45) (published three years after his retirement from University College, London, where his position had been divided between Fisher and Pearson's son Egon) in which Pearson writes "I read (Koshai's paper in the Journal of the Royal Statistical Society, 1933) which as far as I am aware is the only case at present published of the application of Professor Fisher's method. To my astonishment that method depends on first working out the constants of the frequency curve by the (Pearson) Method of Moments and then superposing on it, by what Fisher terms "the Method of Maximum Likelihood" a further approximation to obtain, what he holds, he will thus get, 'more efficient values' of the curve constants". David and Edwards's treatise on the history of statistics[\[82\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-David_History-82) cites the first modern treatment of the beta distribution, in 1911,[\[83\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-83) using the beta designation that has become standard, due to [Corrado Gini](https://en.wikipedia.org/wiki/Corrado_Gini "Corrado Gini"), an Italian [statistician](https://en.wikipedia.org/wiki/Statistician "Statistician"), [demographer](https://en.wikipedia.org/wiki/Demography "Demography"), and [sociologist](https://en.wikipedia.org/wiki/Sociology "Sociology"), who developed the [Gini coefficient](https://en.wikipedia.org/wiki/Gini_coefficient "Gini coefficient"). [N.L.Johnson](https://en.wikipedia.org/wiki/Norman_Lloyd_Johnson "Norman Lloyd Johnson") and [S.Kotz](https://en.wikipedia.org/wiki/Samuel_Kotz "Samuel Kotz"), in their comprehensive and very informative monograph[\[84\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-84) on leading historical personalities in statistical sciences credit [Corrado Gini](https://en.wikipedia.org/wiki/Corrado_Gini "Corrado Gini")[\[85\]](https://en.wikipedia.org/wiki/Beta_distribution#cite_note-85) as "an early Bayesian...who dealt with the problem of eliciting the parameters of an initial Beta distribution, by singling out techniques which anticipated the advent of the so-called empirical Bayes approach." 1. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-5) [***g***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-6) [***h***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-7) [***i***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-8) [***j***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-9) [***k***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-10) [***l***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-11) [***m***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-12) [***n***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-13) [***o***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-14) [***p***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-15) [***q***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-16) [***r***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-17) [***s***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-18) [***t***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-19) [***u***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-20) [***v***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-21) [***w***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-22) [***x***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-23) [***y***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JKB_1-24) Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). "Chapter 25: Beta Distributions". *Continuous Univariate Distributions Vol. 2* (2nd ed.). Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-471-58494-0](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-58494-0 "Special:BookSources/978-0-471-58494-0") . 2. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Mathematical_Statistics_with_MATHEMATICA_2-1) Rose, Colin; Smith, Murray D. (2002). *Mathematical Statistics with MATHEMATICA*. Springer. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0387952345](https://en.wikipedia.org/wiki/Special:BookSources/978-0387952345 "Special:BookSources/978-0387952345") . 3. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2011_3-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2011_3-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2011_3-2) [Kruschke, John K.](https://en.wikipedia.org/wiki/John_K._Kruschke "John K. Kruschke") (2011). *Doing Bayesian data analysis: A tutorial with R and BUGS*. Academic Press / Elsevier. p. 83. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0123814852](https://en.wikipedia.org/wiki/Special:BookSources/978-0123814852 "Special:BookSources/978-0123814852") . 4. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BergerDecisionTheory_4-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BergerDecisionTheory_4-1) Berger, James O. (2010). *Statistical Decision Theory and Bayesian Analysis* (2nd ed.). Springer. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1441930743](https://en.wikipedia.org/wiki/Special:BookSources/978-1441930743 "Special:BookSources/978-1441930743") . 5. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Feller_5-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Feller_5-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Feller_5-2) Feller, William (1971). [*An Introduction to Probability Theory and Its Applications, Vol. 2*](https://archive.org/details/introductiontopr00fell). Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471257097](https://en.wikipedia.org/wiki/Special:BookSources/978-0471257097 "Special:BookSources/978-0471257097") . 6. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-6)** Wadsworth, G. P. (1960). [*Introduction to Probability and Random Variables*](https://archive.org/details/introductiontopr0000wads). New York: McGraw-Hill. p. [52](https://archive.org/details/introductiontopr0000wads/page/52). 7. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kruschke2015_7-0)** [Kruschke, John K.](https://en.wikipedia.org/wiki/John_K._Kruschke "John K. Kruschke") (2015). *Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan*. Academic Press / Elsevier. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-12-405888-0](https://en.wikipedia.org/wiki/Special:BookSources/978-0-12-405888-0 "Special:BookSources/978-0-12-405888-0") . 8. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Wadsworth_8-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Wadsworth_8-1) Wadsworth, George P. and Joseph Bryan (1960). [*Introduction to Probability and Random Variables*](https://archive.org/details/introductiontopr0000wads). McGraw-Hill. 9. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-5) [***g***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Handbook_of_Beta_Distribution_9-6) Gupta, Arjun K., ed. (2004). *Handbook of Beta Distribution and Its Applications*. CRC Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0824753962](https://en.wikipedia.org/wiki/Special:BookSources/978-0824753962 "Special:BookSources/978-0824753962") . 10. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kerman2011_10-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kerman2011_10-1) Kerman, Jouni (2011). "A closed-form approximation for the median of the beta distribution". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1111\.0433](https://arxiv.org/abs/1111.0433) \[[math.ST](https://arxiv.org/archive/math.ST)\]. 11. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-MostellerTukey_11-0)** Mosteller, Frederick and John Tukey (1977). [*Data Analysis and Regression: A Second Course in Statistics*](https://archive.org/details/dataanalysisregr0000most). Addison-Wesley Pub. Co. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1977dars.book.....M](https://ui.adsabs.harvard.edu/abs/1977dars.book.....M). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0201048544](https://en.wikipedia.org/wiki/Special:BookSources/978-0201048544 "Special:BookSources/978-0201048544") . 12. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-WillyFeller1_12-0)** Feller, William (1968). *An Introduction to Probability Theory and Its Applications*. Vol. 1 (3rd ed.). Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471257080](https://en.wikipedia.org/wiki/Special:BookSources/978-0471257080 "Special:BookSources/978-0471257080") . 13. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-13)** Philip J. Fleming and John J. Wallace. *How not to lie with statistics: the correct way to summarize benchmark results*. Communications of the ACM, 29(3):218–221, March 1986. 14. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-14)** ["NIST/SEMATECH e-Handbook of Statistical Methods 1.3.6.6.17. Beta Distribution"](http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm). *[National Institute of Standards and Technology](https://en.wikipedia.org/wiki/National_Institute_of_Standards_and_Technology "National Institute of Standards and Technology") Information Technology Laboratory*. April 2012. Retrieved May 31, 2016. 15. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Oguamanam_15-0)** Oguamanam, D.C.D.; Martin, H. R.; Huissoon, J. P. (1995). "On the application of the beta distribution to gear damage analysis". *Applied Acoustics*. **45** (3): 247–261\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/0003-682X(95)00001-P](https://doi.org/10.1016%2F0003-682X%2895%2900001-P). 16. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Liang_16-0)** Zhiqiang Liang; Jianming Wei; Junyu Zhao; Haitao Liu; Baoqing Li; Jie Shen; Chunlei Zheng (27 August 2008). ["The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3705491). *Sensors*. **8** (8): 5106–5119\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2008Senso...8.5106L](https://ui.adsabs.harvard.edu/abs/2008Senso...8.5106L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/s8085106](https://doi.org/10.3390%2Fs8085106). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [3705491](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3705491). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [27873804](https://pubmed.ncbi.nlm.nih.gov/27873804). 17. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Kenney_and_Keeping_17-0)** Kenney, J. F., and E. S. Keeping (1951). *Mathematics of Statistics Part Two, 2nd edition*. D. Van Nostrand Company Inc. `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 18. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Abramowitz_18-3) Abramowitz, Milton and Irene A. Stegun (1965). [*Handbook Of Mathematical Functions With Formulas, Graphs, And Mathematical Tables*](https://archive.org/details/handbookofmathe000abra). Dover. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-486-61272-0](https://en.wikipedia.org/wiki/Special:BookSources/978-0-486-61272-0 "Special:BookSources/978-0-486-61272-0") . 19. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Weisstein.Kurtosi_19-0)** Weisstein., Eric W. ["Kurtosis"](http://mathworld.wolfram.com/Kurtosis.html). MathWorld--A Wolfram Web Resource. Retrieved 13 August 2012. 20. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Panik_20-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Panik_20-1) Panik, Michael J (2005). *Advanced Statistics from an Elementary Point of View*. Academic Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0120884940](https://en.wikipedia.org/wiki/Special:BookSources/978-0120884940 "Special:BookSources/978-0120884940") . 21. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson_21-5) [Pearson, Karl](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") (1916). ["Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation"](https://doi.org/10.1098%2Frsta.1916.0009). *Philosophical Transactions of the Royal Society A*. **216** (538–548\): 429–457\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1916RSPTA.216..429P](https://ui.adsabs.harvard.edu/abs/1916RSPTA.216..429P). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rsta.1916.0009](https://doi.org/10.1098%2Frsta.1916.0009). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [91092](https://www.jstor.org/stable/91092). 22. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Zwillinger_2014_22-0)** [Gradshteyn, Izrail Solomonovich](https://en.wikipedia.org/wiki/Izrail_Solomonovich_Gradshteyn "Izrail Solomonovich Gradshteyn"); [Ryzhik, Iosif Moiseevich](https://en.wikipedia.org/wiki/Iosif_Moiseevich_Ryzhik "Iosif Moiseevich Ryzhik"); [Geronimus, Yuri Veniaminovich](https://en.wikipedia.org/wiki/Yuri_Veniaminovich_Geronimus "Yuri Veniaminovich Geronimus"); [Tseytlin, Michail Yulyevich](https://en.wikipedia.org/wiki/Michail_Yulyevich_Tseytlin "Michail Yulyevich Tseytlin"); Jeffrey, Alan (2015) \[October 2014\]. Zwillinger, Daniel; [Moll, Victor Hugo](https://en.wikipedia.org/wiki/Victor_Hugo_Moll "Victor Hugo Moll") (eds.). [*Table of Integrals, Series, and Products*](https://en.wikipedia.org/wiki/Gradshteyn_and_Ryzhik "Gradshteyn and Ryzhik"). Translated by Scripta Technica, Inc. (8 ed.). [Academic Press, Inc.](https://en.wikipedia.org/wiki/Academic_Press,_Inc. "Academic Press, Inc.") [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-12-384933-5](https://en.wikipedia.org/wiki/Special:BookSources/978-0-12-384933-5 "Special:BookSources/978-0-12-384933-5") . [LCCN](https://en.wikipedia.org/wiki/LCCN_\(identifier\) "LCCN (identifier)") [2014010276](https://lccn.loc.gov/2014010276). 23. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-23)** Billingsley, Patrick (1995). "Section 30: The Method of Moments". *Probability and measure* (3rd ed.). Wiley-Interscience. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0-471-00710-4](https://en.wikipedia.org/wiki/Special:BookSources/978-0-471-00710-4 "Special:BookSources/978-0-471-00710-4") . 24. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-MacKay_24-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-MacKay_24-1) MacKay, David (2003). *Information Theory, Inference and Learning Algorithms*. Cambridge University Press; First Edition. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2003itil.book.....M](https://ui.adsabs.harvard.edu/abs/2003itil.book.....M). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0521642989](https://en.wikipedia.org/wiki/Special:BookSources/978-0521642989 "Special:BookSources/978-0521642989") . 25. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JohnsonLogInv_25-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JohnsonLogInv_25-1) Johnson, N.L. (1949). ["Systems of frequency curves generated by methods of translation"](http://dml.cz/bitstream/handle/10338.dmlcz/135506/Kybernetika_39-2003-1_3.pdf) (PDF). *Biometrika*. **36** (1–2\): 149–176\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1093/biomet/36.1-2.149](https://doi.org/10.1093%2Fbiomet%2F36.1-2.149). [hdl](https://en.wikipedia.org/wiki/Hdl_\(identifier\) "Hdl (identifier)"):[10338\.dmlcz/135506](https://hdl.handle.net/10338.dmlcz%2F135506). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18132090](https://pubmed.ncbi.nlm.nih.gov/18132090). 26. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-26)** Verdugo Lazo, A. C. G.; Rathie, P. N. (1978). "On the entropy of continuous probability distributions". *IEEE Trans. Inf. Theory*. **24** (1): 120–122\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TIT.1978.1055832](https://doi.org/10.1109%2FTIT.1978.1055832). 27. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-27)** Shannon, Claude E. (1948). "A Mathematical Theory of Communication". *Bell System Technical Journal*. **27** (4): 623–656\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1002/j.1538-7305.1948.tb01338.x](https://doi.org/10.1002%2Fj.1538-7305.1948.tb01338.x). 28. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Cover_and_Thomas_28-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Cover_and_Thomas_28-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Cover_and_Thomas_28-2) Cover, Thomas M. and Joy A. Thomas (2006). *Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing)*. Wiley-Interscience; 2 edition. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471241959](https://en.wikipedia.org/wiki/Special:BookSources/978-0471241959 "Special:BookSources/978-0471241959") . 29. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Plunkett_29-0)** Plunkett, Kim, and Jeffrey Elman (1997). [*Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations (Neural Network Modeling and Connectionism)*](https://archive.org/details/exercisesinrethi0000plun). A Bradford Book. p. 166. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0262661058](https://en.wikipedia.org/wiki/Special:BookSources/978-0262661058 "Special:BookSources/978-0262661058") . `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 30. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Nallapati_30-0)** Nallapati, Ramesh (2006). [*The smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval*](http://maroo.cs.umass.edu/pub/web/getpdf.php?id=679) (Thesis). Computer Science Dept., University of Massachusetts Amherst. 31. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Egon_31-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Egon_31-1) Pearson, Egon S. (July 1969). ["Some historical reflections traced through the development of the use of frequency curves"](http://www.smu.edu/Dedman/Academics/Departments/Statistics/Research/TechnicalReports). *THEMIS Statistical Analysis Research Program, Technical Report 38*. Office of Naval Research, Contract N000014-68-A-0515 (Project NR 042–260). 32. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Hahn_and_Shapiro_32-0)** Hahn, Gerald J.; Shapiro, S. (1994). *Statistical Models in Engineering (Wiley Classics Library)*. Wiley-Interscience. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471040651](https://en.wikipedia.org/wiki/Special:BookSources/978-0471040651 "Special:BookSources/978-0471040651") . 33. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1895_33-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1895_33-1) [Pearson, Karl](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson") (1895). ["Contributions to the mathematical theory of evolution, II: Skew variation in homogeneous material"](https://doi.org/10.1098%2Frsta.1895.0010). *Philosophical Transactions of the Royal Society*. **186**: 343–414\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1895RSPTA.186..343P](https://ui.adsabs.harvard.edu/abs/1895RSPTA.186..343P). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rsta.1895.0010](https://doi.org/10.1098%2Frsta.1895.0010). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [90649](https://www.jstor.org/stable/90649). 34. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-34)** Buchanan, K.; Rockway, J.; Sternberg, O.; Mai, N. N. (May 2016). ["Sum-difference beamforming for radar applications using circularly tapered random arrays"](https://zenodo.org/record/1279364). *2016 IEEE Radar Conference (RadarConf)*. pp. 1–5\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/RADAR.2016.7485289](https://doi.org/10.1109%2FRADAR.2016.7485289). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-5090-0863-6](https://en.wikipedia.org/wiki/Special:BookSources/978-1-5090-0863-6 "Special:BookSources/978-1-5090-0863-6") . [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [32525626](https://api.semanticscholar.org/CorpusID:32525626). 35. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-35)** Buchanan, K.; Flores, C.; Wheeland, S.; Jensen, J.; Grayson, D.; Huff, G. (May 2017). "Transmit beamforming for radar applications using circularly tapered random arrays". *2017 IEEE Radar Conference (RadarConf)*. pp. 0112–0117\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/RADAR.2017.7944181](https://doi.org/10.1109%2FRADAR.2017.7944181). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-4673-8823-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4673-8823-8 "Special:BookSources/978-1-4673-8823-8") . [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [38429370](https://api.semanticscholar.org/CorpusID:38429370). 36. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-36)** Ryan, Buchanan, Kristopher (2014-05-29). ["Theory and Applications of Aperiodic (Random) Phased Arrays"](http://oaktrust.library.tamu.edu/handle/1969.1/157918). `{{cite web}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 37. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pham-Gia2000_37-0)** Pham-Gia, T. (January 2000). ["Distributions of the ratios of independent beta variables and applications"](https://doi.org/10.1080/03610920008832632). *Communications in Statistics - Theory and Methods*. **29** (12): 2693–2715\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1080/03610920008832632](https://doi.org/10.1080%2F03610920008832632). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0361-0926](https://search.worldcat.org/issn/0361-0926). Retrieved 13 November 2024. 38. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-NewPERT_38-0)** Herrerías-Velasco, José Manuel and Herrerías-Pleguezuelo, Rafael and René van Dorp, Johan. (2011). Revisiting the PERT mean and Variance. European Journal of Operational Research (210), p. 448–451. 39. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Malcolm_39-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Malcolm_39-1) Malcolm, D. G.; Roseboom, J. H.; Clark, C. E.; Fazar, W. (September–October 1958). "Application of a Technique for Research and Development Program Evaluation". *Operations Research*. **7** (5): 646–669\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1287/opre.7.5.646](https://doi.org/10.1287%2Fopre.7.5.646). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0030-364X](https://search.worldcat.org/issn/0030-364X). 40. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David1_40-3) David, H. A., Nagaraja, H. N. (2003) *Order Statistics* (3rd Edition). Wiley, New Jersey pp 458. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [0-471-38926-9](https://en.wikipedia.org/wiki/Special:BookSources/0-471-38926-9 "Special:BookSources/0-471-38926-9") 41. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-41)** ["1.3.6.6.17. Beta Distribution"](https://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm). *www.itl.nist.gov*. 42. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-5) [***g***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-6) [***h***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton1906_42-7) Elderton, William Palin (1906). [*Frequency-Curves and Correlation*](https://archive.org/details/frequencycurvesc00elderich). Charles and Edwin Layton (London). 43. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Elderton_and_Johnson_43-0)** Elderton, William Palin and Norman Lloyd Johnson (2009). *Systems of Frequency Curves*. Cambridge University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0521093361](https://en.wikipedia.org/wiki/Special:BookSources/978-0521093361 "Special:BookSources/978-0521093361") . 44. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BowmanShenton_44-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BowmanShenton_44-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BowmanShenton_44-2) [Bowman, K. O.](https://en.wikipedia.org/wiki/Kimiko_O._Bowman "Kimiko O. Bowman"); Shenton, L. R. (2007). ["The beta distribution, moment method, Karl Pearson and R.A. Fisher"](http://www.csm.ornl.gov/~bowman/fjts232.pdf) (PDF). *Far East J. Theo. Stat*. **23** (2): 133–164\. 45. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1936_45-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Pearson1936_45-1) Pearson, Karl (June 1936). "Method of moments and method of maximum likelihood". *Biometrika*. **28** (1/2): 34–59\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/2334123](https://doi.org/10.2307%2F2334123). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2334123](https://www.jstor.org/stable/2334123). 46. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Joanes_and_Gill_46-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Joanes_and_Gill_46-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Joanes_and_Gill_46-2) Joanes, D. N.; C. A. Gill (1998). "Comparing measures of sample skewness and kurtosis". *The Statistician*. **47** (Part 1): 183–189\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1111/1467-9884.00122](https://doi.org/10.1111%2F1467-9884.00122). 47. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-47)** Beckman, R. J.; G. L. Tietjen (1978). "Maximum likelihood estimation for the beta distribution". *Journal of Statistical Computation and Simulation*. **7** (3–4\): 253–258\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1080/00949657808810232](https://doi.org/10.1080%2F00949657808810232). 48. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-48)** Gnanadesikan, R., Pinkham and Hughes (1967). "Maximum likelihood estimation of the parameters of the beta distribution from smallest order statistics". *Technometrics*. **9** (4): 607–620\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.2307/1266199](https://doi.org/10.2307%2F1266199). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [1266199](https://www.jstor.org/stable/1266199). `{{cite journal}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 49. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-invpsi.m_49-0)** Fackler, Paul. ["Inverse Digamma Function (Matlab)"](http://hips.seas.harvard.edu/content/inverse-digamma-function-matlab). Harvard University School of Engineering and Applied Sciences. Retrieved 2012-08-18. 50. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Silvey_50-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Silvey_50-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Silvey_50-2) Silvey, S.D. (1975). *Statistical Inference*. Chapman and Hal. p. 40. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0412138201](https://en.wikipedia.org/wiki/Special:BookSources/978-0412138201 "Special:BookSources/978-0412138201") . 51. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-EdwardsLikelihood_51-0)** Edwards, A. W. F. (1992). *Likelihood*. The Johns Hopkins University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0801844430](https://en.wikipedia.org/wiki/Special:BookSources/978-0801844430 "Special:BookSources/978-0801844430") . 52. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-3) [***e***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-4) [***f***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jaynes_52-5) Jaynes, E.T. (2003). *Probability theory, the logic of science*. Cambridge University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0521592710](https://en.wikipedia.org/wiki/Special:BookSources/978-0521592710 "Special:BookSources/978-0521592710") . 53. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-CostaCover_53-0)** Costa, Max, and Cover, Thomas (September 1983). [*On the similarity of the entropy power inequality and the Brunn Minkowski inequality*](https://isl.stanford.edu/people/cover/papers/transIT/0837cost.pdf) (PDF). Tech.Report 48, Dept. Statistics, Stanford University. `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 54. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Aryal_54-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Aryal_54-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Aryal_54-2) Aryal, Gokarna; Saralees Nadarajah (2004). ["Information matrix for beta distributions"](http://www.math.bas.bg/serdica/2004/2004-513-526.pdf) (PDF). *Serdica Mathematical Journal (Bulgarian Academy of Science)*. **30**: 513–526\. 55. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Laplace_55-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Laplace_55-1) Laplace, Pierre Simon, marquis de (1902). [*A philosophical essay on probabilities*](https://archive.org/details/philosophicaless00lapliala). New York : J. Wiley; London : Chapman & Hall. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1-60206-328-0](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60206-328-0 "Special:BookSources/978-1-60206-328-0") . CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 56. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-CoxRT_56-0)** Cox, Richard T. (1961). *Algebra of Probable Inference*. The Johns Hopkins University Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0801869822](https://en.wikipedia.org/wiki/Special:BookSources/978-0801869822 "Special:BookSources/978-0801869822") . 57. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-KeynesTreatise_57-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-KeynesTreatise_57-1) Keynes, John Maynard (2010) \[1921\]. *A Treatise on Probability: The Connection Between Philosophy and the History of Science*. Wildside Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1434406965](https://en.wikipedia.org/wiki/Special:BookSources/978-1434406965 "Special:BookSources/978-1434406965") . 58. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-PearsonRuleSuccession_58-0)** Pearson, Karl (1907). "On the Influence of Past Experience on Future Expectation". *Philosophical Magazine*. **6** (13): 365–378\. 59. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Jeffreys_59-3) Jeffreys, Harold (1998). *Theory of Probability*. Oxford University Press, 3rd edition. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0198503682](https://en.wikipedia.org/wiki/Special:BookSources/978-0198503682 "Special:BookSources/978-0198503682") . 60. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BroadMind_60-0)** Broad, C. D. (October 1918). "On the relation between induction and probability". *MIND, A Quarterly Review of Psychology and Philosophy*. 27 (New Series) (108): 389–404\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1093/mind/XXVII.4.389](https://doi.org/10.1093%2Fmind%2FXXVII.4.389). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [2249035](https://www.jstor.org/stable/2249035). 61. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-1) [***c***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-2) [***d***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Perks_61-3) Perks, Wilfred (January 1947). ["Some observations on inverse probability including a new indifference rule"](https://web.archive.org/web/20140112111032/http://www.actuaries.org.uk/research-and-resources/documents/some-observations-inverse-probability-including-new-indifference-ru). *Journal of the Institute of Actuaries*. **73** (2): 285–334\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1017/S0020268100012270](https://doi.org/10.1017%2FS0020268100012270). Archived from [the original](http://www.actuaries.org.uk/research-and-resources/documents/some-observations-inverse-probability-including-new-indifference-ru) on 2014-01-12. Retrieved 2012-09-19. 62. ^ [***a***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-ThomasBayes_62-0) [***b***](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-ThomasBayes_62-1) Bayes, Thomas; communicated by Richard Price (1763). ["An Essay towards solving a Problem in the Doctrine of Chances"](https://doi.org/10.1098%2Frstl.1763.0053). *Philosophical Transactions of the Royal Society*. **53**: 370–418\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rstl.1763.0053](https://doi.org/10.1098%2Frstl.1763.0053). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [105741](https://www.jstor.org/stable/105741). 63. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-63)** [Haldane, J. B. S.](https://en.wikipedia.org/wiki/J._B._S._Haldane "J. B. S. Haldane") (1932). "A note on inverse probability". *[Mathematical Proceedings of the Cambridge Philosophical Society](https://en.wikipedia.org/wiki/Mathematical_Proceedings_of_the_Cambridge_Philosophical_Society "Mathematical Proceedings of the Cambridge Philosophical Society")*. **28** (1): 55–61\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1932PCPS...28...55H](https://ui.adsabs.harvard.edu/abs/1932PCPS...28...55H). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1017/s0305004100010495](https://doi.org/10.1017%2Fs0305004100010495). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [122773707](https://api.semanticscholar.org/CorpusID:122773707). 64. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Zellner_64-0)** Zellner, Arnold (1971). *An Introduction to Bayesian Inference in Econometrics*. Wiley-Interscience. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471169376](https://en.wikipedia.org/wiki/Special:BookSources/978-0471169376 "Special:BookSources/978-0471169376") . 65. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-JeffreysPRIOR_65-0)** Jeffreys, Harold (September 1946). ["An Invariant Form for the Prior Probability in Estimation Problems"](https://doi.org/10.1098%2Frspa.1946.0056). *Proceedings of the Royal Society*. A 24. **186** (1007): 453–461\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1946RSPSA.186..453J](https://ui.adsabs.harvard.edu/abs/1946RSPSA.186..453J). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rspa.1946.0056](https://doi.org/10.1098%2Frspa.1946.0056). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20998741](https://pubmed.ncbi.nlm.nih.gov/20998741). 66. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-BergerBernardoSun_66-0)** Berger, James; Bernardo, Jose; Sun, Dongchu (2009). ["The formal definition of reference priors"](http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdfview_1&handle=euclid.aos/1236693154). *The Annals of Statistics*. **37** (2): 905–938\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[0904\.0156](https://arxiv.org/abs/0904.0156). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2009arXiv0904.0156B](https://ui.adsabs.harvard.edu/abs/2009arXiv0904.0156B). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1214/07-AOS587](https://doi.org/10.1214%2F07-AOS587). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [3221355](https://api.semanticscholar.org/CorpusID:3221355). 67. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-67)** Clarke, Bertrand S.; Andrew R. Barron (1994). ["Jeffreys' prior is asymptotically least favorable under entropy risk"](http://www.stat.yale.edu/~arb4/publications_files/jeffery's%20prior.pdf) (PDF). *Journal of Statistical Planning and Inference*. **41**: 37–60\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/0378-3758(94)90153-8](https://doi.org/10.1016%2F0378-3758%2894%2990153-8). 68. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-PearsonGrammar_68-0)** Pearson, Karl (1892). [*The Grammar of Science*](https://books.google.com/books?id=IvdsEcFwcnsC&q=grammar+of+science&pg=PR19). Walter Scott, London. 69. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-PearsnGrammar2009_69-0)** Pearson, Karl (2009). *The Grammar of Science*. BiblioLife. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1110356119](https://en.wikipedia.org/wiki/Special:BookSources/978-1110356119 "Special:BookSources/978-1110356119") . 70. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Gelman_70-0)** Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). *Bayesian Data Analysis*. Chapman and Hall/CRC. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-1584883883](https://en.wikipedia.org/wiki/Special:BookSources/978-1584883883 "Special:BookSources/978-1584883883") . `{{cite book}}`: CS1 maint: multiple names: authors list ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_multiple_names:_authors_list "Category:CS1 maint: multiple names: authors list")) 71. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-J01_71-0)** Jøsang, Audun (2001). ["A logic for uncertain probabilities"](https://scholar.archive.org/work/nilorkzfvjccjir72m75zk3pgy). *International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems*. **9** (3): 279–311\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1142/S0218488501000831](https://doi.org/10.1142%2FS0218488501000831). [MR](https://en.wikipedia.org/wiki/MR_\(identifier\) "MR (identifier)") [1843261](https://mathscinet.ams.org/mathscinet-getitem?mr=1843261). 72. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-wavelet_oliveira_72-0)** H.M. de Oliveira and G.A.A. Araújo,. Compactly Supported One-cyclic Wavelets Derived from Beta Distributions. *Journal of Communication and Information Systems.* vol.20, n.3, pp.27-33, 2005. 73. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-Balding_73-0)** [Balding, David J.](https://en.wikipedia.org/wiki/David_Balding "David Balding"); Nichols, Richard A. (1995). "A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity". *Genetica*. **96** (1–2\). Springer: 3–12\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/BF01441146](https://doi.org/10.1007%2FBF01441146). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [7607457](https://pubmed.ncbi.nlm.nih.gov/7607457). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [30680826](https://api.semanticscholar.org/CorpusID:30680826). 74. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-74)** Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 1086–1091. 75. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-75)** Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609. 76. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-76)** ["Defense Resource Management Institute - Naval Postgraduate School"](https://www.nps.edu/web/drmi/). *www.nps.edu*. 77. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-77)** van der Waerden, B. L., "Mathematical Statistics", Springer, [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-3-540-04507-6](https://en.wikipedia.org/wiki/Special:BookSources/978-3-540-04507-6 "Special:BookSources/978-3-540-04507-6") . 78. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-78)** On normalizing the incomplete beta-function for fitting to dose-response curves M.E. Wise Biometrika vol 47, No. 1/2, June 1960, pp. 173–175 79. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-79)** Pratt, John W. “A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, II.” Journal of the American Statistical Association, vol. 63, no. 324, 1968, pp. 1457–83. JSTOR, <https://doi.org/10.2307/2285896>. Accessed 21 Oct. 2025. 80. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-80)** [Yule, G. U.](https://en.wikipedia.org/wiki/Udny_Yule "Udny Yule"); Filon, L. N. G. (1936). ["Karl Pearson. 1857–1936"](https://en.wikipedia.org/wiki/Karl_Pearson "Karl Pearson"). *[Obituary Notices of Fellows of the Royal Society](https://en.wikipedia.org/wiki/Obituary_Notices_of_Fellows_of_the_Royal_Society "Obituary Notices of Fellows of the Royal Society")*. **2** (5): 72. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1098/rsbm.1936.0007](https://doi.org/10.1098%2Frsbm.1936.0007). [JSTOR](https://en.wikipedia.org/wiki/JSTOR_\(identifier\) "JSTOR (identifier)") [769130](https://www.jstor.org/stable/769130). 81. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-rscat_81-0)** ["Library and Archive catalogue"](https://web.archive.org/web/20111025030931/http://www2.royalsociety.org/DServe/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29\)). *Sackler Digital Archive*. Royal Society. Archived from [the original](http://www2.royalsociety.org/DServe/dserve.exe?dsqIni=Dserve.ini&dsqApp=Archive&dsqCmd=Show.tcl&dsqDb=Persons&dsqPos=0&dsqSearch=%28%28text%29%3D%27%20%20Pearson%3A%20Karl%20%281857%20-%201936%29%20%20%27%29%29) on 2011-10-25. Retrieved 2011-07-01. 82. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-David_History_82-0)** David, H. A. and A.W.F. Edwards (2001). *Annotated Readings in the History of Statistics*. Springer; 1 edition. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0387988443](https://en.wikipedia.org/wiki/Special:BookSources/978-0387988443 "Special:BookSources/978-0387988443") . 83. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-83)** Gini, Corrado (1911). "Considerazioni Sulle Probabilità Posteriori e Applicazioni al Rapporto dei Sessi Nelle Nascite Umane". *Studi Economico-Giuridici della Università de Cagliari*. Anno III (reproduced in Metron 15, 133, 171, 1949): 5–41\. 84. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-84)** Johnson, Norman L. and Samuel Kotz, ed. (1997). *Leading Personalities in Statistical Sciences: From the Seventeenth Century to the Present (Wiley Series in Probability and Statistics*. Wiley. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)") [978-0471163817](https://en.wikipedia.org/wiki/Special:BookSources/978-0471163817 "Special:BookSources/978-0471163817") . 85. **[^](https://en.wikipedia.org/wiki/Beta_distribution#cite_ref-85)** Metron journal. ["Biography of Corrado Gini"](https://web.archive.org/web/20120716202225/http://www.metronjournal.it/storia/ginibio.htm). Metron Journal. Archived from [the original](http://www.metronjournal.it/storia/ginibio.htm) on 2012-07-16. Retrieved 2012-08-18. - ["Beta Distribution"](http://demonstrations.wolfram.com/BetaDistribution/) by Fiona Maclachlan, the [Wolfram Demonstrations Project](https://en.wikipedia.org/wiki/Wolfram_Demonstrations_Project "Wolfram Demonstrations Project"), 2007. - [Beta Distribution – Overview and Example](http://www.xycoon.com/beta.htm), xycoon.com - [Beta Distribution](https://web.archive.org/web/20120829140915/http://www.brighton-webs.co.uk/distributions/beta.htm), brighton-webs.co.uk - [Beta Distribution Video](http://www.exstrom.com/blog/snark/posts/dancingbeta.html), exstrom.com - ["Beta-distribution"](https://www.encyclopediaofmath.org/index.php?title=Beta-distribution), *[Encyclopedia of Mathematics](https://en.wikipedia.org/wiki/Encyclopedia_of_Mathematics "Encyclopedia of Mathematics")*, [EMS Press](https://en.wikipedia.org/wiki/European_Mathematical_Society "European Mathematical Society"), 2001 \[1994\] - [Weisstein, Eric W.](https://en.wikipedia.org/wiki/Eric_W._Weisstein "Eric W. Weisstein") ["Beta Distribution"](https://mathworld.wolfram.com/BetaDistribution.html). *[MathWorld](https://en.wikipedia.org/wiki/MathWorld "MathWorld")*. - [Harvard University Statistics 110 Lecture 23 Beta Distribution, Prof. Joe Blitzstein](https://www.youtube.com/watch?v=UZjlBQbV1KU)

ML Classification

ML Categories

/Science		95.4%
/Science/Mathematics		90.4%
/Science/Mathematics/Statistics		87.1%

Raw JSON

{
    "/Science": 954,
    "/Science/Mathematics": 904,
    "/Science/Mathematics/Statistics": 871
}

ML Page Types

/Article		99.7%
/Article/Wiki		73.2%

Raw JSON

{
    "/Article": 997,
    "/Article/Wiki": 732
}

ML Intent Types

Informational

99.9%

Raw JSON

{
    "Informational": 999
}

Content Metadata

Language

Author

null

Publish Time

not set

Original Publish Time

2013-08-09 00:43:11 (12 years ago)

Republished

Word Count (Total)

48,735

Word Count (Content)

25,342

Links

External Links

180

Internal Links

731

Technical SEO

Meta Nofollow

Meta Noarchive

JS Rendered

Redirect Target

null

Performance

Download Time (ms)

TTFB (ms)

Download Size (bytes)

206,000

Shard

152 (laksa)

Root Hash

17790707453426894952

Unparsed URL

org,wikipedia!en,/wiki/Beta_distribution s443