🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 131 (from laksa154)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

đź“„
INDEXABLE
âś…
CRAWLED
16 hours ago
🤖
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttp://pmean.com/posts/zero-inflated-poisson/
Last Crawled2026-04-13 10:23:07 (16 hours ago)
First Indexed2025-09-30 17:04:29 (6 months ago)
HTTP Status Code200
Meta TitleZero-inflated models – Pmean
Meta Descriptionnull
Meta Canonicalnull
Boilerpipe Text
In models of counts, zeros represent a special case. Often there is more than one reason why you might observe a zero count, and you might want to model this probability separately. One approach is known as a zero inflated Poisson model. library (broom) library (dplyr) library (ggplot2) library (foreign) library (magrittr) library (pscl) The zero inflated Poisson distribution In any Poisson distribution, the probabilty of obtaining a zero is defined as \(P[Y=0]=\frac{\lambda^0e^{-\lambda}}{0!}=e^{-\lambda}\) Sometimes this probability is too low because there are multiple reasons why you might observe a zero. The zero-inflated Poisson model is a mixture of the Poisson random variable with a random variable that places all of its probability at zero. \(\pi \ 0 + (1-\pi)\frac{e^{-\lambda}\lambda^y}{y!}\) This changes the probability of zero to \(P[Y=0]=\pi+(1-\pi)e^{-\lambda}\) Since \(\lambda\) has to be a positive value, the probability above is greater than the Poisson probability for any value of \(\pi>0\) . The remaining probabilities are shrunk by a factor of \(1-\pi\) to compensate for the increase in the probability at zero. You can develop a separate regression model for \(\pi\) the increase in the probability of zero in addition to the regression model for \(\lambda\) , the mean of the Poisson distribution. The zero-inflated Poisson probability distribution is not part of the exponential family, so you need separate methods for estimation. You can find a function in the pscl library that will fit a zero-inflated Poisson model. Start with Poisson regression It is easiest to understand the quasiPoisson model by comparing it to the Poisson model. Francis Huang has a nice web page showing two datasets and how to analyze them using the Poisson regression model as well as several more sophisticated models. The first data set is available for anyone to download. The link I originally used no longer works. fn <- "http://faculty.missouri.edu/huangf/data/poisson/articles.csv" articles <- read.csv (fn) head (articles) Here is a different link for the same file, used at German Rodriguez’s website . I’ve placed a copy on my website in case this link breaks. Also, this dataset is available with the pscl library of R under the name, bioChemists. fn <- "http://www.pmean.com/00files/couart2.dta" articles <- read.dta (fn) head (articles) art fem mar kid5 phd ment 1 0 Men Married 0 2.52 7 2 0 Women Single 0 2.05 6 3 0 Women Single 0 3.75 6 4 0 Men Married 1 1.18 3 5 0 Women Single 0 3.75 26 6 0 Women Married 2 3.59 2 The variables are + gender (fem), + marital status (mar), + number of children (kid5), + prestige of graduate program (phd), + the number of articles published by the individual’s mentor (ment) + the number of articles published by the scientist (art: the outcome). To fit a Poisson regression model in R, use the glm function with the argument, family=“poisson”. pois1 <- glm (art ~ fem + ment + phd + mar + kid5, family= poisson, data = articles) summary (pois1) Call: glm(formula = art ~ fem + ment + phd + mar + kid5, family = poisson, data = articles) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.304617 0.102981 2.958 0.0031 ** femWomen -0.224594 0.054613 -4.112 3.92e-05 *** ment 0.025543 0.002006 12.733 < 2e-16 *** phd 0.012823 0.026397 0.486 0.6271 marMarried 0.155243 0.061374 2.529 0.0114 * kid5 -0.184883 0.040127 -4.607 4.08e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1817.4 on 914 degrees of freedom Residual deviance: 1634.4 on 909 degrees of freedom AIC: 3314.1 Number of Fisher Scoring iterations: 5 The interpretation of the coefficients should be done after exponentiating their values. pois1 %>% tidy %>% mutate ( lower_limit= estimate -1.96 * std.error) %>% mutate ( upper_limit= estimate +1.96 * std.error) %>% select (term, estimate, lower_limit, upper_limit) %>% mutate ( estimate= exp (estimate)) %>% mutate ( lower_limit= exp (lower_limit)) %>% mutate ( upper_limit= exp (upper_limit)) -> pois1_coefficients pois1_coefficients # A tibble: 6 × 4 term estimate lower_limit upper_limit <chr> <dbl> <dbl> <dbl> 1 (Intercept) 1.36 1.11 1.66 2 femWomen 0.799 0.718 0.889 3 ment 1.03 1.02 1.03 4 phd 1.01 0.962 1.07 5 marMarried 1.17 1.04 1.32 6 kid5 0.831 0.768 0.899 Fitting a zero-inflated-Poisson model zip1 <- zeroinfl (art ~ fem + ment + phd + mar + kid5 | 1 , family= poisson, data = articles) Warning in optim(fn = countloglikfun, gr = countgradfun, par = c(lmstart, : unknown names in control: family Warning in optim(fn = loglikfun, gr = gradfun, par = c(start$count, start$zero, : unknown names in control: family summary (zip1) Call: zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | 1, data = articles, family = poisson) Pearson residuals: Min 1Q Median 3Q Max -1.5296 -0.9976 -0.3026 0.5398 7.2884 Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 0.553995 0.113836 4.867 1.14e-06 *** femWomen -0.231609 0.058670 -3.948 7.89e-05 *** ment 0.021543 0.002160 9.973 < 2e-16 *** phd 0.002526 0.028511 0.089 0.929 marMarried 0.131971 0.066130 1.996 0.046 * kid5 -0.170474 0.043296 -3.937 8.24e-05 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -1.6814 0.1558 -10.79 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 14 Log-likelihood: -1621 on 7 Df We cannot use the broom package to extract coefficients. References An earlier version of this page was published on new.pmean.com .
Markdown
[Pmean](http://pmean.com/index.html) - [About the pmean website](http://pmean.com/about.html) # Zero-inflated models \*Blog post Year 2021 Generalized linear model Incomplete page Rmarkdown graphs Author Steve Simon Published September 20, 2021 In models of counts, zeros represent a special case. Often there is more than one reason why you might observe a zero count, and you might want to model this probability separately. One approach is known as a zero inflated Poisson model. ``` ``` ### The zero inflated Poisson distribution In any Poisson distribution, the probabilty of obtaining a zero is defined as \\(P\[Y=0\]=\\frac{\\lambda^0e^{-\\lambda}}{0!}=e^{-\\lambda}\\) Sometimes this probability is too low because there are multiple reasons why you might observe a zero. The zero-inflated Poisson model is a mixture of the Poisson random variable with a random variable that places all of its probability at zero. \\(\\pi \\ 0 + (1-\\pi)\\frac{e^{-\\lambda}\\lambda^y}{y!}\\) This changes the probability of zero to \\(P\[Y=0\]=\\pi+(1-\\pi)e^{-\\lambda}\\) Since \\(\\lambda\\) has to be a positive value, the probability above is greater than the Poisson probability for any value of \\(\\pi\>0\\). The remaining probabilities are shrunk by a factor of \\(1-\\pi\\) to compensate for the increase in the probability at zero. You can develop a separate regression model for \\(\\pi\\) the increase in the probability of zero in addition to the regression model for \\(\\lambda\\), the mean of the Poisson distribution. The zero-inflated Poisson probability distribution is not part of the exponential family, so you need separate methods for estimation. You can find a function in the pscl library that will fit a zero-inflated Poisson model. ### Start with Poisson regression It is easiest to understand the quasiPoisson model by comparing it to the Poisson model. Francis Huang has a [nice web page](https://francish.netlify.app/post/poisson-and-negative-binomial-regression-using-r/) showing two datasets and how to analyze them using the Poisson regression model as well as several more sophisticated models. The first data set is available for anyone to download. The link I originally used no longer works. ``` ``` Here is a [different link](http://www.pmean.com/00files/couart2.dta) for the same file, used at [German Rodriguez’s website](https://data.princeton.edu/wws509/r/overdispersion). I’ve placed a [copy](http://www.pmean.com/00files/couart2.dta) on my website in case this link breaks. Also, this dataset is available with the [pscl library](https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf) of R under the name, bioChemists. ``` ``` ``` art fem mar kid5 phd ment 1 0 Men Married 0 2.52 7 2 0 Women Single 0 2.05 6 3 0 Women Single 0 3.75 6 4 0 Men Married 1 1.18 3 5 0 Women Single 0 3.75 26 6 0 Women Married 2 3.59 2 ``` The variables are + gender (fem), + marital status (mar), + number of children (kid5), + prestige of graduate program (phd), + the number of articles published by the individual’s mentor (ment) + the number of articles published by the scientist (art: the outcome). To fit a Poisson regression model in R, use the glm function with the argument, family=“poisson”. ``` ``` ``` Call: glm(formula = art ~ fem + ment + phd + mar + kid5, family = poisson, data = articles) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.304617 0.102981 2.958 0.0031 ** femWomen -0.224594 0.054613 -4.112 3.92e-05 *** ment 0.025543 0.002006 12.733 < 2e-16 *** phd 0.012823 0.026397 0.486 0.6271 marMarried 0.155243 0.061374 2.529 0.0114 * kid5 -0.184883 0.040127 -4.607 4.08e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1817.4 on 914 degrees of freedom Residual deviance: 1634.4 on 909 degrees of freedom AIC: 3314.1 Number of Fisher Scoring iterations: 5 ``` The interpretation of the coefficients should be done after exponentiating their values. ``` ``` ``` # A tibble: 6 × 4 term estimate lower_limit upper_limit <chr> <dbl> <dbl> <dbl> 1 (Intercept) 1.36 1.11 1.66 2 femWomen 0.799 0.718 0.889 3 ment 1.03 1.02 1.03 4 phd 1.01 0.962 1.07 5 marMarried 1.17 1.04 1.32 6 kid5 0.831 0.768 0.899 ``` ### Fitting a zero-inflated-Poisson model ``` zip1 <- zeroinfl(art ~ fem + ment + phd + mar + kid5 | 1, family=poisson, data = articles) ``` ``` Warning in optim(fn = countloglikfun, gr = countgradfun, par = c(lmstart, : unknown names in control: family ``` ``` Warning in optim(fn = loglikfun, gr = gradfun, par = c(start$count, start$zero, : unknown names in control: family ``` ``` summary(zip1) ``` ``` Call: zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | 1, data = articles, family = poisson) Pearson residuals: Min 1Q Median 3Q Max -1.5296 -0.9976 -0.3026 0.5398 7.2884 Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 0.553995 0.113836 4.867 1.14e-06 *** femWomen -0.231609 0.058670 -3.948 7.89e-05 *** ment 0.021543 0.002160 9.973 < 2e-16 *** phd 0.002526 0.028511 0.089 0.929 marMarried 0.131971 0.066130 1.996 0.046 * kid5 -0.170474 0.043296 -3.937 8.24e-05 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -1.6814 0.1558 -10.79 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 14 Log-likelihood: -1621 on 7 Df ``` We cannot use the broom package to extract coefficients. ### References An [earlier version](http://new.pmean.com/zero-inflated-poisson/) of this page was published on [new.pmean.com](http://new.pmean.com/).
Readable Markdown
In models of counts, zeros represent a special case. Often there is more than one reason why you might observe a zero count, and you might want to model this probability separately. One approach is known as a zero inflated Poisson model. ``` library(broom) library(dplyr) library(ggplot2) library(foreign) library(magrittr) library(pscl) ``` ### The zero inflated Poisson distribution In any Poisson distribution, the probabilty of obtaining a zero is defined as \\(P\[Y=0\]=\\frac{\\lambda^0e^{-\\lambda}}{0!}=e^{-\\lambda}\\) Sometimes this probability is too low because there are multiple reasons why you might observe a zero. The zero-inflated Poisson model is a mixture of the Poisson random variable with a random variable that places all of its probability at zero. \\(\\pi \\ 0 + (1-\\pi)\\frac{e^{-\\lambda}\\lambda^y}{y!}\\) This changes the probability of zero to \\(P\[Y=0\]=\\pi+(1-\\pi)e^{-\\lambda}\\) Since \\(\\lambda\\) has to be a positive value, the probability above is greater than the Poisson probability for any value of \\(\\pi\>0\\). The remaining probabilities are shrunk by a factor of \\(1-\\pi\\) to compensate for the increase in the probability at zero. You can develop a separate regression model for \\(\\pi\\) the increase in the probability of zero in addition to the regression model for \\(\\lambda\\), the mean of the Poisson distribution. The zero-inflated Poisson probability distribution is not part of the exponential family, so you need separate methods for estimation. You can find a function in the pscl library that will fit a zero-inflated Poisson model. ### Start with Poisson regression It is easiest to understand the quasiPoisson model by comparing it to the Poisson model. Francis Huang has a [nice web page](https://francish.netlify.app/post/poisson-and-negative-binomial-regression-using-r/) showing two datasets and how to analyze them using the Poisson regression model as well as several more sophisticated models. The first data set is available for anyone to download. The link I originally used no longer works. ``` fn <- "http://faculty.missouri.edu/huangf/data/poisson/articles.csv" articles <- read.csv(fn) head(articles) ``` Here is a [different link](http://www.pmean.com/00files/couart2.dta) for the same file, used at [German Rodriguez’s website](https://data.princeton.edu/wws509/r/overdispersion). I’ve placed a [copy](http://www.pmean.com/00files/couart2.dta) on my website in case this link breaks. Also, this dataset is available with the [pscl library](https://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf) of R under the name, bioChemists. ``` fn <- "http://www.pmean.com/00files/couart2.dta" articles <- read.dta(fn) head(articles) ``` ``` art fem mar kid5 phd ment 1 0 Men Married 0 2.52 7 2 0 Women Single 0 2.05 6 3 0 Women Single 0 3.75 6 4 0 Men Married 1 1.18 3 5 0 Women Single 0 3.75 26 6 0 Women Married 2 3.59 2 ``` The variables are + gender (fem), + marital status (mar), + number of children (kid5), + prestige of graduate program (phd), + the number of articles published by the individual’s mentor (ment) + the number of articles published by the scientist (art: the outcome). To fit a Poisson regression model in R, use the glm function with the argument, family=“poisson”. ``` pois1 <- glm(art ~ fem + ment + phd + mar + kid5, family=poisson, data = articles) summary(pois1) ``` ``` Call: glm(formula = art ~ fem + ment + phd + mar + kid5, family = poisson, data = articles) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.304617 0.102981 2.958 0.0031 ** femWomen -0.224594 0.054613 -4.112 3.92e-05 *** ment 0.025543 0.002006 12.733 < 2e-16 *** phd 0.012823 0.026397 0.486 0.6271 marMarried 0.155243 0.061374 2.529 0.0114 * kid5 -0.184883 0.040127 -4.607 4.08e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1817.4 on 914 degrees of freedom Residual deviance: 1634.4 on 909 degrees of freedom AIC: 3314.1 Number of Fisher Scoring iterations: 5 ``` The interpretation of the coefficients should be done after exponentiating their values. ``` pois1 %>% tidy %>% mutate(lower_limit=estimate-1.96*std.error) %>% mutate(upper_limit=estimate+1.96*std.error) %>% select(term, estimate, lower_limit, upper_limit) %>% mutate(estimate=exp(estimate)) %>% mutate(lower_limit=exp(lower_limit)) %>% mutate(upper_limit=exp(upper_limit)) -> pois1_coefficients pois1_coefficients ``` ``` # A tibble: 6 × 4 term estimate lower_limit upper_limit <chr> <dbl> <dbl> <dbl> 1 (Intercept) 1.36 1.11 1.66 2 femWomen 0.799 0.718 0.889 3 ment 1.03 1.02 1.03 4 phd 1.01 0.962 1.07 5 marMarried 1.17 1.04 1.32 6 kid5 0.831 0.768 0.899 ``` ### Fitting a zero-inflated-Poisson model ``` zip1 <- zeroinfl(art ~ fem + ment + phd + mar + kid5 | 1, family=poisson, data = articles) ``` ``` Warning in optim(fn = countloglikfun, gr = countgradfun, par = c(lmstart, : unknown names in control: family ``` ``` Warning in optim(fn = loglikfun, gr = gradfun, par = c(start$count, start$zero, : unknown names in control: family ``` ``` summary(zip1) ``` ``` Call: zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | 1, data = articles, family = poisson) Pearson residuals: Min 1Q Median 3Q Max -1.5296 -0.9976 -0.3026 0.5398 7.2884 Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 0.553995 0.113836 4.867 1.14e-06 *** femWomen -0.231609 0.058670 -3.948 7.89e-05 *** ment 0.021543 0.002160 9.973 < 2e-16 *** phd 0.002526 0.028511 0.089 0.929 marMarried 0.131971 0.066130 1.996 0.046 * kid5 -0.170474 0.043296 -3.937 8.24e-05 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -1.6814 0.1558 -10.79 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 14 Log-likelihood: -1621 on 7 Df ``` We cannot use the broom package to extract coefficients. ### References An [earlier version](http://new.pmean.com/zero-inflated-poisson/) of this page was published on [new.pmean.com](http://new.pmean.com/).
Shard131 (laksa)
Root Hash4088472677763846131
Unparsed URLcom,pmean!/posts/zero-inflated-poisson/ h80