šŸ•·ļø Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 106 (from laksa127)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ā„¹ļø Skipped - page is already crawled

šŸ“„
INDEXABLE
āœ…
CRAWLED
26 days ago
šŸ¤–
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0.9 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://cran.r-project.org/web/packages/svyVGAM/vignettes/zeroinfl.html
Last Crawled2026-03-12 03:35:32 (26 days ago)
First Indexednot set
HTTP Status Code200
Meta TitleZero-inflated Poisson model for complex survey data
Meta Descriptionnull
Meta Canonicalnull
Boilerpipe Text
Thomas Lumley 2026-01-16 The Zero-Inflated Poisson model is a model for count data with excess zeros. The response distribution is a mixture of a point mass at zero and a Poisson distribution: if \(Z\) is Bernoulli with probability \(1-p_0\) and \(P\) is Poisson with mean \(\lambda\) , then \(Y=Z+(1-Z)P\) is zero-inflated Poisson. The ZIP is a latent-class model; we can have \(Y=0\) either because \(Z=0\) ('structural' zeroes) or because \(P=0\) . That’s natural in some ecological examples: if you didn’t see any salmon it could be because the area is salmon-free (it’s Eden Park) or because you just randomly didn’t see any. To turn this into a regression model we typically put a logistic regression structure on \(Z\) and a Poisson regression structure on \(P\) . There isn’t (as far as I know) existing software in R for design-based inference in zero-inflated Poisson models, so it’s a good example for the benefits of svyVGAM . The pscl package (Zeileis et al) fits zero-inflated models, and so does VGAM , so we can compare the model fitted with svyVGAM to both of those and to other work-arounds. I’ll do an example with data on number of sexual partners, from NHANES 2003-2004. We will look at questions SXQ200 and SXQ100 : number of male sexual partners. Combining these gives a ā€˜real’ zero-inflated variable: based on sexual orientation the zeroes would divide into 'never' and 'not yet'. Here's how I created the dataset, from two NHANES files. It's data(nhanes_sxq) in the package library(foreign) setwd("~/nhanes") demo = read.xport("demo_c.xpt") sxq = read.xport("sxq_c.xpt") merged = merge(demo, sxq, by='SEQN') merged$total = with(merged, ifelse(RIAGENDR==2, SXQ100+SXQ130, SXQ170+SXQ200)) merged$total[merged$SXQ020==2] = 0 merged$total[merged$total>2000] = NA merged$age = merged$RIDAGEYR/25 merged$malepartners=with(merged, ifelse(RIAGENDR==2,SXQ100,SXQ200)) merged$malepartners[merged$malepartners>200]=NA nhanes_sxq<-merged[, c("SDMVPSU","SDMVSTRA","WTINT2YR","RIDAGEYR","RIDRETH1","DMDEDUC","malepartners")] Start off by loading the packages and setting up a survey design library (svyVGAM) ## Loading required package: VGAM ## Loading required package: stats4 ## Loading required package: splines ## Loading required package: survey ## Loading required package: grid ## Loading required package: Matrix ## Loading required package: survival ## ## Attaching package: 'survey' ## The following object is masked from 'package:VGAM': ## ## calibrate ## The following object is masked from 'package:graphics': ## ## dotchart library (pscl) ## Classes and Methods for R originally developed in the ## Political Science Computational Laboratory ## Department of Political Science ## Stanford University (2002-2015), ## by and under the direction of Simon Jackman. ## hurdle and zeroinfl functions by Achim Zeileis. data (nhanes_sxq) des = svydesign ( id= ~ SDMVPSU, strat= ~ SDMVSTRA, weights= ~ WTINT2YR, nest= TRUE , data= nhanes_sxq) First, we'll fit the model just ignoring the survey design, using both pscl::zeroinfl and VGAM::vglm . These models use the same variables in a logistic regression for \(Z\) and a Poisson regression for \(P\) . In VGAM you would make the models different by constraining coefficients to be zero in one of the models; in pscl you would specify different models before and after the | . unwt = zeroinfl (malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC | RIDAGEYR + factor (RIDRETH1) + DMDEDUC, data= nhanes_sxq) summary (unwt) ## ## Call: ## zeroinfl(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC | ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, data = nhanes_sxq) ## ## Pearson residuals: ## Min 1Q Median 3Q Max ## -1.0204 -0.9433 -0.7871 0.1503 29.2567 ## ## Count model coefficients (poisson with log link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.6666218 0.0506660 32.894 < 2e-16 *** ## RIDAGEYR -0.0055102 0.0008969 -6.143 8.08e-10 *** ## factor(RIDRETH1)2 -0.0394018 0.0779480 -0.505 0.613 ## factor(RIDRETH1)3 0.6508824 0.0345734 18.826 < 2e-16 *** ## factor(RIDRETH1)4 0.6675320 0.0365963 18.240 < 2e-16 *** ## factor(RIDRETH1)5 0.5642960 0.0594928 9.485 < 2e-16 *** ## DMDEDUC 0.0094257 0.0135180 0.697 0.486 ## ## Zero-inflation model coefficients (binomial with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.188125 0.187079 1.006 0.31461 ## RIDAGEYR -0.002938 0.003629 -0.810 0.41822 ## factor(RIDRETH1)2 -0.079637 0.242312 -0.329 0.74242 ## factor(RIDRETH1)3 0.118370 0.116120 1.019 0.30802 ## factor(RIDRETH1)4 0.143301 0.127764 1.122 0.26203 ## factor(RIDRETH1)5 0.259516 0.223032 1.164 0.24459 ## DMDEDUC -0.148881 0.053337 -2.791 0.00525 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Number of iterations in BFGS optimization: 21 ## Log-likelihood: -9518 on 14 Df vglm (malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC, zipoisson (), data = nhanes_sxq, crit = "coef" ) ## ## Call: ## vglm(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC, ## family = zipoisson(), data = nhanes_sxq, crit = "coef") ## ## ## Coefficients: ## (Intercept):1 (Intercept):2 RIDAGEYR:1 RIDAGEYR:2 ## 0.188125349 1.666622759 -0.002937819 -0.005510160 ## factor(RIDRETH1)2:1 factor(RIDRETH1)2:2 factor(RIDRETH1)3:1 factor(RIDRETH1)3:2 ## -0.079635992 -0.039401949 0.118369301 0.650882145 ## factor(RIDRETH1)4:1 factor(RIDRETH1)4:2 factor(RIDRETH1)5:1 factor(RIDRETH1)5:2 ## 0.143300364 0.667531080 0.259515415 0.564295398 ## DMDEDUC:1 DMDEDUC:2 ## -0.148881313 0.009425589 ## ## Degrees of Freedom: 5050 Total; 5036 Residual ## Log-likelihood: -9517.556 Re-scaling the weights A traditional work-around for regression models is to rescale the weights to sum to the sample size and then pretend they are precision weights or frequency weights. nhanes_sxq $ scaledwt<-nhanes_sxq $ WTINT2YR / mean (nhanes_sxq $ WTINT2YR) wt= zeroinfl (malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC | RIDAGEYR + factor (RIDRETH1) + DMDEDUC, data= nhanes_sxq, weights= scaledwt) ## Warning in eval(family$initialize): non-integer #successes in a binomial glm! summary (wt) ## ## Call: ## zeroinfl(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC | ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, data = nhanes_sxq, weights = scaledwt) ## ## Pearson residuals: ## Min 1Q Median 3Q Max ## -1.5864 -0.8418 -0.5430 0.1324 31.9106 ## ## Count model coefficients (poisson with log link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.8340666 0.0614994 29.823 < 2e-16 *** ## RIDAGEYR -0.0073881 0.0009059 -8.155 3.49e-16 *** ## factor(RIDRETH1)2 -0.1073243 0.0853534 -1.257 0.2086 ## factor(RIDRETH1)3 0.6551385 0.0481679 13.601 < 2e-16 *** ## factor(RIDRETH1)4 0.6358167 0.0529173 12.015 < 2e-16 *** ## factor(RIDRETH1)5 0.4774152 0.0666607 7.162 7.96e-13 *** ## DMDEDUC -0.0237391 0.0143070 -1.659 0.0971 . ## ## Zero-inflation model coefficients (binomial with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.660511 0.214458 3.080 0.002071 ** ## RIDAGEYR -0.007833 0.003673 -2.133 0.032957 * ## factor(RIDRETH1)2 -0.116798 0.252451 -0.463 0.643610 ## factor(RIDRETH1)3 0.101968 0.151531 0.673 0.500999 ## factor(RIDRETH1)4 -0.160809 0.181429 -0.886 0.375430 ## factor(RIDRETH1)5 0.106776 0.230339 0.464 0.642964 ## DMDEDUC -0.202397 0.057624 -3.512 0.000444 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Number of iterations in BFGS optimization: 20 ## Log-likelihood: -9766 on 14 Df wtv= vglm (malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC, zipoisson (), data = nhanes_sxq, crit = "coef" , weights= scaledwt) summary (wtv) ## ## Call: ## vglm(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC, ## family = zipoisson(), data = nhanes_sxq, weights = scaledwt, ## crit = "coef") ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept):1 0.6605042 0.2144354 3.080 0.002069 ** ## (Intercept):2 1.8340681 0.0614568 29.843 < 2e-16 *** ## RIDAGEYR:1 -0.0078333 0.0036726 -2.133 0.032934 * ## RIDAGEYR:2 -0.0073881 0.0008995 -8.214 < 2e-16 *** ## factor(RIDRETH1)2:1 -0.1167894 0.2527278 -0.462 0.643999 ## factor(RIDRETH1)2:2 -0.1073312 0.0847641 -1.266 0.205430 ## factor(RIDRETH1)3:1 0.1019712 0.1515002 0.673 0.500899 ## factor(RIDRETH1)3:2 0.6551367 0.0481359 13.610 < 2e-16 *** ## factor(RIDRETH1)4:1 -0.1608040 0.1814098 -0.886 0.375395 ## factor(RIDRETH1)4:2 0.6358147 0.0529738 12.002 < 2e-16 *** ## factor(RIDRETH1)5:1 0.1067789 0.2303235 0.464 0.642931 ## factor(RIDRETH1)5:2 0.4774124 0.0663590 7.194 6.27e-13 *** ## DMDEDUC:1 -0.2023967 0.0576221 -3.512 0.000444 *** ## DMDEDUC:2 -0.0237389 0.0146964 -1.615 0.106249 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Names of linear predictors: logitlink(pstr0), loglink(lambda) ## ## Log-likelihood: -9765.52 on 5036 degrees of freedom ## ## Number of Fisher scoring iterations: 8 ## repwts repdes = as.svrepdesign (des, type= "Fay" , fay.rho= 0.2 ) rep1 = withReplicates (repdes, quote ( coef ( zeroinfl (malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC | RIDAGEYR + factor (RIDRETH1) + DMDEDUC, weights= .weights)) )) rep1 ## theta SE ## count_(Intercept) 1.8340684 0.1350 ## count_RIDAGEYR -0.0073881 0.0028 ## count_factor(RIDRETH1)2 -0.1073314 0.2471 ## count_factor(RIDRETH1)3 0.6551358 0.1857 ## count_factor(RIDRETH1)4 0.6358143 0.1438 ## count_factor(RIDRETH1)5 0.4774122 0.2501 ## count_DMDEDUC -0.0237386 0.0797 ## zero_(Intercept) 0.6605043 0.2582 ## zero_RIDAGEYR -0.0078333 0.0039 ## zero_factor(RIDRETH1)2 -0.1167895 0.2854 ## zero_factor(RIDRETH1)3 0.1019712 0.0983 ## zero_factor(RIDRETH1)4 -0.1608040 0.0859 ## zero_factor(RIDRETH1)5 0.1067790 0.2120 ## zero_DMDEDUC -0.2023967 0.0586 repv = withReplicates (repdes, quote ( coef ( vglm (malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC, zipoisson (), data = nhanes_sxq, crit = "coef" , weights= .weights)) )) repv ## theta SE ## (Intercept):1 0.6605042 0.2582 ## (Intercept):2 1.8340681 0.1350 ## RIDAGEYR:1 -0.0078333 0.0039 ## RIDAGEYR:2 -0.0073881 0.0028 ## factor(RIDRETH1)2:1 -0.1167894 0.2854 ## factor(RIDRETH1)2:2 -0.1073312 0.2471 ## factor(RIDRETH1)3:1 0.1019712 0.0983 ## factor(RIDRETH1)3:2 0.6551367 0.1857 ## factor(RIDRETH1)4:1 -0.1608040 0.0859 ## factor(RIDRETH1)4:2 0.6358147 0.1438 ## factor(RIDRETH1)5:1 0.1067789 0.2120 ## factor(RIDRETH1)5:2 0.4774124 0.2501 ## DMDEDUC:1 -0.2023967 0.0586 ## DMDEDUC:2 -0.0237389 0.0797 svymle Another way to fit the model using just the survey package is with svymle . This takes the log-likelihood and its derivative as arguments, and adds linear predictors to some or all of those arguments. That is, we specify the log-likelihood in terms of the Bernoulli parameter \(p_0\) and the Poisson mean \(\lambda\) -- actually \(\mathrm{logit} p_0\) and \(\eta=\log\lambda\) , and also give the derivative with respect to these two parameters. The software does the necessary additional work to put linear predictors on the parameters and give us the zero-inflated model. In fact, svymle is very similar in underlying approach to vglm ; the difference is that vglm comes with a large collection of predefined models. In defining the loglikelihood I'm going to take advantage of the Poisson pmf being available in R. Let's call it \(\digamma(y,\lambda)\) . The loglikelihood is \[\ell(y; \mu,p_0)=\log\left(p_0\{y==0\}+(1-p)\digamma(y,\mu)\right)\] only we want it in terms of \(\mathrm{logit} p_0\) and \(\eta=\log \lambda\) loglike = function (y,eta,logitp){ mu = exp (eta) p = exp (logitp) / ( 1 + exp (logitp)) log (p * (y == 0 ) + ( 1 - p) * dpois (y,mu)) } We also need the derivatives with respect to \(\mathrm{logit} p_0\) and \(\eta=\log \lambda\) dlogitp = function (y,eta,logitp){ mu = exp (eta) p = exp (logitp) / ( 1 + exp (logitp)) dexpit = p / ( 1 + p) ^ 2 num = dexpit * (y == 0 ) - dexpit * dpois (y,mu) denom = p * (y == 0 ) + ( 1 - p) * dpois (y,mu) num / denom } deta = function (y,eta,logitp){ mu = exp (eta) p = exp (logitp) / ( 1 + exp (logitp)) dmutoy = 0 * y dmutoy[y > 0 ] = exp ( - mu[y > 0 ]) * mu[y > 0 ] ^ (y[y > 0 ] - 1 ) / factorial (y[y > 0 ] - 1 ) num = ( 1 - p) * ( - dpois (y,mu) + dmutoy) denom = p * (y == 0 ) + ( 1 - p) * dpois (y,mu) num / denom } score = function (y,eta,logitp) cbind ( deta (y,eta,logitp), dlogitp (y,eta,logitp)) And now we call svymle giving the linear predictors for both parameters. One of the formulas needs to include the response variable \(Y\) . nlmfit = svymle ( loglike= loglike, grad= score, design= des, formulas= list ( eta= malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC, logitp= ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC), start= coef (unwt), na.action= "na.omit" , method= "BFGS" ) summary (nlmfit) ## Survey-sampled mle: ## svymle(loglike = loglike, gradient = score, design = des, formulas = list(eta = malepartners ~ ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, logitp = ~RIDAGEYR + ## factor(RIDRETH1) + DMDEDUC), start = coef(unwt), na.action = "na.omit", ## method = "BFGS") ## Coef SE p.value ## eta.(Intercept) 1.826824375 0.154214263 < 0.001 ## eta.RIDAGEYR -0.007800674 0.003014996 0.00967 ## eta.factor(RIDRETH1)2 -0.119695528 0.235192608 0.61080 ## eta.factor(RIDRETH1)3 0.639831707 0.165176844 < 0.001 ## eta.factor(RIDRETH1)4 0.615167757 0.117750557 < 0.001 ## eta.factor(RIDRETH1)5 0.465556579 0.213462435 0.02919 ## eta.DMDEDUC -0.008130626 0.072679434 0.91093 ## logitp.(Intercept) 0.578310456 0.246782589 0.01911 ## logitp.RIDAGEYR -0.006077530 0.004017016 0.13029 ## logitp.factor(RIDRETH1)2 -0.033442030 0.280701222 0.90517 ## logitp.factor(RIDRETH1)3 0.124434633 0.095140201 0.19090 ## logitp.factor(RIDRETH1)4 -0.151763506 0.086322700 0.07873 ## logitp.factor(RIDRETH1)5 0.119530402 0.209380277 0.56808 ## logitp.DMDEDUC -0.209112732 0.053553190 < 0.001 ## Stratified 1 - level Cluster Sampling design (with replacement) ## With (30) clusters. ## svydesign(id = ~SDMVPSU, strat = ~SDMVSTRA, weights = ~WTINT2YR, ## nest = TRUE, data = nhanes_sxq) svyVGAM Finally, we use svy_vglm , with variances by linearisation library (svyVGAM) svy_vglm (malepartners ~ RIDAGEYR + factor (RIDRETH1) + DMDEDUC, zipoisson (), design= des, crit = "coef" ) ## Stratified 1 - level Cluster Sampling design (with replacement) ## With (30) clusters. ## svydesign(id = ~SDMVPSU, strat = ~SDMVSTRA, weights = ~WTINT2YR, ## nest = TRUE, data = nhanes_sxq) ## ## Call: ## vglm(formula = formula, family = family, data = surveydata, weights = .survey.prob.weights, ## crit = "coef") ## ## ## Coefficients: ## (Intercept):1 (Intercept):2 RIDAGEYR:1 RIDAGEYR:2 ## 0.660504243 1.834068104 -0.007833317 -0.007388071 ## factor(RIDRETH1)2:1 factor(RIDRETH1)2:2 factor(RIDRETH1)3:1 factor(RIDRETH1)3:2 ## -0.116789371 -0.107331190 0.101971159 0.655136725 ## factor(RIDRETH1)4:1 factor(RIDRETH1)4:2 factor(RIDRETH1)5:1 factor(RIDRETH1)5:2 ## -0.160804047 0.635814748 0.106778915 0.477412443 ## DMDEDUC:1 DMDEDUC:2 ## -0.202396659 -0.023738881 ## ## Degrees of Freedom: 5050 Total; 5036 Residual ## Log-likelihood: -493703966 and by replicate weights svy_vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), design=repdes, crit = "coef")
Markdown
# Zero-inflated Poisson model for complex survey data #### Thomas Lumley #### 2026-01-16 The Zero-Inflated Poisson model is a model for count data with excess zeros. The response distribution is a mixture of a point mass at zero and a Poisson distribution: if \\(Z\\) is Bernoulli with probability \\(1-p\_0\\) and \\(P\\) is Poisson with mean \\(\\lambda\\), then \\(Y=Z+(1-Z)P\\) is zero-inflated Poisson. The ZIP is a latent-class model; we can have \\(Y=0\\) either because \\(Z=0\\) ('structural' zeroes) or because \\(P=0\\). That’s natural in some ecological examples: if you didn’t see any salmon it could be because the area is salmon-free (it’s Eden Park) or because you just randomly didn’t see any. To turn this into a regression model we typically put a logistic regression structure on \\(Z\\) and a Poisson regression structure on \\(P\\). There isn’t (as far as I know) existing software in R for design-based inference in zero-inflated Poisson models, so it’s a good example for the benefits of `svyVGAM`. The `pscl` package (Zeileis et al) fits zero-inflated models, and so does `VGAM`, so we can compare the model fitted with `svyVGAM` to both of those and to other work-arounds. I’ll do an example with data on number of sexual partners, from NHANES 2003-2004. We will look at questions `SXQ200` and `SXQ100`: number of male sexual partners. Combining these gives a ā€˜real’ zero-inflated variable: based on sexual orientation the zeroes would divide into 'never' and 'not yet'. Here's how I created the dataset, from two NHANES files. It's `data(nhanes_sxq)` in the package ``` library(foreign) setwd("~/nhanes") demo = read.xport("demo_c.xpt") sxq = read.xport("sxq_c.xpt") merged = merge(demo, sxq, by='SEQN') merged$total = with(merged, ifelse(RIAGENDR==2, SXQ100+SXQ130, SXQ170+SXQ200)) merged$total[merged$SXQ020==2] = 0 merged$total[merged$total>2000] = NA merged$age = merged$RIDAGEYR/25 merged$malepartners=with(merged, ifelse(RIAGENDR==2,SXQ100,SXQ200)) merged$malepartners[merged$malepartners>200]=NA nhanes_sxq<-merged[, c("SDMVPSU","SDMVSTRA","WTINT2YR","RIDAGEYR","RIDRETH1","DMDEDUC","malepartners")] ``` Start off by loading the packages and setting up a survey design ``` library(svyVGAM) ``` ``` ## Loading required package: VGAM ``` ``` ## Loading required package: stats4 ``` ``` ## Loading required package: splines ``` ``` ## Loading required package: survey ``` ``` ## Loading required package: grid ``` ``` ## Loading required package: Matrix ``` ``` ## Loading required package: survival ``` ``` ## ## Attaching package: 'survey' ``` ``` ## The following object is masked from 'package:VGAM': ## ## calibrate ``` ``` ## The following object is masked from 'package:graphics': ## ## dotchart ``` ``` library(pscl) ``` ``` ## Classes and Methods for R originally developed in the ## Political Science Computational Laboratory ## Department of Political Science ## Stanford University (2002-2015), ## by and under the direction of Simon Jackman. ## hurdle and zeroinfl functions by Achim Zeileis. ``` ``` data(nhanes_sxq) des = svydesign(id=~SDMVPSU,strat=~SDMVSTRA,weights=~WTINT2YR, nest=TRUE, data=nhanes_sxq) ``` First, we'll fit the model just ignoring the survey design, using both `pscl::zeroinfl` and `VGAM::vglm`. These models use the same variables in a logistic regression for \\(Z\\) and a Poisson regression for \\(P\\). In `VGAM` you would make the models different by constraining coefficients to be zero in one of the models; in `pscl` you would specify different models before and after the `|`. ``` unwt = zeroinfl(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC|RIDAGEYR+factor(RIDRETH1)+DMDEDUC, data=nhanes_sxq) summary(unwt) ``` ``` ## ## Call: ## zeroinfl(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC | ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, data = nhanes_sxq) ## ## Pearson residuals: ## Min 1Q Median 3Q Max ## -1.0204 -0.9433 -0.7871 0.1503 29.2567 ## ## Count model coefficients (poisson with log link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.6666218 0.0506660 32.894 < 2e-16 *** ## RIDAGEYR -0.0055102 0.0008969 -6.143 8.08e-10 *** ## factor(RIDRETH1)2 -0.0394018 0.0779480 -0.505 0.613 ## factor(RIDRETH1)3 0.6508824 0.0345734 18.826 < 2e-16 *** ## factor(RIDRETH1)4 0.6675320 0.0365963 18.240 < 2e-16 *** ## factor(RIDRETH1)5 0.5642960 0.0594928 9.485 < 2e-16 *** ## DMDEDUC 0.0094257 0.0135180 0.697 0.486 ## ## Zero-inflation model coefficients (binomial with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.188125 0.187079 1.006 0.31461 ## RIDAGEYR -0.002938 0.003629 -0.810 0.41822 ## factor(RIDRETH1)2 -0.079637 0.242312 -0.329 0.74242 ## factor(RIDRETH1)3 0.118370 0.116120 1.019 0.30802 ## factor(RIDRETH1)4 0.143301 0.127764 1.122 0.26203 ## factor(RIDRETH1)5 0.259516 0.223032 1.164 0.24459 ## DMDEDUC -0.148881 0.053337 -2.791 0.00525 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Number of iterations in BFGS optimization: 21 ## Log-likelihood: -9518 on 14 Df ``` ``` vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), data = nhanes_sxq, crit = "coef") ``` ``` ## ## Call: ## vglm(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC, ## family = zipoisson(), data = nhanes_sxq, crit = "coef") ## ## ## Coefficients: ## (Intercept):1 (Intercept):2 RIDAGEYR:1 RIDAGEYR:2 ## 0.188125349 1.666622759 -0.002937819 -0.005510160 ## factor(RIDRETH1)2:1 factor(RIDRETH1)2:2 factor(RIDRETH1)3:1 factor(RIDRETH1)3:2 ## -0.079635992 -0.039401949 0.118369301 0.650882145 ## factor(RIDRETH1)4:1 factor(RIDRETH1)4:2 factor(RIDRETH1)5:1 factor(RIDRETH1)5:2 ## 0.143300364 0.667531080 0.259515415 0.564295398 ## DMDEDUC:1 DMDEDUC:2 ## -0.148881313 0.009425589 ## ## Degrees of Freedom: 5050 Total; 5036 Residual ## Log-likelihood: -9517.556 ``` ### Re-scaling the weights A traditional work-around for regression models is to rescale the weights to sum to the sample size and then pretend they are precision weights or frequency weights. ``` nhanes_sxq$scaledwt<-nhanes_sxq$WTINT2YR/mean(nhanes_sxq$WTINT2YR) wt= zeroinfl(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC|RIDAGEYR+factor(RIDRETH1)+DMDEDUC, data=nhanes_sxq, weights=scaledwt) ``` ``` ## Warning in eval(family$initialize): non-integer #successes in a binomial glm! ``` ``` summary(wt) ``` ``` ## ## Call: ## zeroinfl(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC | ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, data = nhanes_sxq, weights = scaledwt) ## ## Pearson residuals: ## Min 1Q Median 3Q Max ## -1.5864 -0.8418 -0.5430 0.1324 31.9106 ## ## Count model coefficients (poisson with log link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.8340666 0.0614994 29.823 < 2e-16 *** ## RIDAGEYR -0.0073881 0.0009059 -8.155 3.49e-16 *** ## factor(RIDRETH1)2 -0.1073243 0.0853534 -1.257 0.2086 ## factor(RIDRETH1)3 0.6551385 0.0481679 13.601 < 2e-16 *** ## factor(RIDRETH1)4 0.6358167 0.0529173 12.015 < 2e-16 *** ## factor(RIDRETH1)5 0.4774152 0.0666607 7.162 7.96e-13 *** ## DMDEDUC -0.0237391 0.0143070 -1.659 0.0971 . ## ## Zero-inflation model coefficients (binomial with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.660511 0.214458 3.080 0.002071 ** ## RIDAGEYR -0.007833 0.003673 -2.133 0.032957 * ## factor(RIDRETH1)2 -0.116798 0.252451 -0.463 0.643610 ## factor(RIDRETH1)3 0.101968 0.151531 0.673 0.500999 ## factor(RIDRETH1)4 -0.160809 0.181429 -0.886 0.375430 ## factor(RIDRETH1)5 0.106776 0.230339 0.464 0.642964 ## DMDEDUC -0.202397 0.057624 -3.512 0.000444 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Number of iterations in BFGS optimization: 20 ## Log-likelihood: -9766 on 14 Df ``` ``` wtv= vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), data = nhanes_sxq, crit = "coef",weights=scaledwt) summary(wtv) ``` ``` ## ## Call: ## vglm(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC, ## family = zipoisson(), data = nhanes_sxq, weights = scaledwt, ## crit = "coef") ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept):1 0.6605042 0.2144354 3.080 0.002069 ** ## (Intercept):2 1.8340681 0.0614568 29.843 < 2e-16 *** ## RIDAGEYR:1 -0.0078333 0.0036726 -2.133 0.032934 * ## RIDAGEYR:2 -0.0073881 0.0008995 -8.214 < 2e-16 *** ## factor(RIDRETH1)2:1 -0.1167894 0.2527278 -0.462 0.643999 ## factor(RIDRETH1)2:2 -0.1073312 0.0847641 -1.266 0.205430 ## factor(RIDRETH1)3:1 0.1019712 0.1515002 0.673 0.500899 ## factor(RIDRETH1)3:2 0.6551367 0.0481359 13.610 < 2e-16 *** ## factor(RIDRETH1)4:1 -0.1608040 0.1814098 -0.886 0.375395 ## factor(RIDRETH1)4:2 0.6358147 0.0529738 12.002 < 2e-16 *** ## factor(RIDRETH1)5:1 0.1067789 0.2303235 0.464 0.642931 ## factor(RIDRETH1)5:2 0.4774124 0.0663590 7.194 6.27e-13 *** ## DMDEDUC:1 -0.2023967 0.0576221 -3.512 0.000444 *** ## DMDEDUC:2 -0.0237389 0.0146964 -1.615 0.106249 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Names of linear predictors: logitlink(pstr0), loglink(lambda) ## ## Log-likelihood: -9765.52 on 5036 degrees of freedom ## ## Number of Fisher scoring iterations: 8 ``` ``` ## repwts repdes = as.svrepdesign(des,type="Fay",fay.rho=0.2) rep1 = withReplicates(repdes, quote( coef(zeroinfl(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC|RIDAGEYR+factor(RIDRETH1)+DMDEDUC, weights=.weights)) )) rep1 ``` ``` ## theta SE ## count_(Intercept) 1.8340684 0.1350 ## count_RIDAGEYR -0.0073881 0.0028 ## count_factor(RIDRETH1)2 -0.1073314 0.2471 ## count_factor(RIDRETH1)3 0.6551358 0.1857 ## count_factor(RIDRETH1)4 0.6358143 0.1438 ## count_factor(RIDRETH1)5 0.4774122 0.2501 ## count_DMDEDUC -0.0237386 0.0797 ## zero_(Intercept) 0.6605043 0.2582 ## zero_RIDAGEYR -0.0078333 0.0039 ## zero_factor(RIDRETH1)2 -0.1167895 0.2854 ## zero_factor(RIDRETH1)3 0.1019712 0.0983 ## zero_factor(RIDRETH1)4 -0.1608040 0.0859 ## zero_factor(RIDRETH1)5 0.1067790 0.2120 ## zero_DMDEDUC -0.2023967 0.0586 ``` ``` repv = withReplicates(repdes, quote( coef(vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), data = nhanes_sxq, crit = "coef",weights=.weights)) )) repv ``` ``` ## theta SE ## (Intercept):1 0.6605042 0.2582 ## (Intercept):2 1.8340681 0.1350 ## RIDAGEYR:1 -0.0078333 0.0039 ## RIDAGEYR:2 -0.0073881 0.0028 ## factor(RIDRETH1)2:1 -0.1167894 0.2854 ## factor(RIDRETH1)2:2 -0.1073312 0.2471 ## factor(RIDRETH1)3:1 0.1019712 0.0983 ## factor(RIDRETH1)3:2 0.6551367 0.1857 ## factor(RIDRETH1)4:1 -0.1608040 0.0859 ## factor(RIDRETH1)4:2 0.6358147 0.1438 ## factor(RIDRETH1)5:1 0.1067789 0.2120 ## factor(RIDRETH1)5:2 0.4774124 0.2501 ## DMDEDUC:1 -0.2023967 0.0586 ## DMDEDUC:2 -0.0237389 0.0797 ``` ### svymle Another way to fit the model using just the `survey` package is with `svymle`. This takes the log-likelihood and its derivative as arguments, and adds linear predictors to some or all of those arguments. That is, we specify the log-likelihood in terms of the Bernoulli parameter \\(p\_0\\) and the Poisson mean \\(\\lambda\\) -- actually \\(\\mathrm{logit} p\_0\\) and \\(\\eta=\\log\\lambda\\), and also give the derivative with respect to these two parameters. The software does the necessary additional work to put linear predictors on the parameters and give us the zero-inflated model. In fact, `svymle` is very similar in underlying approach to `vglm`; the difference is that `vglm` comes with a large collection of predefined models. In defining the loglikelihood I'm going to take advantage of the Poisson pmf being available in R. Let's call it \\(\\digamma(y,\\lambda)\\). The loglikelihood is \\\[\\ell(y; \\mu,p\_0)=\\log\\left(p\_0\\{y==0\\}+(1-p)\\digamma(y,\\mu)\\right)\\\] only we want it in terms of \\(\\mathrm{logit} p\_0\\) and \\(\\eta=\\log \\lambda\\) ``` loglike = function(y,eta,logitp){ mu = exp(eta) p = exp(logitp)/(1+exp(logitp)) log(p*(y==0)+(1-p)*dpois(y,mu)) } ``` We also need the derivatives with respect to \\(\\mathrm{logit} p\_0\\) and \\(\\eta=\\log \\lambda\\) ``` dlogitp = function(y,eta,logitp){ mu = exp(eta) p = exp(logitp)/(1+exp(logitp)) dexpit = p/(1+p)^2 num = dexpit*(y==0)-dexpit*dpois(y,mu) denom = p*(y==0)+(1-p)*dpois(y,mu) num/denom } deta = function(y,eta,logitp){ mu = exp(eta) p = exp(logitp)/(1+exp(logitp)) dmutoy = 0*y dmutoy[y>0] = exp(-mu[y>0])*mu[y>0]^(y[y>0]-1)/factorial(y[y>0]-1) num = (1-p)*(-dpois(y,mu)+dmutoy) denom = p*(y==0)+(1-p)*dpois(y,mu) num/denom } score = function(y,eta,logitp) cbind(deta(y,eta,logitp), dlogitp(y,eta,logitp)) ``` And now we call `svymle` giving the linear predictors for both parameters. One of the formulas needs to include the response variable \\(Y\\). ``` nlmfit = svymle(loglike=loglike, grad=score, design=des, formulas=list(eta=malepartners~RIDAGEYR + factor(RIDRETH1) + DMDEDUC, logitp=~RIDAGEYR + factor(RIDRETH1) + DMDEDUC), start=coef(unwt), na.action="na.omit",method="BFGS") summary(nlmfit) ``` ``` ## Survey-sampled mle: ## svymle(loglike = loglike, gradient = score, design = des, formulas = list(eta = malepartners ~ ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, logitp = ~RIDAGEYR + ## factor(RIDRETH1) + DMDEDUC), start = coef(unwt), na.action = "na.omit", ## method = "BFGS") ## Coef SE p.value ## eta.(Intercept) 1.826824375 0.154214263 < 0.001 ## eta.RIDAGEYR -0.007800674 0.003014996 0.00967 ## eta.factor(RIDRETH1)2 -0.119695528 0.235192608 0.61080 ## eta.factor(RIDRETH1)3 0.639831707 0.165176844 < 0.001 ## eta.factor(RIDRETH1)4 0.615167757 0.117750557 < 0.001 ## eta.factor(RIDRETH1)5 0.465556579 0.213462435 0.02919 ## eta.DMDEDUC -0.008130626 0.072679434 0.91093 ## logitp.(Intercept) 0.578310456 0.246782589 0.01911 ## logitp.RIDAGEYR -0.006077530 0.004017016 0.13029 ## logitp.factor(RIDRETH1)2 -0.033442030 0.280701222 0.90517 ## logitp.factor(RIDRETH1)3 0.124434633 0.095140201 0.19090 ## logitp.factor(RIDRETH1)4 -0.151763506 0.086322700 0.07873 ## logitp.factor(RIDRETH1)5 0.119530402 0.209380277 0.56808 ## logitp.DMDEDUC -0.209112732 0.053553190 < 0.001 ## Stratified 1 - level Cluster Sampling design (with replacement) ## With (30) clusters. ## svydesign(id = ~SDMVPSU, strat = ~SDMVSTRA, weights = ~WTINT2YR, ## nest = TRUE, data = nhanes_sxq) ``` ### svyVGAM Finally, we use `svy_vglm`, with variances by linearisation ``` library(svyVGAM) svy_vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), design=des, crit = "coef") ``` ``` ## Stratified 1 - level Cluster Sampling design (with replacement) ## With (30) clusters. ## svydesign(id = ~SDMVPSU, strat = ~SDMVSTRA, weights = ~WTINT2YR, ## nest = TRUE, data = nhanes_sxq) ## ## Call: ## vglm(formula = formula, family = family, data = surveydata, weights = .survey.prob.weights, ## crit = "coef") ## ## ## Coefficients: ## (Intercept):1 (Intercept):2 RIDAGEYR:1 RIDAGEYR:2 ## 0.660504243 1.834068104 -0.007833317 -0.007388071 ## factor(RIDRETH1)2:1 factor(RIDRETH1)2:2 factor(RIDRETH1)3:1 factor(RIDRETH1)3:2 ## -0.116789371 -0.107331190 0.101971159 0.655136725 ## factor(RIDRETH1)4:1 factor(RIDRETH1)4:2 factor(RIDRETH1)5:1 factor(RIDRETH1)5:2 ## -0.160804047 0.635814748 0.106778915 0.477412443 ## DMDEDUC:1 DMDEDUC:2 ## -0.202396659 -0.023738881 ## ## Degrees of Freedom: 5050 Total; 5036 Residual ## Log-likelihood: -493703966 ``` and by replicate weights ``` svy_vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), design=repdes, crit = "coef") ```
Readable Markdown
#### Thomas Lumley #### 2026-01-16 The Zero-Inflated Poisson model is a model for count data with excess zeros. The response distribution is a mixture of a point mass at zero and a Poisson distribution: if \\(Z\\) is Bernoulli with probability \\(1-p\_0\\) and \\(P\\) is Poisson with mean \\(\\lambda\\), then \\(Y=Z+(1-Z)P\\) is zero-inflated Poisson. The ZIP is a latent-class model; we can have \\(Y=0\\) either because \\(Z=0\\) ('structural' zeroes) or because \\(P=0\\). That’s natural in some ecological examples: if you didn’t see any salmon it could be because the area is salmon-free (it’s Eden Park) or because you just randomly didn’t see any. To turn this into a regression model we typically put a logistic regression structure on \\(Z\\) and a Poisson regression structure on \\(P\\). There isn’t (as far as I know) existing software in R for design-based inference in zero-inflated Poisson models, so it’s a good example for the benefits of `svyVGAM`. The `pscl` package (Zeileis et al) fits zero-inflated models, and so does `VGAM`, so we can compare the model fitted with `svyVGAM` to both of those and to other work-arounds. I’ll do an example with data on number of sexual partners, from NHANES 2003-2004. We will look at questions `SXQ200` and `SXQ100`: number of male sexual partners. Combining these gives a ā€˜real’ zero-inflated variable: based on sexual orientation the zeroes would divide into 'never' and 'not yet'. Here's how I created the dataset, from two NHANES files. It's `data(nhanes_sxq)` in the package ``` library(foreign) setwd("~/nhanes") demo = read.xport("demo_c.xpt") sxq = read.xport("sxq_c.xpt") merged = merge(demo, sxq, by='SEQN') merged$total = with(merged, ifelse(RIAGENDR==2, SXQ100+SXQ130, SXQ170+SXQ200)) merged$total[merged$SXQ020==2] = 0 merged$total[merged$total>2000] = NA merged$age = merged$RIDAGEYR/25 merged$malepartners=with(merged, ifelse(RIAGENDR==2,SXQ100,SXQ200)) merged$malepartners[merged$malepartners>200]=NA nhanes_sxq<-merged[, c("SDMVPSU","SDMVSTRA","WTINT2YR","RIDAGEYR","RIDRETH1","DMDEDUC","malepartners")] ``` Start off by loading the packages and setting up a survey design ``` library(svyVGAM) ``` ``` ## Loading required package: VGAM ``` ``` ## Loading required package: stats4 ``` ``` ## Loading required package: splines ``` ``` ## Loading required package: survey ``` ``` ## Loading required package: grid ``` ``` ## Loading required package: Matrix ``` ``` ## Loading required package: survival ``` ``` ## ## Attaching package: 'survey' ``` ``` ## The following object is masked from 'package:VGAM': ## ## calibrate ``` ``` ## The following object is masked from 'package:graphics': ## ## dotchart ``` ``` library(pscl) ``` ``` ## Classes and Methods for R originally developed in the ## Political Science Computational Laboratory ## Department of Political Science ## Stanford University (2002-2015), ## by and under the direction of Simon Jackman. ## hurdle and zeroinfl functions by Achim Zeileis. ``` ``` data(nhanes_sxq) des = svydesign(id=~SDMVPSU,strat=~SDMVSTRA,weights=~WTINT2YR, nest=TRUE, data=nhanes_sxq) ``` First, we'll fit the model just ignoring the survey design, using both `pscl::zeroinfl` and `VGAM::vglm`. These models use the same variables in a logistic regression for \\(Z\\) and a Poisson regression for \\(P\\). In `VGAM` you would make the models different by constraining coefficients to be zero in one of the models; in `pscl` you would specify different models before and after the `|`. ``` unwt = zeroinfl(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC|RIDAGEYR+factor(RIDRETH1)+DMDEDUC, data=nhanes_sxq) summary(unwt) ``` ``` ## ## Call: ## zeroinfl(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC | ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, data = nhanes_sxq) ## ## Pearson residuals: ## Min 1Q Median 3Q Max ## -1.0204 -0.9433 -0.7871 0.1503 29.2567 ## ## Count model coefficients (poisson with log link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.6666218 0.0506660 32.894 < 2e-16 *** ## RIDAGEYR -0.0055102 0.0008969 -6.143 8.08e-10 *** ## factor(RIDRETH1)2 -0.0394018 0.0779480 -0.505 0.613 ## factor(RIDRETH1)3 0.6508824 0.0345734 18.826 < 2e-16 *** ## factor(RIDRETH1)4 0.6675320 0.0365963 18.240 < 2e-16 *** ## factor(RIDRETH1)5 0.5642960 0.0594928 9.485 < 2e-16 *** ## DMDEDUC 0.0094257 0.0135180 0.697 0.486 ## ## Zero-inflation model coefficients (binomial with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.188125 0.187079 1.006 0.31461 ## RIDAGEYR -0.002938 0.003629 -0.810 0.41822 ## factor(RIDRETH1)2 -0.079637 0.242312 -0.329 0.74242 ## factor(RIDRETH1)3 0.118370 0.116120 1.019 0.30802 ## factor(RIDRETH1)4 0.143301 0.127764 1.122 0.26203 ## factor(RIDRETH1)5 0.259516 0.223032 1.164 0.24459 ## DMDEDUC -0.148881 0.053337 -2.791 0.00525 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Number of iterations in BFGS optimization: 21 ## Log-likelihood: -9518 on 14 Df ``` ``` vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), data = nhanes_sxq, crit = "coef") ``` ``` ## ## Call: ## vglm(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC, ## family = zipoisson(), data = nhanes_sxq, crit = "coef") ## ## ## Coefficients: ## (Intercept):1 (Intercept):2 RIDAGEYR:1 RIDAGEYR:2 ## 0.188125349 1.666622759 -0.002937819 -0.005510160 ## factor(RIDRETH1)2:1 factor(RIDRETH1)2:2 factor(RIDRETH1)3:1 factor(RIDRETH1)3:2 ## -0.079635992 -0.039401949 0.118369301 0.650882145 ## factor(RIDRETH1)4:1 factor(RIDRETH1)4:2 factor(RIDRETH1)5:1 factor(RIDRETH1)5:2 ## 0.143300364 0.667531080 0.259515415 0.564295398 ## DMDEDUC:1 DMDEDUC:2 ## -0.148881313 0.009425589 ## ## Degrees of Freedom: 5050 Total; 5036 Residual ## Log-likelihood: -9517.556 ``` ### Re-scaling the weights A traditional work-around for regression models is to rescale the weights to sum to the sample size and then pretend they are precision weights or frequency weights. ``` nhanes_sxq$scaledwt<-nhanes_sxq$WTINT2YR/mean(nhanes_sxq$WTINT2YR) wt= zeroinfl(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC|RIDAGEYR+factor(RIDRETH1)+DMDEDUC, data=nhanes_sxq, weights=scaledwt) ``` ``` ## Warning in eval(family$initialize): non-integer #successes in a binomial glm! ``` ``` summary(wt) ``` ``` ## ## Call: ## zeroinfl(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC | ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, data = nhanes_sxq, weights = scaledwt) ## ## Pearson residuals: ## Min 1Q Median 3Q Max ## -1.5864 -0.8418 -0.5430 0.1324 31.9106 ## ## Count model coefficients (poisson with log link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.8340666 0.0614994 29.823 < 2e-16 *** ## RIDAGEYR -0.0073881 0.0009059 -8.155 3.49e-16 *** ## factor(RIDRETH1)2 -0.1073243 0.0853534 -1.257 0.2086 ## factor(RIDRETH1)3 0.6551385 0.0481679 13.601 < 2e-16 *** ## factor(RIDRETH1)4 0.6358167 0.0529173 12.015 < 2e-16 *** ## factor(RIDRETH1)5 0.4774152 0.0666607 7.162 7.96e-13 *** ## DMDEDUC -0.0237391 0.0143070 -1.659 0.0971 . ## ## Zero-inflation model coefficients (binomial with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.660511 0.214458 3.080 0.002071 ** ## RIDAGEYR -0.007833 0.003673 -2.133 0.032957 * ## factor(RIDRETH1)2 -0.116798 0.252451 -0.463 0.643610 ## factor(RIDRETH1)3 0.101968 0.151531 0.673 0.500999 ## factor(RIDRETH1)4 -0.160809 0.181429 -0.886 0.375430 ## factor(RIDRETH1)5 0.106776 0.230339 0.464 0.642964 ## DMDEDUC -0.202397 0.057624 -3.512 0.000444 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Number of iterations in BFGS optimization: 20 ## Log-likelihood: -9766 on 14 Df ``` ``` wtv= vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), data = nhanes_sxq, crit = "coef",weights=scaledwt) summary(wtv) ``` ``` ## ## Call: ## vglm(formula = malepartners ~ RIDAGEYR + factor(RIDRETH1) + DMDEDUC, ## family = zipoisson(), data = nhanes_sxq, weights = scaledwt, ## crit = "coef") ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept):1 0.6605042 0.2144354 3.080 0.002069 ** ## (Intercept):2 1.8340681 0.0614568 29.843 < 2e-16 *** ## RIDAGEYR:1 -0.0078333 0.0036726 -2.133 0.032934 * ## RIDAGEYR:2 -0.0073881 0.0008995 -8.214 < 2e-16 *** ## factor(RIDRETH1)2:1 -0.1167894 0.2527278 -0.462 0.643999 ## factor(RIDRETH1)2:2 -0.1073312 0.0847641 -1.266 0.205430 ## factor(RIDRETH1)3:1 0.1019712 0.1515002 0.673 0.500899 ## factor(RIDRETH1)3:2 0.6551367 0.0481359 13.610 < 2e-16 *** ## factor(RIDRETH1)4:1 -0.1608040 0.1814098 -0.886 0.375395 ## factor(RIDRETH1)4:2 0.6358147 0.0529738 12.002 < 2e-16 *** ## factor(RIDRETH1)5:1 0.1067789 0.2303235 0.464 0.642931 ## factor(RIDRETH1)5:2 0.4774124 0.0663590 7.194 6.27e-13 *** ## DMDEDUC:1 -0.2023967 0.0576221 -3.512 0.000444 *** ## DMDEDUC:2 -0.0237389 0.0146964 -1.615 0.106249 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Names of linear predictors: logitlink(pstr0), loglink(lambda) ## ## Log-likelihood: -9765.52 on 5036 degrees of freedom ## ## Number of Fisher scoring iterations: 8 ``` ``` ## repwts repdes = as.svrepdesign(des,type="Fay",fay.rho=0.2) rep1 = withReplicates(repdes, quote( coef(zeroinfl(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC|RIDAGEYR+factor(RIDRETH1)+DMDEDUC, weights=.weights)) )) rep1 ``` ``` ## theta SE ## count_(Intercept) 1.8340684 0.1350 ## count_RIDAGEYR -0.0073881 0.0028 ## count_factor(RIDRETH1)2 -0.1073314 0.2471 ## count_factor(RIDRETH1)3 0.6551358 0.1857 ## count_factor(RIDRETH1)4 0.6358143 0.1438 ## count_factor(RIDRETH1)5 0.4774122 0.2501 ## count_DMDEDUC -0.0237386 0.0797 ## zero_(Intercept) 0.6605043 0.2582 ## zero_RIDAGEYR -0.0078333 0.0039 ## zero_factor(RIDRETH1)2 -0.1167895 0.2854 ## zero_factor(RIDRETH1)3 0.1019712 0.0983 ## zero_factor(RIDRETH1)4 -0.1608040 0.0859 ## zero_factor(RIDRETH1)5 0.1067790 0.2120 ## zero_DMDEDUC -0.2023967 0.0586 ``` ``` repv = withReplicates(repdes, quote( coef(vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), data = nhanes_sxq, crit = "coef",weights=.weights)) )) repv ``` ``` ## theta SE ## (Intercept):1 0.6605042 0.2582 ## (Intercept):2 1.8340681 0.1350 ## RIDAGEYR:1 -0.0078333 0.0039 ## RIDAGEYR:2 -0.0073881 0.0028 ## factor(RIDRETH1)2:1 -0.1167894 0.2854 ## factor(RIDRETH1)2:2 -0.1073312 0.2471 ## factor(RIDRETH1)3:1 0.1019712 0.0983 ## factor(RIDRETH1)3:2 0.6551367 0.1857 ## factor(RIDRETH1)4:1 -0.1608040 0.0859 ## factor(RIDRETH1)4:2 0.6358147 0.1438 ## factor(RIDRETH1)5:1 0.1067789 0.2120 ## factor(RIDRETH1)5:2 0.4774124 0.2501 ## DMDEDUC:1 -0.2023967 0.0586 ## DMDEDUC:2 -0.0237389 0.0797 ``` ### svymle Another way to fit the model using just the `survey` package is with `svymle`. This takes the log-likelihood and its derivative as arguments, and adds linear predictors to some or all of those arguments. That is, we specify the log-likelihood in terms of the Bernoulli parameter \\(p\_0\\) and the Poisson mean \\(\\lambda\\) -- actually \\(\\mathrm{logit} p\_0\\) and \\(\\eta=\\log\\lambda\\), and also give the derivative with respect to these two parameters. The software does the necessary additional work to put linear predictors on the parameters and give us the zero-inflated model. In fact, `svymle` is very similar in underlying approach to `vglm`; the difference is that `vglm` comes with a large collection of predefined models. In defining the loglikelihood I'm going to take advantage of the Poisson pmf being available in R. Let's call it \\(\\digamma(y,\\lambda)\\). The loglikelihood is \\\[\\ell(y; \\mu,p\_0)=\\log\\left(p\_0\\{y==0\\}+(1-p)\\digamma(y,\\mu)\\right)\\\] only we want it in terms of \\(\\mathrm{logit} p\_0\\) and \\(\\eta=\\log \\lambda\\) ``` loglike = function(y,eta,logitp){ mu = exp(eta) p = exp(logitp)/(1+exp(logitp)) log(p*(y==0)+(1-p)*dpois(y,mu)) } ``` We also need the derivatives with respect to \\(\\mathrm{logit} p\_0\\) and \\(\\eta=\\log \\lambda\\) ``` dlogitp = function(y,eta,logitp){ mu = exp(eta) p = exp(logitp)/(1+exp(logitp)) dexpit = p/(1+p)^2 num = dexpit*(y==0)-dexpit*dpois(y,mu) denom = p*(y==0)+(1-p)*dpois(y,mu) num/denom } deta = function(y,eta,logitp){ mu = exp(eta) p = exp(logitp)/(1+exp(logitp)) dmutoy = 0*y dmutoy[y>0] = exp(-mu[y>0])*mu[y>0]^(y[y>0]-1)/factorial(y[y>0]-1) num = (1-p)*(-dpois(y,mu)+dmutoy) denom = p*(y==0)+(1-p)*dpois(y,mu) num/denom } score = function(y,eta,logitp) cbind(deta(y,eta,logitp), dlogitp(y,eta,logitp)) ``` And now we call `svymle` giving the linear predictors for both parameters. One of the formulas needs to include the response variable \\(Y\\). ``` nlmfit = svymle(loglike=loglike, grad=score, design=des, formulas=list(eta=malepartners~RIDAGEYR + factor(RIDRETH1) + DMDEDUC, logitp=~RIDAGEYR + factor(RIDRETH1) + DMDEDUC), start=coef(unwt), na.action="na.omit",method="BFGS") summary(nlmfit) ``` ``` ## Survey-sampled mle: ## svymle(loglike = loglike, gradient = score, design = des, formulas = list(eta = malepartners ~ ## RIDAGEYR + factor(RIDRETH1) + DMDEDUC, logitp = ~RIDAGEYR + ## factor(RIDRETH1) + DMDEDUC), start = coef(unwt), na.action = "na.omit", ## method = "BFGS") ## Coef SE p.value ## eta.(Intercept) 1.826824375 0.154214263 < 0.001 ## eta.RIDAGEYR -0.007800674 0.003014996 0.00967 ## eta.factor(RIDRETH1)2 -0.119695528 0.235192608 0.61080 ## eta.factor(RIDRETH1)3 0.639831707 0.165176844 < 0.001 ## eta.factor(RIDRETH1)4 0.615167757 0.117750557 < 0.001 ## eta.factor(RIDRETH1)5 0.465556579 0.213462435 0.02919 ## eta.DMDEDUC -0.008130626 0.072679434 0.91093 ## logitp.(Intercept) 0.578310456 0.246782589 0.01911 ## logitp.RIDAGEYR -0.006077530 0.004017016 0.13029 ## logitp.factor(RIDRETH1)2 -0.033442030 0.280701222 0.90517 ## logitp.factor(RIDRETH1)3 0.124434633 0.095140201 0.19090 ## logitp.factor(RIDRETH1)4 -0.151763506 0.086322700 0.07873 ## logitp.factor(RIDRETH1)5 0.119530402 0.209380277 0.56808 ## logitp.DMDEDUC -0.209112732 0.053553190 < 0.001 ## Stratified 1 - level Cluster Sampling design (with replacement) ## With (30) clusters. ## svydesign(id = ~SDMVPSU, strat = ~SDMVSTRA, weights = ~WTINT2YR, ## nest = TRUE, data = nhanes_sxq) ``` ### svyVGAM Finally, we use `svy_vglm`, with variances by linearisation ``` library(svyVGAM) svy_vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), design=des, crit = "coef") ``` ``` ## Stratified 1 - level Cluster Sampling design (with replacement) ## With (30) clusters. ## svydesign(id = ~SDMVPSU, strat = ~SDMVSTRA, weights = ~WTINT2YR, ## nest = TRUE, data = nhanes_sxq) ## ## Call: ## vglm(formula = formula, family = family, data = surveydata, weights = .survey.prob.weights, ## crit = "coef") ## ## ## Coefficients: ## (Intercept):1 (Intercept):2 RIDAGEYR:1 RIDAGEYR:2 ## 0.660504243 1.834068104 -0.007833317 -0.007388071 ## factor(RIDRETH1)2:1 factor(RIDRETH1)2:2 factor(RIDRETH1)3:1 factor(RIDRETH1)3:2 ## -0.116789371 -0.107331190 0.101971159 0.655136725 ## factor(RIDRETH1)4:1 factor(RIDRETH1)4:2 factor(RIDRETH1)5:1 factor(RIDRETH1)5:2 ## -0.160804047 0.635814748 0.106778915 0.477412443 ## DMDEDUC:1 DMDEDUC:2 ## -0.202396659 -0.023738881 ## ## Degrees of Freedom: 5050 Total; 5036 Residual ## Log-likelihood: -493703966 ``` and by replicate weights ``` svy_vglm(malepartners~RIDAGEYR+factor(RIDRETH1)+DMDEDUC, zipoisson(), design=repdes, crit = "coef") ```
Shard106 (laksa)
Root Hash8771423811280998506
Unparsed URLorg,r-project!cran,/web/packages/svyVGAM/vignettes/zeroinfl.html s443