ib
is used to correct the bias of a fitted model object
with the iterative bootstrap procedure.
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...) # S4 method for glm ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...) # S4 method for lm ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...) # S4 method for lmerMod ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...) # S4 method for nls ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...) # S4 method for vglm ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)
object | an |
---|---|
thetastart | an optional starting value for the iterative procedure.
If |
control | a |
extra_param | if |
... | additional optional arguments (see 'Details'). |
A fitted model object
of class Ib.
The iterative bootstrap procedure is described in
Kuk (1995)
and further
studied by Guerrier et al. (2019)
and
Guerrier et al. (2020)
. The kth iteration of this
algorithm is
$$\hat{\theta}^{k} = \hat{\theta}^{k-1} + \hat{\pi} -
\frac{1}{H}\sum_{h=1}^H\hat{\pi}_h(\hat{\theta}^{k-1})$$
for \(k=1,2,\ldots\) and where the sum is over \(h=1,\ldots,H\).
The estimate \(\hat{\pi}\) is provided by the object
.
The value \(\hat{\pi}_h(\hat{\theta})\) is a parametric bootstrap
estimate where the bootstrap sample is generated from \(\hat{\theta}\)
and a fixed seed
(see ibControl
).
The greater the parameter value \(H\) generally the better bias correction
but the more computation it requires (see ibControl
).
If thetastart=NULL
, the initial value of the procedure is \(\hat{\theta}^{0}=\hat{\pi}\).
The number of iterations are controlled by maxit
parameter of ibControl
.
By default, the method correct coefficients
only. For
extra parameters, it depends on the model. These extra parameters may have
some constraints (e.g. positivity). If constraint=TRUE
(see
ibControl
), then a transformation from the constraint space to the
real is used for the update.
For glm, if extra_param=TRUE
: the shape parameter for the
Gamma
, the variance of the residuals in lm
or
the overdispersion parameter of the negative binomial regression in glm.nb
,
are also corrected. Note that the quasi
families
are not supported for the moment as they have no simulation method
(see simulate
). Bias correction for extra parameters
of the inverse.gaussian
is not yet implemented.
For lm, if extra_param=TRUE
: the variance of the residuals is
also corrected. Note that using the ib
is not useful as coefficients
are already unbiased, unless one considers different
data generating mechanism such as censoring, missing values
and outliers (see ibControl
).
For lmer
, by default, only the fixed effects are corrected.
If extra_param=TRUE
: all the random effects
(variances and correlations) and the variance
of the residuals are also corrected.
Note that using the ib
is
certainly not useful with the argument REML=TRUE
in
lmer
as the bias of variance components is
already addressed, unless one considers different
data generating mechanism such as censoring, missing values
and outliers (see ibControl
).
For nls, if extra_param=TRUE
: the variance of the residuals is
also corrected.
For vglm, extra_param
is currently not used.
Indeed, the philosophy of a vector generalized linear model is to
potentially model all parameters of a distribution with a linear predictor.
Hence, what would be considered as an extra parameter in glm
for instance, may already be captured by the default coefficients
.
However, correcting the bias of a coefficients
does not imply
that the bias of the parameter of the distribution is corrected
(by Jensen's inequality),
so we may use this feature in a future version of the package.
Note that we currently only support distributions
with a simslot
(see simulate.vlm
).
Guerrier S, Dupuis-Lozeron E, Ma Y, Victoria-Feser M (2019).
“Simulation-Based Bias Correction Methods for Complex Models.”
Journal of the American Statistical Association, 114(525), 146-157.
doi: 10.1080/01621459.2017.1380031
, https://doi.org/10.1080/01621459.2017.1380031.
Guerrier S, Karemera M, Orso S, Victoria-Feser M, Zhang Y (2020).
“A General Approach for Simulation-based Bias Correction in High Dimensional Settings.”
https://arxiv.org/pdf/2010.13687.pdf.
Version 2: 13 Nov 2020, 2010.13687, https://arxiv.org/pdf/2010.13687.pdf.
Kuk AYC (1995).
“Asymptotically Unbiased Estimation in Generalized Linear Models with Random Effects.”
Journal of the Royal Statistical Society: Series B (Methodological), 57(2), 395-407.
doi: 10.1111/j.2517-6161.1995.tb02035.x
, https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1995.tb02035.x, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1995.tb02035.x.
Samuel Orso
## poisson regression counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) pois_fit <- glm(counts ~ outcome + treatment, family = poisson()) fit_ib <- ib(pois_fit) summary(fit_ib) #> #> Call: #> glm(formula = counts ~ outcome + treatment, family = poisson()) #> #> Deviance Residuals: #> 1 2 3 4 5 6 7 8 #> -0.67125 0.96272 -0.16965 -0.21999 -0.95552 1.04939 0.84715 -0.09167 #> 9 #> -0.96656 #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 3.54485 0.17090 20.742 < 2e-16 *** #> outcome2 -0.90720 0.20217 -4.487 7.21e-06 *** #> outcome3 -0.51432 0.19274 -2.668 0.00762 ** #> treatment2 -0.56370 0.20000 -2.819 0.00482 ** #> treatment3 0.04706 0.20000 0.235 0.81397 #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> (Dispersion parameter for poisson family taken to be 1) #> #> Null deviance: 10.5814 on 8 degrees of freedom #> Residual deviance: 5.1291 on 4 degrees of freedom #> AIC: 56.761 #> #> Number of Fisher Scoring iterations: 4 #> #> #> Iterative bootstrap procedure: #> #> * number of iterations: 2 #> * objective function: 0.454 #> #> Warning while correcting the bias: objective function does not reduce ## Set H = 1000 if (FALSE) { fit_ib <- ib(pois_fit, control=list(H=1000)) summary(fit_ib) } ## gamma regression clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) fit_gamma <- glm(lot2 ~ log(u), data = clotting, family = Gamma(link = "inverse")) fit_ib <- ib(fit_gamma) ## summary(fit_ib) ## correct for shape parameter and show iterations if (FALSE) { fit_ib <- ib(fit_gamma, control=list(verbose=TRUE), extra_param = TRUE) summary(fit_ib) } ## negative binomial regression library(MASS) fit_nb <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine) fit_ib <- ib(fit_nb) ## summary(fit_ib) ## correct for overdispersion with H=100 if (FALSE) { fit_ib <- ib(fit_nb, control=list(H=100), extra_param = TRUE) summary(fit_ib) } ## linear regression fit_lm <- lm(speed ~ dist, data = cars) fit_ib <- ib(fit_lm) summary(fit_ib) #> #> Call: #> lm(formula = speed ~ dist, data = cars) #> #> Residuals: #> Min 1Q Median 3Q Max #> -7.5785 -2.0591 0.2987 2.5731 6.5733 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 7.99223 0.87546 9.129 4.58e-12 *** #> dist 0.16983 0.01752 9.696 6.90e-13 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 3.16 on 48 degrees of freedom #> Multiple R-squared: 0.662, Adjusted R-squared: 0.6549 #> F-statistic: 94 on 1 and 48 DF, p-value: 6.901e-13 #> #> #> Iterative bootstrap procedure: #> #> * number of iterations: 1 #> * objective function: 0.2917 #> #> Warning while correcting the bias: objective function does not reduce ## correct for variance of residuals fit_ib <- ib(fit_lm, extra_param = TRUE) summary(fit_ib) #> #> Call: #> lm(formula = speed ~ dist, data = cars) #> #> Residuals: #> Min 1Q Median 3Q Max #> -7.6836 -1.8602 0.4635 2.7690 6.9056 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 7.36820 0.88496 8.326 7.11e-11 *** #> dist 0.17894 0.01771 10.106 1.79e-13 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> Residual standard error: 3.194 on 48 degrees of freedom #> Multiple R-squared: 0.6803, Adjusted R-squared: 0.6736 #> F-statistic: 102.1 on 1 and 48 DF, p-value: 1.79e-13 #> #> #> Iterative bootstrap procedure: #> #> * number of iterations: 3 #> * objective function: 0.312 #> #> Warning while correcting the bias: objective function does not reduce ## linear mixed-effects regression library(lme4) #> Loading required package: Matrix fit_lmm <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy, REML = FALSE) fit_ib <- ib(fit_lmm) summary(fit_ib) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: y ~ Days + (Days | Subject) #> Data: data #> #> AIC BIC logLik deviance df.resid #> 1763.9 1783.1 -876.0 1751.9 174 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -3.9416 -0.4656 0.0289 0.4636 5.1793 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> Subject (Intercept) 565.48 23.780 #> Days 32.68 5.717 0.08 #> Residual 654.95 25.592 #> Number of obs: 180, groups: Subject, 18 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 246.834 6.632 37.218 #> Days 10.806 1.502 7.193 #> #> Correlation of Fixed Effects: #> (Intr) #> Days -0.138 #> #> Iterative bootstrap procedure: #> #> * number of iterations: 2 #> * objective function: 2.061e-12 #> #> Warning while correcting the bias: objective function does not reduce ## correct for variances and correlation if (FALSE) { fit_ib <- ib(fit_lmm, extra_param = TRUE) summary(fit_ib) } ## nonlinear regression DNase1 <- subset(DNase, Run == 1) fit_nls <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), data = DNase1) fit_ib <- ib(fit_nls) #> Error in eval(data_name): object 'DNase1' not found summary(fit_ib) #> Linear mixed model fit by maximum likelihood ['lmerMod'] #> Formula: y ~ Days + (Days | Subject) #> Data: data #> #> AIC BIC logLik deviance df.resid #> 1763.9 1783.1 -876.0 1751.9 174 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -3.9416 -0.4656 0.0289 0.4636 5.1793 #> #> Random effects: #> Groups Name Variance Std.Dev. Corr #> Subject (Intercept) 565.48 23.780 #> Days 32.68 5.717 0.08 #> Residual 654.95 25.592 #> Number of obs: 180, groups: Subject, 18 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 246.834 6.632 37.218 #> Days 10.806 1.502 7.193 #> #> Correlation of Fixed Effects: #> (Intr) #> Days -0.138 #> #> Iterative bootstrap procedure: #> #> * number of iterations: 2 #> * objective function: 2.061e-12 #> #> Warning while correcting the bias: objective function does not reduce ## student regression library(VGAM) #> Loading required package: stats4 #> Loading required package: splines tdata <- data.frame(x = runif(nn <- 1000)) tdata <- transform(tdata, y = rt(nn, df = exp(exp(0.5 - x)))) fit_vglm <- vglm(y ~ x, studentt3, data = tdata) fit_ib <- ib(fit_vglm) summary(fit_ib) #> #> Call: #> vglm(formula = y ~ x, family = studentt3, data = tdata) #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept):1 -0.066603 0.075732 -0.879 0.379 #> (Intercept):2 0.020276 0.037254 0.544 0.586 #> (Intercept):3 -0.006566 0.099350 -0.066 0.947 #> x 0.219002 0.133217 1.644 0.100 #> #> Names of linear predictors: location, loglink(scale), logloglink(df) #> #> Log-likelihood: -1884.355 on 2996 degrees of freedom #> #> Number of Fisher scoring iterations: 8 #> #> No Hauck-Donner effect found in any of the estimates #> #> #> Iterative bootstrap procedure: #> #> * number of iterations: 1 #> * objective function: 0.08216 #> #> Warning while correcting the bias: objective function does not reduce