Bias correction via iterative bootstrap

ib is used to correct the bias of a fitted model object with the iterative bootstrap procedure.

ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

# S4 method for glm
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

# S4 method for lm
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

# S4 method for lmerMod
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

# S4 method for nls
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

# S4 method for vglm
ib(object, thetastart = NULL, control = list(...), extra_param = FALSE, ...)

Arguments

object	an `object` representing a fitted model (see 'Details').
thetastart	an optional starting value for the iterative procedure. If `NULL` (default), the procedure starts at the estimates in `object`.
control	a `list` of parameters for controlling the iterative procedure (see `ibControl`).
extra_param	if `TRUE`, the bias of estimation of extra parameters is performed (see 'Details').
...	additional optional arguments (see 'Details').

Value

A fitted model object of class Ib.

Details

The iterative bootstrap procedure is described in Kuk (1995) and further studied by Guerrier et al. (2019) and Guerrier et al. (2020) . The kth iteration of this algorithm is $$\hat{\theta}^{k} = \hat{\theta}^{k-1} + \hat{\pi} - \frac{1}{H}\sum_{h=1}^H\hat{\pi}_h(\hat{\theta}^{k-1})$$ for $k=1,2,\ldots$ and where the sum is over $h=1,\ldots,H$. The estimate $\hat{\pi}$ is provided by the object. The value $\hat{\pi}_h(\hat{\theta})$ is a parametric bootstrap estimate where the bootstrap sample is generated from $\hat{\theta}$ and a fixed seed (see ibControl). The greater the parameter value $H$ generally the better bias correction but the more computation it requires (see ibControl). If thetastart=NULL, the initial value of the procedure is $\hat{\theta}^{0}=\hat{\pi}$. The number of iterations are controlled by maxit parameter of ibControl.

By default, the method correct coefficients only. For extra parameters, it depends on the model. These extra parameters may have some constraints (e.g. positivity). If constraint=TRUE (see ibControl), then a transformation from the constraint space to the real is used for the update.

For glm, if extra_param=TRUE: the shape parameter for the Gamma, the variance of the residuals in lm or the overdispersion parameter of the negative binomial regression in glm.nb, are also corrected. Note that the quasi families are not supported for the moment as they have no simulation method (see simulate). Bias correction for extra parameters of the inverse.gaussian is not yet implemented.

For lm, if extra_param=TRUE: the variance of the residuals is also corrected. Note that using the ib is not useful as coefficients are already unbiased, unless one considers different data generating mechanism such as censoring, missing values and outliers (see ibControl).

For lmer, by default, only the fixed effects are corrected. If extra_param=TRUE: all the random effects (variances and correlations) and the variance of the residuals are also corrected. Note that using the ib is certainly not useful with the argument REML=TRUE in lmer as the bias of variance components is already addressed, unless one considers different data generating mechanism such as censoring, missing values and outliers (see ibControl).

For nls, if extra_param=TRUE: the variance of the residuals is also corrected.

For vglm, extra_param is currently not used. Indeed, the philosophy of a vector generalized linear model is to potentially model all parameters of a distribution with a linear predictor. Hence, what would be considered as an extra parameter in glm for instance, may already be captured by the default coefficients. However, correcting the bias of a coefficients does not imply that the bias of the parameter of the distribution is corrected (by Jensen's inequality), so we may use this feature in a future version of the package. Note that we currently only support distributions with a simslot (see simulate.vlm).

References

Guerrier S, Dupuis-Lozeron E, Ma Y, Victoria-Feser M (2019). “Simulation-Based Bias Correction Methods for Complex Models.” Journal of the American Statistical Association, 114(525), 146-157. doi: 10.1080/01621459.2017.1380031 , https://doi.org/10.1080/01621459.2017.1380031.

Guerrier S, Karemera M, Orso S, Victoria-Feser M, Zhang Y (2020). “A General Approach for Simulation-based Bias Correction in High Dimensional Settings.” https://arxiv.org/pdf/2010.13687.pdf. Version 2: 13 Nov 2020, 2010.13687, https://arxiv.org/pdf/2010.13687.pdf.

Kuk AYC (1995). “Asymptotically Unbiased Estimation in Generalized Linear Models with Random Effects.” Journal of the Royal Statistical Society: Series B (Methodological), 57(2), 395-407. doi: 10.1111/j.2517-6161.1995.tb02035.x , https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1995.tb02035.x, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1995.tb02035.x.

Author

Samuel Orso

Examples

## poisson regression
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
pois_fit <- glm(counts ~ outcome + treatment, family = poisson())
fit_ib <- ib(pois_fit)
summary(fit_ib)
#> 
#> Call:
#> glm(formula = counts ~ outcome + treatment, family = poisson())
#> 
#> Deviance Residuals: 
#>        1         2         3         4         5         6         7         8  
#> -0.67125   0.96272  -0.16965  -0.21999  -0.95552   1.04939   0.84715  -0.09167  
#>        9  
#> -0.96656  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  3.54485    0.17090  20.742  < 2e-16 ***
#> outcome2    -0.90720    0.20217  -4.487 7.21e-06 ***
#> outcome3    -0.51432    0.19274  -2.668  0.00762 ** 
#> treatment2  -0.56370    0.20000  -2.819  0.00482 ** 
#> treatment3   0.04706    0.20000   0.235  0.81397    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 10.5814  on 8  degrees of freedom
#> Residual deviance:  5.1291  on 4  degrees of freedom
#> AIC: 56.761
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> 
#> Iterative bootstrap procedure:
#> 
#>      * number of iterations: 2
#>      * objective function: 0.454
#> 
#> Warning while correcting the bias: objective function does not reduce
## Set H = 1000
if (FALSE) {
fit_ib <- ib(pois_fit, control=list(H=1000))
summary(fit_ib)
}

## gamma regression
clotting <- data.frame(
  u = c(5,10,15,20,30,40,60,80,100),
  lot1 = c(118,58,42,35,27,25,21,19,18),
  lot2 = c(69,35,26,21,18,16,13,12,12))
fit_gamma <- glm(lot2 ~ log(u), data = clotting, family = Gamma(link = "inverse"))
fit_ib <- ib(fit_gamma)
## summary(fit_ib)
## correct for shape parameter and show iterations
if (FALSE) {
fit_ib <- ib(fit_gamma, control=list(verbose=TRUE), extra_param = TRUE)
summary(fit_ib)
}

## negative binomial regression
library(MASS)
fit_nb <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine)
fit_ib <- ib(fit_nb)
## summary(fit_ib)
## correct for overdispersion with H=100
if (FALSE) {
fit_ib <- ib(fit_nb, control=list(H=100), extra_param = TRUE)
summary(fit_ib)
}

## linear regression
fit_lm <- lm(speed ~ dist, data = cars)
fit_ib <- ib(fit_lm)
summary(fit_ib)
#> 
#> Call:
#> lm(formula = speed ~ dist, data = cars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -7.5785 -2.0591  0.2987  2.5731  6.5733 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  7.99223    0.87546   9.129 4.58e-12 ***
#> dist         0.16983    0.01752   9.696 6.90e-13 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 3.16 on 48 degrees of freedom
#> Multiple R-squared:  0.662,	Adjusted R-squared:  0.6549 
#> F-statistic:    94 on 1 and 48 DF,  p-value: 6.901e-13
#> 
#> 
#> Iterative bootstrap procedure:
#> 
#>      * number of iterations: 1
#>      * objective function: 0.2917
#> 
#> Warning while correcting the bias: objective function does not reduce
## correct for variance of residuals
fit_ib <- ib(fit_lm, extra_param = TRUE)
summary(fit_ib)
#> 
#> Call:
#> lm(formula = speed ~ dist, data = cars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -7.6836 -1.8602  0.4635  2.7690  6.9056 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  7.36820    0.88496   8.326 7.11e-11 ***
#> dist         0.17894    0.01771  10.106 1.79e-13 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 3.194 on 48 degrees of freedom
#> Multiple R-squared:  0.6803,	Adjusted R-squared:  0.6736 
#> F-statistic: 102.1 on 1 and 48 DF,  p-value: 1.79e-13
#> 
#> 
#> Iterative bootstrap procedure:
#> 
#>      * number of iterations: 3
#>      * objective function: 0.312
#> 
#> Warning while correcting the bias: objective function does not reduce

## linear mixed-effects regression
library(lme4)
#> Loading required package: Matrix
fit_lmm <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy, REML = FALSE)
fit_ib <- ib(fit_lmm)
summary(fit_ib)
#> Linear mixed model fit by maximum likelihood  ['lmerMod']
#> Formula: y ~ Days + (Days | Subject)
#>    Data: data
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   1763.9   1783.1   -876.0   1751.9      174 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -3.9416 -0.4656  0.0289  0.4636  5.1793 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev. Corr
#>  Subject  (Intercept) 565.48   23.780       
#>           Days         32.68    5.717   0.08
#>  Residual             654.95   25.592       
#> Number of obs: 180, groups:  Subject, 18
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  246.834      6.632  37.218
#> Days          10.806      1.502   7.193
#> 
#> Correlation of Fixed Effects:
#>      (Intr)
#> Days -0.138
#> 
#> Iterative bootstrap procedure:
#> 
#>      * number of iterations: 2
#>      * objective function: 2.061e-12
#> 
#> Warning while correcting the bias: objective function does not reduce
## correct for variances and correlation
if (FALSE) {
fit_ib <- ib(fit_lmm, extra_param = TRUE)
summary(fit_ib)
}

## nonlinear regression
DNase1 <- subset(DNase, Run == 1)
fit_nls <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), data = DNase1)
fit_ib <- ib(fit_nls)
#> Error in eval(data_name): object 'DNase1' not found
summary(fit_ib)
#> Linear mixed model fit by maximum likelihood  ['lmerMod']
#> Formula: y ~ Days + (Days | Subject)
#>    Data: data
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   1763.9   1783.1   -876.0   1751.9      174 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -3.9416 -0.4656  0.0289  0.4636  5.1793 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev. Corr
#>  Subject  (Intercept) 565.48   23.780       
#>           Days         32.68    5.717   0.08
#>  Residual             654.95   25.592       
#> Number of obs: 180, groups:  Subject, 18
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  246.834      6.632  37.218
#> Days          10.806      1.502   7.193
#> 
#> Correlation of Fixed Effects:
#>      (Intr)
#> Days -0.138
#> 
#> Iterative bootstrap procedure:
#> 
#>      * number of iterations: 2
#>      * objective function: 2.061e-12
#> 
#> Warning while correcting the bias: objective function does not reduce

## student regression
library(VGAM)
#> Loading required package: stats4
#> Loading required package: splines
tdata <- data.frame(x = runif(nn <- 1000))
tdata <- transform(tdata,
                   y = rt(nn, df = exp(exp(0.5 - x))))
fit_vglm <- vglm(y ~ x, studentt3, data = tdata)
fit_ib <- ib(fit_vglm)
summary(fit_ib)
#> 
#> Call:
#> vglm(formula = y ~ x, family = studentt3, data = tdata)
#> 
#> Coefficients: 
#>                Estimate Std. Error z value Pr(>|z|)
#> (Intercept):1 -0.066603   0.075732  -0.879    0.379
#> (Intercept):2  0.020276   0.037254   0.544    0.586
#> (Intercept):3 -0.006566   0.099350  -0.066    0.947
#> x              0.219002   0.133217   1.644    0.100
#> 
#> Names of linear predictors: location, loglink(scale), logloglink(df)
#> 
#> Log-likelihood: -1884.355 on 2996 degrees of freedom
#> 
#> Number of Fisher scoring iterations: 8 
#> 
#> No Hauck-Donner effect found in any of the estimates
#> 
#> 
#> Iterative bootstrap procedure:
#> 
#>      * number of iterations: 1
#>      * objective function: 0.08216
#> 
#> Warning while correcting the bias: objective function does not reduce