Impute missing values based on a regression model.

regressionImp(
  formula,
  data,
  family = "AUTO",
  robust = FALSE,
  imp_var = TRUE,
  imp_suffix = "imp",
  mod_cat = FALSE
)

Arguments

formula

model formula to impute one variable

data

A data.frame containing the data

family

family argument for glm(). "AUTO" (the default) tries to choose automatically and is the only really tested option!!!

robust

TRUE/FALSE if robust regression should be used. See details.

imp_var

TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status

imp_suffix

suffix used for TF imputation variables

mod_cat

TRUE/FALSE if TRUE for categorical variables the level with the highest prediction probability is selected, otherwise it is sampled according to the probabilities.

Value

the imputed data set.

Details

lm() is used for family "normal" and glm() for all other families. (robust=TRUE: lmrob(), glmrob())

References

A. Kowarik, M. Templ (2016) Imputation with R package VIM. Journal of Statistical Software, 74(7), 1-16.

See also

Other imputation methods: hotdeck(), impPCA(), irmi(), kNN(), matchImpute(), medianSamp(), rangerImpute(), sampleCat(), xgboostImpute()

Author

Alexander Kowarik

Examples


data(sleep)
sleepImp1 <- regressionImp(Dream+NonD~BodyWgt+BrainWgt,data=sleep)
sleepImp2 <- regressionImp(Sleep+Gest+Span+Dream+NonD~BodyWgt+BrainWgt,data=sleep)

data(testdata)
imp_testdata1 <- regressionImp(b1+b2~x1+x2,data=testdata$wna)
#> There still missing values in variable b2 . Probably due to missing values in the regressors.
imp_testdata3 <- regressionImp(x1~x2,data=testdata$wna,robust=TRUE)
#> There still missing values in variable x1 . Probably due to missing values in the regressors.