Impute missing values based on a regression model.
regressionImp(
formula,
data,
family = "AUTO",
robust = FALSE,
imp_var = TRUE,
imp_suffix = "imp",
mod_cat = FALSE
)
model formula to impute one variable
A data.frame containing the data
family argument for glm()
. "AUTO"
(the default) tries to choose
automatically and is the only really tested option!!!
TRUE
/FALSE
if robust regression should be used. See details.
TRUE
/FALSE
if a TRUE
/FALSE
variables for each imputed
variable should be created show the imputation status
suffix used for TF imputation variables
TRUE
/FALSE
if TRUE
for categorical variables the level with
the highest prediction probability is selected, otherwise it is sampled
according to the probabilities.
the imputed data set.
lm()
is used for family "normal" and glm()
for all other families.
(robust=TRUE: lmrob()
, glmrob()
)
A. Kowarik, M. Templ (2016) Imputation with R package VIM. Journal of Statistical Software, 74(7), 1-16.
Other imputation methods:
hotdeck()
,
impPCA()
,
irmi()
,
kNN()
,
matchImpute()
,
medianSamp()
,
rangerImpute()
,
sampleCat()
,
xgboostImpute()
data(sleep)
sleepImp1 <- regressionImp(Dream+NonD~BodyWgt+BrainWgt,data=sleep)
sleepImp2 <- regressionImp(Sleep+Gest+Span+Dream+NonD~BodyWgt+BrainWgt,data=sleep)
data(testdata)
imp_testdata1 <- regressionImp(b1+b2~x1+x2,data=testdata$wna)
#> There still missing values in variable b2 . Probably due to missing values in the regressors.
imp_testdata3 <- regressionImp(x1~x2,data=testdata$wna,robust=TRUE)
#> There still missing values in variable x1 . Probably due to missing values in the regressors.