Implementation of the popular Sequential, Random (within a domain) hot-deck algorithm for imputation.
hotdeck(
data,
variable = NULL,
ord_var = NULL,
domain_var = NULL,
makeNA = NULL,
NAcond = NULL,
impNA = TRUE,
donorcond = NULL,
imp_var = TRUE,
imp_suffix = "imp"
)
data.frame or matrix
variables where missing values should be imputed (not overlapping with ord_var)
variables for sorting the data set before imputation (not overlapping with variable)
variables for building domains and impute within these domains
list of length equal to the number of variables, with values, that should be converted to NA for each variable
list of length equal to the number of variables, with a condition for imputing a NA
TRUE/FALSE whether NA should be imputed
list of length equal to the number of variables, with a donorcond condition as character string. e.g. ">5" or c(">5","<10). If the list element for a variable is NULL no condition will be applied for this variable.
TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status
suffix for the TRUE/FALSE variables showing the imputation status
the imputed data set.
If the sequential hotdeck does not lead to a suitable, a random donor in the group will be used.
A. Kowarik, M. Templ (2016) Imputation with R package VIM. Journal of Statistical Software, 74(7), 1-16.
Other imputation methods:
impPCA()
,
irmi()
,
kNN()
,
matchImpute()
,
medianSamp()
,
rangerImpute()
,
regressionImp()
,
sampleCat()
,
xgboostImpute()
data(sleep)
sleepI <- hotdeck(sleep)
sleepI2 <- hotdeck(sleep,ord_var="BodyWgt",domain_var="Pred")
# Usage of donorcond in a simple example
sleepI3 <- hotdeck(
sleep,
variable = c("NonD", "Dream", "Sleep", "Span", "Gest"),
ord_var = "BodyWgt", domain_var = "Pred",
donorcond = list(">4", "<17", ">1.5", "%between%c(8,13)", ">5")
)
set.seed(132)
nRows <- 1e3
# Generate a data set with nRows rows and several variables
x <- data.frame(
x = rnorm(nRows), y = rnorm(nRows),
z = sample(LETTERS, nRows, replace = TRUE),
d1 = sample(LETTERS[1:3], nRows, replace = TRUE),
d2 = sample(LETTERS[1:2], nRows, replace = TRUE),
o1 = rnorm(nRows), o2 = rnorm(nRows), o3 = rnorm(100)
)
origX <- x
x[sample(1:nRows,nRows/10), 1] <- NA
x[sample(1:nRows,nRows/10), 2] <- NA
x[sample(1:nRows,nRows/10), 3] <- NA
x[sample(1:nRows,nRows/10), 4] <- NA
xImp <- hotdeck(x,ord_var = c("o1", "o2", "o3"), domain_var = "d2")