In each step of the iteration, one variable is used as a response variable and the remaining variables serve as the regressors.

irmi(
  x,
  eps = 5,
  maxit = 100,
  mixed = NULL,
  mixed.constant = NULL,
  count = NULL,
  step = FALSE,
  robust = FALSE,
  takeAll = TRUE,
  noise = TRUE,
  noise.factor = 1,
  force = FALSE,
  robMethod = "lmrob",
  force.mixed = TRUE,
  mi = 1,
  addMixedFactors = FALSE,
  trace = FALSE,
  init.method = "kNN",
  modelFormulas = NULL,
  multinom.method = "multinom",
  imp_var = TRUE,
  imp_suffix = "imp"
)

Arguments

x

data.frame or matrix

eps

threshold for convergency

maxit

maximum number of iterations

mixed

column index of the semi-continuous variables

mixed.constant

vector with length equal to the number of semi-continuous variables specifying the point of the semi-continuous distribution with non-zero probability

count

column index of count variables

step

a stepwise model selection is applied when the parameter is set to TRUE

robust

if TRUE, robust regression methods will be applied

takeAll

takes information of (initialised) missings in the response as well for regression imputation.

noise

irmi has the option to add a random error term to the imputed values, this creates the possibility for multiple imputation. The error term has mean 0 and variance corresponding to the variance of the regression residuals.

noise.factor

amount of noise.

force

if TRUE, the algorithm tries to find a solution in any case, possible by using different robust methods automatically.

robMethod

regression method when the response is continuous. Default is MM-regression with lmrob.

force.mixed

if TRUE, the algorithm tries to find a solution in any case, possible by using different robust methods automatically.

mi

number of multiple imputations.

addMixedFactors

if TRUE add additional factor variable for each mixed variable as X variable in the regression

trace

Additional information about the iterations when trace equals TRUE.

init.method

Method for initialization of missing values (kNN or median)

modelFormulas

a named list with the name of variables for the rhs of the formulas, which must contain a rhs formula for each variable with missing values, it should look like `list(y1=c("x1","x2"),y2=c("x1","x3"))`` if factor variables for the mixed variables should be created for the regression models

multinom.method

Method for estimating the multinomial models (current default and only available method is multinom)

imp_var

TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status

imp_suffix

suffix for the TRUE/FALSE variables showing the imputation status

Value

the imputed data set.

Details

The method works sequentially and iterative. The method can deal with a mixture of continuous, semi-continuous, ordinal and nominal variables including outliers.

A full description of the method can be found in the mentioned reference.

References

M. Templ, A. Kowarik, P. Filzmoser (2011) Iterative stepwise regression imputation using standard and robust methods. Journal of Computational Statistics and Data Analysis, Vol. 55, pp. 2793-2806.

A. Kowarik, M. Templ (2016) Imputation with R package VIM. Journal of Statistical Software, 74(7), 1-16.

See also

Other imputation methods: hotdeck(), impPCA(), kNN(), matchImpute(), medianSamp(), rangerImpute(), regressionImp(), sampleCat(), xgboostImpute()

Author

Matthias Templ, Alexander Kowarik

Examples


data(sleep)
irmi(sleep)
#>  BodyWgt BrainWgt    Dream    Sleep     Span     Gest     Pred      Exp 
#>    0.005    0.140    0.000    2.600    2.000   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt    Dream    Sleep     Span     Gest     Pred 
#>    1.000 6654.000 5712.000    6.600   19.900  100.000  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Sleep     Span     Gest     Pred      Exp 
#>    0.005    0.140    2.100    2.600    2.000   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Sleep     Span     Gest     Pred 
#>    1.000 6654.000 5712.000   17.900   19.900  100.000  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Dream     Span     Gest     Pred      Exp 
#>    0.005    0.140    2.100    0.000    2.000   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Dream     Span     Gest     Pred 
#>    1.000 6654.000 5712.000   17.900    6.600  100.000  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Dream    Sleep     Gest     Pred      Exp 
#>    0.005    0.140    2.100    0.000    2.600   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Dream    Sleep     Gest     Pred 
#>    1.000 6654.000 5712.000   17.900    6.600   19.900  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Dream    Sleep     Span     Pred      Exp 
#>    0.005    0.140    2.100    0.000    2.600    2.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Dream    Sleep     Span     Pred 
#>    1.000 6654.000 5712.000   17.900    6.600   19.900  100.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 

data(testdata)
imp_testdata1 <- irmi(testdata$wna, mixed = testdata$mixed)
#>        x2        x2 
#>  4.599866 15.039669 
#>        x1        x1 
#>  4.176996 15.235603 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 

# mixed.constant != 0 (-10)
testdata$wna$m1[testdata$wna$m1 == 0] <- -10
testdata$wna$m2 <- log(testdata$wna$m2 + 0.001)
imp_testdata2 <- irmi(
  testdata$wna,
  mixed = testdata$mixed,
  mixed.constant = c(-10,log(0.001))
)
#>        x2        x2 
#>  4.599866 15.039669 
#>        x1        x1 
#>  4.176996 15.235603 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
#>        x1        x2        x1        x2 
#>  4.176996  4.599866 15.235603 15.039669 
imp_testdata2$m2 <- exp(imp_testdata2$m2) - 0.001

#example with fixed formulas for the variables with missing
form = list(
  NonD  = c("BodyWgt", "BrainWgt"),
  Dream = c("BodyWgt", "BrainWgt"),
  Sleep = c("BrainWgt"           ),
  Span  = c("BodyWgt"            ),
  Gest  = c("BodyWgt", "BrainWgt")
)
irmi(sleep, modelFormulas = form, trace = TRUE)
#> Method for multinomial models:multinom
#>  BodyWgt BrainWgt    Dream    Sleep     Span     Gest     Pred      Exp 
#>    0.005    0.140    0.000    2.600    2.000   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt    Dream    Sleep     Span     Gest     Pred 
#>    1.000 6654.000 5712.000    6.600   19.900  100.000  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Sleep     Span     Gest     Pred      Exp 
#>    0.005    0.140    2.100    2.600    2.000   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Sleep     Span     Gest     Pred 
#>    1.000 6654.000 5712.000   17.900   19.900  100.000  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Dream     Span     Gest     Pred      Exp 
#>    0.005    0.140    2.100    0.000    2.000   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Dream     Span     Gest     Pred 
#>    1.000 6654.000 5712.000   17.900    6.600  100.000  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Dream    Sleep     Gest     Pred      Exp 
#>    0.005    0.140    2.100    0.000    2.600   12.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Dream    Sleep     Gest     Pred 
#>    1.000 6654.000 5712.000   17.900    6.600   19.900  645.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>  BodyWgt BrainWgt     NonD    Dream    Sleep     Span     Pred      Exp 
#>    0.005    0.140    2.100    0.000    2.600    2.000    1.000    1.000 
#>   Danger  BodyWgt BrainWgt     NonD    Dream    Sleep     Span     Pred 
#>    1.000 6654.000 5712.000   17.900    6.600   19.900  100.000    5.000 
#>      Exp   Danger 
#>    5.000    5.000 
#>    BodyWgt BrainWgt NonD Dream Sleep Span Gest Pred Exp Danger
#> 1 6654.000   5712.0  3.2   0.8   3.3 38.6  645    3   5      3
#> 2    1.000      6.6  6.3   2.0   8.3  4.5   42    3   1      3
#> 3    3.385     44.5 12.8   2.4  12.5 14.0   60    1   1      1
#> 4    0.920      5.7 10.4   2.4  16.5  3.2   25    5   2      3
#> 5 2547.000   4603.0  2.1   1.8   3.9 69.0  624    3   5      4
#> 6   10.550    179.5  9.1   0.7   9.8 27.0  180    4   4      4
#> Iteration1
#> [1] "inner loop: 3"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: NonD ~ BodyWgt+BrainWgt"
#> [1] "inner loop: 4"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Dream ~ BodyWgt+BrainWgt"
#> [1] "inner loop: 5"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Sleep ~ BrainWgt"
#> [1] "inner loop: 6"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Span ~ BodyWgt"
#> [1] "inner loop: 7"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Gest ~ BodyWgt+BrainWgt"
#> [1] "it = 1 ,  Wert = 28319.772541325"
#> [1] "eps 5"
#> [1] "test: TRUE"
#> Iteration2
#> [1] "inner loop: 3"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: NonD ~ BodyWgt+BrainWgt"
#> [1] "inner loop: 4"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Dream ~ BodyWgt+BrainWgt"
#> [1] "inner loop: 5"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Sleep ~ BrainWgt"
#> [1] "inner loop: 6"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Span ~ BodyWgt"
#> [1] "inner loop: 7"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Gest ~ BodyWgt+BrainWgt"
#> [1] "it = 2 ,  Wert = 10.6447267589895"
#> [1] "eps 5"
#> [1] "test: TRUE"
#> Iteration3
#> [1] "inner loop: 3"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: NonD ~ BodyWgt+BrainWgt"
#> [1] "inner loop: 4"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Dream ~ BodyWgt+BrainWgt"
#> [1] "inner loop: 5"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Sleep ~ BrainWgt"
#> [1] "inner loop: 6"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Span ~ BodyWgt"
#> [1] "inner loop: 7"
#> [1] "numeric"
#> [1] "numeric"
#> [1] "formula used: Gest ~ BodyWgt+BrainWgt"
#> [1] "it = 3 ,  Wert = 0.329034689483892"
#> [1] "eps 5"
#> [1] "test: FALSE"
#> [1] "0.329034689483892 < 5 = eps"
#> [1] "      --> finished after 3 iterations"
#> Imputation performed on the following data set:
#>          type      #missing
#> BodyWgt  "numeric" "0"     
#> BrainWgt "numeric" "0"     
#> NonD     "numeric" "14"    
#> Dream    "numeric" "12"    
#> Sleep    "numeric" "4"     
#> Span     "numeric" "4"     
#> Gest     "numeric" "4"     
#> Pred     "integer" "0"     
#> Exp      "integer" "0"     
#> Danger   "integer" "0"     
#> The variables NonD_imp,Dream_imp,Sleep_imp,Span_imp,Gest_imp are added to the data set.

# Example with ordered variable
td <- testdata$wna
td$c1 <- as.ordered(td$c1)
irmi(td)
#> Warning: The number of unique values in the ordinal variables in data.x
#>               does not correspond to the values given in levOrders
#> Warning: The number of unique values in the ordinal variables in data.y
#>               does not correspond to the values given in levOrders
#>         x2         m1         m2         x2         m1         m2 
#>   4.599866 -10.000000  -6.907755  15.039669  13.741169   2.625980 
#> Warning: The number of unique values in the ordinal variables in data.x
#>               does not correspond to the values given in levOrders
#> Warning: The number of unique values in the ordinal variables in data.y
#>               does not correspond to the values given in levOrders
#>         x1         m1         m2         x1         m1         m2 
#>   4.176996 -10.000000  -6.907755  15.235603  13.741169   2.625980 
#>         x1         x2         m1         m2         x1         x2         m1 
#>   4.176996   4.599866 -10.000000  -6.907755  15.235603  15.039669  13.741169 
#>         m2 
#>   2.625980 
#>         x1         x2         m1         m2         x1         x2         m1 
#>   4.176996   4.599866 -10.000000  -6.907755  15.235603  15.039669  13.741169 
#>         m2 
#>   2.625980 
#>         x1         x2         m1         m2         x1         x2         m1 
#>   4.176996   4.599866 -10.000000  -6.907755  15.235603  15.039669  13.741169 
#>         m2 
#>   2.625980 
#>         x1         x2         m1         m2         x1         x2         m1 
#>   4.176996   4.599866 -10.000000  -6.907755  15.235603  15.039669  13.741169 
#>         m2 
#>   2.625980 
#>        x1        x2        m2        x1        x2        m2 
#>  4.176996  4.599866 -6.907755 15.235603 15.039669  2.625980 
#>         x1         x2         m1         x1         x2         m1 
#>   4.176996   4.599866 -10.000000  15.235603  15.039669  13.741169