Extends the cellwise MCD approach (Raymaekers & Rousseeuw 2024) to mixed continuous + categorical data. Uses MCD for robust covariance estimation of the continuous block, computes cellwise weights from conditional residuals, then imputes missing values via conditional expectations (continuous) and weighted multinomial regression (categorical). Iterates until convergence.
imputeCellMCD(
data,
maxit = 50,
eps = 0.005,
method = "tukey",
alpha = NULL,
mcd_alpha = 0.75,
hard_threshold = 0.5,
mcd_observed = "all",
init_method = "median",
uncert = "conditional",
m = 1L,
boot = FALSE,
trace = FALSE
)a data.frame with missing values (mixed continuous
and categorical variables are supported).
maximum number of iterations (default: 50).
convergence tolerance (default: 5e-3).
weight function for cell weights: "tukey" (default)
or "huber".
tuning constant. NULL (default) uses 4.685 for
Tukey and 1.345 for Huber.
MCD concentration parameter (default: 0.75).
numeric in \([0, 1]\). Before iteration,
observed cells with initial MCD weight below this threshold are set
to missing and re-imputed (detect-once preprocessing). Set to
NULL or 0 to disable (default: 0.5).
strategy for covariance estimation:
"all" (default) runs MCD on all data including imputed values;
"weighted" uses cellWise::cwLocScat with imputed cells
receiving weight 0 (requires the cellWise package);
"pairwise" uses pairwise robust (Gnanadesikan–Kettenring)
covariances on observed cells only.
initialisation for missing values before iteration:
"median" (default), "knn", or "irmi".
imputation uncertainty: "conditional" (default)
adds noise from the conditional normal distribution.
number of multiple imputations (default: 1). If m > 1,
a list of imputed datasets is returned.
logical; if TRUE, bootstrap resampling propagates
parameter uncertainty across the m imputations.
logical; if TRUE, print progress information.
A list with components:
the imputed data.frame.
\(n \times p\) matrix of cell weights. Continuous observed cells have values in \([0, 1]\); categorical columns always have weight 1 (cellwise detection is only applied to continuous variables).
robust location estimate (continuous variables).
robust covariance estimate (continuous variables).
logical indicating convergence.
number of iterations performed.
Raymaekers, C. and Rousseeuw, P.J. (2024). The cellwise minimum covariance determinant estimator. Journal of the American Statistical Association, 119(545), 576–588.
imputeCellIRMI, imputeCellM,
imputeCellEM
Other imputation methods:
hotdeck(),
impPCA(),
imputeCellEM(),
imputeCellIRMI(),
imputeCellM(),
imputeCellwise(),
imputeRobust(),
imputeRobustChain(),
irmi(),
kNN(),
matchImpute(),
medianSamp(),
rangerImpute(),
regressionImp(),
sampleCat(),
vimmi,
vimpute(),
xgboostImpute()