Calibrate weights for bootstrap replicates by using iterative proportional updating to match population totals on various household and personal levels.
Arguments
- dat
either data.frame or data.table containing the sample survey for various periods.
- hid
character specifying the name of the column in
dat
containing the household ID.- weights
character specifying the name of the column in
dat
containing the sample weights.- b.rep
character specifying the names of the columns in
dat
containing bootstrap weights which should be recalibratet- period
character specifying the name of the column in
dat
containing the sample period.- conP.var
character vector containig person-specific variables to which weights should be calibrated or a list of such character vectors. Contingency tables for the population are calculated per
period
usingweights
. If a vector is supplied contingency tables will be calculated for each vector entry. If a list is supplied contingency tables will be calculated for each list entry. See examples for more details.- conH.var
character vector containig household-specific variables to which weights should be calibrated or a list of such character vectors. Contingency tables for the population are calculated per
period
usingweights
. If a vector is supplied contingency tables will be calculated for each vector entry. If a list is supplied contingency tables will be calculated for each list entry. See examples for more details.- conP
list or (partly) named list defining the constraints on person level. The list elements are contingency tables in array representation with dimnames corresponding to the names of the relevant calibration variables in
dat
. If a numerical variable is to be calibrated, the respective list element has to be named with the name of that numerical variable. Otherwise the list element shoud NOT be named.- conH
list or (partly) named list defining the constraints on household level. The list elements are contingency tables in array representation with dimnames corresponding to the names of the relevant calibration variables in
dat
. If a numerical variable is to be calibrated, the respective list element has to be named with the name of that numerical variable. Otherwise the list element shoud NOT be named.- epsP
numeric value specifying the convergence limit for
conP.var
orconP
, seeipf()
.- epsH
numeric value specifying the convergence limit for
conH.var
orconH
, seeipf()
.- ...
additional arguments passed on to function
ipf()
from this package.
Value
Returns a data.table containing the survey data as well as the calibrated weights for the bootstrap replicates. The original bootstrap replicates are overwritten by the calibrated weights. If calibration of a bootstrap replicate does not converge the bootsrap weight is not returned and numeration of the returned bootstrap weights is reduced by one.
Details
recalib
takes survey data (dat
) containing the bootstrap replicates
generated by draw.bootstrap and calibrates weights for each bootstrap
replication according to population totals for person- or household-specific
variables. dat
must be household data where household members correspond to multiple
rows with the same household identifier. The data should at least containt
the following columns:
Column indicating the sample period;
Column indicating the household ID;
Column containing the household sample weights;
Columns which contain the bootstrap replicates (see output of draw.bootstrap);
Columns indicating person- or household-specific variables for which sample weight should be adjusted.
For each period and each variable in conP.var
and/or conH.var
contingency
tables are estimated to get margin totals on personal- and/or
household-specific variables in the population.
Afterwards the bootstrap replicates are multiplied with the original sample
weight and the resulting product ist then adjusted using ipf()
to match the
previously calcualted contingency tables. In this process the columns of the
bootstrap replicates are overwritten by the calibrated weights.
See also
ipf()
for more information on iterative
proportional fitting.
Examples
library(surveysd)
library(data.table)
setDTthreads(1)
set.seed(1234)
eusilc <- demo.eusilc(n = 3, prettyNames = TRUE)
dat_boot <- draw.bootstrap(eusilc, REP = 1, hid = "hid",
weights = "pWeight",
strata = "region", period = "year")
# calibrate weight for bootstrap replicates
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
verbose = TRUE)
#> Iteration stopped after 2 steps
#> Convergence reached
# calibrate on other variables
dat_boot_calib <- recalib(dat_boot, conP.var = c("gender", "age"),
conH.var = c("region", "hsize"), verbose = TRUE)
#> Iteration stopped after 3 steps
#> Convergence reached
# supply contingency tables directly
conP1 <- xtabs(pWeight ~ age + year, data = eusilc)
conP2 <- xtabs(pWeight ~ gender + year, data = eusilc)
conH1 <- xtabs(pWeight ~ region + year,
data = eusilc[!duplicated(paste(hid,year))])
conH2 <- xtabs(pWeight ~ hsize + year,
data = eusilc[!duplicated(paste(hid,year))])
conP <- list(conP1,conP2)
conH <- list(conH1,conH2)
dat_boot_calib <- recalib(dat_boot, conP.var = NULL,
conH.var = NULL, conP = conP,
conH = conH, verbose = TRUE)
#> Iteration stopped after 3 steps
#> Convergence reached
# calibrate on gender x age
dat_boot_calib <- recalib(dat_boot, conP.var = list(c("gender", "age")),
conH.var = NULL, verbose = TRUE)
#> Iteration stopped after 6 steps
#> Convergence reached
# identical
conP1 <- xtabs(pWeight ~ age + gender + year, data = eusilc)
conP <- list(conP1)
dat_boot_calib <- recalib(dat_boot, conP.var = NULL,
conH.var = NULL, conP = conP,
conH = NULL, verbose = TRUE)
#> Iteration stopped after 6 steps
#> Convergence reached