Adjust sampling weights to given totals based on household-level and/or individual level constraints.

ipf( dat, hid = NULL, conP = NULL, conH = NULL, epsP = 1e-06, epsH = 0.01, verbose = FALSE, w = NULL, bound = 4, maxIter = 200, meanHH = TRUE, allPthenH = TRUE, returnNA = TRUE, looseH = FALSE, numericalWeighting = computeLinear, check_hh_vars = TRUE, conversion_messages = FALSE, nameCalibWeight = "calibWeight", minMaxTrim = NULL )

dat | a |
---|---|

hid | name of the column containing the household-ids within |

conP | list or (partly) named list defining the constraints on person
level. The list elements are contingency tables in array representation
with dimnames corresponding to the names of the relevant calibration
variables in |

conH | list or (partly) named list defining the constraints on
household level. The list elements are contingency tables in array
representation with dimnames corresponding to the names of the relevant
calibration variables in |

epsP | numeric value or list (of numeric values and/or arrays)
specifying the convergence limit(s) for |

epsH | numeric value or list (of numeric values and/or arrays)
specifying the convergence limit(s) for |

verbose | if TRUE, some progress information will be printed. |

w | name if the column containing the base weights within |

bound | numeric value specifying the multiplier for determining the
weight trimming boundary if the change of the base weights should be
restricted, i.e. if the weights should stay between 1/ |

maxIter | numeric value specifying the maximum number of iterations that should be performed. |

meanHH | if TRUE, every person in a household is assigned the mean of
the person weights corresponding to the household. If |

allPthenH | if TRUE, all the person level calibration steps are
performed before the houshold level calibration steps (and |

returnNA | if TRUE, the calibrated weight will be set to NA in case of no convergence. |

looseH | if FALSE, the actual constraints |

numericalWeighting | |

check_hh_vars | If |

conversion_messages | show a message, if inputs need to be reformatted. This can be useful for speed optimizations if ipf is called several times with similar inputs (for example bootstrapping) |

nameCalibWeight | character defining the name of the variable for the newly generated calibrated weight. |

minMaxTrim | numeric vector of length2, first element a minimum value for weights to be trimmed to, second element a maximum value for weights to be trimmed to. |

The function will return the input data `dat`

with the calibrated
weights `calibWeight`

as an additional column as well as attributes. If no
convergence has been reached in `maxIter`

steps, and `returnNA`

is `TRUE`

(the default), the column `calibWeights`

will only consist of `NA`

s. The
attributes of the table are attributes derived from the `data.table`

class
as well as the following.

`converged` | Did the algorithm converge in `maxIter` steps? |

`iterations` | The number of iterations performed. |

`conP` , `conH` , `epsP` , `epsH` | See Arguments. |

`conP_adj` , `conH_adj` | Adjusted versions of `conP` and `conH` |

`formP` , `formH` | Formulas that were used to calculate `conP_adj` and
`conH_adj` based on the output table. |

This function implements the weighting procedure described
here.
Usage examples can be found in the corresponding vignette
(`vignette("ipf")`

).

`conP`

and `conH`

are contingency tables, which can be created with `xtabs`

.
The `dimnames`

of those tables should match the names and levels of the
corresponding columns in `dat`

.

`maxIter`

, `epsP`

and `epsH`

are the stopping criteria. `epsP`

and `epsH`

describe relative tolerances in the sense that
$$1-epsP < \frac{w_{i+1}}{w_i} < 1+epsP$$
will be used as convergence criterium. Here i is the iteration step and wi is
the weight of a specific person at step i.

The algorithm
performs best if all varables occuring in the constraints (`conP`

and `conH`

)
as well as the household variable are coded as `factor`

-columns in `dat`

.
Otherwise, conversions will be necessary which can be monitored with the
`conversion_messages`

argument. Setting `check_hh_vars`

to `FALSE`

can also
incease the performance of the scheme.

Alexander Kowarik, Gregor de Cillia

if (FALSE) { # load data eusilc <- demo.eusilc(n = 1, prettyNames = TRUE) # personal constraints conP1 <- xtabs(pWeight ~ age, data = eusilc) conP2 <- xtabs(pWeight ~ gender + region, data = eusilc) conP3 <- xtabs(pWeight*eqIncome ~ gender, data = eusilc) # household constraints conH1 <- xtabs(pWeight ~ hsize + region, data = eusilc) # simple usage ------------------------------------------ calibweights1 <- ipf( eusilc, conP = list(conP1, conP2, eqIncome = conP3), bound = NULL, verbose = TRUE ) # compare personal weight with the calibweigth calibweights1[, .(hid, pWeight, calibWeight)] # advanced usage ---------------------------------------- # use an array of tolerances epsH1 <- conH1 epsH1[1:4, ] <- 0.005 epsH1[5, ] <- 0.2 # create an initial weight for the calibration eusilc[, regSamp := .N, by = region] eusilc[, regPop := sum(pWeight), by = region] eusilc[, baseWeight := regPop/regSamp] calibweights2 <- ipf( eusilc, conP = list(conP1, conP2), conH = list(conH1), epsP = 1e-6, epsH = list(epsH1), bound = 4, w = "baseWeight", verbose = TRUE ) # show an adjusted version of conP and the original attr(calibweights2, "conP_adj") attr(calibweights2, "conP") }