The goal of surveysd is to combine all necessary steps to use calibrated bootstrapping with custom estimation functions. This vignette will cover the usage of the most important functions. For insights in the theory used in this package, refer to vignette("methodology")
.
A test data set based on data(eusilc, package = "laeken")
can be created with demo.eusilc()
library(surveysd)
set.seed(1234)
eusilc <- demo.eusilc(n = 2, prettyNames = TRUE)
eusilc[1:5, .(year, povertyRisk, gender, pWeight)]
## year povertyRisk gender pWeight
## 1: 2010 FALSE female 504.5696
## 2: 2010 FALSE male 504.5696
## 3: 2010 FALSE male 504.5696
## 4: 2010 FALSE female 493.3824
## 5: 2010 FALSE male 493.3824
Use stratified resampling without replacement to generate 10 samples. Those samples are consistent with respect to the reference periods.
dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
strata = "region", period = "year")
Calibrate each sample according to the distribution of gender
(on a personal level) and region
(on a household level).
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[1:5, .(year, povertyRisk, gender, pWeight, w1, w2, w3, w4)]
## year povertyRisk gender pWeight w1 w2 w3 w4
## 1: 2010 FALSE female 504.5696 1025.360 0.4581938 0.4456302 0.4520549
## 2: 2010 FALSE male 504.5696 1025.360 0.4581938 0.4456302 0.4520549
## 3: 2010 FALSE male 504.5696 1025.360 0.4581938 0.4456302 0.4520549
## 4: 2011 FALSE female 504.5696 1024.862 0.4721126 0.4582807 0.4608312
## 5: 2011 FALSE male 504.5696 1024.862 0.4721126 0.4582807 0.4608312
Estimate relative amount of persons at risk of poverty per period and gender
.
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender")
err.est$Estimates
## year n N gender val_povertyRisk stE_povertyRisk
## 1: 2010 7267 3979572 male 12.02660 0.5882841
## 2: 2010 7560 4202650 female 16.73351 0.7473909
## 3: 2010 14827 8182222 <NA> 14.44422 0.6626295
## 4: 2011 7267 3979572 male 12.81921 0.6059190
## 5: 2011 7560 4202650 female 16.62488 0.7355060
## 6: 2011 14827 8182222 <NA> 14.77393 0.6631967
The output contains estimates (val_povertyRisk
) as well as standard errors (stE_povertyRisk
) measured in percent. The rows with gender = NA
denotes the aggregate over all genders for the corresponding year.
Estimate relative amount of persons at risk of poverty per period for each region
, gender
, and combination of both.
group <- list("gender", "region", c("gender", "region"))
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group)
head(err.est$Estimates)
## year n N gender region val_povertyRisk stE_povertyRisk
## 1: 2010 261 122741.8 male Burgenland 17.414524 3.831697
## 2: 2010 288 137822.2 female Burgenland 21.432598 3.243412
## 3: 2010 359 182732.9 male Vorarlberg 12.973259 1.869263
## 4: 2010 374 194622.1 female Vorarlberg 19.883637 3.112974
## 5: 2010 440 253143.7 male Salzburg 9.156964 1.809600
## 6: 2010 484 282307.3 female Salzburg 17.939382 2.587059
## skipping 54 more rows