The goal of surveysd is to combine all necessary steps to use
calibrated bootstrapping with custom estimation functions. This vignette
will cover the usage of the most important functions. For insights in
the theory used in this package, refer to
vignette("methodology")
.
Load dummy data
A test data set based on
data(eusilc, package = "laeken")
can be created with
demo.eusilc()
library(surveysd)
set.seed(1234)
eusilc <- demo.eusilc(n = 2, prettyNames = TRUE)
eusilc[1:5, .(year, povertyRisk, gender, pWeight)]
## year povertyRisk gender pWeight
## <num> <lgcl> <fctr> <num>
## 1: 2010 FALSE female 504.5696
## 2: 2010 FALSE male 504.5696
## 3: 2010 FALSE male 504.5696
## 4: 2010 FALSE female 493.3824
## 5: 2010 FALSE male 493.3824
Draw bootstrap replicates
Use stratified resampling without replacement to generate 10 samples. Those samples are consistent with respect to the reference periods.
dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
strata = "region", period = "year")
Calibrate bootstrap replicates
Calibrate each sample according to the distribution of
gender
(on a personal level) and region
(on a
household level).
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[1:5, .(year, povertyRisk, gender, pWeight, w1, w2, w3, w4)]
## year povertyRisk gender pWeight w1 w2 w3
## <num> <lgcl> <fctr> <num> <num> <num> <num>
## 1: 2010 FALSE female 504.5696 0.4554265 999.4060 0.4530650
## 2: 2010 FALSE male 504.5696 0.4554265 999.4060 0.4530650
## 3: 2010 FALSE male 504.5696 0.4554265 999.4060 0.4530650
## 4: 2010 FALSE female 493.3824 1003.1461494 976.0975 0.4429306
## 5: 2010 FALSE male 493.3824 1003.1461494 976.0975 0.4429306
## w4
## <num>
## 1: 0.4465668
## 2: 0.4465668
## 3: 0.4465668
## 4: 980.0016943
## 5: 980.0016943
Estimate with respect to a grouping variable
Estimate relative amount of persons at risk of poverty per period and
gender
.
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender")
err.est$Estimates
## Key: <year, n, N, gender, estimate_type>
## year n N gender estimate_type val_povertyRisk stE_povertyRisk
## <num> <int> <num> <fctr> <char> <num> <num>
## 1: 2010 7267 3979572 male direct 12.02660 0.4814898
## 2: 2010 7560 4202650 female direct 16.73351 0.3965652
## 3: 2010 14827 8182222 <NA> direct 14.44422 0.4052157
## 4: 2011 7267 3979572 male direct 12.81921 0.4710175
## 5: 2011 7560 4202650 female direct 16.62488 0.4542767
## 6: 2011 14827 8182222 <NA> direct 14.77393 0.4322991
The output contains estimates (val_povertyRisk
) as well
as standard errors (stE_povertyRisk
) measured in percent.
The rows with gender = NA
denotes the aggregate over all
genders for the corresponding year.
Estimate with respect to several variables
Estimate relative amount of persons at risk of poverty per period for
each region
, gender
, and combination of
both.
group <- list("gender", "region", c("gender", "region"))
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group)
head(err.est$Estimates)
## Key: <year, n, N, gender, region, estimate_type>
## year n N gender region estimate_type val_povertyRisk
## <num> <int> <num> <fctr> <fctr> <char> <num>
## 1: 2010 261 122741.8 male Burgenland direct 17.414524
## 2: 2010 288 137822.2 female Burgenland direct 21.432598
## 3: 2010 359 182732.9 male Vorarlberg direct 12.973259
## 4: 2010 374 194622.1 female Vorarlberg direct 19.883637
## 5: 2010 440 253143.7 male Salzburg direct 9.156964
## 6: 2010 484 282307.3 female Salzburg direct 17.939382
## stE_povertyRisk
## <num>
## 1: 3.543608
## 2: 4.409559
## 3: 1.862601
## 4: 3.161239
## 5: 1.915739
## 6: 2.538307
## skipping 54 more rows