Draw bootstrap replicates from survey data using the rescaled bootstrap for stratified multistage sampling, presented by Preston, J. (2009).
rescaled.bootstrap( dat, REP = 1000, strata = "DB050>1", cluster = "DB060>DB030", fpc = "N.cluster>N.households", single.PSU = c("merge", "mean"), return.value = c("data", "replicates"), check.input = TRUE, new.method = FALSE )
dat | either data frame or data table containing the survey sample |
---|---|
REP | integer indicating the number of bootstraps to be drawn |
strata | string specifying the column name in |
cluster | string specifying the column name in |
fpc | string specifying the column name in |
single.PSU | either "merge" or "mean" defining how single PSUs need to
be dealt with. For |
return.value | either "data" or "replicates" specifying the return value
of the function. For "data" the survey data is returned as class
|
check.input | logical, if TRUE the input will be checked before applying the bootstrap procedure |
new.method | logical, if TRUE bootstrap replicates will never be negative even if in some strata the whole population is in the sample. WARNING: This is still experimental and resulting standard errors might be underestimated! Use this if for some strata the whole population is in the sample! |
returns the complete data set including the bootstrap replicates or
just the bootstrap replicates, depending on return.value="data"
or
return.value="replicates"
respectively.
For specifying multistage sampling designs the column names in
strata
,cluster
and fpc
need to seperated by ">".
For multistage sampling the strings are read from left to right meaning that
the column name before the first ">" is taken as the column for
stratification/clustering/number of PSUs at the first and the column after
the last ">" is taken as the column for stratification/clustering/number of
PSUs at the last stage.
If for some stages the sample was not stratified or clustered one must
specify this by "1" or "I", e.g. strata=c("strata1>I>strata3")
if there was
no stratification at the second stage or cluster=c("cluster1>cluster2>I")
if there were no clusters at the last stage.
The number of PSUs at each stage is not calculated internally and must be
specified for any sampling design.
For single stage sampling using stratification this can usually be done by
adding over all sample weights of each PSU by each strata-code.
Spaces in each of the strings will be removed, so if column names contain
spaces they should be renamed before calling this procedure!
Preston, J. (2009). Rescaled bootstrap for stratified multistage sampling. Survey Methodology. 35. 227-234.
Johannes Gussenbauer, Statistics Austria
data(eusilc, package = "laeken") data.table::setDT(eusilc) eusilc[,N.households:=sum(db090[!duplicated(db030)]),by=db040]#> db030 hsize db040 rb030 age rb090 pl030 pb220a py010n py050n #> 1: 1 3 Tyrol 101 34 female 2 AT 9756.25 0 #> 2: 1 3 Tyrol 102 39 male 1 Other 12471.60 0 #> 3: 1 3 Tyrol 103 2 male <NA> <NA> NA NA #> 4: 2 4 Tyrol 201 38 female 7 AT 12487.03 0 #> 5: 2 4 Tyrol 202 43 male 1 AT 42821.23 0 #> --- #> 14823: 5997 4 Lower Austria 599704 16 female 4 AT 0.00 0 #> 14824: 5998 1 Upper Austria 599801 38 female 1 AT 13962.56 0 #> 14825: 5999 1 Tyrol 599901 31 male 1 AT 14685.18 0 #> 14826: 6000 2 Tyrol 600001 60 male 1 AT 20606.82 0 #> 14827: 6000 2 Tyrol 600002 53 female 6 AT 0.00 0 #> py090n py100n py110n py120n py130n py140n hy040n hy050n hy070n hy080n #> 1: 0.00 0 0 0 0 0 4273.9 2428.11 0 0 #> 2: 0.00 0 0 0 0 0 4273.9 2428.11 0 0 #> 3: NA NA NA NA NA NA 4273.9 2428.11 0 0 #> 4: 0.00 0 0 0 0 0 0.0 1549.72 0 0 #> 5: 0.00 0 0 0 0 0 0.0 1549.72 0 0 #> --- #> 14823: 0.00 0 0 0 0 0 0.0 1955.19 0 0 #> 14824: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14825: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14826: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14827: 3825.63 0 0 0 0 0 0.0 0.00 0 0 #> hy090n hy110n hy130n hy145n eqSS eqIncome db090 rb050 N.households #> 1: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 2: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 3: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 4: 2.13 0 0 0 2.1 27076.24 493.3824 493.3824 279017 #> 5: 2.13 0 0 0 2.1 27076.24 493.3824 493.3824 279017 #> --- #> 14823: 0.00 0 0 0 2.5 26508.20 556.4260 556.4260 647361 #> 14824: 424.85 0 0 0 1.0 14387.41 643.2557 643.2557 567011 #> 14825: 120.65 0 0 0 1.0 14805.83 679.7288 679.7288 279017 #> 14826: 0.00 0 0 0 1.5 16288.30 567.1544 567.1544 279017 #> 14827: 0.00 0 0 0 1.5 16288.30 567.1544 567.1544 279017eusilc.bootstrap <- rescaled.bootstrap(eusilc,REP=100,strata="db040", cluster="db030",fpc="N.households") eusilc[,new_strata:=paste(db040,hsize,sep="_")]#> db030 hsize db040 rb030 age rb090 pl030 pb220a py010n py050n #> 1: 1 3 Tyrol 101 34 female 2 AT 9756.25 0 #> 2: 1 3 Tyrol 102 39 male 1 Other 12471.60 0 #> 3: 1 3 Tyrol 103 2 male <NA> <NA> NA NA #> 4: 2 4 Tyrol 201 38 female 7 AT 12487.03 0 #> 5: 2 4 Tyrol 202 43 male 1 AT 42821.23 0 #> --- #> 14823: 5997 4 Lower Austria 599704 16 female 4 AT 0.00 0 #> 14824: 5998 1 Upper Austria 599801 38 female 1 AT 13962.56 0 #> 14825: 5999 1 Tyrol 599901 31 male 1 AT 14685.18 0 #> 14826: 6000 2 Tyrol 600001 60 male 1 AT 20606.82 0 #> 14827: 6000 2 Tyrol 600002 53 female 6 AT 0.00 0 #> py090n py100n py110n py120n py130n py140n hy040n hy050n hy070n hy080n #> 1: 0.00 0 0 0 0 0 4273.9 2428.11 0 0 #> 2: 0.00 0 0 0 0 0 4273.9 2428.11 0 0 #> 3: NA NA NA NA NA NA 4273.9 2428.11 0 0 #> 4: 0.00 0 0 0 0 0 0.0 1549.72 0 0 #> 5: 0.00 0 0 0 0 0 0.0 1549.72 0 0 #> --- #> 14823: 0.00 0 0 0 0 0 0.0 1955.19 0 0 #> 14824: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14825: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14826: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14827: 3825.63 0 0 0 0 0 0.0 0.00 0 0 #> hy090n hy110n hy130n hy145n eqSS eqIncome db090 rb050 N.households #> 1: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 2: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 3: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 4: 2.13 0 0 0 2.1 27076.24 493.3824 493.3824 279017 #> 5: 2.13 0 0 0 2.1 27076.24 493.3824 493.3824 279017 #> --- #> 14823: 0.00 0 0 0 2.5 26508.20 556.4260 556.4260 647361 #> 14824: 424.85 0 0 0 1.0 14387.41 643.2557 643.2557 567011 #> 14825: 120.65 0 0 0 1.0 14805.83 679.7288 679.7288 279017 #> 14826: 0.00 0 0 0 1.5 16288.30 567.1544 567.1544 279017 #> 14827: 0.00 0 0 0 1.5 16288.30 567.1544 567.1544 279017 #> new_strata #> 1: Tyrol_3 #> 2: Tyrol_3 #> 3: Tyrol_3 #> 4: Tyrol_4 #> 5: Tyrol_4 #> --- #> 14823: Lower Austria_4 #> 14824: Upper Austria_1 #> 14825: Tyrol_1 #> 14826: Tyrol_2 #> 14827: Tyrol_2#> db030 hsize db040 rb030 age rb090 pl030 pb220a py010n py050n #> 1: 1 3 Tyrol 101 34 female 2 AT 9756.25 0 #> 2: 1 3 Tyrol 102 39 male 1 Other 12471.60 0 #> 3: 1 3 Tyrol 103 2 male <NA> <NA> NA NA #> 4: 2 4 Tyrol 201 38 female 7 AT 12487.03 0 #> 5: 2 4 Tyrol 202 43 male 1 AT 42821.23 0 #> --- #> 14823: 5997 4 Lower Austria 599704 16 female 4 AT 0.00 0 #> 14824: 5998 1 Upper Austria 599801 38 female 1 AT 13962.56 0 #> 14825: 5999 1 Tyrol 599901 31 male 1 AT 14685.18 0 #> 14826: 6000 2 Tyrol 600001 60 male 1 AT 20606.82 0 #> 14827: 6000 2 Tyrol 600002 53 female 6 AT 0.00 0 #> py090n py100n py110n py120n py130n py140n hy040n hy050n hy070n hy080n #> 1: 0.00 0 0 0 0 0 4273.9 2428.11 0 0 #> 2: 0.00 0 0 0 0 0 4273.9 2428.11 0 0 #> 3: NA NA NA NA NA NA 4273.9 2428.11 0 0 #> 4: 0.00 0 0 0 0 0 0.0 1549.72 0 0 #> 5: 0.00 0 0 0 0 0 0.0 1549.72 0 0 #> --- #> 14823: 0.00 0 0 0 0 0 0.0 1955.19 0 0 #> 14824: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14825: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14826: 0.00 0 0 0 0 0 0.0 0.00 0 0 #> 14827: 3825.63 0 0 0 0 0 0.0 0.00 0 0 #> hy090n hy110n hy130n hy145n eqSS eqIncome db090 rb050 N.households #> 1: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 2: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 3: 33.39 0 0 0 1.8 16090.69 504.5696 504.5696 279017 #> 4: 2.13 0 0 0 2.1 27076.24 493.3824 493.3824 279017 #> 5: 2.13 0 0 0 2.1 27076.24 493.3824 493.3824 279017 #> --- #> 14823: 0.00 0 0 0 2.5 26508.20 556.4260 556.4260 647361 #> 14824: 424.85 0 0 0 1.0 14387.41 643.2557 643.2557 567011 #> 14825: 120.65 0 0 0 1.0 14805.83 679.7288 679.7288 279017 #> 14826: 0.00 0 0 0 1.5 16288.30 567.1544 567.1544 279017 #> 14827: 0.00 0 0 0 1.5 16288.30 567.1544 567.1544 279017 #> new_strata N.housholds #> 1: Tyrol_3 39861 #> 2: Tyrol_3 39861 #> 3: Tyrol_3 39861 #> 4: Tyrol_4 50325 #> 5: Tyrol_4 50325 #> --- #> 14823: Lower Austria_4 94036 #> 14824: Upper Austria_1 168533 #> 14825: Tyrol_1 80208 #> 14826: Tyrol_2 84506 #> 14827: Tyrol_2 84506eusilc.bootstrap <- rescaled.bootstrap(eusilc,REP=100,strata=c("new_strata"), cluster="db030",fpc="N.households")