Error estimation

For the most part, this document will present the functionalities of the function surveysd::calc.stError() which generates point estimates and standard errors for user-supplied estimation functions.

Prerequisites

In order to use a dataset with calc.stError(), several weight columns have to be present. Each weight column corresponds to a bootstrap sample. In the following examples, we will use the data from demo.eusilc() and attach the bootstrap weights using draw.bootstrap() and recalib(). Please refer to the documentation of those functions for more detail.

library(surveysd)

set.seed(1234)
eusilc <- demo.eusilc(prettyNames = TRUE)
dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
                           strata = "region", period = "year")
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
                          epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[, onePerson := nrow(.SD) == 1, by = .(year, hid)]

## print part of the dataset
dat_boot_calib[1:5, .(year, povertyRisk, eqIncome, onePerson, pWeight, w1, w2, w3, w4, w5)]

##     year povertyRisk eqIncome onePerson  pWeight           w1           w2
##    <num>      <lgcl>    <num>    <lgcl>    <num>        <num>        <num>
## 1:  2010       FALSE 16090.69     FALSE 504.5696 1008.6599936 1012.0421735
## 2:  2010       FALSE 16090.69     FALSE 504.5696 1008.6599936 1012.0421735
## 3:  2010       FALSE 16090.69     FALSE 504.5696 1008.6599936 1012.0421735
## 4:  2010       FALSE 27076.24     FALSE 493.3824    0.4390257    0.4410354
## 5:  2010       FALSE 27076.24     FALSE 493.3824    0.4390257    0.4410354
##           w3           w4          w5
##        <num>        <num>       <num>
## 1: 0.4503017 1039.1858437   0.4516434
## 2: 0.4503017 1039.1858437   0.4516434
## 3: 0.4503017 1039.1858437   0.4516434
## 4: 0.4405081    0.4525917 993.7730645
## 5: 0.4405081    0.4525917 993.7730645

Estimator functions

The parameters fun and var in calc.stError() define the estimator to be used in the error analysis. There are two built-in estimator functions weightedSum() and weightedRatio() which can be used as follows.

povertyRate <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio)
totalIncome <- calc.stError(dat_boot_calib, var = "eqIncome", fun = weightedSum)

Those functions calculate the ratio of persons at risk of poverty (in percent) and the total income. By default, the results are calculated separately for each reference period.

povertyRate$Estimates

## Key: <year, n, N, estimate_type>
##     year     n       N estimate_type val_povertyRisk stE_povertyRisk
##    <num> <int>   <num>        <char>           <num>           <num>
## 1:  2010 14827 8182222        direct        14.44422       0.6456405
## 2:  2011 14827 8182222        direct        14.77393       0.5173877
## 3:  2012 14827 8182222        direct        15.04515       0.4795197
## 4:  2013 14827 8182222        direct        14.89013       0.4722443
## 5:  2014 14827 8182222        direct        15.14556       0.5126322
## 6:  2015 14827 8182222        direct        15.53640       0.4889217
## 7:  2016 14827 8182222        direct        15.08315       0.5773682
## 8:  2017 14827 8182222        direct        15.42019       0.4083678

totalIncome$Estimates

## Key: <year, n, N, estimate_type>
##     year     n       N estimate_type val_eqIncome stE_eqIncome
##    <num> <int>   <num>        <char>        <num>        <num>
## 1:  2010 14827 8182222        direct 162750998071   1224932244
## 2:  2011 14827 8182222        direct 161926931417   1355681349
## 3:  2012 14827 8182222        direct 162576509628   1101849682
## 4:  2013 14827 8182222        direct 163199507862   1245617252
## 5:  2014 14827 8182222        direct 163986275009   1333429301
## 6:  2015 14827 8182222        direct 163416275447   1612218047
## 7:  2016 14827 8182222        direct 162706205137   1529356073
## 8:  2017 14827 8182222        direct 164314959107   1291009642

Columns that use the val_ prefix denote the point estimate belonging to the “main weight” of the dataset, which is pWeight in case of the dataset used here.

Columns with the stE_ prefix denote standard errors calculated with bootstrap replicates. The replicates result in using w1, w2, …, w10 instead of pWeight when applying the estimator.

n denotes the number of observations for the year and N denotes the total weight of those persons.

Custom estimators

In order to define a custom estimator function to be used in fun, the function needs to have at least two arguments like the example below.

## define custom estimator
myWeightedSum <- function(x, w) {
  sum(x*w)
}

## check if results are equal to the one using `surveysd::weightedSum()`
totalIncome2 <- calc.stError(dat_boot_calib, var = "eqIncome", fun = myWeightedSum)
all.equal(totalIncome$Estimates, totalIncome2$Estimates)

## [1] TRUE

The parameters x and w can be assumed to be vectors with equal length with w being numeric weight vector and x being the column defined in the var argument. It will be called once for each period (in this case year) and for each weight column (in this case pWeight, w1, w2, …, w10).

Custom estimators using additional parameters can also be supplied and parameter add.arg can be used to set the additional arguments for the custom estimator.

## use add.arg-argument
fun <- function(x, w, b) {
  sum(x*w*b)
}
add.arg = list(b="onePerson")

err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = fun,
                        period.mean = 0, add.arg=add.arg)
err.est$Estimates

## Key: <year, n, N, estimate_type>
##     year     n       N estimate_type val_povertyRisk stE_povertyRisk
##    <num> <int>   <num>        <char>           <num>           <num>
## 1:  2010 14827 8182222        direct        273683.9       15831.000
## 2:  2011 14827 8182222        direct        261883.6       13695.875
## 3:  2012 14827 8182222        direct        243083.9        5970.467
## 4:  2013 14827 8182222        direct        238004.4       11504.737
## 5:  2014 14827 8182222        direct        218572.1        5377.214
## 6:  2015 14827 8182222        direct        219984.1        9194.169
## 7:  2016 14827 8182222        direct        201753.9        8109.214
## 8:  2017 14827 8182222        direct        196881.2        5987.490

# compare with direct computation
compare.value <- dat_boot_calib[,fun(povertyRisk,pWeight,b=onePerson),
                                 by=c("year")]
all((compare.value$V1-err.est$Estimates$val_povertyRisk)==0)

## [1] TRUE

The above chunk computes the weighted poverty ratio for single person households.

Adjust variable depending on bootstrap weights

In our example the variable povertyRisk is a boolean and is TRUE if the income is less than 60% of the weighted median income. Thus it directly depends on the original weight vector pWeight. To further reduce the estimated error one should calculate for each bootstrap replicate weight $w$ the weighted median income $medIncome_{w}$ and then define $povertyRisk_w$ as

$$ povertyRisk_w = \cases{1 \quad\text{if Income}<0.6\cdot medIncome_{w}\\ 0 \quad\text{else}} $$

The estimator can then be applied to the new variable $povertyRisk_w$ . This can be realized using a custom estimator function.

# custom estimator to first derive poverty threshold 
# and then estimate a weighted ratio
povmd <- function(x, w) {
 md <- laeken::weightedMedian(x, w)*0.6
 pmd60 <- x < md
 # weighted ratio is directly estimated inside the function
 return(sum(w[pmd60])/sum(w)*100)
}

err.est <- calc.stError(
  dat_boot_calib, var = "povertyRisk", fun = weightedRatio,
  fun.adjust.var = povmd, adjust.var = "eqIncome")
err.est$Estimates

## Key: <year, n, N, estimate_type>
##     year     n       N estimate_type val_povertyRisk stE_povertyRisk
##    <num> <int>   <num>        <char>           <num>           <num>
## 1:  2010 14827 8182222        direct        14.44422               0
## 2:  2011 14827 8182222        direct        14.77393               0
## 3:  2012 14827 8182222        direct        15.04515               0
## 4:  2013 14827 8182222        direct        14.89013               0
## 5:  2014 14827 8182222        direct        15.14556               0
## 6:  2015 14827 8182222        direct        15.53640               0
## 7:  2016 14827 8182222        direct        15.08315               0
## 8:  2017 14827 8182222        direct        15.42019               0

The approach shown above is only valid if no grouping variables are supplied (parameter group = NULL). If grouping variables are supplied one should use parameters fun.adjust.var and adjust.var such that the $povertyRisk_w$ is first calculated for each period and then used for each grouping in group.

# using fun.adjust.var and adjust.var to estimate povmd60 indicator
# for each period and bootstrap weight before applying the weightedRatio
povmd2 <- function(x, w) {
 md <- laeken::weightedMedian(x, w)*0.6
 pmd60 <- x < md
 return(as.integer(pmd60))
}

# set adjust.var="eqIncome" so the income vector is used to estimate
# the povmd60 indicator for each bootstrap weight
# and the resulting indicators are passed to function weightedRatio
group <- "gender"
err.est <- calc.stError(
  dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender",
  fun.adjust.var = povmd2, adjust.var = "eqIncome")
err.est$Estimates

## Key: <year, n, N, gender, estimate_type>
##      year     n       N gender estimate_type val_povertyRisk stE_povertyRisk
##     <num> <int>   <num> <fctr>        <char>           <num>           <num>
##  1:  2010  7267 3979572   male        direct        12.02660       0.5340028
##  2:  2010  7560 4202650 female        direct        16.73351       0.4478495
##  3:  2010 14827 8182222   <NA>        direct        14.44422       0.4202035
##  4:  2011  7267 3979572   male        direct        12.81921       0.4726545
##  5:  2011  7560 4202650 female        direct        16.62488       0.5052391
##  6:  2011 14827 8182222   <NA>        direct        14.77393       0.4285908
##  7:  2012  7267 3979572   male        direct        13.76065       0.4920958
##  8:  2012  7560 4202650 female        direct        16.26147       0.2953726
##  9:  2012 14827 8182222   <NA>        direct        15.04515       0.3658453
## 10:  2013  7267 3979572   male        direct        13.88962       0.4660053
## 11:  2013  7560 4202650 female        direct        15.83754       0.4618770
## 12:  2013 14827 8182222   <NA>        direct        14.89013       0.4462450
## 13:  2014  7267 3979572   male        direct        14.50351       0.4831362
## 14:  2014  7560 4202650 female        direct        15.75353       0.5056109
## 15:  2014 14827 8182222   <NA>        direct        15.14556       0.4513984
## 16:  2015  7267 3979572   male        direct        15.12289       0.4648867
## 17:  2015  7560 4202650 female        direct        15.92796       0.3423830
## 18:  2015 14827 8182222   <NA>        direct        15.53640       0.3127915
## 19:  2016  7267 3979572   male        direct        14.57968       0.5532361
## 20:  2016  7560 4202650 female        direct        15.55989       0.4666695
## 21:  2016 14827 8182222   <NA>        direct        15.08315       0.4772422
## 22:  2017  7267 3979572   male        direct        14.94816       0.5151348
## 23:  2017  7560 4202650 female        direct        15.86717       0.5462449
## 24:  2017 14827 8182222   <NA>        direct        15.42019       0.5171668
##      year     n       N gender estimate_type val_povertyRisk stE_povertyRisk

Multiple estimators

In case an estimator should be applied to several columns of the dataset, var can be set to a vector containing all necessary columns.

multipleRates <- calc.stError(dat_boot_calib, var = c("povertyRisk", "onePerson"), fun = weightedRatio)
multipleRates$Estimates

## Key: <year, n, N, estimate_type>
##     year     n       N estimate_type val_povertyRisk stE_povertyRisk
##    <num> <int>   <num>        <char>           <num>           <num>
## 1:  2010 14827 8182222        direct        14.44422       0.7003471
## 2:  2011 14827 8182222        direct        14.77393       0.4938724
## 3:  2012 14827 8182222        direct        15.04515       0.4294296
## 4:  2013 14827 8182222        direct        14.89013       0.4150980
## 5:  2014 14827 8182222        direct        15.14556       0.4092522
## 6:  2015 14827 8182222        direct        15.53640       0.5463951
## 7:  2016 14827 8182222        direct        15.08315       0.5246169
## 8:  2017 14827 8182222        direct        15.42019       0.4284391
##    val_onePerson stE_onePerson
##            <num>         <num>
## 1:      14.85737     0.7003471
## 2:      14.85737     0.4938724
## 3:      14.85737     0.4294296
## 4:      14.85737     0.4150980
## 5:      14.85737     0.4092522
## 6:      14.85737     0.5463951
## 7:      14.85737     0.5246169
## 8:      14.85737     0.4284391

Here we see the relative number of persons at risk of poverty and the relative number of one-person households.

Grouping

The groups argument can be used to calculate estimators for different subsets of the data. This argument can take the grouping variable as a string that refers to a column name (usually a factor) in dat. If set, all estimators are not only split by the reference period but also by the grouping variable. For simplicity, only one reference period of the above data is used.

dat2 <- subset(dat_boot_calib, year == 2010)
for (att  in c("period", "weights", "b.rep"))
  attr(dat2, att) <- attr(dat_boot_calib, att)

To calculate the ratio of persons at risk of poverty for each federal state of Austria, group = "region" can be used.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, group = "region")
povertyRates$Estimates

## Key: <year, n, N, region, estimate_type>
##      year     n       N        region estimate_type val_povertyRisk
##     <num> <int>   <num>        <fctr>        <char>           <num>
##  1:  2010   549  260564    Burgenland        direct        19.53984
##  2:  2010   733  377355    Vorarlberg        direct        16.53731
##  3:  2010   924  535451      Salzburg        direct        13.78734
##  4:  2010  1078  563648     Carinthia        direct        13.08627
##  5:  2010  1317  701899         Tyrol        direct        15.30819
##  6:  2010  2295 1167045        Styria        direct        14.37464
##  7:  2010  2322 1598931        Vienna        direct        17.23468
##  8:  2010  2804 1555709 Lower Austria        direct        13.84362
##  9:  2010  2805 1421620 Upper Austria        direct        10.88977
## 10:  2010 14827 8182222          <NA>        direct        14.44422
##     stE_povertyRisk
##               <num>
##  1:       3.0801817
##  2:       2.3092452
##  3:       2.3830009
##  4:       1.8283667
##  5:       1.5234799
##  6:       1.3010259
##  7:       1.1829950
##  8:       1.1390482
##  9:       1.3639079
## 10:       0.6456405

The last row with region = NA denotes the aggregate over all regions. Note that the columns N and n now show the weighted and unweighted number of persons in each region.

Several grouping variables

In case more than one grouping variable is used, there are several options of calling calc.stError() depending on whether combinations of grouping levels should be regarded or not. We will consider the variables gender and region as our grouping variables and show three options on how calc.stError() can be called.

Option 1: All regions and all genders

Calculate the point estimate and standard error for each region and each gender. The number of rows in the output is therefore

$n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12.$

The last row is again the estimate for the whole period.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = c("gender", "region"))
povertyRates$Estimates

## Key: <year, n, N, gender, region, estimate_type>
##      year     n       N gender        region estimate_type val_povertyRisk
##     <num> <int>   <num> <fctr>        <fctr>        <char>           <num>
##  1:  2010   549  260564   <NA>    Burgenland        direct        19.53984
##  2:  2010   733  377355   <NA>    Vorarlberg        direct        16.53731
##  3:  2010   924  535451   <NA>      Salzburg        direct        13.78734
##  4:  2010  1078  563648   <NA>     Carinthia        direct        13.08627
##  5:  2010  1317  701899   <NA>         Tyrol        direct        15.30819
##  6:  2010  2295 1167045   <NA>        Styria        direct        14.37464
##  7:  2010  2322 1598931   <NA>        Vienna        direct        17.23468
##  8:  2010  2804 1555709   <NA> Lower Austria        direct        13.84362
##  9:  2010  2805 1421620   <NA> Upper Austria        direct        10.88977
## 10:  2010  7267 3979572   male          <NA>        direct        12.02660
## 11:  2010  7560 4202650 female          <NA>        direct        16.73351
## 12:  2010 14827 8182222   <NA>          <NA>        direct        14.44422
##     stE_povertyRisk
##               <num>
##  1:       3.0801817
##  2:       2.3092452
##  3:       2.3830009
##  4:       1.8283667
##  5:       1.5234799
##  6:       1.3010259
##  7:       1.1829950
##  8:       1.1390482
##  9:       1.3639079
## 10:       0.7062000
## 11:       0.6653989
## 12:       0.6456405

Option 2: All combinations of `region` and `gender`

Split the data by all combinations of the two grouping variables. This will result in a larger output-table of the size

$n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + 1) = 1\cdot(9\cdot2 + 1)= 19.$

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list(c("gender", "region")))
povertyRates$Estimates

## Key: <year, n, N, gender, region, estimate_type>
##      year     n         N gender        region estimate_type val_povertyRisk
##     <num> <int>     <num> <fctr>        <fctr>        <char>           <num>
##  1:  2010   261  122741.8   male    Burgenland        direct       17.414524
##  2:  2010   288  137822.2 female    Burgenland        direct       21.432598
##  3:  2010   359  182732.9   male    Vorarlberg        direct       12.973259
##  4:  2010   374  194622.1 female    Vorarlberg        direct       19.883637
##  5:  2010   440  253143.7   male      Salzburg        direct        9.156964
##  6:  2010   484  282307.3 female      Salzburg        direct       17.939382
##  7:  2010   517  268581.4   male     Carinthia        direct       10.552149
##  8:  2010   561  295066.6 female     Carinthia        direct       15.392924
##  9:  2010   650  339566.5   male         Tyrol        direct       12.857542
## 10:  2010   667  362332.5 female         Tyrol        direct       17.604861
## 11:  2010  1128  571011.7   male        Styria        direct       11.671247
## 12:  2010  1132  774405.4   male        Vienna        direct       15.590616
## 13:  2010  1167  596033.3 female        Styria        direct       16.964539
## 14:  2010  1190  824525.6 female        Vienna        direct       18.778813
## 15:  2010  1363  684272.5   male Upper Austria        direct        9.074690
## 16:  2010  1387  772593.2 female Lower Austria        direct       16.372949
## 17:  2010  1417  783115.8   male Lower Austria        direct       11.348283
## 18:  2010  1442  737347.5 female Upper Austria        direct       12.574206
## 19:  2010 14827 8182222.0   <NA>          <NA>        direct       14.444218
##     stE_povertyRisk
##               <num>
##  1:       2.9558393
##  2:       3.5819672
##  3:       2.5260206
##  4:       2.6804099
##  5:       1.9731209
##  6:       2.8596320
##  7:       1.4904660
##  8:       2.4199358
##  9:       1.6992510
## 10:       1.4994375
## 11:       1.4184052
## 12:       1.0860581
## 13:       1.5555397
## 14:       1.5269705
## 15:       1.4812521
## 16:       1.1879052
## 17:       1.3830320
## 18:       1.3266616
## 19:       0.6456405

Option 3: Cobination of Option 1 and Option 2

In this case, the estimates and standard errors are calculated for

every gender,
every region and
every combination of region and gender.

The number of rows in the output is therefore

$n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9\cdot2 + 9 + 2 + 1) = 30.$

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list("gender", "region", c("gender", "region")))
povertyRates$Estimates

## Key: <year, n, N, gender, region, estimate_type>
##      year     n         N gender        region estimate_type val_povertyRisk
##     <num> <int>     <num> <fctr>        <fctr>        <char>           <num>
##  1:  2010   261  122741.8   male    Burgenland        direct       17.414524
##  2:  2010   288  137822.2 female    Burgenland        direct       21.432598
##  3:  2010   359  182732.9   male    Vorarlberg        direct       12.973259
##  4:  2010   374  194622.1 female    Vorarlberg        direct       19.883637
##  5:  2010   440  253143.7   male      Salzburg        direct        9.156964
##  6:  2010   484  282307.3 female      Salzburg        direct       17.939382
##  7:  2010   517  268581.4   male     Carinthia        direct       10.552149
##  8:  2010   549  260564.0   <NA>    Burgenland        direct       19.539837
##  9:  2010   561  295066.6 female     Carinthia        direct       15.392924
## 10:  2010   650  339566.5   male         Tyrol        direct       12.857542
## 11:  2010   667  362332.5 female         Tyrol        direct       17.604861
## 12:  2010   733  377355.0   <NA>    Vorarlberg        direct       16.537310
## 13:  2010   924  535451.0   <NA>      Salzburg        direct       13.787343
## 14:  2010  1078  563648.0   <NA>     Carinthia        direct       13.086268
## 15:  2010  1128  571011.7   male        Styria        direct       11.671247
## 16:  2010  1132  774405.4   male        Vienna        direct       15.590616
## 17:  2010  1167  596033.3 female        Styria        direct       16.964539
## 18:  2010  1190  824525.6 female        Vienna        direct       18.778813
## 19:  2010  1317  701899.0   <NA>         Tyrol        direct       15.308190
## 20:  2010  1363  684272.5   male Upper Austria        direct        9.074690
## 21:  2010  1387  772593.2 female Lower Austria        direct       16.372949
## 22:  2010  1417  783115.8   male Lower Austria        direct       11.348283
## 23:  2010  1442  737347.5 female Upper Austria        direct       12.574206
## 24:  2010  2295 1167045.0   <NA>        Styria        direct       14.374637
## 25:  2010  2322 1598931.0   <NA>        Vienna        direct       17.234683
## 26:  2010  2804 1555709.0   <NA> Lower Austria        direct       13.843623
## 27:  2010  2805 1421620.0   <NA> Upper Austria        direct       10.889773
## 28:  2010  7267 3979571.7   male          <NA>        direct       12.026600
## 29:  2010  7560 4202650.3 female          <NA>        direct       16.733508
## 30:  2010 14827 8182222.0   <NA>          <NA>        direct       14.444218
##      year     n         N gender        region estimate_type val_povertyRisk
##     stE_povertyRisk
##               <num>
##  1:       2.9558393
##  2:       3.5819672
##  3:       2.5260206
##  4:       2.6804099
##  5:       1.9731209
##  6:       2.8596320
##  7:       1.4904660
##  8:       3.0801817
##  9:       2.4199358
## 10:       1.6992510
## 11:       1.4994375
## 12:       2.3092452
## 13:       2.3830009
## 14:       1.8283667
## 15:       1.4184052
## 16:       1.0860581
## 17:       1.5555397
## 18:       1.5269705
## 19:       1.5234799
## 20:       1.4812521
## 21:       1.1879052
## 22:       1.3830320
## 23:       1.3266616
## 24:       1.3010259
## 25:       1.1829950
## 26:       1.1390482
## 27:       1.3639079
## 28:       0.7062000
## 29:       0.6653989
## 30:       0.6456405
##     stE_povertyRisk

Group differences

If differences between groups need to be calculated, e.g difference of poverty rates between gender = "male" and gender = "female", parameter group.diff can be utilised. Setting group.diff = TRUE the differences and the standard error of these differences for all variables defined in groups will be calculated.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = c("gender", "region"),
                             group.diff = TRUE)
povertyRates$Estimates

## Key: <year, n, N, gender, region, estimate_type>
##      year       n         N        gender                        region
##     <num>   <num>     <num>        <fctr>                        <fctr>
##  1:  2010   549.0  260564.0          <NA>                    Burgenland
##  2:  2010   641.0  318959.5          <NA>       Burgenland - Vorarlberg
##  3:  2010   733.0  377355.0          <NA>                    Vorarlberg
##  4:  2010   736.5  398007.5          <NA>         Burgenland - Salzburg
##  5:  2010   813.5  412106.0          <NA>        Burgenland - Carinthia
##  6:  2010   828.5  456403.0          <NA>         Salzburg - Vorarlberg
##  7:  2010   905.5  470501.5          <NA>        Carinthia - Vorarlberg
##  8:  2010   924.0  535451.0          <NA>                      Salzburg
##  9:  2010   933.0  481231.5          <NA>            Burgenland - Tyrol
## 10:  2010  1001.0  549549.5          <NA>          Carinthia - Salzburg
## 11:  2010  1025.0  539627.0          <NA>            Tyrol - Vorarlberg
## 12:  2010  1078.0  563648.0          <NA>                     Carinthia
## 13:  2010  1120.5  618675.0          <NA>              Salzburg - Tyrol
## 14:  2010  1197.5  632773.5          <NA>             Carinthia - Tyrol
## 15:  2010  1317.0  701899.0          <NA>                         Tyrol
## 16:  2010  1422.0  713804.5          <NA>           Burgenland - Styria
## 17:  2010  1435.5  929747.5          <NA>           Burgenland - Vienna
## 18:  2010  1514.0  772200.0          <NA>           Styria - Vorarlberg
## 19:  2010  1527.5  988143.0          <NA>           Vienna - Vorarlberg
## 20:  2010  1609.5  851248.0          <NA>             Salzburg - Styria
## 21:  2010  1623.0 1067191.0          <NA>             Salzburg - Vienna
## 22:  2010  1676.5  908136.5          <NA>    Burgenland - Lower Austria
## 23:  2010  1677.0  841092.0          <NA>    Burgenland - Upper Austria
## 24:  2010  1686.5  865346.5          <NA>            Carinthia - Styria
## 25:  2010  1700.0 1081289.5          <NA>            Carinthia - Vienna
## 26:  2010  1768.5  966532.0          <NA>    Lower Austria - Vorarlberg
## 27:  2010  1769.0  899487.5          <NA>    Upper Austria - Vorarlberg
## 28:  2010  1806.0  934472.0          <NA>                Styria - Tyrol
## 29:  2010  1819.5 1150415.0          <NA>                Tyrol - Vienna
## 30:  2010  1864.0 1045580.0          <NA>      Lower Austria - Salzburg
## 31:  2010  1864.5  978535.5          <NA>      Salzburg - Upper Austria
## 32:  2010  1941.0 1059678.5          <NA>     Carinthia - Lower Austria
## 33:  2010  1941.5  992634.0          <NA>     Carinthia - Upper Austria
## 34:  2010  2060.5 1128804.0          <NA>         Lower Austria - Tyrol
## 35:  2010  2061.0 1061759.5          <NA>         Tyrol - Upper Austria
## 36:  2010  2295.0 1167045.0          <NA>                        Styria
## 37:  2010  2308.5 1382988.0          <NA>               Styria - Vienna
## 38:  2010  2322.0 1598931.0          <NA>                        Vienna
## 39:  2010  2549.5 1361377.0          <NA>        Lower Austria - Styria
## 40:  2010  2550.0 1294332.5          <NA>        Styria - Upper Austria
## 41:  2010  2563.0 1577320.0          <NA>        Lower Austria - Vienna
## 42:  2010  2563.5 1510275.5          <NA>        Upper Austria - Vienna
## 43:  2010  2804.0 1555709.0          <NA>                 Lower Austria
## 44:  2010  2804.5 1488664.5          <NA> Lower Austria - Upper Austria
## 45:  2010  2805.0 1421620.0          <NA>                 Upper Austria
## 46:  2010  7267.0 3979571.7          male                          <NA>
## 47:  2010  7413.5 4091111.0 male - female                          <NA>
## 48:  2010  7560.0 4202650.3        female                          <NA>
## 49:  2010 14827.0 8182222.0          <NA>                          <NA>
##      year       n         N        gender                        region
##        estimate_type val_povertyRisk stE_povertyRisk
##               <char>           <num>           <num>
##  1:           direct     19.53983651       3.0801817
##  2: group difference      3.00252634       3.3440799
##  3:           direct     16.53731017       2.3092452
##  4: group difference      5.75249330       3.5657838
##  5: group difference      6.45356876       3.9849262
##  6: group difference     -2.74996696       2.9858980
##  7: group difference     -3.45104242       3.2549609
##  8:           direct     13.78734321       2.3830009
##  9: group difference      4.23164602       3.4914678
## 10: group difference     -0.70107546       2.6010839
## 11: group difference     -1.22911968       2.7299152
## 12:           direct     13.08626775       1.8283667
## 13: group difference     -1.52084728       1.6908612
## 14: group difference     -2.22192274       2.6477620
## 15:           direct     15.30819049       1.5234799
## 16: group difference      5.16519923       3.2532858
## 17: group difference      2.30515330       3.3224435
## 18: group difference     -2.16267289       3.1952079
## 19: group difference      0.69737304       1.5035744
## 20: group difference     -0.58729407       2.2538369
## 21: group difference     -3.44734000       2.6436214
## 22: group difference      5.69621369       2.8540687
## 23: group difference      8.65006312       3.3806543
## 24: group difference     -1.28836953       2.3703127
## 25: group difference     -4.14841546       2.5772093
## 26: group difference     -2.69368735       1.8955769
## 27: group difference     -5.64753678       2.8355308
## 28: group difference     -0.93355321       1.2952525
## 29: group difference     -1.92649272       1.9246499
## 30: group difference      0.05627961       2.0082683
## 31: group difference      2.89756982       2.8351611
## 32: group difference     -0.75735506       2.3045370
## 33: group difference      2.19649436       2.0993320
## 34: group difference     -1.46456768       1.7378340
## 35: group difference      4.41841710       2.3174786
## 36:           direct     14.37463728       1.3010259
## 37: group difference     -2.86004593       2.2348640
## 38:           direct     17.23468321       1.1829950
## 39: group difference     -0.53101447       1.7555890
## 40: group difference      3.48486389       1.5978948
## 41: group difference     -3.39106040       1.1153839
## 42: group difference     -6.34490982       1.9960346
## 43:           direct     13.84362281       1.1390482
## 44: group difference      2.95384943       1.6636062
## 45:           direct     10.88977339       1.3639079
## 46:           direct     12.02659998       0.7062000
## 47: group difference     -4.70690810       0.4725442
## 48:           direct     16.73350808       0.6653989
## 49:           direct     14.44421817       0.6456405
##        estimate_type val_povertyRisk stE_povertyRisk

The resulting output table contains 49 rows. 12 rows for all the direct estimators

$n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12,$

and another 37 for all the differences within the variable "gender" and "region" seperately. Variable "gender" has 2 unique values (unique(dat2$gender)) resulting in 1 difference, ~ gender = "male" - gender = "female" and variable "region" has 9 unique values (unique(dat2$region)) resulting in

$8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = \sum\limits_{1=1}^{9-1}i = 36$

estimates. Thus the output contains 1 + 36 = 37 estimates with respect to group differences.

If a combintaion of grouping variables is used in group and group.diff = TRUE then differences between combinations will only be calculated if one of the grouping variables differs. For example the difference between the following groups would be calculated

gender = "female" & region = "Vienna" - gender = "male" & region = "Vienna"
gender = "female" & region = "Vienna" - gender = "female" & region = "Salzburg"
gender = "male" & region = "Salzburg" - gender = "female" & region = "Salzburg"

The difference between gender = "female" & region = "Vienna" and gender = "male" & region = "Salzburg" however would not be calculated.

Thus this leads to

$2\cdot(\sum\limits_{1=1}^{9-1}i) + 9\cdot1 = 81$

results with respect to the differences. The Output contains an additional column estimate_type and

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list(c("gender", "region")),
                             group.diff = TRUE)
povertyRates$Estimates[,.N,by=.(estimate_type)]

##       estimate_type     N
##              <char> <int>
## 1:           direct    19
## 2: group difference    81

Differences between survey periods

Differences of estimates between periods can be calculated using parameter period.diff. period.diff expects a character vector (if not NULL) specifying for which periods the differences should be calcualed for. The inputs should be specified in the form "period2" - "period1".

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates

## Key: <year, n, N, estimate_type>
##         year     n       N     estimate_type val_povertyRisk stE_povertyRisk
##       <char> <num>   <num>            <char>           <num>           <num>
## 1:      2014 14827 8182222            direct      15.1455601       0.5126322
## 2:      2015 14827 8182222            direct      15.5364014       0.4889217
## 3: 2015-2014 14827 8182222 period difference       0.3908413       0.4704026
## 4:      2016 14827 8182222            direct      15.0831502       0.5773682
## 5: 2016-2015 14827 8182222 period difference      -0.4532512       0.5194934
## 6:      2017 14827 8182222            direct      15.4201916       0.4083678
## 7: 2017-2016 14827 8182222 period difference       0.3370414       0.4919362

If additional grouping variables are supplied to calc.stError() die differences across periods are also carried out for all variables in group.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             group = "gender",
                             period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates

## Key: <year, n, N, gender, estimate_type>
##          year     n       N gender     estimate_type val_povertyRisk
##        <char> <num>   <num> <fctr>            <char>           <num>
##  1:      2014  7267 3979572   male            direct      14.5035068
##  2:      2014  7560 4202650 female            direct      15.7535328
##  3:      2014 14827 8182222   <NA>            direct      15.1455601
##  4:      2015  7267 3979572   male            direct      15.1228904
##  5:      2015  7560 4202650 female            direct      15.9279630
##  6:      2015 14827 8182222   <NA>            direct      15.5364014
##  7: 2015-2014  7267 3979572   male period difference       0.6193836
##  8: 2015-2014  7560 4202650 female period difference       0.1744301
##  9: 2015-2014 14827 8182222   <NA> period difference       0.3908413
## 10:      2016  7267 3979572   male            direct      14.5796824
## 11:      2016  7560 4202650 female            direct      15.5598937
## 12:      2016 14827 8182222   <NA>            direct      15.0831502
## 13: 2016-2015  7267 3979572   male period difference      -0.5432080
## 14: 2016-2015  7560 4202650 female period difference      -0.3680693
## 15: 2016-2015 14827 8182222   <NA> period difference      -0.4532512
## 16:      2017  7267 3979572   male            direct      14.9481591
## 17:      2017  7560 4202650 female            direct      15.8671684
## 18:      2017 14827 8182222   <NA>            direct      15.4201916
## 19: 2017-2016  7267 3979572   male period difference       0.3684767
## 20: 2017-2016  7560 4202650 female period difference       0.3072748
## 21: 2017-2016 14827 8182222   <NA> period difference       0.3370414
##          year     n       N gender     estimate_type val_povertyRisk
##     stE_povertyRisk
##               <num>
##  1:       0.5780030
##  2:       0.5222374
##  3:       0.5126322
##  4:       0.6723961
##  5:       0.3891856
##  6:       0.4889217
##  7:       0.5155048
##  8:       0.4741321
##  9:       0.4704026
## 10:       0.6775332
## 11:       0.5174284
## 12:       0.5773682
## 13:       0.5447479
## 14:       0.5414803
## 15:       0.5194934
## 16:       0.4374739
## 17:       0.4107798
## 18:       0.4083678
## 19:       0.5387920
## 20:       0.4810226
## 21:       0.4919362
##     stE_povertyRisk

Averages across periods

With parameter period.mean averages across periods are calculated additional. The parameter accepts only odd integer values. The resulting table will contain the direct estimates as well as rolling averages of length period.mean.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.mean = 3)
povertyRates$Estimates

## Key: <year, n, N, estimate_type>
##              year     n       N  estimate_type val_povertyRisk stE_povertyRisk
##            <char> <num>   <num>         <char>           <num>           <num>
## 1:           2014 14827 8182222         direct        15.14556       0.5126322
## 2: 2014_2015_2016 14827 8182222 period average        15.25504       0.4496982
## 3:           2015 14827 8182222         direct        15.53640       0.4889217
## 4: 2015_2016_2017 14827 8182222 period average        15.34658       0.4028768
## 5:           2016 14827 8182222         direct        15.08315       0.5773682
## 6:           2017 14827 8182222         direct        15.42019       0.4083678

if in addition the parameters group and/or period.diff are specified then differences and groupings of averages will be calculated.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.mean = 3, period.diff = "2016 - 2015",
                             group = "gender")
povertyRates$Estimates

## Key: <year, n, N, gender, estimate_type>
##               year     n       N gender                      estimate_type
##             <char> <num>   <num> <fctr>                             <char>
##  1:           2014  7267 3979572   male                             direct
##  2:           2014  7560 4202650 female                             direct
##  3:           2014 14827 8182222   <NA>                             direct
##  4: 2014_2015_2016  7267 3979572   male                     period average
##  5: 2014_2015_2016  7560 4202650 female                     period average
##  6: 2014_2015_2016 14827 8182222   <NA>                     period average
##  7:           2015  7267 3979572   male                             direct
##  8:           2015  7560 4202650 female                             direct
##  9:           2015 14827 8182222   <NA>                             direct
## 10: 2015_2016_2017  7267 3979572   male                     period average
## 11: 2015_2016_2017  7560 4202650 female                     period average
## 12: 2015_2016_2017 14827 8182222   <NA>                     period average
## 13:           2016  7267 3979572   male                             direct
## 14:           2016  7560 4202650 female                             direct
## 15:           2016 14827 8182222   <NA>                             direct
## 16:      2016-2015  7267 3979572   male                  period difference
## 17:      2016-2015  7560 4202650 female                  period difference
## 18:      2016-2015 14827 8182222   <NA>                  period difference
## 19: 2016-2015_mean  7267 3979572   male difference between period averages
## 20: 2016-2015_mean  7560 4202650 female difference between period averages
## 21: 2016-2015_mean 14827 8182222   <NA> difference between period averages
## 22:           2017  7267 3979572   male                             direct
## 23:           2017  7560 4202650 female                             direct
## 24:           2017 14827 8182222   <NA>                             direct
##               year     n       N gender                      estimate_type
##     val_povertyRisk stE_povertyRisk
##               <num>           <num>
##  1:     14.50350682      0.57800303
##  2:     15.75353283      0.52223739
##  3:     15.14556006      0.51263215
##  4:     14.73535987      0.57119384
##  5:     15.74712982      0.39010598
##  6:     15.25503720      0.44969824
##  7:     15.12289042      0.67239613
##  8:     15.92796296      0.38918560
##  9:     15.53640136      0.48892172
## 10:     14.88357729      0.51877021
## 11:     15.78500836      0.32734186
## 12:     15.34658105      0.40287678
## 13:     14.57968239      0.67753324
## 14:     15.55989368      0.51742844
## 15:     15.08315018      0.57736817
## 16:     -0.54320803      0.54474787
## 17:     -0.36806928      0.54148025
## 18:     -0.45325118      0.51949345
## 19:      0.14821741      0.09132724
## 20:      0.03787854      0.11858738
## 21:      0.09154385      0.09465179
## 22:     14.94815906      0.43747385
## 23:     15.86716845      0.41077980
## 24:     15.42019160      0.40836780
##     val_povertyRisk stE_povertyRisk

2025-08-08