Turn sc_data objects into tidy data frames

sc_tabulate() extracts the data in the table and turns it into a tidy data.frame. It applies labeling of the data and transforms time variables into a Date format if they satisfy certain STATcube Standards.

sc_tabulate(table, ...) is just an alias for table$tabulate(...) and was added so this rather complicated method can have a separate documentation page. It is recommended to use the table$tabulate() syntax

the ... argument decides which measures and/or fields should be included in the output. If no measures are given, all measures are included. The same is true for fields.

Usage

sc_tabulate(
  table,
  ...,
  .list = NULL,
  raw = FALSE,
  parse_time = TRUE,
  recode_zeros = inherits(table, "sc_table"),
  language = NULL,
  sort = FALSE
)

Arguments

table: An object of class sc_data
...: Names of measures and/or fields
.list: allows to define the arguments for ... as a character vector.
raw: If FALSE (the default), apply labeling to the dataset. Otherwise, return codes.
parse_time: Should time variables be converted into a Date format? Ignored if raw is set to TRUE.
recode_zeros: turn zero values into NAs
language: The language to be used for labelling. By default, the dataset language (table$language) is used.
sort: If TRUE, the resulting data will be sorted by all provided field values

Details

Aggregation is done as follows

First, all columns that priovide a total code via table$total_codes() will be used to filter for column == total_code or column != total_code
Then, the remaining data is aggregated using rowsum()

The ellipsis (...) supports partial matching of codes and labels. See Examples

For objects of class sc_table two additional operations are performed.

zeros are recoded to NAs
rounding is done according to the precision of each measure. Ronding happens after the recoding to NA values

Examples

############################ OGD Data #######################################

table <- od_table("OGD_veste309_Veste309_1")

# no arguments -> same output as `table$data`
table$tabulate()
#> # A STATcubeR tibble: 72 x 9
#>    Sex       Citizen…¹ Regio…² Form …³ Arith…⁴ 1st q…⁵ 2nd q…⁶ 3rd q…⁷
#>  * <fct>     <fct>     <fct>   <fct>     <dbl>   <dbl>   <dbl>   <dbl>
#>  1 Sum total Total     Total   "Total"    17.6    11.6    15.1    20.1
#>  2 Sum total Total     Total   "Stand…    19.0    12.8    16.2    21.7
#>  3 Sum total Total     Total   "Non-s…    15.2    10.2    13.0    17.2
#>  4 Sum total Total     Total   "Non-s…    15.9    10.7    13.8    18.3
#>  5 Sum total Total     Total   "Non-s…    16.8    10.2    13.4    18.6
#>  6 Sum total Total     Total   "Non-s…    12       9      10.1    12.3
#>  7 Sum total Total     Total   "Non-s…    14.0    11.5    13.5    15.7
#>  8 Sum total Total     AT11 B… "Total"    15.7    11.0    13.7    17.5
#>  9 Sum total Total     AT12 L… "Total"    16.4    11.4    14.3    18.5
#> 10 Sum total Total     AT13 V… "Total"    19.2    11.4    15.9    22.6
#> # … with 62 more rows, 1 more variable: `Number of employees` <dbl>,
#> #   and abbreviated variable names ¹Citizenship, ²`Region (NUTS2)`,
#> #   ³`Form of employment`, ⁴`Arithmetic mean`, ⁵`1st quartile`,
#> #   ⁶`2nd quartile (median)`, ⁷`3rd quartile`

# provide some fields -> aggregate to keep only these fields
table$tabulate("Sex", "Citizenship")
#> # A STATcubeR tibble: 27 x 7
#>    Sex       Citizenship       Arith…¹ 1st q…² 2nd q…³ 3rd q…⁴ Numbe…⁵
#>  * <fct>     <fct>               <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 Sum total "Total"             266.   181.     230.    301.  8917965
#>  2 Sum total "Austrian"           18.5   12.4     15.9    21.2 2079655
#>  3 Sum total "Non-Austrian "      14.4    9.72    12.3    15.9  571282
#>  4 Sum total "EU-28/EWR/Switz…    15.2   10.0     12.8    16.6  380327
#>  5 Sum total "EU-15/EWR befor…    19.6   11.9     15.9    21.9  122572
#>  6 Sum total "EU-13 EU Member…    13.0    9.59    11.9    14.6  257755
#>  7 Sum total "Former Yugoslav…    12.8    9.58    12.0    14.7   96065
#>  8 Sum total "Turkey"             12.2    9.35    11.3    14.4   34759
#>  9 Sum total "Other"              13.8    9.07    10.9    14.6   60131
#> 10 Male      "Total"             290.   195.     246.    327.  4952512
#> # … with 17 more rows, and abbreviated variable names
#> #   ¹`Arithmetic mean`, ²`1st quartile`, ³`2nd quartile (median)`,
#> #   ⁴`3rd quartile`, ⁵`Number of employees`

# provide some measures -> drop all other measures from the output
table$tabulate("Arithmetic mean")
#> # A STATcubeR tibble: 72 x 5
#>    Sex       Citizenship `Region (NUTS2)`   Form of employme…¹ Arith…²
#>  * <fct>     <fct>       <fct>              <fct>                <dbl>
#>  1 Sum total Total       Total              "Total"               17.6
#>  2 Sum total Total       Total              "Standard employm…    19.0
#>  3 Sum total Total       Total              "Non-standard emp…    15.2
#>  4 Sum total Total       Total              "Non-standard emp…    15.9
#>  5 Sum total Total       Total              "Non-standard emp…    16.8
#>  6 Sum total Total       Total              "Non-standard emp…    12  
#>  7 Sum total Total       Total              "Non-standard emp…    14.0
#>  8 Sum total Total       AT11 Burgenland    "Total"               15.7
#>  9 Sum total Total       AT12 Lower Austria "Total"               16.4
#> 10 Sum total Total       AT13 Vienna        "Total"               19.2
#> # … with 62 more rows, and abbreviated variable names
#> #   ¹`Form of employment`, ²`Arithmetic mean`

# mixture of measures and fields  -> keep exactly those columns
table$tabulate("Sex", "Arithmetic mean")
#> # A STATcubeR tibble: 3 x 2
#>   Sex       `Arithmetic mean`
#> * <fct>                 <dbl>
#> 1 Sum total              385.
#> 2 Male                   418.
#> 3 Female                 342.

## define total codes
table$total_codes(
  `C-A11-0` = "A11-1",
  `C-STAATS-0` = "STAATS-9",
  `C-VEBDL-0` = "VEBDL-10",
  `C-BESCHV-0` = "BESCHV-1"
)

## alternatively, use partial matching to define totals
table$total_codes(
  Sex = "Sum total",
  Citizenship = "Total",
  Region = "Total",
  `Form of employment` = "Total"
)

# filter for totals in `Region (NUTS2)` and `Form of employment`. Drop totals
# in `Sex` and `Citizenship`.
table$tabulate("Sex", "Citizenship")
#> # A STATcubeR tibble: 16 x 7
#>    Sex    Citizenship          Arith…¹ 1st q…² 2nd q…³ 3rd q…⁴ Numbe…⁵
#>  * <fct>  <fct>                  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 Male   "Austrian"              20.4   13.8    17.4     23.5 1190627
#>  2 Male   "Non-Austrian "         15.3   10.6    13.2     16.6  353961
#>  3 Male   "EU-28/EWR/Switzerl…    16.0   10.9    13.5     17.1  236607
#>  4 Male   "EU-15/EWR before 2…    21.2   12.6    16.7     24.0   75300
#>  5 Male   "EU-13 EU Member St…    13.6   10.3    12.7     15.1  161308
#>  6 Male   "Former Yugoslavia …    14.1   11.2    13.6     16.1   56474
#>  7 Male   "Turkey"                13.3   10.4    12.8     15.4   22516
#>  8 Male   "Other"                 14.0    9.07   11.2     14.6   38364
#>  9 Female "Austrian"              15.9   11.1    14.0     18.3  889028
#> 10 Female "Non-Austrian "         13.0    9.19   10.6     14.3  217322
#> 11 Female "EU-28/EWR/Switzerl…    13.7    9.38   11.3     15.5  143719
#> 12 Female "EU-15/EWR before 2…    17.1   11.0    14.7     19.7   47272
#> 13 Female "EU-13 EU Member St…    12.1    9.06   10.3     13.2   96447
#> 14 Female "Former Yugoslavia …    10.9    9       9.78    11.8   39591
#> 15 Female "Turkey"                10.3    8.68    9.51    10.9   12244
#> 16 Female "Other"                 13.3    9.07   10.6     14.6   21767
#> # … with abbreviated variable names ¹`Arithmetic mean`,
#> #   ²`1st quartile`, ³`2nd quartile (median)`, ⁴`3rd quartile`,
#> #   ⁵`Number of employees`

## switch language
table$language <- "de"

## `...` matches for codes and labels
table$tabulate("C-A11-0", "Staats", "2. Quartil (Median)")
#> # A STATcubeR tibble: 16 x 3
#>    Geschlecht Staatsangehörigkeit                         2. Quartil…¹
#>  * <fct>      <fct>                                              <dbl>
#>  1 männlich   "Österreich"                                       17.4 
#>  2 männlich   "Nicht-Österreich"                                 13.2 
#>  3 männlich   "EU-28/EWR/Schweiz (ohne Österreich)"              13.5 
#>  4 männlich   "EU-15/EWR vor 2004/Schweiz "                      16.7 
#>  5 männlich   "EU-13 Beitrittsstaaten ab 2004"                   12.7 
#>  6 männlich   "ehemaliges Jugoslawien (außerhalb der EU)"        13.6 
#>  7 männlich   "Türkei"                                           12.8 
#>  8 männlich   "Andere"                                           11.2 
#>  9 weiblich   "Österreich"                                       14.0 
#> 10 weiblich   "Nicht-Österreich"                                 10.6 
#> 11 weiblich   "EU-28/EWR/Schweiz (ohne Österreich)"              11.3 
#> 12 weiblich   "EU-15/EWR vor 2004/Schweiz "                      14.7 
#> 13 weiblich   "EU-13 Beitrittsstaaten ab 2004"                   10.3 
#> 14 weiblich   "ehemaliges Jugoslawien (außerhalb der EU)"         9.78
#> 15 weiblich   "Türkei"                                            9.51
#> 16 weiblich   "Andere"                                           10.6 
#> # … with abbreviated variable name ¹`2. Quartil (Median)`

## Keep totals in the output by removing total codes
table$tabulate("C-A11-0")      # -> 2 rows: "male" "female"
#> # A STATcubeR tibble: 2 x 6
#>   Geschlecht `Arithmetisches Mittel` 1. Quar…¹ 2. Qu…² 3. Qu…³ Zahl …⁴
#> * <fct>                        <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1 männlich                      19.2      12.8    16.3    22.0 1544588
#> 2 weiblich                      15.3      10.4    13.4    17.6 1106350
#> # … with abbreviated variable names ¹`1. Quartil`,
#> #   ²`2. Quartil (Median)`, ³`3. Quartil`,
#> #   ⁴`Zahl d unselbst Beschäftigten`
table$total_codes(`C-A11-0` = NA)
table$tabulate("C-A11-0")      # -> 3 rows: "total", "male", "female"
#> # A STATcubeR tibble: 3 x 6
#>   Geschlecht `Arithmetisches Mittel` 1. Quar…¹ 2. Qu…² 3. Qu…³ Zahl …⁴
#> * <fct>                        <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1 insgesamt                     17.6      11.6    15.1    20.1 2650938
#> 2 männlich                      19.2      12.8    16.3    22.0 1544588
#> 3 weiblich                      15.3      10.4    13.4    17.6 1106350
#> # … with abbreviated variable names ¹`1. Quartil`,
#> #   ²`2. Quartil (Median)`, ³`3. Quartil`,
#> #   ⁴`Zahl d unselbst Beschäftigten`

## table$tabulate(...) is an alias for sc_tabulate(table, ...)
sc_tabulate(table, "C-A11-0")
#> # A STATcubeR tibble: 3 x 6
#>   Geschlecht `Arithmetisches Mittel` 1. Quar…¹ 2. Qu…² 3. Qu…³ Zahl …⁴
#> * <fct>                        <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1 insgesamt                     17.6      11.6    15.1    20.1 2650938
#> 2 männlich                      19.2      12.8    16.3    22.0 1544588
#> 3 weiblich                      15.3      10.4    13.4    17.6 1106350
#> # … with abbreviated variable names ¹`1. Quartil`,
#> #   ²`2. Quartil (Median)`, ³`3. Quartil`,
#> #   ⁴`Zahl d unselbst Beschäftigten`

######################### STATcube REST API #################################

table_tourism <- sc_example("accomodation.json") %>% sc_table("de")

table_tourism$tabulate()
#> # A STATcubeR tibble: 2,448 x 5
#>    `Saison/Tourismusmonat` Herkunftsland Beherbergun…¹ Übern…² Ankün…³
#>  * <date>                  <fct>         <fct>           <dbl>   <dbl>
#>  1 1999-11-01              Österreich    Hotels gesamt  858602  371215
#>  2 1999-11-01              Österreich    Ferienwohnun…   22398    5365
#>  3 1999-11-01              Österreich    Sonstige       386865   67272
#>  4 1999-11-01              Deutschland   Hotels gesamt  620973  223856
#>  5 1999-11-01              Deutschland   Ferienwohnun…   56012   13095
#>  6 1999-11-01              Deutschland   Sonstige        85583   24262
#>  7 1999-11-01              andere Länder Hotels gesamt  544074  235343
#>  8 1999-11-01              andere Länder Ferienwohnun…   30265    6493
#>  9 1999-11-01              andere Länder Sonstige        41945   14433
#> 10 1999-12-01              Österreich    Hotels gesamt  928834  346040
#> # … with 2,438 more rows, and abbreviated variable names
#> #   ¹Beherbergungsbetrieb, ²Übernachtungen, ³Ankünfte
table_tourism$tabulate("Saison/Tourismusmonat")
#> # A STATcubeR tibble: 272 x 3
#>    `Saison/Tourismusmonat` Übernachtungen Ankünfte
#>  * <date>                           <dbl>    <dbl>
#>  1 1999-11-01                     2646717   961334
#>  2 1999-12-01                     7373034  1771530
#>  3 2000-01-01                    12437812  2462258
#>  4 2000-02-01                    13480000  2532404
#>  5 2000-03-01                    11340174  2273436
#>  6 2000-04-01                     6138146  1666777
#>  7 2000-05-01                     5150347  1673232
#>  8 2000-06-01                     8908931  2438518
#>  9 2000-07-01                    14618147  3118576
#> 10 2000-08-01                    16498662  3248084
#> # … with 262 more rows
table_tourism$tabulate("Saison/Tourismusmonat", "Ankünfte")
#> # A STATcubeR tibble: 272 x 2
#>    `Saison/Tourismusmonat` Ankünfte
#>  * <date>                     <dbl>
#>  1 1999-11-01                961334
#>  2 1999-12-01               1771530
#>  3 2000-01-01               2462258
#>  4 2000-02-01               2532404
#>  5 2000-03-01               2273436
#>  6 2000-04-01               1666777
#>  7 2000-05-01               1673232
#>  8 2000-06-01               2438518
#>  9 2000-07-01               3118576
#> 10 2000-08-01               3248084
#> # … with 262 more rows
table_tourism$tabulate("Ankünfte")
#> # A STATcubeR tibble: 2,448 x 4
#>    `Saison/Tourismusmonat` Herkunftsland Beherbergungsbetrieb  Ankün…¹
#>  * <date>                  <fct>         <fct>                   <dbl>
#>  1 1999-11-01              Österreich    Hotels gesamt          371215
#>  2 1999-11-01              Österreich    Ferienwohnungen insg…    5365
#>  3 1999-11-01              Österreich    Sonstige                67272
#>  4 1999-11-01              Deutschland   Hotels gesamt          223856
#>  5 1999-11-01              Deutschland   Ferienwohnungen insg…   13095
#>  6 1999-11-01              Deutschland   Sonstige                24262
#>  7 1999-11-01              andere Länder Hotels gesamt          235343
#>  8 1999-11-01              andere Länder Ferienwohnungen insg…    6493
#>  9 1999-11-01              andere Länder Sonstige                14433
#> 10 1999-12-01              Österreich    Hotels gesamt          346040
#> # … with 2,438 more rows, and abbreviated variable name ¹Ankünfte

## TODO: param annotations does not work currently
if (FALSE) {
  table_trade <- sc_example("foreign_trade.json") %>% sc_table("de")
  tt <- sc_tabulate(table_trade, "Berichtsjahr", "Import, Wert in Euro",
                    annotations = TRUE)
  tt
  tt[['Import, Wert in Euro_a']] %>% str()
}

Usage

Arguments

Details

See also

Examples