The STATcubeR Data Class • STATcubeR

The class sc_data defines a common interface for open data datasets and responses from the /table endpoint of the STATcube REST API. It defines methods that are applicable to both datasources like aquiring metadata, labeling the data and aggregating results.

Constructing sc_data objects

The sc_data class itself is not exported in STATcubeR. Therefore, objects of the class should be created with one of the following functions

od_table() obtains data from the OGD portal. See the OGD article
sc_table() creates a request against the table endpoint of the STATcube REST API. See the JSON requests article
- sc_table_saved() and sc_table_custom() also use the /table endpoint. However, the request is specified via ids rather than a json file.

To illustrate, we will use one of the OGD datasets to showcase the functionalities of this class. Notice however, that objects created with sc_table() can be used interchangibly.

x <- od_table("OGD_krebs_ext_KREBS_1")

Data

The data from the table can be extracted using the active binding $data. Notice how OGD_krebs_ext_KREBS_1 only includes codes and possibly some totals. The data is always provided in a long format with one column for each field and one column for each measure.

x$data

# A STATcubeR tibble: 46,479 x 5
  `C-TUM_ICD10_3ST-0` `C-BERJ-0` `C-BUNDESLAND-0` `C-KRE_GESCHLECHT-0` `F-KRE`
* <fct>               <fct>      <fct>            <fct>                  <int>
1 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-1     GESCHLECHT-1               2
2 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-2     GESCHLECHT-1               8
3 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-2     GESCHLECHT-2               2
4 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-3     GESCHLECHT-1               6
5 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-3     GESCHLECHT-2               2
# … with 46,474 more rows

It will be explained how labeled data can be obtained in the Tabulation section.

Metadata

Metadata for a sc_data object includes labels and other information that is relevant to correctly parse the raw data. The active binding $meta contains at least the entries $source, $measures and $fields.

Source

The source field contains information about the data source. The most important entries are code and label.

x$meta$source

# STATcubeR metadata: 1 x 7
  code                  label                                              lang 
  <chr>                 <chr>                                              <chr>
1 OGD_krebs_ext_KREBS_1 Cancer statistics by reporting year, province of … en   
# … with 4 more columns: 'label_de', 'label_en', 'requested', 'scr_version'

On the bottom, we see that additional information about the source is available, namely label_en, label_de, etc. These additional metadata entries might not be available for sc_table objects.

Measures

This part of the metadata is a data.frame with one row for each measure. It contains codes and labels as well as the number of NAs found in $data for that particular column.

x$meta$measures

# STATcubeR metadata: 1 x 7
  code  label                     NAs
  <chr> <chr>                   <int>
1 F-KRE Number of records F-KRE     0
# … with 4 more columns: 'label_de', 'label_en', 'de_desc', 'en_desc'

Fields

The fields entry summarizes all classification fields i.e. categorical variables. It includes the codes and labels as well as the total code registered for the particular field.

x$meta$fields

# STATcubeR metadata: 4 x 9
  code               label                   total_code nitems type       
  <chr>              <chr>                   <chr>       <int> <chr>      
1 C-TUM_ICD10_3ST-0  Tumore ICD/10 3-Steller NA             95 Category   
2 C-BERJ-0           Reporting year          NA             37 Time (year)
3 C-BUNDESLAND-0     Province of residence   NA              9 Category   
4 C-KRE_GESCHLECHT-0 Sex                     NA              2 Category   
# … with 4 more columns: 'label_de', 'label_en', 'de_desc', 'en_desc'

Field information

To get more info about specific fields, use the $field() method. This will return all classification elements as a data.frame.

x$field("Tumore")

# STATcubeR metadata: 95 x 10
  code              label                                                       
  <chr>             <chr>                                                       
1 TUM_ICD10_3ST-C00 <C00> Bösartige Neubildung der Lippe                        
2 TUM_ICD10_3ST-C01 <C01> Bösartige Neubildung des Zungengrundes                
3 TUM_ICD10_3ST-C02 <C02> Bösartige Neubildung sonstiger und nicht näher bezeic…
4 TUM_ICD10_3ST-C03 <C03> Bösartige Neubildung des Zahnfleisches                
5 TUM_ICD10_3ST-C04 <C04> Bösartige Neubildung des Mundbodens                   
# … with 90 more rows, and 1 more variable: parsed <chr>
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

x$field("Reporting year")

# STATcubeR metadata: 37 x 10
  code      label parsed    
  <chr>     <chr> <date>    
1 BERJ-1983 1983  1983-01-01
2 BERJ-1984 1984  1984-01-01
3 BERJ-1985 1985  1985-01-01
4 BERJ-1986 1986  1986-01-01
5 BERJ-1987 1987  1987-01-01
# … with 32 more rows
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

x$field("Province")

# STATcubeR metadata: 9 x 10
  code         label           parsed         
  <chr>        <chr>           <chr>          
1 BUNDESLAND-1 "Burgenland "   "Burgenland "  
2 BUNDESLAND-2 "Carinthia"     "Carinthia"    
3 BUNDESLAND-3 "Lower Austria" "Lower Austria"
4 BUNDESLAND-4 "Upper Austria" "Upper Austria"
5 BUNDESLAND-5 "Salzburg"      "Salzburg"     
6 BUNDESLAND-6 "Styria"        "Styria"       
7 BUNDESLAND-7 "Tyrol"         "Tyrol"        
8 BUNDESLAND-8 "Vorarlberg"    "Vorarlberg"   
9 BUNDESLAND-9 "Vienna"        "Vienna"       
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

x$field("Sex")

# STATcubeR metadata: 2 x 10
  code         label  parsed
  <chr>        <chr>  <chr> 
1 GESCHLECHT-1 male   male  
2 GESCHLECHT-2 female female
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

Tabulation

The method $tabulate() can be used to turn sc_table objects into tidy data.frames. See the tabulation article for more defails.

x$tabulate()

# A STATcubeR tibble: 46,479 x 5
  `Tumore ICD/10 3-Steller`            Reportin…¹ Province of r…² Sex    Numbe…³
* <fct>                                <date>     <fct>           <fct>    <int>
1 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Burgenland "   male         2
2 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Carinthia"     male         8
3 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Carinthia"     female       2
4 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Lower Austria" male         6
5 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Lower Austria" female       2
# … with 46,474 more rows, and abbreviated variable names ¹`Reporting year`,
#   ²`Province of residence`, ³`Number of records F-KRE`