Open Government Data

od_table() makes it easy to import datasets from data.statistik.gv.at into your R sessions. This function downloads csv sources from the fileserver. This means that no API key is required to use STATcubeR with datasets from the OGD portal.

In this example, we will use a data set about cancer statistics. The dataset id "OGD_krebs_ext_KREBS_1" can be extracted from the url and will be used in the data import.

Import and overview

To import a dataset, provide the dataset id as an argument.

table <- od_table("OGD_krebs_ext_KREBS_1")

This returns an object of class od_table, which bundles all the data from the OGD portal that correspons to this dataset. Printing the object will show a summary of the contents.

table

#> Cancer statistics by reporting year, province of residence and
#> localisation of cancer
#> 
#> Dataset: OGD_krebs_ext_KREBS_1 (data.statistik.gv.at)
#> Measures: Number of records F-KRE
#> Fields: Tumore ICD/10 3-Steller <95>, Reporting year <37>, Province of
#>   residence <9>, Sex <2>
#> 
#> Request: [2022-12-20 11:36:01]
#> STATcubeR: 0.5.0.1

The dataset contains the number of cancer patients by several classification fields

tumor type differentiates <95> types of cancers
The reporting period spans <37> years (1983 to 2019).
The regional variable contains the <9> NUTS-2 regions of austria.
The demographic variable “Sex” is reported with <2> levels

Convert to a data frame

The method $tabulate() can be used to turn the object into a data.frame in long format, which contains labled data.

table$tabulate()

# A STATcubeR tibble: 46,479 x 5
   `Tumore ICD/10 3-Steller`           Reportin…¹ Province of r…² Sex    Numbe…³
 * <fct>                               <date>     <fct>           <fct>    <int>
 1 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Burgenland "   male         2
 2 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Carinthia"     male         8
 3 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Carinthia"     female       2
 4 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Lower Austria" male         6
 5 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Lower Austria" female       2
 6 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Upper Austria" male        12
 7 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Upper Austria" female       2
 8 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Salzburg"      male         4
 9 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Styria"        male         5
10 <C00> Bösartige Neubildung der Lip… 1983-01-01 "Styria"        female       2
# … with 46,469 more rows, and abbreviated variable names ¹`Reporting year`,
#   ²`Province of residence`, ³`Number of records F-KRE`

The dataset contains 46479 rows. If every combination of tumor type, year, region and sex would contain a seperate row the number of rows would be the following.

\[ 95\times37\times9\times2 = 63270 \]

This means that the table is fairly dense. But this might not be the case for other OGD datasets.

Metadata

This section will show the different metadata components contained in the table object and how they relate to the resources on the OGD server.

table$resources$name
#> [1] "OGD_krebs_ext_KREBS_1.json"                  
#> [2] "OGD_krebs_ext_KREBS_1.csv"                   
#> [3] "OGD_krebs_ext_KREBS_1_HEADER.csv"            
#> [4] "OGD_krebs_ext_KREBS_1_C-TUM_ICD10_3ST-0.csv" 
#> [5] "OGD_krebs_ext_KREBS_1_C-BERJ-0.csv"          
#> [6] "OGD_krebs_ext_KREBS_1_C-BUNDESLAND-0.csv"    
#> [7] "OGD_krebs_ext_KREBS_1_C-KRE_GESCHLECHT-0.csv"
#> attr(,"class")
#> [1] "ogd_file"  "character"

The labels for the columns of the data.frame representation are generated from OGD_krebs_ext_KREBS_1_HEADER.csv and can be extracted from the table object via $header.

table$header

# STATcubeR metadata: 5 x 6
  code               label label_de                    label_en               
  <chr>              <chr> <chr>                       <chr>                  
1 F-KRE              NA    Anzahl der Datensätze F-KRE Number of records F-KRE
2 C-TUM_ICD10_3ST-0  NA    Tumore ICD/10 3-Steller     NA                     
3 C-BERJ-0           NA    Berichtsjahr                Reporting year         
4 C-BUNDESLAND-0     NA    Bundesland                  Province of residence  
5 C-KRE_GESCHLECHT-0 NA    Geschlecht                  Sex                    
# … with 2 more columns: 'de_desc', 'en_desc'

Additional metadata for the columns can be obtained via $meta. See the STATcubeR data article for more details.

Field infos

The method table$field() can be used to get information about specific classification fields. Thise contain data from {dataset_id}_{field_code}.csv. Unlike the metadata in sc_table, the od_table class always contains german and english labels. Both can be used to label the dataset.

The following call gives access to the german and english labels for the 95 differen tumor types in the "cancer type" classification. Click "Year" above to see information about the years.

table$field("C-TUM_ICD10_3ST-0")

# STATcubeR metadata: 95 x 10
  code              label                                                       
  <chr>             <chr>                                                       
1 TUM_ICD10_3ST-C00 <C00> Bösartige Neubildung der Lippe                        
2 TUM_ICD10_3ST-C01 <C01> Bösartige Neubildung des Zungengrundes                
3 TUM_ICD10_3ST-C02 <C02> Bösartige Neubildung sonstiger und nicht näher bezeic…
4 TUM_ICD10_3ST-C03 <C03> Bösartige Neubildung des Zahnfleisches                
5 TUM_ICD10_3ST-C04 <C04> Bösartige Neubildung des Mundbodens                   
# … with 90 more rows, and 1 more variable: parsed <chr>
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

OGD_krebs_ext_KREBS_1_C-TUM_ICD10_3ST-0.csv

The reporting period spans 37 years (1983 to 2019). The classification elements are parsed into a <date> format for the <data.frame> representation.

table$field("C-BERJ-0")

# STATcubeR metadata: 37 x 10
  code      label parsed    
  <chr>     <chr> <date>    
1 BERJ-1983 1983  1983-01-01
2 BERJ-1984 1984  1984-01-01
3 BERJ-1985 1985  1985-01-01
4 BERJ-1986 1986  1986-01-01
5 BERJ-1987 1987  1987-01-01
# … with 32 more rows
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

OGD_krebs_ext_KREBS_1_C-BERJ-0.csv

The regional classification contains 9 elements which correspond to the NUTS2 regions (“Bundesländer”) of austria.

table$field("C-BUNDESLAND-0")

# STATcubeR metadata: 9 x 10
  code         label           parsed         
  <chr>        <chr>           <chr>          
1 BUNDESLAND-1 "Burgenland "   "Burgenland "  
2 BUNDESLAND-2 "Carinthia"     "Carinthia"    
3 BUNDESLAND-3 "Lower Austria" "Lower Austria"
4 BUNDESLAND-4 "Upper Austria" "Upper Austria"
5 BUNDESLAND-5 "Salzburg"      "Salzburg"     
# … with 4 more rows
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

OGD_krebs_ext_KREBS_1_C-BUNDESLAND-0.csv

Sex is coded as a ditochome variable with the classification elements "male" and "female".

table$field("C-KRE_GESCHLECHT-0")

# STATcubeR metadata: 2 x 10
  code         label  parsed
  <chr>        <chr>  <chr> 
1 GESCHLECHT-1 male   male  
2 GESCHLECHT-2 female female
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'

OGD_krebs_ext_KREBS_1_C-KRE_GESCHLECHT-0.csv

json Metadata

The json metadatafile OGD_krebs_ext_KREBS_1.json is available via the $json binding.

table$json

#> Krebsstatistik
#> 
#> Krebsstatistik nach Krebslokalisation (ICD10), Geschlecht und
#> Wohnbundesland
#> 
#> Measures: Anzahl der Datensätze F-KRE
#> Fields: Tumore ICD/10 3-Steller, Berichtsjahr, Bundesland, Geschlecht
#> Updated: 2022-09-06 14:51:39
#> Tags: Krebsstatistik, Krebslokalisation-ICD10, Geschlecht, Wohnbundesland
#> Categories: Gesundheit

od_json("OGD_veste309_Veste309_1")

#> Verdienststrukturerhebung 2018 Bruttostundenverdienste in EUR nach
#> Staatsangehörigkeit, Bundesland und Beschäftigungsverhältnis
#> 
#> Verdienststruktur nach Geschlecht, Staatsangehörigkeit, Bundesland
#> und Beschäftigungsverhältnis
#> 
#> Measures: Arithmetisches Mittel, 1. Quartil, 2. Quartil (Median), 3.
#>   Quartil, Zahl d unselbst Beschäftigten
#> Fields: Geschlecht, Staatsangehörigkeit, Bundesland (NUTS 2), Form des
#>   Beschäftigungsverhältnisses
#> Updated: 2022-03-24 11:29:48
#> Tags: Staatsangehörigkeit, Bundesland, Beschäftigungsverhältnis
#> Categories: Arbeit, Bevölkerung

od_json("OGD_konjunkturmonitor_KonMon_1")

#> Konjunkturmonitor
#> 
#> Measures: Produktionsindex Industrie (at; 2015=100), Technische
#>   Gesamtproduktion Industrie in Tsd. € (KJE), Umsatzindex Industrie
#>   (2015=100), Umsatz Industrie inTsd.€ (KJE), Auftragseingangsindex
#>   Industrie (2015=100), Beschäftigtenindex Industrie (2015=100),
#>   Beschäftigte Industrie gesamt (KJE), Produktivitätsindex Industrie je
#>   unselbständig Beschäftigtem (2015=100), Produktivitätsindex Industrie je
#>   geleisteter Arbeitsstunde (2015=100), Erzeugerpreisindex für den
#>   Produzierenden Bereich (2021=100; NACE B-E), … (76 more)
#> Fields: Berichtszeitraum, Wertangabe
#> Updated: 2022-12-20 09:25:23
#> Tags: Konjunkturdaten
#> Categories: Wirtschaft und Tourismus, Arbeit, Bevölkerung, Finanzen und
#>   Rechnungswesen

This print method only shows part of the metadata. More information can be extracted by using the keys of the json object.

table$json$extras$publisher
#> [1] "Statistik Austria, Guglgasse 13, 1110 Wien, Austria"
table$json$extras$update_frequency
#> [1] "jährlich"
table$json$resources[[1]]$url
#> [1] "https://data.statistik.gv.at/data/OGD_krebs_ext_KREBS_1.csv"

Table Contents

To get the raw microdata from OGD_krebs_ext_KREBS_1.csv , use table$data. The output is similar to what is returned from read.csv2("OGD_krebs_ext_KREBS_1.csv").

table$data

# A STATcubeR tibble: 46,479 x 5
  `C-TUM_ICD10_3ST-0` `C-BERJ-0` `C-BUNDESLAND-0` `C-KRE_GESCHLECHT-0` `F-KRE`
* <fct>               <fct>      <fct>            <fct>                  <int>
1 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-1     GESCHLECHT-1               2
2 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-2     GESCHLECHT-1               8
3 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-2     GESCHLECHT-2               2
4 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-3     GESCHLECHT-1               6
5 TUM_ICD10_3ST-C00   BERJ-1983  BUNDESLAND-3     GESCHLECHT-2               2
# … with 46,474 more rows

od_table() makes sure that the levels of all factor columns are in the same order as in the metadata.

levels(table$data$`C-BUNDESLAND-0`) == table$field("C-BUNDESLAND-0")$code

#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

As mentioned above, a labelled version of the data can be obtained via table$tabulate(). The labelling is done by taking the raw dataset and then joining the labes from $header and $field().

table$tabulate()

# A STATcubeR tibble: 46,479 x 5
  `Tumore ICD/10 3-Steller`            Reportin…¹ Province of r…² Sex    Numbe…³
* <fct>                                <date>     <fct>           <fct>    <int>
1 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Burgenland "   male         2
2 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Carinthia"     male         8
3 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Carinthia"     female       2
4 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Lower Austria" male         6
5 <C00> Bösartige Neubildung der Lippe 1983-01-01 "Lower Austria" female       2
# … with 46,474 more rows, and abbreviated variable names ¹`Reporting year`,
#   ²`Province of residence`, ³`Number of records F-KRE`

Time variables are converted into a <date> format if they satisfy certain STATcube standards. You can read more about $tabulate() in the tabulation article.

A Trip to Germany

It is possible to switch the language used for labelling the dataset using the $language field. This field can be used to get and set the language. Allowed options are "en" for english and "de" for german.

table$language
#> [1] "en"
table$language <- "de"
table$language
#> [1] "de"

This option affects the print() method as well as the output of $tabulate(). If no english labels are available, the german labels are used as a fallback mechanism.

table

#> Krebsstatistik
#> 
#> Dataset: OGD_krebs_ext_KREBS_1 (data.statistik.gv.at)
#> Measures: Anzahl der Datensätze F-KRE
#> Fields: Tumore ICD/10 3-Steller <95>, Berichtsjahr <37>, Bundesland <9>,
#>   Geschlecht <2>
#> 
#> Request: [2022-12-20 11:36:01]
#> STATcubeR: 0.5.0.1

table$tabulate()

# A STATcubeR tibble: 46,479 x 5
  `Tumore ICD/10 3-Steller`         Berichts…¹ Bundesland       Geschl…² Anzah…³
* <fct>                             <date>     <fct>            <fct>      <int>
1 <C00> Bösartige Neubildung der L… 1983-01-01 Burgenland       männlich       2
2 <C00> Bösartige Neubildung der L… 1983-01-01 Kärnten          männlich       8
3 <C00> Bösartige Neubildung der L… 1983-01-01 Kärnten          weiblich       2
4 <C00> Bösartige Neubildung der L… 1983-01-01 Niederösterreich männlich       6
5 <C00> Bösartige Neubildung der L… 1983-01-01 Niederösterreich weiblich       2
# … with 46,474 more rows, and abbreviated variable names ¹Berichtsjahr,
#   ²Geschlecht, ³`Anzahl der Datensätze F-KRE`