## ✔ Key could be verified via a test request
## ℹ The provided key will be available for this R session
## ℹ Add `STATCUBE_KEY_EXT = XXXX` to "~/.Renviron" to set the key
## persistently. Replace `XXXX` with your key
od_table()
makes it easy to import datasets from
data.statistik.gv.at
into your R sessions. This function downloads csv sources from the
fileserver. This means that no API key is required to use STATcubeR with
datasets from the OGD portal.
In this example, we will use a
data set about cancer statistics.
The dataset id "OGD_krebs_ext_KREBS_1"
can be extracted
from the url and will be used in the data import.
Import and overview
To import a dataset, provide the dataset id as an argument.
table <- od_table("OGD_krebs_ext_KREBS_1")
This returns an object of class od_table
, which
bundles all the data from the OGD portal that corresponds to this
dataset. Printing the object will show a summary of the contents.
table
#> Cancer statistics by reporting year, province of residence and
#> localisation of cancer
#>
#> Dataset: OGD_krebs_ext_KREBS_1 (data.statistik.gv.at)
#> Measures: Number of records F-KRE
#> Fields: Tumore ICD/10 3-Steller <98>, Reporting year <40>, Province of
#> residence <9>, Sex <2>
#>
#> Request: [2024-11-29 09:55:39.598241]
#> STATcubeR: 1.0.0
The dataset contains the number of cancer patients by several classification fields
- tumor type differentiates <95> types of cancers
- The reporting period spans <37> years (1983 to 2019).
- The regional variable contains the <9> NUTS-2 regions of Austria.
- The demographic variable “Sex” is reported with <2> levels
Convert to a data frame
The method $tabulate()
can be used to turn the object
into a data.frame
in long format, which contains labeled
data.
table$tabulate()
# A STATcubeR tibble: 49,190 x 5
`Tumore ICD/10 3-Steller` `Reporting year` Province of residenc…¹ Sex
* <fct> <date> <fct> <fct>
1 <C00> Bösartige Neubildung de… 1983-01-01 "Burgenland " male
2 <C00> Bösartige Neubildung de… 1983-01-01 "Carinthia" male
3 <C00> Bösartige Neubildung de… 1983-01-01 "Carinthia" female
4 <C00> Bösartige Neubildung de… 1983-01-01 "Lower Austria" male
5 <C00> Bösartige Neubildung de… 1983-01-01 "Lower Austria" female
6 <C00> Bösartige Neubildung de… 1983-01-01 "Upper Austria" male
7 <C00> Bösartige Neubildung de… 1983-01-01 "Upper Austria" female
8 <C00> Bösartige Neubildung de… 1983-01-01 "Salzburg" male
9 <C00> Bösartige Neubildung de… 1983-01-01 "Styria" male
10 <C00> Bösartige Neubildung de… 1983-01-01 "Styria" female
# ℹ 49,180 more rows
# ℹ abbreviated name: ¹`Province of residence`
# ℹ 1 more variable: `Number of records F-KRE` <int>
The dataset contains 49190 rows. If every combination of tumor type, year, region and sex would contain a separate row the number of rows would be the following.
This means that the table is fairly dense. But this might not be the case for other OGD datasets.
Metadata
This section will show the different metadata components contained in the table object and how they relate to the resources on the OGD server.
table$resources$name
#> [1] "OGD_krebs_ext_KREBS_1.json"
#> [2] "OGD_krebs_ext_KREBS_1.csv"
#> [3] "OGD_krebs_ext_KREBS_1_HEADER.csv"
#> [4] "OGD_krebs_ext_KREBS_1_C-TUM_ICD10_3ST-0.csv"
#> [5] "OGD_krebs_ext_KREBS_1_C-BERJ-0.csv"
#> [6] "OGD_krebs_ext_KREBS_1_C-BUNDESLAND-0.csv"
#> [7] "OGD_krebs_ext_KREBS_1_C-KRE_GESCHLECHT-0.csv"
#> attr(,"class")
#> [1] "ogd_file" "character"
Header
The labels for the columns of the data.frame representation are
generated from
OGD_krebs_ext_KREBS_1_HEADER.csv
and can be extracted from the table object via $header
.
table$header
# STATcubeR metadata: 5 x 6
code label label_de label_en
<chr> <chr> <chr> <chr>
1 F-KRE NA Anzahl der Datensätze F-KRE Number of records F-KRE
2 C-TUM_ICD10_3ST-0 NA Tumore ICD/10 3-Steller NA
3 C-BERJ-0 NA Berichtsjahr Reporting year
4 C-BUNDESLAND-0 NA Bundesland Province of residence
5 C-KRE_GESCHLECHT-0 NA Geschlecht Sex
# … with 2 more columns: 'de_desc', 'en_desc'
Additional metadata for the columns can be obtained via
$meta
. See the
STATcubeR data article
for more details.
Field infos
The method table$field()
can be used to get information
about specific classification fields. These contain data from
{dataset_id}_{field_code}.csv
. Unlike the metadata in
sc_table
, the od_table
class always contains
German and English labels. Both can be used to label the dataset.
The following call gives access to the German and English labels for
the 95 different tumor types in the "cancer type"
classification. Click "Year"
above to see information about
the years.
table$field("C-TUM_ICD10_3ST-0")
# STATcubeR metadata: 98 x 10
code label
<chr> <chr>
1 TUM_ICD10_3ST-C00 <C00> Bösartige Neubildung der Lippe
2 TUM_ICD10_3ST-C01 <C01> Bösartige Neubildung des Zungengrundes
3 TUM_ICD10_3ST-C02 <C02> Bösartige Neubildung sonstiger und nicht näher bezeic…
4 TUM_ICD10_3ST-C03 <C03> Bösartige Neubildung des Zahnfleisches
5 TUM_ICD10_3ST-C04 <C04> Bösartige Neubildung des Mundbodens
# ℹ 93 more rows
# ℹ 1 more variable: parsed <chr>
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'
The reporting period spans 37 years (1983 to 2019). The
classification elements are parsed into a <date>
format for the <data.frame>
representation.
table$field("C-BERJ-0")
# STATcubeR metadata: 40 x 10
code label parsed
<chr> <chr> <date>
1 BERJ-1983 1983 1983-01-01
2 BERJ-1984 1984 1984-01-01
3 BERJ-1985 1985 1985-01-01
4 BERJ-1986 1986 1986-01-01
5 BERJ-1987 1987 1987-01-01
# ℹ 35 more rows
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'
The regional classification contains 9 elements which correspond to the NUTS2 regions (“Bundesländer”) of Austria.
table$field("C-BUNDESLAND-0")
# STATcubeR metadata: 9 x 10
code label parsed
<chr> <chr> <chr>
1 BUNDESLAND-1 "Burgenland " "Burgenland "
2 BUNDESLAND-2 "Carinthia" "Carinthia"
3 BUNDESLAND-3 "Lower Austria" "Lower Austria"
4 BUNDESLAND-4 "Upper Austria" "Upper Austria"
5 BUNDESLAND-5 "Salzburg" "Salzburg"
# ℹ 4 more rows
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'
Sex is coded as a dichotomous variable with the classification
elements "male"
and "female"
.
table$field("C-KRE_GESCHLECHT-0")
# STATcubeR metadata: 2 x 10
code label parsed
<chr> <chr> <chr>
1 GESCHLECHT-1 male male
2 GESCHLECHT-2 female female
# … with 7 more columns: 'label_de', 'label_en', 'parent', 'de_desc', 'en_desc', 'visible', 'order'
json Metadata
The json metadata file
OGD_krebs_ext_KREBS_1.json
is available via the $json
binding.
table$json
#> Krebsstatistik
#>
#> Krebsstatistik nach Krebslokalisation (ICD10), Geschlecht und
#> Wohnbundesland
#>
#> Measures: Anzahl der Datensätze F-KRE
#> Fields: Tumore ICD/10 3-Steller, Berichtsjahr, Bundesland, Geschlecht
#> Updated: 2024-01-25 16:03:34
#> Tags: Krebsstatistik, Krebslokalisation-ICD10, Geschlecht, Wohnbundesland
#> Categories: Gesundheit
od_json("OGD_veste309_Veste309_1")
#> Verdienststrukturerhebung 2018 Bruttostundenverdienste in EUR nach
#> Staatsangehörigkeit, Bundesland und Beschäftigungsverhältnis
#>
#> Verdienststruktur nach Geschlecht, Staatsangehörigkeit, Bundesland
#> und Beschäftigungsverhältnis
#>
#> Measures: Arithmetisches Mittel, 1. Quartil, 2. Quartil (Median), 3.
#> Quartil, Zahl d unselbst Beschäftigten
#> Fields: Geschlecht, Staatsangehörigkeit, Bundesland (NUTS 2), Form des
#> Beschäftigungsverhältnisses
#> Updated: 2022-03-24 11:29:48
#> Tags: Staatsangehörigkeit, Bundesland, Beschäftigungsverhältnis
#> Categories: Arbeit, Bevölkerung
od_json("OGD_konjunkturmonitor_KonMon_1")
#> Konjunkturmonitor
#>
#> Measures: Produktionsindex Industrie (at; 2021=100), Technische
#> Gesamtproduktion Industrie in Tsd. € (KJE), Umsatzindex Industrie
#> (2021=100), Umsatz Industrie inTsd.€ (KJE), Auftragseingangsindex
#> Industrie (2021=100), Beschäftigtenindex Industrie (2021=100),
#> Beschäftigte Industrie gesamt (KJE), Produktivitätsindex Industrie je
#> unselbständig Beschäftigtem (2021=100), Produktivitätsindex Industrie je
#> geleisteter Arbeitsstunde (2021=100), Erzeugerpreisindex für den
#> Produzierenden Bereich (2021=100; NACE B-E), … (78 more)
#> Fields: Berichtszeitraum, Wertangabe
#> Updated: 2024-11-26 10:00:46
#> Tags: Konjunkturdaten
#> Categories: Wirtschaft und Tourismus, Arbeit, Bevölkerung, Finanzen und
#> Rechnungswesen
This print method only shows part of the metadata. More information can be extracted by using the keys of the json object.
table$json$extras$publisher
#> [1] "Statistik Austria, Guglgasse 13, 1110 Wien, Austria"
table$json$extras$update_frequency
#> [1] "jährlich"
table$json$resources[[1]]$url
#> [1] "https://data.statistik.gv.at/data/OGD_krebs_ext_KREBS_1.csv"
Table Contents
To get the raw microdata from
OGD_krebs_ext_KREBS_1.csv
,
use table$data
. The output is similar to what is returned
from read.csv2("OGD_krebs_ext_KREBS_1.csv")
.
table$data
# A STATcubeR tibble: 49,190 x 5
`C-TUM_ICD10_3ST-0` `C-BERJ-0` `C-BUNDESLAND-0` `C-KRE_GESCHLECHT-0` `F-KRE`
* <fct> <fct> <fct> <fct> <int>
1 TUM_ICD10_3ST-C00 BERJ-1983 BUNDESLAND-1 GESCHLECHT-1 2
2 TUM_ICD10_3ST-C00 BERJ-1983 BUNDESLAND-2 GESCHLECHT-1 8
3 TUM_ICD10_3ST-C00 BERJ-1983 BUNDESLAND-2 GESCHLECHT-2 2
4 TUM_ICD10_3ST-C00 BERJ-1983 BUNDESLAND-3 GESCHLECHT-1 6
5 TUM_ICD10_3ST-C00 BERJ-1983 BUNDESLAND-3 GESCHLECHT-2 2
# ℹ 49,185 more rows
od_table()
makes sure that the levels of all factor
columns are in the same order as in the metadata.
levels(table$data$`C-BUNDESLAND-0`) == table$field("C-BUNDESLAND-0")$code
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
As mentioned above, a labeled version of the data can be obtained via
table$tabulate()
. The labeling is done by taking the raw
dataset and then joining the labels from $header
and
$field()
.
table$tabulate()
# A STATcubeR tibble: 49,190 x 5
`Tumore ICD/10 3-Steller` `Reporting year` Province of residenc…¹ Sex
* <fct> <date> <fct> <fct>
1 <C00> Bösartige Neubildung der… 1983-01-01 "Burgenland " male
2 <C00> Bösartige Neubildung der… 1983-01-01 "Carinthia" male
3 <C00> Bösartige Neubildung der… 1983-01-01 "Carinthia" female
4 <C00> Bösartige Neubildung der… 1983-01-01 "Lower Austria" male
5 <C00> Bösartige Neubildung der… 1983-01-01 "Lower Austria" female
# ℹ 49,185 more rows
# ℹ abbreviated name: ¹`Province of residence`
# ℹ 1 more variable: `Number of records F-KRE` <int>
Time variables are converted into a <date>
format
if they satisfy certain STATcube standards. You can read more about
$tabulate()
in the
tabulation article.
A Trip to Germany
It is possible to switch the language used for labeling the dataset
using the $language
field. This field can be used to get
and set the language. Allowed options are "en"
for English
and "de"
for German.
table$language
#> [1] "en"
table$language <- "de"
table$language
#> [1] "de"
This option affects the print()
method as well as the
output of $tabulate()
. If no English labels are available,
the German labels are used as a fallback mechanism.
table
#> Krebsstatistik
#>
#> Dataset: OGD_krebs_ext_KREBS_1 (data.statistik.gv.at)
#> Measures: Anzahl der Datensätze F-KRE
#> Fields: Tumore ICD/10 3-Steller <98>, Berichtsjahr <40>, Bundesland <9>,
#> Geschlecht <2>
#>
#> Request: [2024-11-29 09:55:39.598241]
#> STATcubeR: 1.0.0
table$tabulate()
# A STATcubeR tibble: 49,190 x 5
`Tumore ICD/10 3-Steller` Berichtsjahr Bundesland Geschlecht
* <fct> <date> <fct> <fct>
1 <C00> Bösartige Neubildung der Lippe 1983-01-01 Burgenland männlich
2 <C00> Bösartige Neubildung der Lippe 1983-01-01 Kärnten männlich
3 <C00> Bösartige Neubildung der Lippe 1983-01-01 Kärnten weiblich
4 <C00> Bösartige Neubildung der Lippe 1983-01-01 Niederösterreich männlich
5 <C00> Bösartige Neubildung der Lippe 1983-01-01 Niederösterreich weiblich
# ℹ 49,185 more rows
# ℹ 1 more variable: `Anzahl der Datensätze F-KRE` <int>
Further reading
- See the
available datasets article
to list all datasets that are compatible with
od_table()
. - Open data datasets often contain a large amount of rows. Check out the tabulation article to see how they can be summarized into a more compact form.
- STATcubeR caches all files requested from the server under the hood. The caching article explains where and how those caches are stored.