EXPERIMENTAL This function parses several json metadata files at once
and combines them into a data.frame
so the datasets can easily be
filtered based on categorizations, tags, number of classifications, etc.
Arguments
- server
the OGD-server to be used.
"ext"
(the default) for the external server orprod
for the production server- local
If
TRUE
(the default), the catalogue is created based on cached json metadata. Otherwise, the cache is updated prior to creating the catalogue using a "bulk-download" for metadata files.
Value
a data.frame
with the following structure
Column | Type | Description |
title | chr | Title of the dataset |
measures | int | Number of measure variables |
fields | int | Number of classification fields |
modified | datetime | Timestamp when the dataset was last modified |
created | datetime | Timestamp when the dataset was created |
database | chr | ID of the corresponding STATcube database |
title_en | chr | English title |
notes | chr | Description for the dataset |
frequency | chr | How often is the dataset updated? |
category | chr | Category of the dataset |
tags | list<chr> | tags assigned to the dataset |
json | list<od_json> | Full json metadata |
The type datetime
refers to the POSIXct
format as returned by Sys.time()
.
The last column "json"
contains the full json metadata as returned by
od_json()
.
Examples
catalogue <- od_catalogue()
catalogue
#> # A data frame: 2 × 13
#> title measures fields modified created
#> <chr> <int> <int> <dttm> <dttm>
#> 1 Krebsstatis… 1 4 2024-01-25 16:03:34 2019-08-08 11:09:49
#> 2 Verdienstst… 5 4 2022-03-24 11:29:48 2017-08-02 20:00:00
#> # ℹ 8 more variables: id <ogd_id>, database <chr>, title_en <chr>,
#> # notes <chr>, update_frequency <chr>, tags <I<list>>,
#> # categorization <chr>, json <I<list>>
table(catalogue$update_frequency)
#>
#> jährlich nicht geplant
#> 1 1
table(catalogue$categorization)
#>
#> Arbeit Gesundheit
#> 1 1
catalogue[catalogue$categorization == "Gesundheit", 1:4]
#> # A data frame: 1 × 4
#> title measures fields modified
#> * <chr> <int> <int> <dttm>
#> 1 Krebsstatistik 1 4 2024-01-25 16:03:34
catalogue[catalogue$measures >= 70, 1:3]
#> # A data frame: 0 × 3
#> # ℹ 3 variables: title <chr>, measures <int>, fields <int>
catalogue$json[[1]]
#> Krebsstatistik
#>
#> Krebsstatistik nach Krebslokalisation (ICD10), Geschlecht und
#> Wohnbundesland
#>
#> Measures: Anzahl der Datensätze F-KRE
#> Fields: Tumore ICD/10 3-Steller, Berichtsjahr, Bundesland, Geschlecht
#> Updated: 2024-01-25 16:03:34
#> Tags: Krebsstatistik, Krebslokalisation-ICD10, Geschlecht,
#> Wohnbundesland
#> Categories: Gesundheit
head(catalogue$database)
#> [1] "dekrebs_ext" "deveste309"