Skip to contents

EXPERIMENTAL This function parses several json metadata files at once and combines them into a data.frame so the datasets can easily be filtered based on categorizations, tags, number of classifications, etc.

Usage

od_catalogue(server = "ext", local = TRUE)

Arguments

server

the OGD-server to be used. "ext" (the default) for the external server or prod for the production server

local

If TRUE (the default), the catalogue is created based on cached json metadata. Otherwise, the cache is updated prior to creating the catalogue using a "bulk-download" for metadata files.

Details

The naming, ordering and choice of the columns is likely to change. Currently, the following columns are provided.

ColumnTypeDescription
titlechrTitle of the dataset
measuresintNumber of measure variables
fieldsintNumber of classification fields
modifieddatetimeTimestamp when the dataset was last modified
createddatetimeTimestamp when the dataset was created
databasechrID of the corresponding STATcube database
title_enchrEnglish title
noteschrDescription for the dataset
frequencychrHow often is the dataset updated?
categorychrCategory of the dataset
tagslist<chr>tags assigned to the dataset
jsonlist<od_json>Full json metadata

The type datetime refers to the POSIXct format as returned by Sys.time(). The last column "json" containes the full json metadata as returned by od_json().

Examples

catalogue <- od_catalogue()
catalogue
#> # A data frame: 2 × 13
#>   title  measures fields modified            created             id   
#>   <chr>     <int>  <int> <dttm>              <dttm>              <chr>
#> 1 Krebs…        1      4 2024-01-25 16:03:34 2019-08-08 11:09:49 OGD_…
#> 2 Verdi…        5      4 2022-03-24 11:29:48 2017-08-02 20:00:00 OGD_…
#> # ℹ 7 more variables: database <chr>, title_en <chr>, notes <chr>,
#> #   update_frequency <chr>, tags <I<list>>, categorization <chr>,
#> #   json <I<list>>
catalogue$update_frequency %>% table()
#> .
#>      jährlich nicht geplant 
#>             1             1 
catalogue$categorization %>% table()
#> .
#>     Arbeit Gesundheit 
#>          1          1 
catalogue[catalogue$categorization == "Gesundheit", 1:4]
#> # A data frame: 1 × 4
#>   title          measures fields modified           
#> * <chr>             <int>  <int> <dttm>             
#> 1 Krebsstatistik        1      4 2024-01-25 16:03:34
catalogue[catalogue$measures >= 70, 1:3]
#> # A data frame: 0 × 3
#> # ℹ 3 variables: title <chr>, measures <int>, fields <int>
catalogue$json[[1]]
#> Krebsstatistik
#> 
#> Krebsstatistik nach Krebslokalisation (ICD10), Geschlecht und
#> Wohnbundesland
#> 
#> Measures: Anzahl der Datensätze F-KRE
#> Fields: Tumore ICD/10 3-Steller, Berichtsjahr, Bundesland, Geschlecht
#> Updated: 2024-01-25 16:03:34
#> Tags: Krebsstatistik, Krebslokalisation-ICD10, Geschlecht,
#>   Wohnbundesland
#> Categories: Gesundheit
catalogue$database %>% head()
#> [1] "dekrebs_ext" "deveste309"