Skip to contents

EXPERIMENTAL This function parses several json metadata files at once and combines them into a data.frame so the datasets can easily be filtered based on categorizations, tags, number of classifications, etc.

Usage

od_catalogue(server = "ext", local = TRUE)

Arguments

server

the OGD-server to be used. "ext" (the default) for the external server or prod for the production server

local

If TRUE (the default), the catalogue is created based on cached json metadata. Otherwise, the cache is updated prior to creating the catalogue using a "bulk-download" for metadata files.

Value

a data.frame with the following structure

ColumnTypeDescription
titlechrTitle of the dataset
measuresintNumber of measure variables
fieldsintNumber of classification fields
modifieddatetimeTimestamp when the dataset was last modified
createddatetimeTimestamp when the dataset was created
databasechrID of the corresponding STATcube database
title_enchrEnglish title
noteschrDescription for the dataset
frequencychrHow often is the dataset updated?
categorychrCategory of the dataset
tagslist<chr>tags assigned to the dataset
jsonlist<od_json>Full json metadata

The type datetime refers to the POSIXct format as returned by Sys.time(). The last column "json" contains the full json metadata as returned by od_json().

Details

The naming, ordering and choice of the columns is likely to change.

Examples

catalogue <- od_catalogue()
catalogue
#> # A data frame: 2 × 13
#>   title        measures fields modified            created            
#>   <chr>           <int>  <int> <dttm>              <dttm>             
#> 1 Krebsstatis…        1      4 2024-01-25 16:03:34 2019-08-08 11:09:49
#> 2 Verdienstst…        5      4 2022-03-24 11:29:48 2017-08-02 20:00:00
#> # ℹ 8 more variables: id <ogd_id>, database <chr>, title_en <chr>,
#> #   notes <chr>, update_frequency <chr>, tags <I<list>>,
#> #   categorization <chr>, json <I<list>>
table(catalogue$update_frequency)
#> 
#>      jährlich nicht geplant 
#>             1             1 
table(catalogue$categorization)
#> 
#>     Arbeit Gesundheit 
#>          1          1 
catalogue[catalogue$categorization == "Gesundheit", 1:4]
#> # A data frame: 1 × 4
#>   title          measures fields modified           
#> * <chr>             <int>  <int> <dttm>             
#> 1 Krebsstatistik        1      4 2024-01-25 16:03:34
catalogue[catalogue$measures >= 70, 1:3]
#> # A data frame: 0 × 3
#> # ℹ 3 variables: title <chr>, measures <int>, fields <int>
catalogue$json[[1]]
#> Krebsstatistik
#> 
#> Krebsstatistik nach Krebslokalisation (ICD10), Geschlecht und
#> Wohnbundesland
#> 
#> Measures: Anzahl der Datensätze F-KRE
#> Fields: Tumore ICD/10 3-Steller, Berichtsjahr, Bundesland, Geschlecht
#> Updated: 2024-01-25 16:03:34
#> Tags: Krebsstatistik, Krebslokalisation-ICD10, Geschlecht,
#>   Wohnbundesland
#> Categories: Gesundheit
head(catalogue$database)
#> [1] "dekrebs_ext" "deveste309"