Resource management for open.data — od

Helper functions for caching and parsing open.data resources.

Usage

od_cache_dir(dir = NULL)

od_cache_clear(id, server = "ext")

od_cache_file(id, suffix = NULL, timestamp = NULL, ..., server = "ext")

od_resource(id, suffix = NULL, timestamp = NULL, server = "ext")

od_json(id, timestamp = Sys.time() - 3600, server = "ext")

od_resource_all(id, json = od_json(id), server = "ext")

Arguments

dir: If NULL, the cache directory is returned. Otherwise, the cache directory will be updated to dir.
id: A database id
server: the OGD-Server to use to load update the resources in case they are outdated. "ext" for the external server (the default) od "red" for the editing server.
suffix: A suffix for the resource: "HEADER" or a field code.
timestamp: A timestamp in POSIXct format. If provided, the cached resource will be updated if it is older than that value. Otherwise it will be downloaded only if it does not exist in the cache.
...: For internal use
json: The JSON file belonging to the dataset

Value

For od_cache_file() and od_resource(), the returned objects contain a hidden attribute attr(., "od") about the time used for downloading and parsing the resource. od_resource_all() converts these hidden attribute into columns.

Details

od_cache_clear(id) removes all files belonging to the specified id.

By default, downloaded json files will "expire" in one hour or 3600 seconds. That is, if a json is requested, it will be reused from the cache unless the file.mtime() is more than one hour behind Sys.time().

Examples

# get the current cache directory
od_cache_dir()
#> [1] "~/.cache/STATcubeR/open_data/"

# Get paths to cached files
od_cache_file("OGD_veste309_Veste309_1")
#> [1] ~/.cache/STATcubeR/open_data/OGD_veste309_Veste309_1.csv
od_cache_file("OGD_veste309_Veste309_1", "C-A11-0")
#> [1] ~/.cache/STATcubeR/open_data/OGD_veste309_Veste309_1_C-A11-0.csv

# get a parsed verison of the resource
od_resource("OGD_veste309_Veste309_1", "C-A11-0")
#> # A data frame: 3 × 7
#>   code  label label_de  label_en  parent de_desc en_desc
#> * <chr> <chr> <chr>     <chr>     <fct>  <lgl>   <lgl>  
#> 1 A11-1 NA    insgesamt Sum total NA     NA      NA     
#> 2 A11-2 NA    männlich  Male      NA     NA      NA     
#> 3 A11-3 NA    weiblich  Female    NA     NA      NA     

# get json metadata about a dataset
od_json('OGD_veste309_Veste309_1')
#> Verdienststrukturerhebung 2018 Bruttostundenverdienste in EUR
#> nach Staatsangehörigkeit, Bundesland und
#> Beschäftigungsverhältnis
#> 
#> Verdienststruktur nach Geschlecht, Staatsangehörigkeit,
#> Bundesland und Beschäftigungsverhältnis
#> 
#> Measures: Arithmetisches Mittel, 1. Quartil, 2. Quartil (Median), 3.
#>   Quartil, Zahl d unselbst Beschäftigten
#> Fields: Geschlecht, Staatsangehörigkeit, Bundesland (NUTS 2), Form
#>   des Beschäftigungsverhältnisses
#> Updated: 2022-03-24 11:29:48
#> Tags: Staatsangehörigkeit, Bundesland, Beschäftigungsverhältnis
#> Categories: Arbeit, Bevölkerung

# Bundle all resources
od_resource_all("OGD_veste309_Veste309_1")
#> # A data frame: 6 × 7
#>   name                     last_modi…¹ cached      size downl…² parsed
#>   <chr>                    <dttm>      <dttm>     <dbl>   <dbl>  <dbl>
#> 1 meta.json                2022-03-24  2022-03-24  4931      NA 13.9  
#> 2 data.csv                 2022-03-24  2022-03-24   516      NA  0.464
#> 3 OGD_veste309_Veste309_1… 2022-03-24  2022-03-24   159      NA  0.400
#> 4 OGD_veste309_Veste309_1… 2022-03-24  2022-03-24   697      NA  0.409
#> 5 OGD_veste309_Veste309_1… 2022-03-24  2022-03-24   518      NA  0.413
#> 6 OGD_veste309_Veste309_1… 2022-03-24  2022-03-24   641      NA  0.615
#> # … with 1 more variable: data <I<list>>, and abbreviated variable
#> #   names ¹last_modified, ²download