This article desribes the phase of development at 2020/10. It describes several perspectives of the THL open data API and the openthl package.
For a description of the API, see THL open data API docs.
The API can return the following type of data
The main API interaction is coded in the file api.R
. The
function getFromAPI()
is a general purpose data retrieval
function. It takes as parameter the URL and the ‘type’ of the data,
which is either ‘meta’ or ‘data’. Depending on the type, the parsing of
the response is handled by either jsonlite (meta) or rjstat (data)
packages. String manipulation is utilised to handle the JSONP case after
which jsonlite is used.
The general idea is that this main retrieval function is called by more specialised functions, which further parse the content into useful formats (eg. data.frames, specialised S3 classes).
url <- openthl:::api_data_url(path = "epirapo")
reslist <- openthl:::getFromAPI(url, type = "meta")
reslist
#> $link
#> $link$item
#> href
#> 1 https://sampo.thl.fi/pivot/prod/en/epirapo/covid19care/fact_epirapo_covid19care.json
#> 2 https://sampo.thl.fi/pivot/prod/fi/epirapo/covid19care/fact_epirapo_covid19care.json
#> 3 https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19care/fact_epirapo_covid19care.json
#> 4 https://sampo.thl.fi/pivot/prod/en/epirapo/covid19case/fact_epirapo_covid19case.json
#> 5 https://sampo.thl.fi/pivot/prod/fi/epirapo/covid19case/fact_epirapo_covid19case.json
#> 6 https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19case/fact_epirapo_covid19case.json
#> 7 https://sampo.thl.fi/pivot/prod/fi/epirapo/covid19inci/fact_epirapo_covid19inci.json
#> 8 https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19inci/fact_epirapo_covid19inci.json
#> 9 https://sampo.thl.fi/pivot/prod/en/epirapo/covid19inci/fact_epirapo_covid19inci.json
#> 10 https://sampo.thl.fi/pivot/prod/en/epirapo/omaolosymp/fact_epirapo_omaolosymp.json
#> 11 https://sampo.thl.fi/pivot/prod/fi/epirapo/omaolosymp/fact_epirapo_omaolosymp.json
#> 12 https://sampo.thl.fi/pivot/prod/sv/epirapo/omaolosymp/fact_epirapo_omaolosymp.json
#> 13 https://sampo.thl.fi/pivot/prod/en/epirapo/respinfcare/fact_epirapo_respinfcare.json
#> 14 https://sampo.thl.fi/pivot/prod/sv/epirapo/respinfcare/fact_epirapo_respinfcare.json
#> 15 https://sampo.thl.fi/pivot/prod/fi/epirapo/respinfcare/fact_epirapo_respinfcare.json
#> label class
#> 1 COVID-19 patients in hospital dataset
#> 2 Sairaalahoidossa olevat COVID-19-potilaat dataset
#> 3 COVID-19-patienter som vårdas på sjukhus dataset
#> 4 COVID-19 cases in the infectious diseases registry dataset
#> 5 Tartuntatautirekisterin COVID-19-tapaukset dataset
#> 6 Antal fall av COVID-19 i registret över smittsamma sjukdomar dataset
#> 7 Koronarokotusten vaikuttavuus Suomessa dataset
#> 8 Coronavaccinationernas effektivitet i Finland dataset
#> 9 Effectiveness of coronavirus vaccinations in Finland dataset
#> 10 Self-reported COVID-19 symptoms dataset
#> 11 Itseilmoitetut koronaoireet dataset
#> 12 Självrapporterade COVID-19 symtom dataset
#> 13 Inpatient care of acute respiratory tract infections in Finland dataset
#> 14 Sjukhusvård av akuta luftvägsinfektioner i Finland dataset
#> 15 Akuuttien hengitystieinfektioiden sairaalahoito Suomessa dataset
#>
#>
#> $label
#> [1] "epirapo"
#>
#> $version
#> [1] "2.0"
#>
#> $class
#> [1] "collection"
#>
#> $updated
#> [1] "2025-01-30"
#>
#> attr(,"class")
#> [1] "api_response"
#> attr(,"url")
#> [1] "https://sampo.thl.fi/pivot/prod/api/epirapo.json"
#> attr(,"api-url")
#> [1] "https://sampo.thl.fi/pivot/prod/api"
#> attr(,"class")
#> [1] "character" "api-data-url"
#> attr(,"status")
#> [1] 200
There are two public base URLs for the API: beta and prod.
openthl:::url_base(type = "prod") # default
#> [1] "https://sampo.thl.fi/pivot/prod/"
openthl:::url_base("beta")
#> [1] "https://sampo.thl.fi/pivot/beta/"
The URL to the API is given by
The function api_data_url()
builds URLs which can be
queried by getFromAPI()
. The result is a character vector
which has as an attribute the main API URL. It also has the S3 class
“api-data-url”.
The API can return csv or json type data, but I think it would make sense for the R package to only interact with JSON.
openthl:::api_data_url("epirapo", format = "json")
#> [1] "https://sampo.thl.fi/pivot/prod/api/epirapo.json"
#> attr(,"api-url")
#> [1] "https://sampo.thl.fi/pivot/prod/api"
#> attr(,"class")
#> [1] "character" "api-data-url"
Relevant source codes:
Development thoughts:
The API terminology includes the following hierarchy
The subject is a bit like a schema. A single subject can include multiple hydras. A single hydra can include multiple cubes. A cube is a dataset with labels according to a single language. I believe that there are always as many cubes in the hydra as there are translations to the dataset.
thlSubject()
lists all cubes belonging to a subject
subject <- "epirapo"
x <- thlSubject(subject)
x
#> # A tibble: 15 × 3
#> href label class
#> * <chr> <chr> <chr>
#> 1 https://sampo.thl.fi/pivot/prod/en/epirapo/covid19care/fact_epir… COVI… data…
#> 2 https://sampo.thl.fi/pivot/prod/fi/epirapo/covid19care/fact_epir… Sair… data…
#> 3 https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19care/fact_epir… COVI… data…
#> 4 https://sampo.thl.fi/pivot/prod/en/epirapo/covid19case/fact_epir… COVI… data…
#> 5 https://sampo.thl.fi/pivot/prod/fi/epirapo/covid19case/fact_epir… Tart… data…
#> 6 https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19case/fact_epir… Anta… data…
#> 7 https://sampo.thl.fi/pivot/prod/fi/epirapo/covid19inci/fact_epir… Koro… data…
#> 8 https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19inci/fact_epir… Coro… data…
#> 9 https://sampo.thl.fi/pivot/prod/en/epirapo/covid19inci/fact_epir… Effe… data…
#> 10 https://sampo.thl.fi/pivot/prod/en/epirapo/omaolosymp/fact_epira… Self… data…
#> 11 https://sampo.thl.fi/pivot/prod/fi/epirapo/omaolosymp/fact_epira… Itse… data…
#> 12 https://sampo.thl.fi/pivot/prod/sv/epirapo/omaolosymp/fact_epira… Själ… data…
#> 13 https://sampo.thl.fi/pivot/prod/en/epirapo/respinfcare/fact_epir… Inpa… data…
#> 14 https://sampo.thl.fi/pivot/prod/sv/epirapo/respinfcare/fact_epir… Sjuk… data…
#> 15 https://sampo.thl.fi/pivot/prod/fi/epirapo/respinfcare/fact_epir… Akuu… data…
thlDatasets()
takes the output of
thlSubject()
and presents the same information, but parses
the hrefs into columns. There is probably no good reason why
thlDatasets()
could not simply accept the subject character
name as input, but currently it only accepts the object returned by
thlSubject()
.
thlDatasets(x)
#> base_url lang subject hydra
#> 1 https://sampo.thl.fi/pivot/prod/ en epirapo covid19care
#> 2 https://sampo.thl.fi/pivot/prod/ fi epirapo covid19care
#> 3 https://sampo.thl.fi/pivot/prod/ sv epirapo covid19care
#> 4 https://sampo.thl.fi/pivot/prod/ en epirapo covid19case
#> 5 https://sampo.thl.fi/pivot/prod/ fi epirapo covid19case
#> 6 https://sampo.thl.fi/pivot/prod/ sv epirapo covid19case
#> 7 https://sampo.thl.fi/pivot/prod/ fi epirapo covid19inci
#> 8 https://sampo.thl.fi/pivot/prod/ sv epirapo covid19inci
#> 9 https://sampo.thl.fi/pivot/prod/ en epirapo covid19inci
#> 10 https://sampo.thl.fi/pivot/prod/ en epirapo omaolosymp
#> 11 https://sampo.thl.fi/pivot/prod/ fi epirapo omaolosymp
#> 12 https://sampo.thl.fi/pivot/prod/ sv epirapo omaolosymp
#> 13 https://sampo.thl.fi/pivot/prod/ en epirapo respinfcare
#> 14 https://sampo.thl.fi/pivot/prod/ sv epirapo respinfcare
#> 15 https://sampo.thl.fi/pivot/prod/ fi epirapo respinfcare
#> cube
#> 1 epirapo_covid19care
#> 2 epirapo_covid19care
#> 3 epirapo_covid19care
#> 4 epirapo_covid19case
#> 5 epirapo_covid19case
#> 6 epirapo_covid19case
#> 7 epirapo_covid19inci
#> 8 epirapo_covid19inci
#> 9 epirapo_covid19inci
#> 10 epirapo_omaolosymp
#> 11 epirapo_omaolosymp
#> 12 epirapo_omaolosymp
#> 13 epirapo_respinfcare
#> 14 epirapo_respinfcare
#> 15 epirapo_respinfcare
thlHydra()
takes a subject name and a hydra name and
returns hrefs to the cubes (datasets) in that hydra. For example, the
hydra covid19case is translated into fi, en and sv, so it has 3
cubes.
thlHydra(subject, hydra = "covid19case")
#> # A tibble: 3 × 3
#> href label class
#> * <chr> <chr> <chr>
#> 1 https://sampo.thl.fi/pivot/prod/en/epirapo/covid19case/fact_epira… COVI… data…
#> 2 https://sampo.thl.fi/pivot/prod/fi/epirapo/covid19case/fact_epira… Tart… data…
#> 3 https://sampo.thl.fi/pivot/prod/sv/epirapo/covid19case/fact_epira… Anta… data…
Development thoughts:
Main functionality source file location: retrieve.R
A single dimension in the cube is a hierarchical structure with multiple stages. An example is Area with stages hospital district (stage 1) and municipality (stage 2).
the thlCube function now returns an object which includes
The hierarchical dimension information is presented as a wide format data frame. The prefix in the column names indicates the stage. There is a single row per a unique label in the highest hierarchy level. In the example below the highest hierarchy level is municipality, so there are 311 rows (number of municipalities) in the dimension data.frame.
str(cube$dimensions[[1]]) # first dimension
#> Classes 'hydra_dimension_df' and 'data.frame': 310 obs. of 22 variables:
#> $ stage1_id : chr "169722" "169722" "169722" "169722" ...
#> $ stage1_sid : int 292857 292857 292857 292857 292857 292857 292857 292857 292857 292857 ...
#> $ stage1_label : chr "Etelä-Suomen AVI" "Etelä-Suomen AVI" "Etelä-Suomen AVI" "Etelä-Suomen AVI" ...
#> $ stage1_stage : chr "avi" "avi" "avi" "avi" ...
#> $ stage1_code : chr "1" "1" "1" "1" ...
#> $ stage1_sort : int 375719 375719 375719 375719 375719 375719 375719 375719 375719 375719 ...
#> $ stage1_uri : chr "http://meta.thl.fi/codes/wild/dimension/area/avi/1" "http://meta.thl.fi/codes/wild/dimension/area/avi/1" "http://meta.thl.fi/codes/wild/dimension/area/avi/1" "http://meta.thl.fi/codes/wild/dimension/area/avi/1" ...
#> $ stage1_parent_id: chr "400" "400" "400" "400" ...
#> $ stage2_id : chr "169729" "169729" "169729" "169729" ...
#> $ stage2_sid : int 210938 210938 210938 210938 210938 210938 210938 210938 210938 210938 ...
#> $ stage2_label : chr "Uusimaa" "Uusimaa" "Uusimaa" "Uusimaa" ...
#> $ stage2_stage : chr "region" "region" "region" "region" ...
#> $ stage2_code : chr "01" "01" "01" "01" ...
#> $ stage2_sort : int 375662 375662 375662 375662 375662 375662 375662 375662 375662 375662 ...
#> $ stage2_uri : chr "http://meta.thl.fi/codes/wild/dimension/area/region/01" "http://meta.thl.fi/codes/wild/dimension/area/region/01" "http://meta.thl.fi/codes/wild/dimension/area/region/01" "http://meta.thl.fi/codes/wild/dimension/area/region/01" ...
#> $ stage3_id : chr "169752" "169759" "169774" "169780" ...
#> $ stage3_sid : int 301842 301863 302004 175740 301961 301768 302003 301995 301759 301971 ...
#> $ stage3_label : chr "Askola" "Espoo" "Hanko" "Helsinki" ...
#> $ stage3_stage : chr "municipality" "municipality" "municipality" "municipality" ...
#> $ stage3_code : chr "018" "049" "078" "091" ...
#> $ stage3_sort : int 374731 374738 374753 374760 374767 374778 374793 374804 374810 374818 ...
#> $ stage3_uri : chr "http://meta.thl.fi/codes/wild/dimension/area/municipality/018" "http://meta.thl.fi/codes/wild/dimension/area/municipality/049" "http://meta.thl.fi/codes/wild/dimension/area/municipality/078" "http://meta.thl.fi/codes/wild/dimension/area/municipality/091" ...
#> - attr(*, "nstage")= num 3
#> - attr(*, "root")='data.frame': 1 obs. of 9 variables:
#> ..$ id : chr "400"
#> ..$ sid : int 2079
#> ..$ label : chr "AVI yhteensä"
#> ..$ stage : chr "root"
#> ..$ code : chr "Avi_yht"
#> ..$ sort : int 1
#> ..$ uri : chr "http://meta.thl.fi/codes/wild/dimension/area/root"
#> ..$ properties:'data.frame': 1 obs. of 1 variable:
#> .. ..$ is: chr "area/root"
#> ..$ parent_id : chr "area"
The function get_dimensions
queries the API for
dimension information.
url <- "https://sampo.thl.fi/pivot/prod/en/epirapo/covid19case/fact_epirapo_covid19case.json"
dimensions <- openthl:::get_dimensions(url) # list
The function parse_dimensions()
parses all dimensions as
a list of data frames (each with S3 class ‘hydra_dimension_df’)
# parse all dimensions as a list of data frames (each with S3 class 'hydra_dimension_df')
DF <- openthl:::parse_dimensions(dimensions)
names(DF)
#> [1] "hcdmunicipality2020" "dateweek20200101" "ttr10yage"
#> [4] "sex" "measure"
parse_dimensions()
uses getHierarchy()
,
which parses a single dimension as a data frame.
# parse a single dimension as a data frame
df <- openthl:::getHierarchy(dimensions$children[[1]], parent_id = dimensions$id[[1]])
str(df)
#> 'data.frame': 310 obs. of 15 variables:
#> $ stage1_id : chr "hcdmunicipality20202" "hcdmunicipality20202" "hcdmunicipality20202" "hcdmunicipality20202" ...
#> $ stage1_sid : int 445131 445131 445131 445131 445131 445131 445131 445131 445131 445131 ...
#> $ stage1_label : chr "Åland" "Åland" "Åland" "Åland" ...
#> $ stage1_stage : chr "hcd" "hcd" "hcd" "hcd" ...
#> $ stage1_code : chr "hcdmunicipality20202" "hcdmunicipality20202" "hcdmunicipality20202" "hcdmunicipality20202" ...
#> $ stage1_sort : int 2 2 2 2 2 2 2 2 2 2 ...
#> $ stage1_uri : chr "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/hcd/2" "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/hcd/2" "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/hcd/2" "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/hcd/2" ...
#> $ stage1_parent_id: chr "hcdmunicipality20201" "hcdmunicipality20201" "hcdmunicipality20201" "hcdmunicipality20201" ...
#> $ stage2_id : chr "hcdmunicipality202023" "hcdmunicipality202024" "hcdmunicipality202025" "hcdmunicipality202026" ...
#> $ stage2_sid : int 445268 444988 445090 445231 445227 445124 445080 445266 445153 445091 ...
#> $ stage2_label : chr "Brändö" "Eckerö" "Finström" "Föglö" ...
#> $ stage2_stage : chr "municipality" "municipality" "municipality" "municipality" ...
#> $ stage2_code : chr "hcdmunicipality202023" "hcdmunicipality202024" "hcdmunicipality202025" "hcdmunicipality202026" ...
#> $ stage2_sort : int 23 24 25 26 27 28 29 30 31 32 ...
#> $ stage2_uri : chr "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/municipality/23" "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/municipality/24" "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/municipality/25" "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/municipality/26" ...
#> - attr(*, "nstage")= num 2
#> - attr(*, "root")='data.frame': 1 obs. of 8 variables:
#> ..$ id : chr "hcdmunicipality20201"
#> ..$ sid : int 445222
#> ..$ label : chr "All areas"
#> ..$ stage : chr "root"
#> ..$ code : chr "hcdmunicipality20201"
#> ..$ sort : int 1
#> ..$ uri : chr "http://meta.thl.fi/codes/wild/dimension/hcdmunicipality2020/1/root/1"
#> ..$ parent_id: chr "hcdmunicipality2020"
Data retriaval and parsing is largely unimplemented. The general data retriaval function with type ‘data’ should be tested and parsing implemented.
Methods could be written which utilise the object returned by
openthl::thlCube
, which contains the dimension meta
informantion, to build queries which
For example the following methods could be implemented:
Some choices need to be made regarding how the user refers to the
dimensions and measures, i.e. whether to use ID’s or labels. It may make
sense to use ID’s when select():ing. A stage could be referred to by for
example