This sotkanet R package provides access to data from the Sotkanet portal. Your contributions and bug reports and other feedback are welcome.
The Sotkanet portal provides over 2000 demographic indicators across Finland and Europe. It is maintained by the National Institute for Health and Welfare (THL). For more information, see Information about Sotkanet and API description.
The sotkanet
R package enables access to the Sotkanet
API using R facilitating the use of the data from the API. This package
is part of rOpenGov.
To install latest release version from CRAN, use:
To install development version from GitHub, use:
Test the installation by loading the package:
Load sotkanet and other packages used in the vignette.
List available Sotkanet indicators using
sotkanet_indicators()
:
# Using a preset list of indicators to avoid a large download
indicators <- sotkanet_indicators(id = c(4, 5, 6, 127, 10012, 10027),
type = "table", lang = "en")
kable(head(indicators))
List geographical regions with available indicators using
sotkanet_regions()
:
To download the data, we need to know the indicator for it. You can
look for the right indicator using aforementioned
sotkanet_indicators()
or by browsing the Sotkanet website. For
example, the indicator no. 10012 responds to the (EU) GPD per capita in
Purchasing Power Standards (PPS) dataset. The data can be downloaded
with get_sotkanet()
function. If we want, for example, the
GPD data from Finland for 2000-2010, the function call is:
# Get the indicator data
dat <- get_sotkanet(indicators = 10012, years = 2000:2010,
genders = c("total"), lang = "en", regions = "Finland")
# The first few lines of the data
kable(head(dat)) %>%
kable_styling() %>%
scroll_box(width = "100%")
The data can also be downloaded by using interactive function
sotkanet_interactive()
. It gives user interactive
alternative for downloading the dataset. This function can also print
dataset citation, code for the get_sotkanet()
call and
fixity checksum.
Dataset citation can be printed for any indicator using the function
sotkanet_cite()
. The citation for the GPD data is:
Let’s now demonstrate the use of the package with two examples. For the first example we will use the GPD data from Nordic countries (Pohjoismaat) for 2000-2010 and draw a time series of the data comparing the countries.
# Get indicator data
dat <- get_sotkanet(indicators = 10012, years = 2000:2010,
genders = "total", lang = "en", region.category = "POHJOISMAAT")
indicator_name <- as.character(unique(dat$indicator.title))
indicator_source <- as.character(unique(dat$indicator.organization.title))
# Retrive metadata
dat_meta <- sotkanet_indicator_metadata(id = 10012)
# Visualize
library(ggplot2)
p <- ggplot(dat, aes(x = year, y = primary.value,
group = region.title, color = region.title)) +
geom_line() + ggtitle(paste0(indicator_name, " \n", indicator_source)) +
labs(x = "Year", y = "Value",caption = paste0(
"Data source: https://sotkanet.fi/sotkanet", "\n", "Data date: ", dat_meta$`data-updated`)) +
scale_x_continuous(breaks = seq(2000,2010, by = 1)) +
theme(title = element_text(size = 10)) +
theme(axis.title.x = element_text(size = 15)) +
theme(axis.title.y = element_text(size = 15)) +
theme(legend.title = element_text(size = 15))
print(p)
For the second example we will plot the population of Finnish municipalities against a measure of educational level.
# Get the data for the two indicators
dat <- get_sotkanet(indicators = c(127, 180),
years = 2022, lang = "en",
genders = c("total"), region.category = c("KUNTA"))
# Pick the fields of interest and remove duplicates
datf <- dat[,c("region.title", "indicator.title", "primary.value")]
datf <- datf[!duplicated(datf),]
dw <- reshape(datf, idvar = "region.title",
timevar = "indicator.title", direction = "wide")
names(dw) <- c("Municipality", "Population", "Education_level")
# Vizualise
p <- ggplot(dw, aes(x = log(Population), y = Education_level)) + geom_point(size = 3) +
ggtitle("Education level vs. population size") +
theme(title = element_text(size = 10)) +
labs(y = "Education level", caption = "Data source: https://sotkanet.fi/sotkanet") +
theme(axis.title.x = element_text(size = 15)) +
theme(axis.title.y = element_text(size = 15)) +
theme(legend.title = element_text(size = 15))
plot(p)
Cite Sotkanet and link to https://sotkanet.fi/sotkanet/fi/index. Also mention indicator provider.
Central points:
This work can be freely used, modified and distributed under the Two-clause BSD license.
citation("sotkanet")
#> Kindly cite the sotkanet R package as follows:
#>
#> Leo Lahti, Einari Happonen, Juuso Parkkinen, Joona Lehtomaki, Vesa
#> Saaristo, Aleksi Lahtinen and Pyry Kantanen (rOpenGov 2024).
#> sotkanet: Sotkanet Open Data Access and Analysis. R package version
#> 0.10.1 https://github.com/rOpenGov/sotkanet
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Misc{,
#> title = {sotkanet: Sotkanet Open Data Access and Analysis},
#> author = {Leo Lahti and Einari Happonen and Joona Lehtomäki and Juuso Parkkinen and Joona Lehtomaki and Vesa Saaristo and Pyry Kantanen and Aleksi Lahtinen},
#> url = {https://github.com/rOpenGov/sotkanet},
#> year = {2024},
#> note = {R package version 0.10.1},
#> }
#>
#> Many thanks for all contributors!
You can check the package GitHub page for known issues. You can can also use it to report new bugs and to make suggestions for improving the package.