--- title: "Joining attribute data with geofi data" author: "Markus Kainu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Joining attribute data with geofi data} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.height = 7, fig.width = 7, dpi = 75 ) ``` This vignettes provides few examples on how to join attribute data from common sources of attribute data. Here we are using data from [THL Sotkanet](https://sotkanet.fi/sotkanet/en/index) and [Paavo (Open data by postal code area)](https://pxdata.stat.fi/PXWeb/pxweb/en/Postinumeroalueittainen_avoin_tieto/). **Installation** `geofi` can be installed from CRAN using ```{r, eval = FALSE} # install from CRAN install.packages("geofi") # Install development version from GitHub remotes::install_github("ropengov/geofi") ``` ```{r include = FALSE, eval = TRUE} # Let's first create a function that checks if the suggested # packages are available check_namespaces <- function(pkgs){ return(all(unlist(sapply(pkgs, requireNamespace,quietly = TRUE)))) } apiacc <- geofi::check_api_access() pkginst <- check_namespaces(c("dplyr","tidyr","janitor","ggplot2")) apiacc_pkginst <- all(apiacc,pkginst) ``` ## Municipalities Municipality data provided by `get_municipalities()`-function contains 77 indicators variables from each of 309 municipalities. Variables can be used either for aggregating data or as keys for joining attribute data. ### Population data from Sotkanet In this first example we join municipality level indicators of *Swedish-speaking population at year end* from Sotkanet [population data](https://sotkanet.fi/sotkanet/en/haku?g=219), Dataset is provided as part of geofi package as `geofi::sotkadata_swedish_speaking_pop`. ```{r municipality_map, eval = apiacc_pkginst} library(geofi) muni <- get_municipalities(year = 2023) library(dplyr) sotkadata_swedish_speaking_pop <- geofi::sotkadata_swedish_speaking_pop ``` This is not obvious to all, but have the municipality names in Finnish among other regional breakdowns which allows us to combine the data with spatial data using `municipality_name_fi`-variable. ```{r, bind_data, eval = apiacc_pkginst} map_data <- right_join(muni, sotkadata_swedish_speaking_pop, by = c("municipality_code" = "municipality_code")) ``` Now we can plot a map showing `Share of Swedish-speakers of the population, %` and `Share of foreign citizens of the population, %` on two panels sharing a scale. ```{r plot1, fig.width = 10, fig.height = 7, eval = apiacc_pkginst} library(ggplot2) map_data %>% ggplot(aes(fill = primary.value)) + geom_sf() + labs(title = unique(sotkadata_swedish_speaking_pop$indicator.title.fi)) + theme(legend.position = "top") ``` ## Zipcode level You can download data from [Paavo (Open data by postal code area)](https://pxdata.stat.fi/PXWeb/pxweb/en/Postinumeroalueittainen_avoin_tieto/) using [`pxweb`](https://ropengov.github.io/pxweb/)-package. In this example we use dataset that can be downloaded preformatted in `csv` format directly from Statistics Finland. Population data is provided as part of geofi package as `geofi::statfi_zipcode_population`. ```{r zipcode_with_statistics_finland, eval = apiacc_pkginst} statfi_zipcode_population <- geofi::statfi_zipcode_population ``` Before we can join the data, we must extract the numerical postal code from `postal_code_area`-variable. ```{r get_zipcodes, eval = apiacc_pkginst} # Lets join with spatial data and plot the area of each zipcode zipcodes19 <- get_zipcodes(year = 2019) zipcodes_map <- left_join(zipcodes19, statfi_zipcode_population) ggplot(zipcodes_map) + geom_sf(aes(fill = X2022), color = alpha("white", 1/3)) + labs(title = "Total number of inhabitants, 2022", fill = NULL) ```