--- title: "Helsinki open data R tools" author: "Juuso Parkkinen, Pyry Kantanen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Helsinki open data R tools} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include = FALSE} NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.height = 7, fig.width = 7, dpi = 75, purl = NOT_CRAN, eval = NOT_CRAN ) ``` helsinki - tutorial =========== helsinki R package provides tools to access open data from the Helsinki region in Finland. For contact information, source code and bug reports, see the project's [GitHub page](https://github.com/rOpenGov/helsinki). For other similar packages and related blog posts, see the [rOpenGov](https://ropengov.org) project website. ## Installation Release version for most users: ```{r install_default, eval=FALSE} install.packages("helsinki") ``` Development version for developers and other interested parties: ```{r install_remotes, eval=FALSE} library(remotes) remotes::install_github("ropengov/helsinki") ``` Load the package. ```{r load, message=FALSE, warning=FALSE, results='hide'} library(helsinki) ``` ## API Access The package has basic functions for interacting with WFS APIs, courtesy of [FMI2-package](https://ropengov.github.io/fmi2/): `wfs_api()` for returning "wfs_api" and `to_sf()` for turning these objects into sf-objects. All available features of a given API can be easily listed with the `get_feature_list()` function. The API functions can, however, be used with a wide variety of different `base.url` parameters. ```{r wfs1} input_url <- "https://kartta.hsy.fi/geoserver/wfs" hsy_features <- get_feature_list(base.url = input_url) # Select only features which are related to water utilities and services hsy_vesihuolto <- hsy_features[which(hsy_features$Namespace == "vesihuolto"), ] hsy_vesihuolto # We select our feature of interest from this list: Location of waterposts feature_of_interest <- "vesihuolto:VH_Vesipostit_HSY" ``` When the wanted feature and its Name (in other words: Namespace:Title combination) is known, it can be downloaded with `get_feature()` by providing the correct `base.url` and the Name as the `typename` parameter. ```{r wfs2} input_url <- "https://kartta.hsy.fi/geoserver/wfs" feature_of_interest <- "vesihuolto:VH_Vesipostit_HSY" # downloading a feature waterposts <- get_feature(base.url = input_url, typename = feature_of_interest) # Visualizing the location of waterposts if (exists("waterposts")) { if (!is.null(waterposts)) { plot(waterposts$geom) } } ``` Dots on a blank canvas do not make much sense and therefore helsinki-package has `get_city_map()` function for downloading city district boundaries. An example of this is provided in the [Helsinki region district maps](#helsinki-region-district-maps) section of this vignette. Helsinki-package provides an easy-to-use menu-driven `select_feature()` function that effectively combines `get_feature_list()` and `get_feature()`. At default it only returns the Name of the wanted function, but if `get` parameter is set to TRUE, it returns an sf_object which can be easily visualized. ```{r wfs3, eval = FALSE} input_url <- "https://kartta.hsy.fi/geoserver/wfs" # Interactive example with select_feature selected_feature <- select_feature(base.url = input_url) feature <- get_feature(base.url = input_url, typename = selected_feature) # Skipping a redundant step with parameter get = TRUE feature <- select_feature(base.url = input_url, get = TRUE) ``` ### Helsinki Region Environmental Services HSY open data The above example shows a general use case which can easily be applied to Helsinki Region Environmental Services (HSY) WFS API as well as other service providers' APIs. For legacy reasons, helsinki-package has also some specialized functions that aim to make downloading often used data as easy as possible. Specifically, there are two new functions that replace deprecated functionalities from `get_hsy()` function: `get_vaestotietoruudukko()` (population grid) and `get_rakennustietoruudukko()` (building information grid). ```{r hsy_examples} library(ggplot2) pop_grid <- get_vaestotietoruudukko(year = 2018) building_grid <- get_rakennustietoruudukko(year = 2020) # Logarithmic scales to make the visualizations more discernible if (!all(is.null(pop_grid), is.null(building_grid))) { ggplot(pop_grid) + geom_sf(aes(colour = log(asukkaita), fill = log(asukkaita))) ggplot(building_grid) + geom_sf(aes(colour = log(kerala_yht), fill = log(kerala_yht))) } ``` With the previous version of the helsinki package, years 2015 to 2020 were supported. In 2022 a new year was added, 2011, demonstrating how the API may be updated more regularly than the package. The `get_feature_list()` function can be used to download datasets that are not baked into included functions. In addition to the datasets listed in the API getting updated, there are also legacy datasets that were never included in the API. We have added the functionality to download datasets from a wider selection of years, as zip files from a different file repository. These files may differ slightly from those downloaded via API and have different column names and larger grid squares and so on. ```{r hsy_examples2} library(ggplot2) pop_grid2 <- get_vaestotietoruudukko(year = 2011) building_grid2 <- get_rakennustietoruudukko(year = 2011) if (!all(is.null(pop_grid2), is.null(building_grid2))) { ggplot(pop_grid2) + geom_sf(aes(colour = log(ASUKKAITA), fill = log(ASUKKAITA))) ggplot(building_grid2) + geom_sf(aes(colour = log(ASVALJYYS), fill = log(ASVALJYYS))) } ``` While easy enough to build, specialized functions such as these are probably not something that power users want to rely on in their work flows. They also add more manual phases to package maintenance and therefore are probably not the direction we're heading in the future. If you feel differently about this and there is a dataset that gets a lot of use, feel free to drop us a suggestion in [GitHub](https://github.com/rOpenGov/helsinki/issues). ## Service and event information Function `get_servicemap()` retrieves regional service data from city of Helsinki [Service Map API](http://api.hel.fi/servicemap/v2/), that contains data from the [Service Map](https://palvelukartta.hel.fi/fi/). ```{r servicemap, message=FALSE, warning=FALSE} # Search for "puisto" (park) (specify q="query") search_puisto <- get_servicemap(query = "search", q = "puisto") # Study results: 47 variables in the data frame str(search_puisto, max.level = 1) ``` We can see that this search returns a large number of results, over 2000. The results are returned as pages, where each page has 20 results by default. By giving no additional search parameters, we get 20 results from the first page of search results. ```{r search_example1} # Get names for the first 20 results search_puisto$results$name.fi # See what kind of data is given for services names(search_puisto$results) ``` More results could be retrieved and viewed by giving additional `search` parameters. ```{r search_example2} search_puisto <- get_servicemap(query = "search", q = "puisto", page_size = 30, page = 2) str(search_puisto) search_puisto$results$name.fi ``` As we could see from above example, the returned data frame had 30 observations with 29 variables. At full width this output can be messy to handle in R console. One possible option would be to turn it into a more easily manageable tibble (which often is not a bad idea), another is to limit the extent of the query at the start. Function `get_linkedevents()` retrieves regional event data from the new [Linked Events API](http://api.hel.fi/linkedevents/v1/). ```{r linkedevents, message=FALSE, warning=FALSE} # Search for current events events <- get_linkedevents(query = "event") # Get names for the first 20 results events$data$name$fi # See what kind of data is given for events names(events$data) ``` ## Helsinki region district maps Helsinki region geographic data can be accessed from a WFS API by using the get_city_map() function. Data is available for all 4 cities in the capital region: Helsinki, Espoo, Vantaa and Kauniainen. Administrative divisions can be accessed on 3 distinct levels: "suuralue", "tilastoalue" and "pienalue". Literal, completely unofficial translations for these could be "grand district", "statistical area" and "(minor) district". The naming convention of these levels is sometimes confusing even in Finnish documents and different names can vary by city and time. The main takeaway is that "suuralue" is the highest-level division and "pienalue" is the most granular level of division. "Tilastoalue" is somewhere between these two. These are the names to be used even if the city of interest might not use them in their Finnish or English website. As promised earlier in [API Access](#api-access), the following example gives an idea on how to visualize waterpost locations (and, of course, other types of spatial data as well) on capital region map. ```{r mapping_example1, eval = FALSE} helsinki <- get_city_map(city = "helsinki", level = "suuralue") espoo <- get_city_map(city = "espoo", level = "suuralue") vantaa <- get_city_map(city = "vantaa", level = "suuralue") kauniainen <- get_city_map(city = "kauniainen", level = "suuralue") library(ggplot2) if (!all(is.null(helsinki), is.null(espoo), is.null(vantaa), is.null(kauniainen), is.null(waterposts))) { ggplot() + geom_sf(data = helsinki) + geom_sf(data = espoo) + geom_sf(data = vantaa) + geom_sf(data = kauniainen) + geom_sf(data = waterposts) } ``` In addition, it is possible to download "aanestysalue" (voting district) divisions for the city of Helsinki. Currently this data is not available for other cities and it must be accessed from other sources. ```{r mapping_example2, eval=FALSE} map <- get_city_map(city = "helsinki", level = "suuralue") voting_district <- get_city_map(city = "helsinki", level = "aanestysalue") ``` ```{r mapping_example2_plot, eval=FALSE} library(sf) plot(sf::st_geometry(map)) plot(sf::st_geometry(voting_district)) ``` For other cities than Helsinki voting districts are currently not available. ## Helsinki Region Infoshare statistics API Function `get_hri_stats()` retrieves data from the [Helsinki Region Infoshare statistics API](http://dev.hel.fi/stats/). Specify a dataset to retrieve. In this specific example we will download the first item on the stats_list object. The output is a three-dimensional array. ```{r hri_stats1, message=FALSE, warning=FALSE} # Retrieve list of available data stats_list <- get_hri_stats(query = "") # Show first results head(stats_list) if (!is.null(stats_list)) { # Retrieve a specific dataset stats_res <- get_hri_stats(query = stats_list[1]) # Show the structure of the results str(stats_res) } ``` ## Licensing and Citations ### Citing the data See `help()` to get citation information for each function and related data sources. If no such information is explicitly stated, see data provider's website for more information. ### Citing the R package ```{r citation, comment=NA} citation("helsinki") ``` ### Session info This vignette was created with ```{r sessioninfo, message=FALSE, warning=FALSE} sessionInfo() ```