Title: | Tools for Eurostat Open Data |
---|---|
Description: | Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities. |
Authors: | Leo Lahti [aut, cre] , Janne Huovari [aut], Markus Kainu [aut], Przemyslaw Biecek [aut], Daniel Antal [ctb], Diego Hernangomez [ctb] , Joona Lehtomaki [ctb], Francois Briatte [ctb], Reto Stauffer [ctb], Paul Rougieux [ctb], Anna Vasylytsya [ctb], Oliver Reiter [ctb], Pyry Kantanen [ctb] , Enrico Spinielli [ctb] |
Maintainer: | Leo Lahti <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 4.0.0 |
Built: | 2024-08-13 13:01:20 UTC |
Source: | https://github.com/rOpenGov/eurostat |
Tools to download data from the Eurostat database https://ec.europa.eu/eurostat together with search and manipulation utilities.
Package | eurostat |
Type | Package |
Version | 4.0.0 |
Date | 2014-2023 |
License | BSD_2_clause + file LICENSE |
LazyLoad | yes |
Eurostat website: https://ec.europa.eu/eurostat Eurostat database: https://ec.europa.eu/eurostat/web/main/data/database
Information about the data update schedule from Eurostat: "Eurostat datasets are updated twice a day at 11:00 and 23:00 CET, if newer data is available or for structural changes, for example for the dimensions in the dataset.
The Eurostat database always contains the latest version of the datasets, meaning that there is no versioning or documentation of past versions of the data."
Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query
The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API
See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.
For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Data is downloaded from Eurostat API Statistics. See Eurostat documentation for more information about data queries in API Statistics https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query
This replaces the old JSON Web Services that was used by Eurostat before February 2023 and by the eurostat R package versions before 3.7.13. See Eurostat documentation about the migration from JSON web service to API Statistics for more information about the differences between the old and the new service: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+migrating+from+JSON+web+service+to+API+Statistics
For easily viewing which filtering options are available - in addition to the default ones, time and language - Eurostat Web services Query builder tool may be useful: https://ec.europa.eu/eurostat/web/query-builder
When using Eurostat API Statistics (JSON API), datasets can be filtered
before they are downloaded and saved in local memory. The general format
for filter parameters is <DIMENSION_CODE>=<VALUE>
.
Filter parameters are optional but the used dimension codes must be present
in the data product that is being queried. Dimension codes can
vary between different data products so it may be useful to examine new
datasets in Eurostat data browser beforehand. However, most if not all
Eurostat datasets concern European countries and contain information that
was gathered at some point in time, so geo
and time
dimension codes
can usually be used.
<DIMENSION_CODE>
and <VALUE>
are case-insensitive and they can be written
in lowercase or uppercase in the query.
Parameters are passed onto the eurostat
package functions get_eurostat()
and get_eurostat_json()
as a list item. If an individual item contains
multiple items, as it often can be in the case of geo
parameters and
other optional items, they must be in the form of a vector: c("FI", "SE")
.
For examples on how to use these parameters, see function examples below.
time
and time_period
address the same TIME_PERIOD
dimension in the
dataset and can be used interchangeably. In the Eurostat documentation
it is stated that "Using more than one Time parameter in the same query
is not accepted", but practice has shown that actually Eurostat API allows
multiple time
parameters in the same query. This makes it possible to
use R colon operator when writing queries, so time = c(2015:2018)
translates to &time=2015&time=2016&time=2017&time=2018
.
The only exception
to this is when the queried dataset contains e.g. quarterly data and
TIME_PERIOD
is saved as 2015-Q1
, 2015-Q2
etc. Then it is possible
to use time=2015-Q1&time=2015-Q2
style in the query URL, but this makes it
unfeasible to use the colon operator and requires a lot of manual typing.
Because of this, it is useful to know about other time parameters as well:
untilTimePeriod
: return dataset items from the oldest record up until the
set time, for example "all data until 2000": untilTimePeriod = 2000
sinceTimePeriod
: return dataset items starting from set time, for example
"all datastarting from 2008": sinceTimePeriod = 2008
lastTimePeriod
: starting from the most recent time period, how many
preceding time periods should be returned? For example 10 most
recent observations: lastTimePeriod = 10
Using both untilTimePeriod
and sinceTimePeriod
parameters in the same
query is allowed, making the usage of the R colon operator unnecessary.
In the case of quarterly data, using untilTimePeriod
and sinceTimePeriod
parameters also works, as opposed to the colon operator, so it is generally
safer to use them as well.
In get_eurostat_json()
examples nama_10_gdp
dataset is filtered with
two additional filter parameters:
na_item = "B1GQ"
unit = "CLV_I10"
Filters like these are most likely unique to the nama_10_gdp
dataset
(or other datasets within the same domain) and should
not be used with others dataset without user discretion.
By using label_eurostat()
we know that "B1GQ"
stands for
"Gross domestic product at market prices" and
"CLV_I10"
means "Chain linked volumes, index 2010=100".
Different dimension codes can be translated to a natural language by using
the get_eurostat_dic()
function, which returns labels for individual
dimension items such as na_item
and unit
, as opposed to
label_eurostat()
which does it for whole datasets. For example, the
parameter na_item
stands for "National accounts indicator (ESA 2010)" and
unit
stands for "Unit of measure".
All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.
Example:
lang = "fr"
For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
"Eurostat's general copyright notice and licence policy is applicable and can be consulted here: https://ec.europa.eu/eurostat/about-us/policies/copyright
Please also be aware of the European Commission's general conditions: https://commission.europa.eu/legal-notice_en
Moreover, there are specific provisions applicable to some of the following datasets available for downloading. The download and usage of these data is subject to their acceptance:
Administrative Units / Statistical Units
Population distribution / Demography
Transport Networks
Land Cover
Elevation (DEM)"
Of the abovementioned datasets, Administrative Units / Statistical Units is applicable if the user wants to draw maps with borders provided by GISCO / EuroGeographics.
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the GISCO website: GISCO: Geographical information and maps - Administrative units/statistical units
"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:
The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").
The permission to use the data is granted on condition that:
the data will not be used for commercial purposes;
the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page."
When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:
EN: © EuroGeographics for the administrative boundaries
FR: © EuroGeographics pour les limites administratives
DE: © EuroGeographics bezüglich der Verwaltungsgrenzen
For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.
If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Most Eurostat datasets are relatively manageable, at least on a machine with 16 GB of RAM. The largest dataset in Eurostat database, at the time of writing this, had 148362539 (148 million) values, which results in an object with 148 million rows in tidy data (long) format. The test machine with 16 GB of RAM was able to handle the second largest dataset in the database with 91 million values (rows).
There are still some methods to make data fetching functions perform faster:
turn caching off: get_eurostat(cache = FALSE)
turn cache compression off (may result in rather large cache files!):
get_eurostat(compress_file = FALSE)
if you want faster caching with manageable file sizes, use stringsAsFactors:
get_eurostat(cache = TRUE, compress_file = TRUE, stringsAsFactors = TRUE)
Use faster data.table functions: get_eurostat(use.data.table = TRUE)
Keep column processing to a minimum:
get_eurostat(time_format = "raw", type = "code")
etc.
Read get_eurostat()
function documentation carefully so you understand
what different arguments do
Filter the dataset so that you fetch only the parts you need!
For working with sub-national statistics the basic functions of the regions package are imported https://regions.dataobservatory.eu/.
Leo Lahti, Janne Huovari, Markus Kainu, Przemyslaw Biecek
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 A BibTeX entry for LaTeX users is @Article{10.32614/RJ-2017-019, title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, journal = {The R Journal}, volume = {9}, number = {1}, pages = {385--392}, year = {2017}, doi = {10.32614/RJ-2017-019}, url = {https://doi.org/10.32614/RJ-2017-019}, } Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat A BibTeX entry for LaTeX users is @Misc{eurostat, title = {eurostat: Tools for Eurostat Open Data}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, url = {https://github.com/rOpenGov/eurostat}, type = {Computer software}, year = {2023}, note = {R package version 4.0.0}, }
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
help("regions")
, https://regions.dataobservatory.eu/
library(eurostat)
library(eurostat)
Check if R has access to resources at http://ec.europa.eu
check_access_to_data()
check_access_to_data()
a logical.
Markus Kainu [email protected]
check_access_to_data()
check_access_to_data()
Delete all .rds files from the eurostat cache directory.
See get_eurostat()
for more on cache.
clean_eurostat_cache(cache_dir = NULL, config = FALSE)
clean_eurostat_cache(cache_dir = NULL, config = FALSE)
cache_dir |
A path to cache directory. If |
config |
Logical |
Przemyslaw Biecek, Leo Lahti, Janne Huovari, Markus Kainu and Diego Hernangómez
Other cache utilities:
set_eurostat_cache_dir()
## Not run: clean_eurostat_cache() ## End(Not run)
## Not run: clean_eurostat_cache() ## End(Not run)
Categorises a numeric vector into automatic or manually defined
categories and polishes the labels ready for used in mapping with ggplot2
.
cut_to_classes( x, n = 5, style = "equal", manual = FALSE, manual_breaks = NULL, decimals = 0, nodata_label = "No data" )
cut_to_classes( x, n = 5, style = "equal", manual = FALSE, manual_breaks = NULL, decimals = 0, nodata_label = "No data" )
x |
A numeric vector, eg. |
n |
A numeric. number of classes/categories |
style |
chosen style: one of "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust", "bclust", "fisher", "jenks", "dpih", "headtails", "maximum", or "box" |
manual |
Logical. If manual breaks are being used |
manual_breaks |
Numeric vector with manual threshold values |
decimals |
Number of decimals to include with labels |
nodata_label |
String. Text label for NA category. |
a factor.
Markus Kainu [email protected]
Other helpers:
dic_order()
,
eurotime2date()
,
eurotime2num()
,
harmonize_country_code()
,
label_eurostat()
# lp <- get_eurostat("nama_aux_lp") lp <- get_eurostat("nama_10_lp_ulc") lp$class <- cut_to_classes(lp$values, n = 5, style = "equal", decimals = 1)
# lp <- get_eurostat("nama_aux_lp") lp <- get_eurostat("nama_10_lp_ulc") lp$class <- cut_to_classes(lp$values, n = 5, style = "equal", decimals = 1)
Orders the factor levels.
dic_order(x, dic, type)
dic_order(x, dic, type)
x |
a variable (code or labelled) to get order for. |
dic |
a name of the dictionary. Correspond a variable name in the
data_frame from |
type |
a type of the x. Could be |
Some variables, like classifications, have logical or conventional
ordering. Eurostat data tables are nor necessary ordered in this order.
The function dic_order()
get the ordering from Eurostat classifications
dictionaries. The function label_eurostat()
can also order factor levels
of labels with argument eu_order = TRUE
.
A numeric vector of orders.
Przemyslaw Biecek, Leo Lahti, Janne Huovari and Markus Kainu
Other helpers:
cut_to_classes()
,
eurotime2date()
,
eurotime2num()
,
harmonize_country_code()
,
label_eurostat()
Countries and country codes in EU, Euro area, EFTA and EU candidate countries.
eu_countries ea_countries efta_countries eu_candidate_countries
eu_countries ea_countries efta_countries eu_candidate_countries
A data_frame:
code: Country code in the Eurostat database.
name: Country name in English.
label: Country name in the Eurostat database.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 19 rows and 3 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 4 rows and 3 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 7 rows and 3 columns.
https://ec.europa.eu/eurostat/statistics-explained/index.php/Tutorial:Country_codes_and_protocol_order, https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Euro_area
Other datasets:
eurostat_geodata_60_2016
,
tgs00026
Geospatial data of Europe from GISCO in 1:60 million scale from year 2016
sf object
The dataset contains 2016 observations (rows) and 12 variables (columns).
The object contains the following columns:
id: JSON id code, the same as NUTS_ID. See NUTS_ID below for further clarification.
LEVL_CODE: NUTS level code: 0 (national level), 1 (major socio-economic regions), 2 (basic regions for the application of regional policies) or 3 (small regions).
NUTS_ID: NUTS ID code, consisting of country code and numbers (1 for NUTS 1, 2 for NUTS 2 and 3 for NUTS 3)
CNTR_CODE: Country code: two-letter ISO code (ISO 3166 alpha-2), except in the case of Greece (EL).
NAME_LATN: NUTS name in local language, transliterated to Latin script
NUTS_NAME: NUTS name in local language, in local script.
MOUNT_TYPE: Mountain typology for NUTS 3 regions.
1: "where more than 50 % of the surface is covered by topographic mountain areas"
2: "in which more than 50 % of the regional population lives in topographic mountain areas"
3: "where more than 50 % of the surface is covered by topographic mountain areas and where more than 50 % of the regional population lives in these mountain areas"
4: non-mountain region / other region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 and non-EU countries)
URBN_TYPE: Urban-rural typology for NUTS 3 regions.
1: predominantly urban region
2: intermediate region
3: predominantly rural region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
COAST_TYPE: Coastal typology for NUTS 3 regions.
1: coastal (on coast)
2: coastal (>= 50% of population living within 50km of the coastline)
3: non-coastal region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
FID: Same as NUTS_ID.
geo: Same as NUTS_ID, added for for easier joins with dplyr. However, it is recommended to use other identical fields for this purpose.
geometry: geospatial information.
Dataset updated: 2023-06-29. For a more recent version, please use
giscoR::gisco_get_nuts()
function.
Data source: Eurostat via giscoR::gisco_get_nuts()
.
© EuroGeographics for the administrative boundaries
Data downloaded from: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: GISCO: Geographical information and maps - Administrative units/statistical units
"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:
The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").
The permission to use the data is granted on condition that:
the data will not be used for commercial purposes;
the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page.
When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:
EN: © EuroGeographics for the administrative boundaries
FR: © EuroGeographics pour les limites administratives
DE: © EuroGeographics bezüglich der Verwaltungsgrenzen
For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.
If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."
giscoR::gisco_get_nuts()
and
Eurostat. (2019). Methodological manual on territorial typologies – 2018 edition. Manuals and guidelines.
Other datasets:
eu_countries
,
tgs00026
Other geospatial:
get_eurostat_geospatial()
eurostat_geodata_60_2016 <- eurostat::eurostat_geodata_60_2016 # Manipulate and plot if (require(sf)) { library(sf) # Filter NUTS3 from select countries like in a regular data frame example_nuts <- subset(eurostat_geodata_60_2016, LEVL_CODE == 3 & CNTR_CODE %in% c("DK", "DE", "PL")) plot(example_nuts["CNTR_CODE"]) }
eurostat_geodata_60_2016 <- eurostat::eurostat_geodata_60_2016 # Manipulate and plot if (require(sf)) { library(sf) # Filter NUTS3 from select countries like in a regular data frame example_nuts <- subset(eurostat_geodata_60_2016, LEVL_CODE == 3 & CNTR_CODE %in% c("DK", "DE", "PL")) plot(example_nuts["CNTR_CODE"]) }
This list of defunct functions is maintained to document changes to eurostat functions in a transparent manner.
grepEurostatTOC(...)
grepEurostatTOC(...)
... |
Generic representation of old arguments |
The following functions are defunct:
grepEurostatTOC
: Use search_eurostat
instead
Date conversion from Eurostat time format. A function to
convert Eurostat time values to objects of class Date()
representing calendar dates.
eurotime2date(x, last = FALSE)
eurotime2date(x, last = FALSE)
x |
a charter string with time information in Eurostat time format. |
last |
a logical. If |
Available patterns are YYYY (year), YYYY-SN (semester), YYYY-QN (quarter), YYYY-MM (month), YYYY-WNN (week) and YYYY-MM-DD (day).
an object of class Date()
.
Janne Huovari [email protected]
See citation("eurostat")
:
# Kindly cite the eurostat R package as follows: # # Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and # analysis of Eurostat open data with the eurostat package. The R # Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 # # A BibTeX entry for LaTeX users is # # @Article{10.32614/RJ-2017-019, # title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, # author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, # journal = {The R Journal}, # volume = {9}, # number = {1}, # pages = {385--392}, # year = {2017}, # doi = {10.32614/RJ-2017-019}, # url = {https://doi.org/10.32614/RJ-2017-019}, # } # # Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., # and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data # [Computer software]. R package version 4.0.0. # https://github.com/rOpenGov/eurostat # # A BibTeX entry for LaTeX users is # # @Misc{eurostat, # title = {eurostat: Tools for Eurostat Open Data}, # author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, # url = {https://github.com/rOpenGov/eurostat}, # type = {Computer software}, # year = {2023}, # note = {R package version 4.0.0}, # }
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2num()
,
harmonize_country_code()
,
label_eurostat()
na_q <- get_eurostat("namq_10_pc", time_format = "raw") na_q$TIME_PERIOD <- eurotime2date(x = na_q$TIME_PERIOD) unique(na_q$TIME_PERIOD) ## Not run: # Test for weekly data get_eurostat( id = "lfsi_abs_w", select_time = c("W"), time_format = "date" ) ## End(Not run)
na_q <- get_eurostat("namq_10_pc", time_format = "raw") na_q$TIME_PERIOD <- eurotime2date(x = na_q$TIME_PERIOD) unique(na_q$TIME_PERIOD) ## Not run: # Test for weekly data get_eurostat( id = "lfsi_abs_w", select_time = c("W"), time_format = "date" ) ## End(Not run)
A conversion of a Eurostat time format to numeric.
eurotime2num(x)
eurotime2num(x)
x |
a charter string with time information in Eurostat time format. |
Bi-annual (semester), quarterly, monthly and weekly data can be presented as a fraction of the year in beginning of the period. Conversion of daily data is not supported.
see as.numeric()
.
Janne Huovari [email protected], Pyry Kantanen
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2date()
,
harmonize_country_code()
,
label_eurostat()
na_q <- get_eurostat("namq_10_pc", time_format = "raw") na_q$TIME_PERIOD <- eurotime2num(x = na_q$TIME_PERIOD) unique(na_q$TIME_PERIOD)
na_q <- get_eurostat("namq_10_pc", time_format = "raw") na_q$TIME_PERIOD <- eurotime2num(x = na_q$TIME_PERIOD) unique(na_q$TIME_PERIOD)
Creates a bibliography from selected Eurostat data files, including last Eurostat update, URL access data, and optional keywords set by the user.
get_bibentry(code, keywords = NULL, format = "Biblatex", lang = "en")
get_bibentry(code, keywords = NULL, format = "Biblatex", lang = "en")
code |
A Eurostat data code or a vector of Eurostat data codes as character or factor. |
keywords |
A list of keywords to be added to the entries. Defaults
to |
format |
Default is |
lang |
2-letter language code, default is " |
a bibentry, Bibtex or Biblatex object.
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Daniel Antal, Przemyslaw Biecek
utils::bibentry RefManageR::toBiblatex
## Not run: my_bibliography <- get_bibentry( code = c("tran_hv_frtra", "tec00001"), keywords = list( c("transport", "freight", "multimodal data", "GDP"), c("economy and finance", "annual", "national accounts", "GDP") ), format = "Biblatex" ) my_bibliography ## End(Not run)
## Not run: my_bibliography <- get_bibentry( code = c("tran_hv_frtra", "tec00001"), keywords = list( c("transport", "freight", "multimodal data", "GDP"), c("economy and finance", "annual", "national accounts", "GDP") ), format = "Biblatex" ) my_bibliography ## End(Not run)
Download data sets from Eurostat https://ec.europa.eu/eurostat
get_eurostat( id, time_format = "date", filters = NULL, type = "code", select_time = NULL, lang = "en", cache = TRUE, update_cache = FALSE, cache_dir = NULL, compress_file = TRUE, stringsAsFactors = FALSE, keepFlags = FALSE, use.data.table = FALSE, ... )
get_eurostat( id, time_format = "date", filters = NULL, type = "code", select_time = NULL, lang = "en", cache = TRUE, update_cache = FALSE, cache_dir = NULL, compress_file = TRUE, stringsAsFactors = FALSE, keepFlags = FALSE, use.data.table = FALSE, ... )
id |
A unique identifier / code for the dataset of interest. If code is not
known |
time_format |
a string giving a type of the conversion of the time column from the
eurostat format. The default argument " |
filters |
A named list of filters. Names of list objects are Eurostat
variable codes and values are vectors of observation codes. If |
type |
A type of variables, " |
select_time |
a character symbol for a time frequency or |
lang |
2-letter language code, default is " |
cache |
a logical whether to do caching. Default is |
update_cache |
a logical whether to update cache. Can be set also with
|
cache_dir |
a path to a cache directory. |
compress_file |
a logical whether to compress the RDS-file in caching. Default is |
stringsAsFactors |
if |
keepFlags |
a logical whether the flags (e.g. "confidential",
"provisional") should be kept in a separate column or if they
can be removed. Default is |
use.data.table |
Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed. |
... |
Arguments passed on to
|
Datasets are downloaded from
the Eurostat SDMX 2.1 API
in TSV format or from The Eurostat
API Statistics JSON API.
If only the table id
is given, the whole table is downloaded from the
SDMX API. If any filters
are given JSON API is used instead.
The bulk download facility is the fastest method to download whole datasets.
It is also often the only way as the JSON API has limitation of maximum
50 sub-indicators at time and whole datasets usually exceeds that. Also,
it seems that multi frequency datasets can only be retrieved via
bulk download facility and the select_time
is not available for
JSON API method.
If your connection is through a proxy, you may have to set proxy parameters
to use JSON API, see get_eurostat_json()
.
By default datasets are cached to reduce load on Eurostat services and
because some datasets can be quite large.
Cache files are stored in a temporary directory by default or in
a named directory (See set_eurostat_cache_dir()
).
The cache can be emptied with clean_eurostat_cache()
.
The id
, a code, for the dataset can be searched with
the search_eurostat()
or from the Eurostat database
https://ec.europa.eu/eurostat/data/database. The Eurostat
database gives codes in the Data Navigation Tree after every dataset
in parenthesis.
a tibble.
One column for each dimension in the data, the time column for a time
dimension and the values column for numerical values. Eurostat data does
not include all missing values and a treatment of missing values depend
on source. In bulk download facility missing values are dropped if all
dimensions are missing on particular time. In JSON API missing values are
dropped only if all dimensions are missing on all times. The data from
bulk download facility can be completed for example with tidyr::complete()
.
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
When using Eurostat API Statistics (JSON API), datasets can be filtered
before they are downloaded and saved in local memory. The general format
for filter parameters is <DIMENSION_CODE>=<VALUE>
.
Filter parameters are optional but the used dimension codes must be present
in the data product that is being queried. Dimension codes can
vary between different data products so it may be useful to examine new
datasets in Eurostat data browser beforehand. However, most if not all
Eurostat datasets concern European countries and contain information that
was gathered at some point in time, so geo
and time
dimension codes
can usually be used.
<DIMENSION_CODE>
and <VALUE>
are case-insensitive and they can be written
in lowercase or uppercase in the query.
Parameters are passed onto the eurostat
package functions get_eurostat()
and get_eurostat_json()
as a list item. If an individual item contains
multiple items, as it often can be in the case of geo
parameters and
other optional items, they must be in the form of a vector: c("FI", "SE")
.
For examples on how to use these parameters, see function examples below.
time
and time_period
address the same TIME_PERIOD
dimension in the
dataset and can be used interchangeably. In the Eurostat documentation
it is stated that "Using more than one Time parameter in the same query
is not accepted", but practice has shown that actually Eurostat API allows
multiple time
parameters in the same query. This makes it possible to
use R colon operator when writing queries, so time = c(2015:2018)
translates to &time=2015&time=2016&time=2017&time=2018
.
The only exception
to this is when the queried dataset contains e.g. quarterly data and
TIME_PERIOD
is saved as 2015-Q1
, 2015-Q2
etc. Then it is possible
to use time=2015-Q1&time=2015-Q2
style in the query URL, but this makes it
unfeasible to use the colon operator and requires a lot of manual typing.
Because of this, it is useful to know about other time parameters as well:
untilTimePeriod
: return dataset items from the oldest record up until the
set time, for example "all data until 2000": untilTimePeriod = 2000
sinceTimePeriod
: return dataset items starting from set time, for example
"all datastarting from 2008": sinceTimePeriod = 2008
lastTimePeriod
: starting from the most recent time period, how many
preceding time periods should be returned? For example 10 most
recent observations: lastTimePeriod = 10
Using both untilTimePeriod
and sinceTimePeriod
parameters in the same
query is allowed, making the usage of the R colon operator unnecessary.
In the case of quarterly data, using untilTimePeriod
and sinceTimePeriod
parameters also works, as opposed to the colon operator, so it is generally
safer to use them as well.
In get_eurostat_json()
examples nama_10_gdp
dataset is filtered with
two additional filter parameters:
na_item = "B1GQ"
unit = "CLV_I10"
Filters like these are most likely unique to the nama_10_gdp
dataset
(or other datasets within the same domain) and should
not be used with others dataset without user discretion.
By using label_eurostat()
we know that "B1GQ"
stands for
"Gross domestic product at market prices" and
"CLV_I10"
means "Chain linked volumes, index 2010=100".
Different dimension codes can be translated to a natural language by using
the get_eurostat_dic()
function, which returns labels for individual
dimension items such as na_item
and unit
, as opposed to
label_eurostat()
which does it for whole datasets. For example, the
parameter na_item
stands for "National accounts indicator (ESA 2010)" and
unit
stands for "Unit of measure".
All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.
Example:
lang = "fr"
For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Most Eurostat datasets are relatively manageable, at least on a machine with 16 GB of RAM. The largest dataset in Eurostat database, at the time of writing this, had 148362539 (148 million) values, which results in an object with 148 million rows in tidy data (long) format. The test machine with 16 GB of RAM was able to handle the second largest dataset in the database with 91 million values (rows).
There are still some methods to make data fetching functions perform faster:
turn caching off: get_eurostat(cache = FALSE)
turn cache compression off (may result in rather large cache files!):
get_eurostat(compress_file = FALSE)
if you want faster caching with manageable file sizes, use stringsAsFactors:
get_eurostat(cache = TRUE, compress_file = TRUE, stringsAsFactors = TRUE)
Use faster data.table functions: get_eurostat(use.data.table = TRUE)
Keep column processing to a minimum:
get_eurostat(time_format = "raw", type = "code")
etc.
Read get_eurostat()
function documentation carefully so you understand
what different arguments do
Filter the dataset so that you fetch only the parts you need!
Przemyslaw Biecek, Leo Lahti, Janne Huovari, Markus Kainu and Pyry Kantanen
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 A BibTeX entry for LaTeX users is @Article{10.32614/RJ-2017-019, title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, journal = {The R Journal}, volume = {9}, number = {1}, pages = {385--392}, year = {2017}, doi = {10.32614/RJ-2017-019}, url = {https://doi.org/10.32614/RJ-2017-019}, } Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat A BibTeX entry for LaTeX users is @Misc{eurostat, title = {eurostat: Tools for Eurostat Open Data}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, url = {https://github.com/rOpenGov/eurostat}, type = {Computer software}, year = {2023}, note = {R package version 4.0.0}, }
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
search_eurostat()
, label_eurostat()
## Not run: k <- get_eurostat("nama_10_lp_ulc") k <- get_eurostat("nama_10_lp_ulc", time_format = "num") k <- get_eurostat("nama_10_lp_ulc", update_cache = TRUE) k <- get_eurostat("nama_10_lp_ulc", cache_dir = file.path(tempdir(), "r_cache") ) options(eurostat_update = TRUE) k <- get_eurostat("nama_10_lp_ulc") options(eurostat_update = FALSE) set_eurostat_cache_dir(file.path(tempdir(), "r_cache2")) k <- get_eurostat("nama_10_lp_ulc") k <- get_eurostat("nama_10_lp_ulc", cache = FALSE) k <- get_eurostat("avia_gonc", select_time = "Y", cache = FALSE) dd <- get_eurostat("nama_10_gdp", filters = list( geo = "FI", na_item = "B1GQ", unit = "CLV_I10" ) ) # A dataset with multiple time series in one dd2 <- get_eurostat("AVIA_GOR_ME", select_time = c("A", "M", "Q"), time_format = "date_last" ) # An example of downloading whole dataset from JSON API dd3 <- get_eurostat("AVIA_GOR_ME", filters = list() ) # Filtering a dataset from a local file dd3_filter <- get_eurostat("AVIA_GOR_ME", filters = list( tra_meas = "FRM_BRD" ) ) ## End(Not run)
## Not run: k <- get_eurostat("nama_10_lp_ulc") k <- get_eurostat("nama_10_lp_ulc", time_format = "num") k <- get_eurostat("nama_10_lp_ulc", update_cache = TRUE) k <- get_eurostat("nama_10_lp_ulc", cache_dir = file.path(tempdir(), "r_cache") ) options(eurostat_update = TRUE) k <- get_eurostat("nama_10_lp_ulc") options(eurostat_update = FALSE) set_eurostat_cache_dir(file.path(tempdir(), "r_cache2")) k <- get_eurostat("nama_10_lp_ulc") k <- get_eurostat("nama_10_lp_ulc", cache = FALSE) k <- get_eurostat("avia_gonc", select_time = "Y", cache = FALSE) dd <- get_eurostat("nama_10_gdp", filters = list( geo = "FI", na_item = "B1GQ", unit = "CLV_I10" ) ) # A dataset with multiple time series in one dd2 <- get_eurostat("AVIA_GOR_ME", select_time = c("A", "M", "Q"), time_format = "date_last" ) # An example of downloading whole dataset from JSON API dd3 <- get_eurostat("AVIA_GOR_ME", filters = list() ) # Filtering a dataset from a local file dd3_filter <- get_eurostat("AVIA_GOR_ME", filters = list( tra_meas = "FRM_BRD" ) ) ## End(Not run)
Download a Eurostat dictionary.
get_eurostat_dic(dictname, lang = "en")
get_eurostat_dic(dictname, lang = "en")
dictname |
A character, dictionary for the variable to be downloaded. |
lang |
A character, language code. Options: "en" (default), "fr", "de". |
For given coded variable from Eurostat
https://ec.europa.eu/eurostat/. The dictionaries link codes with
human-readable labels. To translate codes to labels, use
label_eurostat()
.
tibble with two columns: code names and full names.
Przemyslaw Biecek and Leo Lahti [email protected]. Thanks to Wietse Dol for contributions. Updated by Pyry Kantanen to support XML codelists.
See citation("eurostat")
:
# Kindly cite the eurostat R package as follows: # # Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and # analysis of Eurostat open data with the eurostat package. The R # Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 # # A BibTeX entry for LaTeX users is # # @Article{10.32614/RJ-2017-019, # title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, # author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, # journal = {The R Journal}, # volume = {9}, # number = {1}, # pages = {385--392}, # year = {2017}, # doi = {10.32614/RJ-2017-019}, # url = {https://doi.org/10.32614/RJ-2017-019}, # } # # Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., # and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data # [Computer software]. R package version 4.0.0. # https://github.com/rOpenGov/eurostat # # A BibTeX entry for LaTeX users is # # @Misc{eurostat, # title = {eurostat: Tools for Eurostat Open Data}, # author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, # url = {https://github.com/rOpenGov/eurostat}, # type = {Computer software}, # year = {2023}, # note = {R package version 4.0.0}, # }
label_eurostat()
, get_eurostat()
,
search_eurostat()
.
get_eurostat_dic("crop_pro") # Try another language get_eurostat_dic("crop_pro", lang = "fr")
get_eurostat_dic("crop_pro") # Try another language get_eurostat_dic("crop_pro", lang = "fr")
Loops over all files in a Eurostat database folder, downloads the data and assigns the datasets to environment.
get_eurostat_folder(code, env = .EurostatEnv)
get_eurostat_folder(code, env = .EurostatEnv)
code |
Folder code from Eurostat Table of Contents. |
env |
Name of the environment where downloaded datasets are assigned. Default is .EurostatEnv. If NULL, datasets are returned as a list object. |
The datasets are assigned into .EurostatEnv by default, using dataset codes as object names. The datasets are downloaded from SDMX API as TSV files, meaning that they are returned without filtering. No filters can be provided using this function.
Please do not attempt to download too many datasets or the whole database at once. The number of datasets that can be downloaded at once is hardcoded to 20. The function also asks the user for confirmation if the number of datasets in a folder is more than 10. This is by design to discourage straining Eurostat API.
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query
The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API
See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.
For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf
Pyry Kantanen
get_eurostat_toc()
toc_count_children()
toc_determine_hierarchy()
toc_list_children()
toc_count_whitespace()
Downloads either a simple features (sf) or a data_frame
of NUTS regions. This function is a wrapper of giscoR::gisco_get_nuts()
.
This function requires to have installed the packages sf and
giscoR.
get_eurostat_geospatial( output_class = "sf", resolution = "60", nuts_level = "all", year = "2016", cache = TRUE, update_cache = FALSE, cache_dir = NULL, crs = "4326", make_valid = "DEPRECATED", ... )
get_eurostat_geospatial( output_class = "sf", resolution = "60", nuts_level = "all", year = "2016", cache = TRUE, update_cache = FALSE, cache_dir = NULL, crs = "4326", make_valid = "DEPRECATED", ... )
output_class |
Class of object returned,
either |
resolution |
Resolution of the geospatial data. One of
|
nuts_level |
Level of NUTS classification of the geospatial data. One of "0", "1", "2", "3" or "all" (mimics the original behaviour) |
year |
NUTS release year. One of "2003", "2006", "2010", "2013", "2016" or "2021" |
cache |
a logical whether to do caching. Default is |
update_cache |
a logical whether to update cache. Can be set also with
|
cache_dir |
a path to a cache directory. See
|
crs |
projection of the map: 4-digit EPSG code. One of:
|
make_valid |
Deprecated |
... |
Arguments passed on to
|
The objects downloaded from GISCO should contain all or some of the following variable columns:
id: JSON id code, the same as NUTS_ID. See NUTS_ID below for further clarification.
LEVL_CODE: NUTS level code: 0 (national level), 1 (major socio-economic regions), 2 (basic regions for the application of regional policies) or 3 (small regions).
NUTS_ID: NUTS ID code, consisting of country code and numbers (1 for NUTS 1, 2 for NUTS 2 and 3 for NUTS 3)
CNTR_CODE: Country code: two-letter ISO code (ISO 3166 alpha-2), except in the case of Greece (EL).
NAME_LATN: NUTS name in local language, transliterated to Latin script
NUTS_NAME: NUTS name in local language, in local script.
MOUNT_TYPE: Mountain typology for NUTS 3 regions.
1: "where more than 50 % of the surface is covered by topographic mountain areas"
2: "in which more than 50 % of the regional population lives in topographic mountain areas"
3: "where more than 50 % of the surface is covered by topographic mountain areas and where more than 50 % of the regional population lives in these mountain areas"
4: non-mountain region / other region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 and non-EU countries)
URBN_TYPE: Urban-rural typology for NUTS 3 regions.
1: predominantly urban region
2: intermediate region
3: predominantly rural region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
COAST_TYPE: Coastal typology for NUTS 3 regions.
1: coastal (on coast)
2: coastal (>= 50% of population living within 50km of the coastline)
3: non-coastal region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
FID: Same as NUTS_ID.
geo: Same as NUTS_ID, added for for easier joins with dplyr. Consider the status of this column "questioning" and use other columns for joins when possible.
geometry: geospatial information.
a sf or data_frame
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
"Eurostat's general copyright notice and licence policy is applicable and can be consulted here: https://ec.europa.eu/eurostat/about-us/policies/copyright
Please also be aware of the European Commission's general conditions: https://commission.europa.eu/legal-notice_en
Moreover, there are specific provisions applicable to some of the following datasets available for downloading. The download and usage of these data is subject to their acceptance:
Administrative Units / Statistical Units
Population distribution / Demography
Transport Networks
Land Cover
Elevation (DEM)"
Of the abovementioned datasets, Administrative Units / Statistical Units is applicable if the user wants to draw maps with borders provided by GISCO / EuroGeographics.
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the GISCO website: GISCO: Geographical information and maps - Administrative units/statistical units
"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:
The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").
The permission to use the data is granted on condition that:
the data will not be used for commercial purposes;
the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page."
When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:
EN: © EuroGeographics for the administrative boundaries
FR: © EuroGeographics pour les limites administratives
DE: © EuroGeographics bezüglich der Verwaltungsgrenzen
For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.
If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."
Markus Kainu [email protected], Diego Hernangomez https://github.com/dieghernan/
Data source: Eurostat
© EuroGeographics for the administrative boundaries
Data downloaded using giscoR
Other geospatial:
eurostat_geodata_60_2016
# Uses cached dataset sf <- get_eurostat_geospatial( output_class = "sf", resolution = "60", nuts_level = "all" ) # Downloads dataset from server sf2 <- get_eurostat_geospatial( output_class = "sf", resolution = "20", nuts_level = "all" ) df <- get_eurostat_geospatial( output_class = "df", nuts_level = "0" )
# Uses cached dataset sf <- get_eurostat_geospatial( output_class = "sf", resolution = "60", nuts_level = "all" ) # Downloads dataset from server sf2 <- get_eurostat_geospatial( output_class = "sf", resolution = "20", nuts_level = "all" ) df <- get_eurostat_geospatial( output_class = "df", nuts_level = "0" )
A simple interactive helper function to go through the steps of downloading and/or finding suitable eurostat datasets.
get_eurostat_interactive(code = NULL)
get_eurostat_interactive(code = NULL)
code |
A unique identifier / code for the dataset of interest. If code is not
known |
This function is intended to enable easy exploration of different eurostat
package functionalities and functions. In order to not drown the end user
in endless menus this function does not allow for setting
all possible get_eurostat()
function arguments. It is possible to set
time_format
, type
, lang
, stringsAsFactors
, keepFlags
, and
use.data.table
in the interactive menus.
In some datasets setting these parameters may result in a
"Error in label_eurostat" error, for example:
"labels for XXXXXX includes duplicated labels in the Eurostat dictionary".
In these cases, and with other more complex queries, please
use get_eurostat()
function directly.
Retrieve data from Eurostat API in JSON format.
get_eurostat_json( id, filters = NULL, type = "code", lang = "en", stringsAsFactors = FALSE, proxy = FALSE, ... )
get_eurostat_json( id, filters = NULL, type = "code", lang = "en", stringsAsFactors = FALSE, proxy = FALSE, ... )
id |
A unique identifier / code for the dataset of interest. If code is not
known |
filters |
A named list of filters. Names of list objects are Eurostat
variable codes and values are vectors of observation codes. If |
type |
A type of variables, " |
lang |
2-letter language code, default is " |
stringsAsFactors |
if |
proxy |
Use proxy, TRUE or FALSE (default). |
... |
Arguments passed on to
|
Data to retrieve from
The Eurostat Web Services
can be specified with filters. Normally, it is
better to use JSON query through get_eurostat()
, than to use
get_eurostat_json()
directly.
Queries are limited to 50 sub-indicators at a time. A time can be
filtered with fixed "time" filter or with "sinceTimePeriod" and
"lastTimePeriod" filters. A sinceTimePeriod = 2000
returns
observations from 2000 to a last available. A lastTimePeriod = 10
returns a 10 last observations. See "Filtering datasets" section below
for more detailed information about filters.
To use a proxy to connect, proxy arguments can be
passed to httr2::req_perform()
via httr2::req_proxy()
- see latter
function documentation for parameter names that can be passed with ...
.
A non-functional example:
get_eurostat_json(id, filters, proxy = TRUE, url = "127.0.0.1", port = 80)
.
When retrieving data from Eurostat JSON API the user may encounter errors.
For end user convenience, we have provided a ready-made internal dataset
sdmx_http_errors
that contains descriptive labels and descriptions about
the possible interpretation or cause of each error. These messages are
returned if the API returns a status indicating a HTTP error
(400 or greater).
The Eurostat implementation seems to be based on SDMX 2.1, which is the reason we've used SDMX Standards guidelines as a supplementary source that we have included in the dataset. What this means in practice is that the dataset contains error codes and their mappings that are not mentioned in the Eurostat website. We hope you never encounter them.
A dataset as an object of data.frame
class.
Data is downloaded from Eurostat API Statistics. See Eurostat documentation for more information about data queries in API Statistics https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query
This replaces the old JSON Web Services that was used by Eurostat before February 2023 and by the eurostat R package versions before 3.7.13. See Eurostat documentation about the migration from JSON web service to API Statistics for more information about the differences between the old and the new service: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+migrating+from+JSON+web+service+to+API+Statistics
For easily viewing which filtering options are available - in addition to the default ones, time and language - Eurostat Web services Query builder tool may be useful: https://ec.europa.eu/eurostat/web/query-builder
When using Eurostat API Statistics (JSON API), datasets can be filtered
before they are downloaded and saved in local memory. The general format
for filter parameters is <DIMENSION_CODE>=<VALUE>
.
Filter parameters are optional but the used dimension codes must be present
in the data product that is being queried. Dimension codes can
vary between different data products so it may be useful to examine new
datasets in Eurostat data browser beforehand. However, most if not all
Eurostat datasets concern European countries and contain information that
was gathered at some point in time, so geo
and time
dimension codes
can usually be used.
<DIMENSION_CODE>
and <VALUE>
are case-insensitive and they can be written
in lowercase or uppercase in the query.
Parameters are passed onto the eurostat
package functions get_eurostat()
and get_eurostat_json()
as a list item. If an individual item contains
multiple items, as it often can be in the case of geo
parameters and
other optional items, they must be in the form of a vector: c("FI", "SE")
.
For examples on how to use these parameters, see function examples below.
time
and time_period
address the same TIME_PERIOD
dimension in the
dataset and can be used interchangeably. In the Eurostat documentation
it is stated that "Using more than one Time parameter in the same query
is not accepted", but practice has shown that actually Eurostat API allows
multiple time
parameters in the same query. This makes it possible to
use R colon operator when writing queries, so time = c(2015:2018)
translates to &time=2015&time=2016&time=2017&time=2018
.
The only exception
to this is when the queried dataset contains e.g. quarterly data and
TIME_PERIOD
is saved as 2015-Q1
, 2015-Q2
etc. Then it is possible
to use time=2015-Q1&time=2015-Q2
style in the query URL, but this makes it
unfeasible to use the colon operator and requires a lot of manual typing.
Because of this, it is useful to know about other time parameters as well:
untilTimePeriod
: return dataset items from the oldest record up until the
set time, for example "all data until 2000": untilTimePeriod = 2000
sinceTimePeriod
: return dataset items starting from set time, for example
"all datastarting from 2008": sinceTimePeriod = 2008
lastTimePeriod
: starting from the most recent time period, how many
preceding time periods should be returned? For example 10 most
recent observations: lastTimePeriod = 10
Using both untilTimePeriod
and sinceTimePeriod
parameters in the same
query is allowed, making the usage of the R colon operator unnecessary.
In the case of quarterly data, using untilTimePeriod
and sinceTimePeriod
parameters also works, as opposed to the colon operator, so it is generally
safer to use them as well.
In get_eurostat_json()
examples nama_10_gdp
dataset is filtered with
two additional filter parameters:
na_item = "B1GQ"
unit = "CLV_I10"
Filters like these are most likely unique to the nama_10_gdp
dataset
(or other datasets within the same domain) and should
not be used with others dataset without user discretion.
By using label_eurostat()
we know that "B1GQ"
stands for
"Gross domestic product at market prices" and
"CLV_I10"
means "Chain linked volumes, index 2010=100".
Different dimension codes can be translated to a natural language by using
the get_eurostat_dic()
function, which returns labels for individual
dimension items such as na_item
and unit
, as opposed to
label_eurostat()
which does it for whole datasets. For example, the
parameter na_item
stands for "National accounts indicator (ESA 2010)" and
unit
stands for "Unit of measure".
All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.
Example:
lang = "fr"
For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Przemyslaw Biecek, Leo Lahti, Janne Huovari Markus Kainu and Pyry Kantanen
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 A BibTeX entry for LaTeX users is @Article{10.32614/RJ-2017-019, title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, journal = {The R Journal}, volume = {9}, number = {1}, pages = {385--392}, year = {2017}, doi = {10.32614/RJ-2017-019}, url = {https://doi.org/10.32614/RJ-2017-019}, } Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat A BibTeX entry for LaTeX users is @Misc{eurostat, title = {eurostat: Tools for Eurostat Open Data}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, url = {https://github.com/rOpenGov/eurostat}, type = {Computer software}, year = {2023}, note = {R package version 4.0.0}, }
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
## Not run: # Generally speaking these queries would be done through get_eurostat tmp <- get_eurostat_json("nama_10_gdp") yy <- get_eurostat_json("nama_10_gdp", filters = list( geo = c("FI", "SE", "EU28"), time = c(2015:2023), lang = "FR", na_item = "B1GQ", unit = "CLV_I10" )) # TIME_PERIOD filter works also with the new JSON API yy2 <- get_eurostat_json("nama_10_gdp", filters = list( geo = c("FI", "SE", "EU28"), TIME_PERIOD = c(2015:2023), lang = "FR", na_item = "B1GQ", unit = "CLV_I10" )) # An example from get_eurostat dd <- get_eurostat("nama_10_gdp", filters = list( geo = "FI", na_item = "B1GQ", unit = "CLV_I10" )) ## End(Not run)
## Not run: # Generally speaking these queries would be done through get_eurostat tmp <- get_eurostat_json("nama_10_gdp") yy <- get_eurostat_json("nama_10_gdp", filters = list( geo = c("FI", "SE", "EU28"), time = c(2015:2023), lang = "FR", na_item = "B1GQ", unit = "CLV_I10" )) # TIME_PERIOD filter works also with the new JSON API yy2 <- get_eurostat_json("nama_10_gdp", filters = list( geo = c("FI", "SE", "EU28"), TIME_PERIOD = c(2015:2023), lang = "FR", na_item = "B1GQ", unit = "CLV_I10" )) # An example from get_eurostat dd <- get_eurostat("nama_10_gdp", filters = list( geo = "FI", na_item = "B1GQ", unit = "CLV_I10" )) ## End(Not run)
Download data from the eurostat database through the new dissemination API.
get_eurostat_raw(id, use.data.table = FALSE)
get_eurostat_raw(id, use.data.table = FALSE)
id |
A unique identifier / code for the dataset of interest. If code is not
known |
use.data.table |
Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed. |
A dataset in tibble format. First column contains comma separated codes of cases. Other columns usually corresponds to years and column names are years with preceding X. Data is in character format as it contains values together with eurostat flags for data.
Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query
The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API
See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.
For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Przemyslaw Biecek, Leo Lahti, Janne Huovari and Pyry Kantanen
See citation("eurostat")
:
# Kindly cite the eurostat R package as follows: # # Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and # analysis of Eurostat open data with the eurostat package. The R # Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 # # A BibTeX entry for LaTeX users is # # @Article{10.32614/RJ-2017-019, # title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, # author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, # journal = {The R Journal}, # volume = {9}, # number = {1}, # pages = {385--392}, # year = {2017}, # doi = {10.32614/RJ-2017-019}, # url = {https://doi.org/10.32614/RJ-2017-019}, # } # # Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., # and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data # [Computer software]. R package version 4.0.0. # https://github.com/rOpenGov/eurostat # # A BibTeX entry for LaTeX users is # # @Misc{eurostat, # title = {eurostat: Tools for Eurostat Open Data}, # author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, # url = {https://github.com/rOpenGov/eurostat}, # type = {Computer software}, # year = {2023}, # note = {R package version 4.0.0}, # }
eurostat:::get_eurostat_raw("educ_iste")
eurostat:::get_eurostat_raw("educ_iste")
Download table of contents (TOC) of eurostat datasets.
get_eurostat_toc(lang = "en")
get_eurostat_toc(lang = "en")
lang |
2-letter language code, default is " |
In the downloaded Eurostat Table of Contents the 'code' column values are refer to the function 'id' that is used as an argument in certain functions when downloading datasets.
A tibble with nine columns:
Dataset title in English (default)
Each item (dataset, table and folder) of the TOC has a
unique code which allows it to be identified in the API. Used in the
get_eurostat()
and get_eurostat_raw()
functions to retrieve datasets.
dataset, folder or table
Date, indicates the last time the
dataset/table was updated (format DD.MM.YYYY
or %d.%m.%Y
)
Date, indicates the last time the
dataset/table structure was modified (format DD.MM.YYYY
or %d.%m.%Y
)
Date of the oldest value included in the dataset
(if available) (format usually YYYY
or %Y
but can also be YYYY-MM
,
YYYY-MM-DD
, YYYY-SN
, YYYY-QN
etc.)
Date of the most recent value included in the dataset
(if available) (format usually YYYY
or %Y
but can also be YYYY-MM
,
YYYY-MM-DD
, YYYY-SN
, YYYY-QN
etc.)
Number of actual values included in the dataset
Hierarchy of the data navigation tree, represented in the original txt file by a 4-spaces indentation prefix in the title
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
Przemyslaw Biecek, Leo Lahti and Pyry Kantanen [email protected]
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 A BibTeX entry for LaTeX users is @Article{10.32614/RJ-2017-019, title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, journal = {The R Journal}, volume = {9}, number = {1}, pages = {385--392}, year = {2017}, doi = {10.32614/RJ-2017-019}, url = {https://doi.org/10.32614/RJ-2017-019}, } Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat A BibTeX entry for LaTeX users is @Misc{eurostat, title = {eurostat: Tools for Eurostat Open Data}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, url = {https://github.com/rOpenGov/eurostat}, type = {Computer software}, year = {2023}, note = {R package version 4.0.0}, }
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
get_eurostat()
, search_eurostat()
tmp <- get_eurostat_toc() head(tmp) # Convert columns containing dates as character into Date class # Last update of data tmp[[4]] <- as.Date(tmp[[4]], format = c("%d.%m.%Y")) # Last table structure change tmp[[5]] <- as.Date(tmp[[5]], format = c("%d.%m.%Y")) # Data start, contains several formats (date, week, month quarter, semester) # Unfortunately semesters are not directly supported so they need to be # changed into quarters tmp$data.start <- gsub("S2", "Q3", tmp$data.start) tmp$data.start <- lubridate::as_date( x = tmp$data.start, format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m") ) # Data end, same as data start tmp$data.end <- gsub("S2", "Q3", tmp$data.end) tmp$data.end <- lubridate::as_date( x = tmp$data.end, format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m") )
tmp <- get_eurostat_toc() head(tmp) # Convert columns containing dates as character into Date class # Last update of data tmp[[4]] <- as.Date(tmp[[4]], format = c("%d.%m.%Y")) # Last table structure change tmp[[5]] <- as.Date(tmp[[5]], format = c("%d.%m.%Y")) # Data start, contains several formats (date, week, month quarter, semester) # Unfortunately semesters are not directly supported so they need to be # changed into quarters tmp$data.start <- gsub("S2", "Q3", tmp$data.start) tmp$data.start <- lubridate::as_date( x = tmp$data.start, format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m") ) # Data end, same as data start tmp$data.end <- gsub("S2", "Q3", tmp$data.end) tmp$data.end <- lubridate::as_date( x = tmp$data.end, format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m") )
The European Commission and the Eurostat generally uses ISO 3166-1 alpha-2 codes with two exceptions: EL (not GR) is used to represent Greece, and UK (not GB) is used to represent the United Kingdom. This function turns country codes into to ISO 3166-1 alpha-2.
harmonize_country_code(x)
harmonize_country_code(x)
x |
A character or a factor vector of eurostat countycodes. |
a vector.
Janne Huovari [email protected]
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2date()
,
eurotime2num()
,
label_eurostat()
lp <- get_eurostat("nama_10_lp_ulc") lp$geo <- harmonize_country_code(lp$geo)
lp <- get_eurostat("nama_10_lp_ulc") lp$geo <- harmonize_country_code(lp$geo)
Get definitions for Eurostat codes from Eurostat dictionaries.
label_eurostat( x, dic = NULL, code = NULL, eu_order = FALSE, lang = "en", countrycode = NULL, countrycode_nomatch = NULL, custom_dic = NULL, fix_duplicated = FALSE ) label_eurostat_vars(x = NULL, id, lang = "en") label_eurostat_tables(x, lang = "en")
label_eurostat( x, dic = NULL, code = NULL, eu_order = FALSE, lang = "en", countrycode = NULL, countrycode_nomatch = NULL, custom_dic = NULL, fix_duplicated = FALSE ) label_eurostat_vars(x = NULL, id, lang = "en") label_eurostat_tables(x, lang = "en")
x |
A character or a factor vector or a data_frame. |
dic |
A string (vector) naming eurostat dictionary or dictionaries.
If |
code |
For data_frames names of the column for which also code columns should be retained. The suffix "_code" is added to code column names. |
eu_order |
Logical. Should Eurostat ordering used for label levels. Affects only factors. |
lang |
2-letter language code, default is " |
countrycode |
A |
countrycode_nomatch |
What to do when using the countrycode to label
a "geo" and countrycode fails to find a match, for example other than
country codes like EU28. The original code is used with
a |
custom_dic |
a named vector or named list of named vectors to give an own dictionary for (part of) codes. Names of the vector should be codes and values labels. List can be used to specify dictionaries and then list names should be dictionary codes. |
fix_duplicated |
A logical. If TRUE, the code is added to the duplicated label values. If FALSE (default) error is given if labeling produce duplicates. |
id |
A unique identifier / code for the dataset of interest. If code is not
known |
A character or a factor vector of codes returns a corresponding
vector of definitions. label_eurostat()
labels also data_frames from
get_eurostat()
. For vectors a dictionary name have to be
supplied. For data_frames dictionary names are taken from column names.
"time" and "values" columns are returned as they were, so you can supply
data_frame from get_eurostat()
and get data_frame with
definitions instead of codes.
Some Eurostat dictionaries includes duplicated labels. By default
duplicated labels cause an error, but they can be fixed automatically
with fix_duplicated = TRUE
.
a vector or a data_frame.
label_eurostat_vars()
: Get definitions for variable (column) names.
label_eurostat_tables()
: Get definitions for table names
Janne Huovari [email protected]
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2date()
,
eurotime2num()
,
harmonize_country_code()
## Not run: lp <- get_eurostat("nama_10_lp_ulc") lpl <- label_eurostat(lp) str(lpl) lpl_order <- label_eurostat(lp, eu_order = TRUE) lpl_code <- label_eurostat(lp, code = "unit") # Note that the dataset id must be provided in label_eurostat_vars label_eurostat_vars(id = "nama_10_lp_ulc", x = "geo", lang = "en") label_eurostat_tables("nama_10_lp_ulc") label_eurostat(c("FI", "DE", "EU28"), dic = "geo") label_eurostat( c("FI", "DE", "EU28"), dic = "geo", custom_dic = c(DE = "Germany") ) label_eurostat( c("FI", "DE", "EU28"), dic = "geo", countrycode = "country.name", custom_dic = c(EU28 = "EU") ) label_eurostat( c("FI", "DE", "EU28"), dic = "geo", countrycode = "country.name" ) # In Finnish label_eurostat( c("FI", "DE", "EU28"), dic = "geo", countrycode = "cldr.short.fi" ) ## End(Not run)
## Not run: lp <- get_eurostat("nama_10_lp_ulc") lpl <- label_eurostat(lp) str(lpl) lpl_order <- label_eurostat(lp, eu_order = TRUE) lpl_code <- label_eurostat(lp, code = "unit") # Note that the dataset id must be provided in label_eurostat_vars label_eurostat_vars(id = "nama_10_lp_ulc", x = "geo", lang = "en") label_eurostat_tables("nama_10_lp_ulc") label_eurostat(c("FI", "DE", "EU28"), dic = "geo") label_eurostat( c("FI", "DE", "EU28"), dic = "geo", custom_dic = c(DE = "Germany") ) label_eurostat( c("FI", "DE", "EU28"), dic = "geo", countrycode = "country.name", custom_dic = c(EU28 = "EU") ) label_eurostat( c("FI", "DE", "EU28"), dic = "geo", countrycode = "country.name" ) # In Finnish label_eurostat( c("FI", "DE", "EU28"), dic = "geo", countrycode = "cldr.short.fi" ) ## End(Not run)
Parses cache_list.json file and returns a data.frame
list_eurostat_cache_items(cache_dir = NULL)
list_eurostat_cache_items(cache_dir = NULL)
cache_dir |
a path to a cache directory. |
A data.frame object with 3 columns: dataset code, download date and query md5 hash
Lists datasets from eurostat table of contents with the particular pattern in item titles.
search_eurostat( pattern, type = "dataset", column = "title", fixed = TRUE, lang = "en" )
search_eurostat( pattern, type = "dataset", column = "title", fixed = TRUE, lang = "en" )
pattern |
Text string that is used to search from dataset, folder or table titles, depending on the type argument. |
type |
Selection for types of datasets to be searched. Default is |
column |
Selection for the column of TOC where search is done. Default is |
fixed |
logical. If TRUE (default), pattern is a string to be matched as is.
See |
lang |
2-letter language code, default is " |
Downloads list of all datasets available on eurostat and return list of names of datasets that contains particular pattern in the dataset description. E.g. all datasets related to education of teaching.
If you wish to perform searches on other fields than item title,
you can download the Eurostat Table of Contents manually using
get_eurostat_toc()
function and use grep()
function normally. The data
browser on Eurostat website may also return useful results.
A tibble with nine columns:
Dataset title in English (default)
Each item (dataset, table and folder) of the TOC has a
unique code which allows it to be identified in the API. Used in the
get_eurostat()
and get_eurostat_raw()
functions to retrieve datasets.
dataset, folder or table
Date, indicates the last time the
dataset/table was updated (format DD.MM.YYYY
or %d.%m.%Y
)
Date, indicates the last time the
dataset/table structure was modified (format DD.MM.YYYY
or %d.%m.%Y
)
Date of the oldest value included in the dataset
(if available) (format usually YYYY
or %Y
but can also be YYYY-MM
,
YYYY-MM-DD
, YYYY-SN
, YYYY-QN
etc.)
Date of the most recent value included in the dataset
(if available) (format usually YYYY
or %Y
but can also be YYYY-MM
,
YYYY-MM-DD
, YYYY-SN
, YYYY-QN
etc.)
Number of actual values included in the dataset
Hierarchy of the data navigation tree, represented in the original txt file by a 4-spaces indentation prefix in the title
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
Przemyslaw Biecek and Leo Lahti [email protected]
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 A BibTeX entry for LaTeX users is @Article{10.32614/RJ-2017-019, title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek}, journal = {The R Journal}, volume = {9}, number = {1}, pages = {385--392}, year = {2017}, doi = {10.32614/RJ-2017-019}, url = {https://doi.org/10.32614/RJ-2017-019}, } Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat A BibTeX entry for LaTeX users is @Misc{eurostat, title = {eurostat: Tools for Eurostat Open Data}, author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek and Diego Hernangomez and Daniel Antal and Pyry Kantanen}, url = {https://github.com/rOpenGov/eurostat}, type = {Computer software}, year = {2023}, note = {R package version 4.0.0}, }
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
get_eurostat()
, search_eurostat()
tmp <- search_eurostat("education") head(tmp) # Use "fixed = TRUE" when pattern has characters that would need escaping. # Here, parentheses would normally need to be escaped in regex tmp <- search_eurostat("Live births (total) by NUTS 3 region", fixed = TRUE)
tmp <- search_eurostat("education") head(tmp) # Use "fixed = TRUE" when pattern has characters that would need escaping. # Here, parentheses would normally need to be escaped in regex tmp <- search_eurostat("Live births (total) by NUTS 3 region", fixed = TRUE)
This function will store your cache_dir
path on your local machine
and would load it for future sessions. Type
Sys.getenv("EUROSTAT_CACHE_DIR")
to
find your cached path.
Alternatively, you can store the cache_dir
manually with the following
options:
Run Sys.setenv(EUROSTAT_CACHE_DIR = "cache_dir")
. You
would need to run this command on each session
(Similar to install = FALSE
).
Set options(eurostat_cache_dir = "cache_dir")
. Similar to
the previous option. This is provided for backwards compatibility
purposes.
Write this line on your .Renviron file:
EUROSTAT_CACHE_DIR = "value_for_cache_dir"
(same behavior than
install = TRUE
). This would store your cache_dir
permanently.
set_eurostat_cache_dir( cache_dir, overwrite = FALSE, install = FALSE, verbose = TRUE )
set_eurostat_cache_dir( cache_dir, overwrite = FALSE, install = FALSE, verbose = TRUE )
cache_dir |
A path to a cache directory. On missing value the function
would store the cached files on a temporary dir (See
|
overwrite |
If this is set to |
install |
if |
verbose |
Logical, displays information. Useful for debugging,
default is |
An (invisible) character with the path to your cache_dir
.
Diego Hernangómez
Other cache utilities:
clean_eurostat_cache()
# Don't run this! It would modify your current state ## Not run: set_eurostat_cache_dir(verbose = TRUE) ## End(Not run) Sys.getenv("EUROSTAT_CACHE_DIR")
# Don't run this! It would modify your current state ## Not run: set_eurostat_cache_dir(verbose = TRUE) ## End(Not run) Sys.getenv("EUROSTAT_CACHE_DIR")
Auxiliary Data Sets
tgs00026
tgs00026
data_frame
Disposable income of private households by NUTS 2 regions
Retrieved with: tgs00026 <- get_eurostat("tgs00026", time_format = "raw")
Data retrieval date: 2022-06-27
Other datasets:
eu_countries
,
eurostat_geodata_60_2016