Title: | Processing Regional Statistics |
---|---|
Description: | Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series. |
Authors: | Daniel Antal [aut, cre] , Kasia Kulma [ctb] , Istvan Zsoldos [ctb] , Leo Lahti [ctb] |
Maintainer: | Daniel Antal <[email protected]> |
License: | GPL-3 |
Version: | 0.1.8 |
Built: | 2024-12-04 03:27:28 UTC |
Source: | https://github.com/rOpenGov/regions |
A dataset containing all recognised geo codes in the EU
NUTS correspondence tables. This is re-arranged from
nuts_changes
.
all_valid_nuts_codes
all_valid_nuts_codes
A data frame with 3 variables:
NUTS geo identifier
country, NUTS1, NUTS2 or NUTS3
The NUTS definition where the geo code can be found.
https://ec.europa.eu/eurostat/web/nuts/history/
nuts_recoded, nuts_changes, nuts_exceptions
A dataset containing the states and territories of Australia.
australia_states
australia_states
A data frame with 8 rows and 3 variables:
ISO 3166-1 country codes
subdivision codes within Australia (states and territories)
subdivision names within Australia (states and territories)
The Online Browsing Platform of the International Organization for Standardization https://www.iso.org/obp/ui/#iso:code:3166:AU
Create the nuts_lau_2019 correspondence table May be used to create similar historical correspondence tables.
create_nuts_lau_2019()
create_nuts_lau_2019()
A data.frame which is also saved and can be retrieved with
data(nuts_lau_2019).
Use this function as a template to
obtain historical correspondence tables.
A dataset containing the percentage of individuals who used the Internet on a daily basis in the European countries and regions.
daily_internet_users
daily_internet_users
A data frame with 3 variables:
National and sub-national geographical codes from Eurostat
Time, coded as a numeric variable of the year, 2006-2019
The numeric statistical values
The fresh version of this statistic can be obtained by
eurostat::get_eurostat("isoc_r_iuse_i", time_format = "num")
and filtered for the indic_is = "I_IDAY"
indicator and the
unit="PC_IND"
unit.
The eventual source of the data is the Eurostat table isoc_r_iuse_i
https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=isoc_r_iuse_i&lang=en
The function identifies the sub-national geographical identifiers from known typologies and returns the ISO 3166-1 alpha-2 country codes.
get_country_code(geo, typology = "NUTS")
get_country_code(geo, typology = "NUTS")
geo |
A character variable with geo codes. |
typology |
Currently the following typologies are supported:
|
The ISO 3166-1 alpha-2 codes of the countries as a character vector.
Other recode functions:
recode_nuts()
{ get_country_code (c("EL", "GR", "DED", "HU102")) }
{ get_country_code (c("EL", "GR", "DED", "HU102")) }
A dataset containing the correspondence table between the EU NUTS 2016 typology and the typology used by Google in the Google Mobility Reports.
google_nuts_matchtable
google_nuts_matchtable
A data frame with 817 rows and 6 variables:
ISO 3166-1 alpha2 code
Hierarchical level in the Google Mobility Reports
The name used by Google.
NUTS code in the 2016 definition
country, NUTS1, NUTS2 or NUTS3, nuts_level_3_lau, nuts_level_3_iso-3166-2
Logical variable, if the coding is valid in NUTS2016
In some cases only a full correspondence is not possible. In these
cases we created pseudo-NUTS codes, which have a FALSE
valid_2016
value. These pseudo-NUTS codes can help
approximation for the underlying regions.
Pseudo-NUTS codes were used in Estonia, Italy, Portugal, Slovenia and in parts of Latvia.
In Latvia and Slovenia, the pseudo NUTS code is a combination of the the containing NUTS3 code and the municipality's LAU code.
In Estonia, they are a combination of the NUTS3 code and the
ISO-3166-2
LAU code (county level.) This is the case in most of
Portugal and the United Kingdom, too. In these cases the pseudo-codes refer to a
quasi-NUTS4 code, which are smaller than the containing NUTS3 region,
therefore they should be aggregated.
A special case is ITD_IT-32
, which is is a combination
of two NUTS2 statistical regions, but it forms under the ISO-3166-2
ITD_IT-32
a single unit, the autonomous region of
Trentino and South Tyrol. In this case, they should be disaggregated.
A similar solution is required for the United Kingdom.
Istvan Zsoldos, Daniel Antal
https://ec.europa.eu/eurostat/web/nuts/history/
This is a generic function to impute data from broader hierarchical geographical areas to smaller ones. It requires the exact specification of the of the geographical typology.
impute_down( upstream_data = NULL, downstream_data = NULL, country_var = "country_code", regional_code = "geo_code", values_var = "values", time_var = NULL, upstream_method_var = NULL, downstream_method_var = NULL )
impute_down( upstream_data = NULL, downstream_data = NULL, country_var = "country_code", regional_code = "geo_code", values_var = "values", time_var = NULL, upstream_method_var = NULL, downstream_method_var = NULL )
upstream_data |
An upstream data frame to project on containing smaller geographical units, for example, country-level data. |
downstream_data |
A downstream data frame containing the smaller level missing data observations. It must contain all the necessary structural information for imputation. |
country_var |
The geographical ID of the upstream data,
defaults to |
regional_code |
The geographical ID of the downstream data,
defaults to |
values_var |
The variable that contains the upstream data to be
imputed to the downstream data, defaults to |
time_var |
The time component, if present, defaults to
|
upstream_method_var |
The name of the variable that contains the
potentially applied imputation methods. Defaults to |
downstream_method_var |
The name of the variable that will contain
the metadata of the potentially applied imputation methods.
Defaults to |
The more general impute_down
function requires typology information from the higher
and lower level typologies. This is not needed when the EU vocabulary
is used, and the hierarchy can be established from the EU vocabularies.
The upstream data frame (containing data of a larger unit) and
the downstream data (containing data of smaller sub-divisional units) are
joined; whenever data is missing in the downstream sub-divisional column,
it is imputed with the corresponding values from the upstream data frame.
The 'method'
metadata column explains if the actual downstream
data or the imputed data can be found in the downstream value column.
Other impute functions:
impute_down_nuts()
{ upstream <- data.frame ( country_code = rep( "AU", 3), year = c(2018:2020), my_var = c(10,12,11), description = c("note1", NA_character_, "note3") ) downstream <- australia_states impute_down ( upstream_data = upstream, downstream_data = downstream, country_var = "country_code", regional_code = "geo_code", values_var = "my_var", time_var = "year" ) }
{ upstream <- data.frame ( country_code = rep( "AU", 3), year = c(2018:2020), my_var = c(10,12,11), description = c("note1", NA_character_, "note3") ) downstream <- australia_states impute_down ( upstream_data = upstream, downstream_data = downstream, country_var = "country_code", regional_code = "geo_code", values_var = "my_var", time_var = "year" ) }
This is a special case of impute_down
for the EU NUTS
hierarchical typologies. All valid actual rows will be projected down
to all smaller constituent typologies where data is missing.
impute_down_nuts( dat, geo_var = "geo", values_var = "values", method_var = NULL, nuts_year = 2016 )
impute_down_nuts( dat, geo_var = "geo", values_var = "values", method_var = NULL, nuts_year = 2016 )
dat |
A data frame with exactly two or three columns: |
geo_var |
The variable that contains the geographical codes in the NUTS typologies, defaults to code"geo_var". |
values_var |
The variable that contains the upstream data to be
imputed to the downstream data, defaults to |
method_var |
The variable that contains the metadata on various
processing information, defaults to |
nuts_year |
The year of the NUTS typology to use, it defaults to the
currently valid |
The more general impute_down
function requires typology information from the higher
and lower level typologies. This is not needed when the EU vocabulary
is used, and the hierarchy can be established from the EU vocabularies.
Be mindful that while all possible imputations are made, imputations beyond one hierarchical level will result in very crude estimates.
The imputed dataset dat
must refer to a single time unit, i.e.
panel data is not supported.
An augmented version of the dat
imputed data frame with all
possible projections to valid smaller units, i.e. NUTS0 = country
values
imputed to all missing NUTS1
units, NUTS1
values
imputed to all missing NUTS2
units, NUTS2
values
imputed to all missing NUTS3
units.
Other impute functions:
impute_down()
data(mixed_nuts_example) impute_down_nuts(mixed_nuts_example, nuts_year = 2016)
data(mixed_nuts_example) impute_down_nuts(mixed_nuts_example, nuts_year = 2016)
This data frame is a fictious example that contains in a small, easy-to-review example many potential typological problems. It is used to test imputation functions and to create examples with them.
mixed_nuts_example
mixed_nuts_example
A data frame with 22 rows and 3 variables:
NUTS geo identifier, mixed from 4 typology levels.
Random numbers.
Descriptive metadata.
https://ec.europa.eu/eurostat/web/nuts/history/
nuts_changes, all_valid_nuts_codes, impute_down_nuts
A dataset containing the joined correspondence tables of the EU NUTS typologies.
nuts_changes
nuts_changes
A data frame with 3097 rows and 22 variables:
country, NUTS1, NUTS2 or NUTS3
The year when the code was first used
The year when the code was last used
NUTS code in the 2003 definition
NUTS code in the 2003 definition
NUTS code in the 2006 definition
NUTS code in the 2010 definition
NUTS code in the 2013 definition
NUTS code in the 2016 definition
NUTS code in the 2021 definition
NUTS territorial name in the 2003 definition
NUTS territorial name in the 2006 definition
NUTS territorial name in the 2010 definition
NUTS territorial name in the 2013 definition
NUTS territorial name in the 2016 definition
NUTS territorial name in the 2021 definition
Change described in the 2003 correspondence table
Change described in the 2006 correspondence table
Change described in the 2010 correspondence table
Change described in the 2013 correspondence table
Change described in the 2016 correspondence table
Change described in the 2021 correspondence table
https://ec.europa.eu/eurostat/web/nuts/history/
nuts_recoded, all_valid_nuts_codes
A dataset containing exceptions to the NUTS geographical codes.
nuts_exceptions
nuts_exceptions
A data frame with 2 variables:
National and sub-national geographical codes from Eurostat
Short description of exception
They contains non-EU regions that are consistent with NUTS, but not defined within the NUTS.
The also contain European country codes that do not conform with NUTS.
Eurostat NUTS history: https://ec.europa.eu/eurostat/web/nuts/history/
nuts_recoded, nuts_changes, all_valid_nuts_codes
A dataset containing the joined correspondence tables of the EU NUTS and local administration units (LAU) typologies.
nuts_lau_2019
nuts_lau_2019
A data frame with 99140 rows and 22 variables:
NUTS3 code of the local administrative unit, 2016 definition
Local Administrative Unit code
LAU name, official in national language(s)
LAU name, official Latin alphabet version
Change in name in the year before?
Population
Area in square meters
Degree of urbanization
Change in degree of urbanization?
Part of coastal area classification?
Change in coastal area classification
NUTS territorial name in the 2006 definition
NUTS territorial name in the 2010 definition
Name of the city
Containing metro area ID, if applicable
Change in metro area ID
Name of containing greater city (metropolitan) area, if applicable
FUA ID
Change of FUA ID since last year
Name in FUA database
NUTS country code with exceptions: EL for Greece, UK for United Kingdom
GISCO ID
This is also the authoritative vocabulary for local administration, names, including city and metropolitan area names.
https://ec.europa.eu/eurostat/web/nuts/local-administrative-units
nuts_recoded, all_valid_nuts_codes
Containing all recoded NUTS units from the European Union.
This is re-arranged from nuts_changes
.
nuts_recoded
nuts_recoded
A data frame with 8 rows and 3 variables:
NUTS geo identifier
country, NUTS1, NUTS2 or NUTS3
year of the NUTS definition or version
when the geo code changed
Two character ISO standard country codes.
https://ec.europa.eu/eurostat/web/nuts/history/
nuts_changes, all_valid_nuts_codes
Validate your geo codes, pair them with the appropriate standard typology, look up potential causes of invalidity in the EU correspondence tables, and look up the appropriate geographical codes in the other (target) typology.
recode_nuts(dat, geo_var = "geo", nuts_year = 2016)
recode_nuts(dat, geo_var = "geo", nuts_year = 2016)
dat |
A data frame with a 3-5 character |
geo_var |
Defaults to |
nuts_year |
The year of the NUTS typology to use.
You can select any valid
NUTS definition, i.e. |
A usual task is for example to validate geo codes in the 'NUTS2016'
typology and translate them to the now obsolete the 'NUTS2010'
typology
to join current data with historical data sets.
The original data frame with a 'geo_var'
column is extended
with a 'typology'
column that states in which typology is the 'geo_var'
a valid code. For invalid codes, looks up potential reasons of invalidity
and adds them to the 'typology_change'
column, and at last it
adds a column of character vector containing the desired codes in the
target typology, for example, in the NUTS2013 typology.
Other recode functions:
get_country_code()
{ foo <- data.frame ( geo = c("FR", "DEE32", "UKI3" , "HU12", "DED", "FRK"), values = runif(6, 0, 100 ), stringsAsFactors = FALSE ) recode_nuts(foo, nuts_year = 2013) }
{ foo <- data.frame ( geo = c("FR", "DEE32", "UKI3" , "HU12", "DED", "FRK"), values = runif(6, 0, 100 ), stringsAsFactors = FALSE ) recode_nuts(foo, nuts_year = 2013) }
A subset of the Eurostat dataset
R&D personnel and researchers by sector of performance, sex and NUTS 2 regions
.
regional_rd_personnel
regional_rd_personnel
A data frame with 956 observations of 7 variables:
National and sub-national geographical codes from Eurostat
Time, coded as a numeric variable of the year, 2006-2019
The numeric statistical values
Unit of measurement, contains only FTE
Sex of researchers, contains only both sexes as T
Professional position, contains all R&D employees not only researchers
Sector of performance, filtered for all sectors as TOTAL
Mapping Regional Data, Mapping Metadata Problem
The fresh version of this statistic can be obtained by
eurostat::get_eurostat_json (id = "rd_p_persreg",
filters = list (sex = "T", prof_pos = "TOTAL",sectperf = "TOTAL", unit = "FTE" ))
https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=rd_p_persreg&lang=en
recode_nuts
The regions package provides four categories of functions: validate, recode, impute and aggregate.
The validate functions validate the conformity of a typological (geographical) label with a certain typology. Currently the EU statistical NUTS typologies and countries are implemented.
These functions correct the geo coding of sub-national statistics, or bring them to a consistent format.
The impute functions impute data from one regional unit to a different
level of regional unit, such as a country level data to a province / state
level data.
impute_down
and provides
imputation functions from higher aggregation hierarchy levels to
lower ones, for example from ISO-3166-1
to ISO-3166-2
.
impute_down_nuts
provides the same functionality with the
EU typologies, but with far less work, because they rely on the internal
hierarchical structure of these metadata, for example, from NUTS1
to NUTS2
.
Aggregation function from lower hierarchy levels to higher ones,
for example from NUTS3 to NUTS1
or from ISO-3166-2
to
ISO-3166-1
.
Disaggregation functions from higher hierarchy levels to lower ones,
for example from NUTS1
to NUTS2
or from
ISO-3166-1
to ISO-3166-2
.
Validate that geo
is conforming with the NUTS1
,
NUTS2
, or NUTS3
typologies.
validate_geo_code(geo, nuts_year = 2016)
validate_geo_code(geo, nuts_year = 2016)
geo |
A vector of geographical code to validate. |
nuts_year |
A valid NUTS edition year. |
While country codes are technically not part of the NUTS typologies,
Eurostat de facto uses a NUTS0
typology to identify countries.
This de facto typology has three exception which are handled by the
validate_nuts_countries function.
NUTS typologies have different versions, therefore the conformity
is validated with one specific versions, which can be any of these:
1999
, 2003
, 2006
, 2010
,
2013
, the currently used 2016
and the already
announced and defined 2021
.
The NUTS typology was codified with the NUTS2003
, and the
pre-1999 NUTS typologies may confuse programmatic data processing,
given that some NUTS1 regions were identified with country codes
in smaller countries that had no NUTS1
divisions.
#'
Currently the 2016
is used by Eurostat, but many datasets
still contain 2013
and sometimes earlier metadata.
A character list with the valid typology, or 'invalid' in the cases when the geo coding is not valid.
my_reg_data <- data.frame ( geo = c("BE1", "HU102", "FR1", "DED", "FR7", "TR", "DED2", "EL", "XK", "GB"), values = runif(10)) validate_geo_code(my_reg_data$geo)
my_reg_data <- data.frame ( geo = c("BE1", "HU102", "FR1", "DED", "FR7", "TR", "DED2", "EL", "XK", "GB"), values = runif(10)) validate_geo_code(my_reg_data$geo)
This function is mainly a wrapper around the well-known countrycode function, with three exception that are particular to the European Union statistical nomenclature.
validate_nuts_countries(dat, geo_var = "geo")
validate_nuts_countries(dat, geo_var = "geo")
dat |
A data frame with a 2-character geo variable to be validated |
geo_var |
Defaults to |
All ISO-3166-1 country codes are validated, and also the three exceptions.
Treated valid, because NUTS uses EL instead of GR for Greece since 2010.
Treated valid, because NUTS uses UK instead of GB for the United Kingdom.
XK is used for Kosovo, because Eurostat uses this code, too.
The original data frame extended with the column 'typology'
.
This column states 'country'
for valid country typology coding, or
appropriate label for invalid ISO-3166-alpha-2 and ISO-3166-alpha-3 codes.
Other validate functions:
validate_nuts_regions()
{ my_dat <- data.frame ( geo = c("AL", "GR", "XK", "EL", "UK", "GB", "NLD", "ZZ" ), values = runif(8) ) ## NLD is an ISO 3-character code and is not validated. validate_nuts_countries(my_dat) }
{ my_dat <- data.frame ( geo = c("AL", "GR", "XK", "EL", "UK", "GB", "NLD", "ZZ" ), values = runif(8) ) ## NLD is an ISO 3-character code and is not validated. validate_nuts_countries(my_dat) }
Validate that geo_var
is conforming with the NUTS1
,
NUTS2
, or NUTS3
typologies.
While country codes are technically not part of the NUTS typologies,
Eurostat de facto uses a NUTS0
typology to identify countries.
This de facto typology has three exception which are handled by the
validate_nuts_countries function.
validate_nuts_regions(dat, geo_var = "geo", nuts_year = 2016)
validate_nuts_regions(dat, geo_var = "geo", nuts_year = 2016)
dat |
A data frame with a 3-5 character |
geo_var |
Defaults to |
nuts_year |
The year of the NUTS typology to use.
Defaults to |
NUTS typologies have different versions, therefore the conformity
is validated with one specific versions, which can be any of these:
1999
, 2003
, 2006
, 2010
,
2013
, the currently used 2016
and the already
announced and defined 2021
.
The NUTS typology was codified with the NUTS2003
, and the
pre-1999 NUTS typologies may confuse programmatic data processing,
given that some NUTS1 regions were identified with country codes
in smaller countries that had no NUTS1
divisions.
Currently the 2016
is used by Eurostat, but many datasets
still contain 2013
and sometimes earlier metadata.
Returns the original dat
data frame with a column
that specifies the comformity with the NUTS definition of the year
nuts_year
.
Other validate functions:
validate_nuts_countries()
my_reg_data <- data.frame ( geo = c("BE1", "HU102", "FR1", "DED", "FR7", "TR", "DED2", "EL", "XK", "GB"), values = runif(10)) validate_nuts_regions (my_reg_data) validate_nuts_regions (my_reg_data, nuts_year = 2013) validate_nuts_regions (my_reg_data, nuts_year = 2003)
my_reg_data <- data.frame ( geo = c("BE1", "HU102", "FR1", "DED", "FR7", "TR", "DED2", "EL", "XK", "GB"), values = runif(10)) validate_nuts_regions (my_reg_data) validate_nuts_regions (my_reg_data, nuts_year = 2013) validate_nuts_regions (my_reg_data, nuts_year = 2003)
Assertions are made to give early and precise error messages for wrong API call parameters.
validate_parameters(typology = NULL, param = NULL, param_name = NULL)
validate_parameters(typology = NULL, param = NULL, param_name = NULL)
typology |
Currently the following typologies are supported:
|
param |
A parameter value that must not be |
param_name |
The name of the parameter that must not have a value of |
These assertions are called from various wrapper functions. However, you can also call this function directly to make sure that you are adding (programmatically) the correct parameters to a call.
All validate_parameters
parameters default to NULL
.
Asserts the correct parameter values for any values that are not NULL
.
A boolean, logical variable if the parameter calls are valid.