Eurostat offers so-called correspondence tables to follow boundary changes, recoding and relabelling for all NUTS changes since the formalization of the NUTS typology. Unfortunately, these Excel tables do not conform with the requirements of tidy data, and their vocabulary for is not standardized, either. For example, recoding changes are often labelled as recoding, recoding and renaming, code change, Code change, etc.
The data-raw
library contains these Excel tables and
very long data wrangling code that unifies the relevant vocabulary of
these Excel files and brings the tables into a single, tidy format ,
starting with the definition NUTS1999
. The resulting data
file nuts_changes
is included in the regions
package. It already contains the changes that will come into force in
2021.
Let’s review a few changes.
data(nuts_changes)
nuts_changes %>%
mutate ( geo_16 = .data$code_2016,
geo_13 = .data$code_2013 ) %>%
filter ( code_2016 %in% c("FRB", "HU11") |
code_2013 %in% c("FR7", "HU10", "FR24")) %>%
select ( all_of(c("typology", "geo_16", "geo_13", "start_year",
"code_2013", "change_2013",
"code_2016", "change_2016"))
) %>%
pivot_longer ( cols = starts_with("code"),
names_to = 'definition',
values_to = 'code') %>%
pivot_longer ( cols = starts_with("change"),
names_to = 'change',
values_to = 'description') %>%
filter (!is.na(.data$description),
!is.na(.data$code)) %>%
select ( -.data$change ) %>%
knitr::kable ()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"change"` instead of `.data$change`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
typology | geo_16 | geo_13 | start_year | definition | code | description |
---|---|---|---|---|---|---|
nuts_level_1 | FRB | NA | 2016 | code_2016 | FRB | new nuts 1 region, identical to ex-nuts 2 region fr24 |
nuts_level_1 | FRK | FR7 | NA | code_2013 | FR7 | relabelled and recoded |
nuts_level_1 | FRK | FR7 | NA | code_2016 | FRK | relabelled and recoded |
nuts_level_1 | NA | FR7 | NA | code_2013 | FR7 | discontinued |
nuts_level_2 | FRB0 | FR24 | NA | code_2013 | FR24 | recoded and relabelled |
nuts_level_2 | FRB0 | FR24 | NA | code_2016 | FRB0 | recoded and relabelled |
nuts_level_2 | HU11 | NA | 2016 | code_2016 | HU11 | new region, equals ex-nuts 3 region hu101 |
nuts_level_2 | NA | HU10 | NA | code_2013 | HU10 | discontinued; split into new hu11 and hu12 |
You will not find the geo
identifier FRB
in
any statistical data that was released before France changes its
administrative boundaries and the NUTS2016
boundary
definition came into force. However, as the description says, you may
find historical data elsewhere, in a historical NUTS2-level product for
the FRB
CENTRE — VAL DE LOIRE NUTS1
region, because it is identical to the earlier NUTS2
level
region FR24
, i.e. Central France, which was known as
Centre for many years before the transition to
NUTS2016
. The size and importance of this territorial unit
is more similar to NUTS1
than NUTS2
units.
Because FRB
contains only one FRB0
, the
earlier FR24
, it is technically identified as a NUTS2-level
region, too. You find the same data in the NUTS2
typology.
With statistical products on NUTS2 level, you can simply recode
historical FR24
data to FRB0
, since the
aggregation level and the boundaries are not changed. Furthermore, you
can project this data to any NUTS1
level panel either under
the earlier FR2
NUTS1
label, if you use the
old definition, or the new FRB
label, if you use the
current NUTS2016
typology.
Let’s see a hypothetical data frame with random variables. (Usually a data frame has no so many issues, so a more detailed example can be constructed this way.)
example_df <- data.frame (
geo = c("FR", "DEE32", "UKI3" ,
"HU12", "DED",
"FRK"),
values = runif(6, 0, 100 ),
stringsAsFactors = FALSE )
recode_nuts(dat = example_df,
nuts_year = 2013) %>%
select ( geo, values, code_2013) %>%
knitr::kable()
geo | values | code_2013 |
---|---|---|
FR | 45.25616 | FR |
UKI3 | 36.32497 | UKI3 |
DED | 25.17747 | DED |
FRK | 70.65230 | FR7 |
HU12 | 28.06531 | NA |
DEE32 | 30.50220 | NA |
In this hypothetical example we are creating backward compatibility
with the NUTS2013
definition. There are three type of
observations:
NUTS2013
dataset.recode_nuts(example_df, nuts_year = 2013) %>%
select ( all_of(c("geo", "values", "typology_change", "code_2013")) ) %>%
knitr::kable()
geo | values | typology_change | code_2013 |
---|---|---|---|
FR | 45.25616 | unchanged | FR |
UKI3 | 36.32497 | unchanged | UKI3 |
DED | 25.17747 | unchanged | DED |
FRK | 70.65230 | Recoded from FRK [used in NUTS 2016-2021] | FR7 |
HU12 | 28.06531 | Used in NUTS 2016-2021 | NA |
DEE32 | 30.50220 | Used in NUTS 1999-2003 | NA |
The first three observations are comparable with a
NUTS2013
dataset. The fourth observation is comparable,
too, but when joining with a NUTS2013
dataset or map, it is
likely that FRK
needs to be re-coded to
FR7
.
The following data can be joined with a NUTS2013
dataset
or map:
recode_nuts(example_df, nuts_year = 2013) %>%
select ( .data$code_2013, .data$values, .data$typology_change ) %>%
rename ( geo = .data$code_2013 ) %>%
filter ( !is.na(.data$geo) ) %>%
knitr::kable()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"code_2013"` instead of `.data$code_2013`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"typology_change"` instead of `.data$typology_change`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
geo | values | typology_change |
---|---|---|
FR | 45.25616 | unchanged |
UKI3 | 36.32497 | unchanged |
DED | 25.17747 | unchanged |
FR7 | 70.65230 | Recoded from FRK [used in NUTS 2016-2021] |
And re-assuringly these data will be compatible with the next NUTS typology, too!
recode_nuts(example_df, nuts_year = 2021) %>%
select ( .data$code_2021, .data$values, .data$typology_change ) %>%
rename ( geo = .data$code_2021 ) %>%
filter ( !is.na(.data$geo) ) %>%
knitr::kable()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"code_2021"` instead of `.data$code_2021`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
geo | values | typology_change |
---|---|---|
FR | 45.25616 | unchanged |
UKI3 | 36.32497 | unchanged |
HU12 | 28.06531 | unchanged |
DED | 25.17747 | unchanged |
FRK | 70.65230 | unchanged |
What about HU12
?
data(nuts_changes)
nuts_changes %>%
select( .data$code_2016, .data$geo_name_2016, .data$change_2016) %>%
filter( code_2016 == "HU12") %>%
filter( complete.cases(.) ) %>%
knitr::kable()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"code_2016"` instead of `.data$code_2016`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"geo_name_2016"` instead of `.data$geo_name_2016`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"change_2016"` instead of `.data$change_2016`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
code_2016 | geo_name_2016 | change_2016 |
---|---|---|
HU12 | Pest | new region, equals ex-nuts 3 region hu102 |
The description in the correspondence tables clarifies that in fact
historical data may be assembled for HU12
(Pest
county.)
HU-PE
)
or for NUTS3
data (as HU102
)NUTS1
region data.That will be the topic of a later vignette on aggregation and re-aggregation.