Recoding & Relabelling

Eurostat offers so-called correspondence tables to follow boundary changes, recoding and relabelling for all NUTS changes since the formalization of the NUTS typology. Unfortunately, these Excel tables do not conform with the requirements of tidy data, and their vocabulary for is not standardized, either. For example, recoding changes are often labelled as recoding, recoding and renaming, code change, Code change, etc.

The data-raw library contains these Excel tables and very long data wrangling code that unifies the relevant vocabulary of these Excel files and brings the tables into a single, tidy format , starting with the definition NUTS1999. The resulting data file nuts_changes is included in the regions package. It already contains the changes that will come into force in 2021.

Let’s review a few changes.

data(nuts_changes)

nuts_changes %>%
  mutate ( geo_16 = .data$code_2016, 
           geo_13 = .data$code_2013 ) %>%
  filter ( code_2016 %in% c("FRB", "HU11") | 
             code_2013 %in% c("FR7", "HU10", "FR24")) %>%
  select ( all_of(c("typology", "geo_16", "geo_13", "start_year",
           "code_2013", "change_2013",
           "code_2016", "change_2016")) 
           ) %>%
  pivot_longer ( cols = starts_with("code"), 
                 names_to = 'definition', 
                 values_to = 'code') %>%
  pivot_longer ( cols = starts_with("change"), 
                 names_to = 'change', 
                 values_to = 'description')  %>%
  filter (!is.na(.data$description), 
          !is.na(.data$code)) %>%
  select ( -.data$change ) %>%
  knitr::kable ()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"change"` instead of `.data$change`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
typology geo_16 geo_13 start_year definition code description
nuts_level_1 FRB NA 2016 code_2016 FRB new nuts 1 region, identical to ex-nuts 2 region fr24
nuts_level_1 FRK FR7 NA code_2013 FR7 relabelled and recoded
nuts_level_1 FRK FR7 NA code_2016 FRK relabelled and recoded
nuts_level_1 NA FR7 NA code_2013 FR7 discontinued
nuts_level_2 FRB0 FR24 NA code_2013 FR24 recoded and relabelled
nuts_level_2 FRB0 FR24 NA code_2016 FRB0 recoded and relabelled
nuts_level_2 HU11 NA 2016 code_2016 HU11 new region, equals ex-nuts 3 region hu101
nuts_level_2 NA HU10 NA code_2013 HU10 discontinued; split into new hu11 and hu12

You will not find the geo identifier FRB in any statistical data that was released before France changes its administrative boundaries and the NUTS2016 boundary definition came into force. However, as the description says, you may find historical data elsewhere, in a historical NUTS2-level product for the FRB CENTRE — VAL DE LOIRE NUTS1 region, because it is identical to the earlier NUTS2 level region FR24, i.e. Central France, which was known as Centre for many years before the transition to NUTS2016. The size and importance of this territorial unit is more similar to NUTS1 than NUTS2 units.

Because FRB contains only one FRB0, the earlier FR24, it is technically identified as a NUTS2-level region, too. You find the same data in the NUTS2 typology. With statistical products on NUTS2 level, you can simply recode historical FR24 data to FRB0, since the aggregation level and the boundaries are not changed. Furthermore, you can project this data to any NUTS1 level panel either under the earlier FR2 NUTS1 label, if you use the old definition, or the new FRB label, if you use the current NUTS2016 typology.

Let’s see a hypothetical data frame with random variables. (Usually a data frame has no so many issues, so a more detailed example can be constructed this way.)

example_df <- data.frame ( 
  geo  =  c("FR", "DEE32", "UKI3" ,
            "HU12", "DED", 
            "FRK"), 
  values = runif(6, 0, 100 ),
  stringsAsFactors = FALSE )

recode_nuts(dat = example_df, 
            nuts_year = 2013) %>%
  select ( geo, values, code_2013) %>%
  knitr::kable()
geo values code_2013
FR 59.95523 FR
UKI3 51.11289 UKI3
DED 70.69878 DED
FRK 42.06438 FR7
HU12 58.32588 NA
DEE32 91.24384 NA

In this hypothetical example we are creating backward compatibility with the NUTS2013 definition. There are three type of observations:

  • Observations about typologies that did not change. There is not further thing to do to make the data comparable across time.
  • Typologies which changed their geo codes, but are not affected by boundary changes, i.e. the data is comparable, it is only found at a different geographical label
  • Typologies that are not comparable, and we cannot compare them meaningfully with a NUTS2013 dataset.
recode_nuts(example_df, nuts_year = 2013) %>%
  select ( all_of(c("geo", "values", "typology_change", "code_2013")) ) %>%
  knitr::kable()
geo values typology_change code_2013
FR 59.95523 unchanged FR
UKI3 51.11289 unchanged UKI3
DED 70.69878 unchanged DED
FRK 42.06438 Recoded from FRK [used in NUTS 2016-2021] FR7
HU12 58.32588 Used in NUTS 2016-2021 NA
DEE32 91.24384 Used in NUTS 1999-2003 NA

The first three observations are comparable with a NUTS2013 dataset. The fourth observation is comparable, too, but when joining with a NUTS2013 dataset or map, it is likely that FRK needs to be re-coded to FR7.

The following data can be joined with a NUTS2013 dataset or map:

recode_nuts(example_df, nuts_year = 2013) %>%
  select ( .data$code_2013, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2013 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"code_2013"` instead of `.data$code_2013`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"typology_change"` instead of `.data$typology_change`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
geo values typology_change
FR 59.95523 unchanged
UKI3 51.11289 unchanged
DED 70.69878 unchanged
FR7 42.06438 Recoded from FRK [used in NUTS 2016-2021]

And re-assuringly these data will be compatible with the next NUTS typology, too!

recode_nuts(example_df, nuts_year = 2021) %>%
  select ( .data$code_2021, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2021 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"code_2021"` instead of `.data$code_2021`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
geo values typology_change
FR 59.95523 unchanged
UKI3 51.11289 unchanged
HU12 58.32588 unchanged
DED 70.69878 unchanged
FRK 42.06438 unchanged

What about HU12?

data(nuts_changes) 
nuts_changes %>% 
  select( .data$code_2016, .data$geo_name_2016, .data$change_2016) %>%
  filter( code_2016 == "HU12") %>%
  filter( complete.cases(.) ) %>%
  knitr::kable()
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"code_2016"` instead of `.data$code_2016`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"geo_name_2016"` instead of `.data$geo_name_2016`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"change_2016"` instead of `.data$change_2016`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
code_2016 geo_name_2016 change_2016
HU12 Pest new region, equals ex-nuts 3 region hu102

The description in the correspondence tables clarifies that in fact historical data may be assembled for HU12 (Pest county.)

  • It can be accessed from national LAU sources (as HU-PE) or for NUTS3 data (as HU102)
  • It can be calculated by deducting the Budapest data from the former Central Hungary NUTS1 region data.

That will be the topic of a later vignette on aggregation and re-aggregation.