---
title: "Metadata Vocabularies for Input–Output Analysis"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Metadata Vocabularies for Input–Output Analysis}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(iotables)
```
## Import and Normalisation Workflow
The five Eurostat vocabularies —\
`ind_ava`, `ind_use`, `prd_ava`, `prd_use`, and `cpa2_1` — were imported directly from the official Eurostat metadata registry ().\
Each dataset mirrors the structure of its corresponding SDMX codelist and preserves Eurostat’s identifiers and validity information.
### Data Sources
| Vocabulary | Description | Source URL |
|------------------------|------------------------|------------------------|
| `ind_ava` | Industries, adjustments and value added (rows for industry × industry SIOTs) | |
| `ind_use` | Industry uses (columns for industry × industry SIOTs) | |
| `prd_ava` | Products, adjustments and value added (rows for product × product SIOTs) | |
| `prd_use` | Product uses (columns for product × product SIOTs) | |
| `cpa2_1` | Statistical Classification of Products by Activity (CPA 2.1) | |
### Import Steps
1. **Raw Download**\
Each vocabulary was retrieved as an Excel export from EIONET’s vocabulary registry.
2. **Column Standardisation**\
Columns were renamed to a unified schema: `id, label, status, status_modified, notation, group, quadrant, numeric_order, iotables_label, block, uri`
3. **Quadrant and Block Assignment**\
Each item was assigned a `quadrant` and a semantic `block` consistent across vocabularies:
- `10` = intermediate (Quadrant 1)
- `20` = primary_inputs (Quadrant 3)
- `30` = final_use (Quadrant 2)
- `50` = extension / diagnostic\
Control totals such as *“Total supply at basic prices”* were retained as `block = "control_total"`.
4. **Ordinal Ordering**\
`numeric_order` was reindexed within each quadrant with consistent gaps (10, 20, …) to ensure reproducible ordering for matrix construction.
5. **URI Generation**\
Each code was linked to its SKOS concept using:
```{r uri, eval=FALSE}
df$uri <- sprintf(
"https://dd.eionet.europa.eu/vocabularyconcept/eurostat/%s/%s",
vocabulary_id,
df$notation
)
```
6. **Validation**\
Each table was checked for:
- missing or duplicate IDs
- monotone numeric order
- alignment of quadrant ↔ block semantics
7. **Storage and Naming**
The cleaned tibbles were stored as exported data objects:
```
data/ind_ava.rda data/ind_use.rda data/prd_ava.rda data/prd_use.rda
```
Each dataset can be loaded directly with `data()`.
### Adjustments to Vocabularies
Although the four Eurostat vocabularies (`ind_ava`, `ind_use`, `prd_ava`, `prd_use`) were imported directly from the official Eurostat metadata registry, some modifications were necessary to ensure compatibility with the actual Eurostat input–output datasets. The main data sources, in particular `naio_10_cp1750` and `naio_10_cp1700`, occasionally include variables that are not coded according to the published and standardised vocabularies. While these inconsistencies are usually clear to a manual user, they can create ambiguity in a reproducible workflow where automated matching is required.
For example, the *product × product* SIOTs for the Slovak Republic contain a more detailed industry breakdown than that defined in prd_ava and prd_use. To maintain alignment across datasets, all 0-, 1-, and 2-digit codes from the `cpa2_1` vocabulary were imputed into the four vocabularies. Each entry includes a validity flag in the status column, indicating whether the code is valid in the official Eurostat vocabulary or was adopted from observed but non-standard codes in the data. This approach preserves reproducibility while ensuring complete coverage of all codes encountered in current Eurostat data releases.
### Versioning
All four vocabularies correspond to the 2025 Eurostat CPA 2.1 / ESA 2010 edition.