--- title: "Metadata Vocabularies for Input–Output Analysis" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Metadata Vocabularies for Input–Output Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(iotables) ``` ## Import and Normalisation Workflow The five Eurostat vocabularies —\ `ind_ava`, `ind_use`, `prd_ava`, `prd_use`, and `cpa2_1` — were imported directly from the official Eurostat metadata registry ().\ Each dataset mirrors the structure of its corresponding SDMX codelist and preserves Eurostat’s identifiers and validity information. ### Data Sources | Vocabulary | Description | Source URL | |------------------------|------------------------|------------------------| | `ind_ava` | Industries, adjustments and value added (rows for industry × industry SIOTs) | | | `ind_use` | Industry uses (columns for industry × industry SIOTs) | | | `prd_ava` | Products, adjustments and value added (rows for product × product SIOTs) | | | `prd_use` | Product uses (columns for product × product SIOTs) | | | `cpa2_1` | Statistical Classification of Products by Activity (CPA 2.1) | | ### Import Steps 1. **Raw Download**\ Each vocabulary was retrieved as an Excel export from EIONET’s vocabulary registry. 2. **Column Standardisation**\ Columns were renamed to a unified schema: `id, label, status, status_modified, notation, group, quadrant, numeric_order, iotables_label, block, uri` 3. **Quadrant and Block Assignment**\ Each item was assigned a `quadrant` and a semantic `block` consistent across vocabularies: - `10` = intermediate (Quadrant 1) - `20` = primary_inputs (Quadrant 3) - `30` = final_use (Quadrant 2) - `50` = extension / diagnostic\ Control totals such as *“Total supply at basic prices”* were retained as `block = "control_total"`. 4. **Ordinal Ordering**\ `numeric_order` was reindexed within each quadrant with consistent gaps (10, 20, …) to ensure reproducible ordering for matrix construction. 5. **URI Generation**\ Each code was linked to its SKOS concept using: ```{r uri, eval=FALSE} df$uri <- sprintf( "https://dd.eionet.europa.eu/vocabularyconcept/eurostat/%s/%s", vocabulary_id, df$notation ) ``` 6. **Validation**\ Each table was checked for: - missing or duplicate IDs - monotone numeric order - alignment of quadrant ↔ block semantics 7. **Storage and Naming** The cleaned tibbles were stored as exported data objects: ``` data/ind_ava.rda data/ind_use.rda data/prd_ava.rda data/prd_use.rda ``` Each dataset can be loaded directly with `data()`. ### Adjustments to Vocabularies Although the four Eurostat vocabularies (`ind_ava`, `ind_use`, `prd_ava`, `prd_use`) were imported directly from the official Eurostat metadata registry, some modifications were necessary to ensure compatibility with the actual Eurostat input–output datasets. The main data sources, in particular `naio_10_cp1750` and `naio_10_cp1700`, occasionally include variables that are not coded according to the published and standardised vocabularies. While these inconsistencies are usually clear to a manual user, they can create ambiguity in a reproducible workflow where automated matching is required. For example, the *product × product* SIOTs for the Slovak Republic contain a more detailed industry breakdown than that defined in prd_ava and prd_use. To maintain alignment across datasets, all 0-, 1-, and 2-digit codes from the `cpa2_1` vocabulary were imputed into the four vocabularies. Each entry includes a validity flag in the status column, indicating whether the code is valid in the official Eurostat vocabulary or was adopted from observed but non-standard codes in the data. This approach preserves reproducibility while ensuring complete coverage of all codes encountered in current Eurostat data releases. ### Versioning All four vocabularies correspond to the 2025 Eurostat CPA 2.1 / ESA 2010 edition.