Metadata Vocabularies for Input–Output Analysis

library(iotables)

Import and Normalisation Workflow

The five Eurostat vocabularies —
ind_ava, ind_use, prd_ava, prd_use, and cpa2_1 — were imported directly from the official Eurostat metadata registry (https://dd.eionet.europa.eu/vocabulary/eurostat/).
Each dataset mirrors the structure of its corresponding SDMX codelist and preserves Eurostat’s identifiers and validity information.

Data Sources

Vocabulary Description Source URL
ind_ava Industries, adjustments and value added (rows for industry × industry SIOTs) https://dd.eionet.europa.eu/vocabulary/eurostat/ind_ava/
ind_use Industry uses (columns for industry × industry SIOTs) https://dd.eionet.europa.eu/vocabulary/eurostat/ind_use/
prd_ava Products, adjustments and value added (rows for product × product SIOTs) https://dd.eionet.europa.eu/vocabulary/eurostat/prd_ava/
prd_use Product uses (columns for product × product SIOTs) https://dd.eionet.europa.eu/vocabulary/eurostat/prd_use/
cpa2_1 Statistical Classification of Products by Activity (CPA 2.1) https://dd.eionet.europa.eu/vocabulary/eurostat/cpa2_1/

Import Steps

  1. Raw Download
    Each vocabulary was retrieved as an Excel export from EIONET’s vocabulary registry.

  2. Column Standardisation
    Columns were renamed to a unified schema: id, label, status, status_modified, notation, group, quadrant, numeric_order, iotables_label, block, uri

  3. Quadrant and Block Assignment
    Each item was assigned a quadrant and a semantic block consistent across vocabularies:

    • 10 = intermediate (Quadrant 1)

    • 20 = primary_inputs (Quadrant 3)

    • 30 = final_use (Quadrant 2)

    • 50 = extension / diagnostic
      Control totals such as “Total supply at basic prices” were retained as block = "control_total".

  4. Ordinal Ordering
    numeric_order was reindexed within each quadrant with consistent gaps (10, 20, …) to ensure reproducible ordering for matrix construction.

  5. URI Generation
    Each code was linked to its SKOS concept using:

df$uri <- sprintf(
  "https://dd.eionet.europa.eu/vocabularyconcept/eurostat/%s/%s",
  vocabulary_id,
  df$notation
)
  1. Validation
    Each table was checked for:

    • missing or duplicate IDs

    • monotone numeric order

    • alignment of quadrant ↔︎ block semantics

  2. Storage and Naming

The cleaned tibbles were stored as exported data objects:

data/ind_ava.rda data/ind_use.rda data/prd_ava.rda data/prd_use.rda 

Each dataset can be loaded directly with data(<name>).

Adjustments to Vocabularies

Although the four Eurostat vocabularies (ind_ava, ind_use, prd_ava, prd_use) were imported directly from the official Eurostat metadata registry, some modifications were necessary to ensure compatibility with the actual Eurostat input–output datasets. The main data sources, in particular naio_10_cp1750 and naio_10_cp1700, occasionally include variables that are not coded according to the published and standardised vocabularies. While these inconsistencies are usually clear to a manual user, they can create ambiguity in a reproducible workflow where automated matching is required.

For example, the product × product SIOTs for the Slovak Republic contain a more detailed industry breakdown than that defined in prd_ava and prd_use. To maintain alignment across datasets, all 0-, 1-, and 2-digit codes from the cpa2_1 vocabulary were imputed into the four vocabularies. Each entry includes a validity flag in the status column, indicating whether the code is valid in the official Eurostat vocabulary or was adopted from observed but non-standard codes in the data. This approach preserves reproducibility while ensuring complete coverage of all codes encountered in current Eurostat data releases.

Versioning

All four vocabularies correspond to the 2025 Eurostat CPA 2.1 / ESA 2010 edition.