This R package provides tools to access PX-WEB API. Your contributions and bug reports and other feedback are welcome!
We can find more information on the PX-Web/PC-Axis API here.
PXWEB is an API structure developed by Statistics Sweden and other national statistical institutions (NSI) to disseminate public statistics in a structured way. This API enables downloading and using data from statistical agencies without using a web browser direct over HTTP/HTTPS.
The pxweb R package connects any PXWEB API to R and
facilitates the access, use and referencing of data from PXWEB APIs.
A number of organizations use PXWEB to distribute hierarchical data. You can browse the available data sets at:
The data in PXWEB APIs consists of metadata and data parts. Metadata is structured in a hierarchical node tree, where each node contains information about subnodes. The leaf nodes have information on which the dimensions are available for the data at that leaf node.
To install the latest stable release version from CRAN, just use:
To install the latest stable release version from GitHub, just use:
Test the installation by loading the library:
A tutorial is included with the package with:
There are two ways of using the pxweb R package to
access data, either interactively or using the core functions. To access
data, two parts are needed, an URL to the data table in the API and a
query specifying what data is of interest.
The simplest way of using pxweb is to use it
interactively, navigate the API to the data of interest, and then set up
the query of interest. When selecting values for a table variable, use
* to select all available values, separate multiple choices
with ,, and use : to select a range of
choices. If a variable can be eliminated from the query, use
e to eliminate it.
# Navigate through all pxweb api:s in the R package API catalogue
d <- pxweb_interactive()
# Get data from SCB (Statistics Sweden)
d <- pxweb_interactive("api.scb.se")
# Fetching data from statfi (Statistics Finland)
d <- pxweb_interactive("statfin.stat.fi")
# Fetching data from StatBank (Statistics Norway)
d <- pxweb_interactive("data.ssb.no")
# To see all available PXWEB APIs use
pxweb_apis <- pxweb_api_catalogue()In the example above, we use the interactive functionality from the PXWEB API root, but we could use any path to the API.
# Start with a specific path.
d <- pxweb_interactive("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A")This functionality also means that we can navigate any PXWEB API, irrespectively of if they are a part of the R package API catalogue or not. Just supply an URL to somewhere in the API and then navigate the API from there.
Due to new CRAN policies, it is not possible to use an R function to
edit the API catalogue of the R package, but editing them can be done
quickly from R using file.edit().
Although, if the pxweb is installed again, it will
overwrite the old API catalogue. So the easiest way is to add a PXWEB
API to the global catalogue. To do this, do a pull request at the pxweb
GitHub page here.
Under the hood, the pxweb package uses the pxweb_get()
function to access data from the PXWEB API. It also keeps track of the
API’s time limits and splits big queries into optimal downloadable
chunks. If we use pxweb_get() without a query, the function
either returns a PXWEB LEVELS object or a PXWEB METADATA object. What is
returned depends on if the URL points to a table in the API or not. Here
is an example of a PXWEB LEVELS object.
# Get PXWEB levels
px_levels <- pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/")
px_levels## PXWEB LEVELS
## BefolkningNy (t): Population by region, marital status, age and sex. Year 1968 - 2022
## FolkmangdNov (t): Population 1 November by region, age and sex. Year 2002 - 2023
## FolkmangdDistrikt (t): Population by district, Landscape or Part of the country by sex. Year 2015 - 2022
## BefolkManad (t): Population per month by region, age and sex. Year 2000M01 - 2023M11
## BefolkningR1860N (t): Population by age and sex. Year 1860 - 2022
And if we use pxweb_get() for a table, a PXWEB METADATA
object is returned.
# Get PXWEB metadata about a table
scb_table_url <- paste0(
"https", "://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy"
)
px_meta <- pxweb_get(scb_table_url)
px_meta## PXWEB METADATA
## Population by region, marital status, age, sex, observations and year
## variables:
## [[1]] Region: region
## [[2]] Civilstand: marital status
## [[3]] Alder: age
## [[4]] Kon: sex
## [[5]] ContentsCode: observations
## [[6]] Tid: year
To download data, we need both the URL to the table and a query
specifying what parts of the table are of interest. An URL to a table is
an URL that will return a metadata object if not a query is supplied.
Creating a query can be done in three main ways. The first and most
straightforward approach is to use pxweb_interactive() to
explore the table URL and create a query interactively.
The interactive function returns an invisible list. The list always
contains the table URL and the query, even if the data is not
downloaded. If you choose to download data before exiting the
interactive session, the list also contains a data element
with the downloaded data. This is different from
pxweb_get(), which returns the API response itself.
## [1] "url" "query"
## [1] "http://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy"
## PXWEB QUERY
## query:
## [[1]] Region (item):
## 00
## [[2]] Civilstand (item):
## OG, G, ÄNKL, SK
## [[3]] Alder (item):
## tot
## [[4]] ContentsCode (item):
## BE0101N1
## [[5]] Tid (item):
## 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
We can also turn the query into a JSON query that we can use outside R.
## {
## "query": [
## {
## "code": "Region",
## "selection": {
## "filter": "item",
## "values": ["00"]
## }
## },
## {
## "code": "Civilstand",
## "selection": {
## "filter": "item",
## "values": ["OG", "G", "ÄNKL", "SK"]
## }
## },
## {
## "code": "Alder",
## "selection": {
## "filter": "item",
## "values": ["tot"]
## }
## },
## {
## "code": "ContentsCode",
## "selection": {
## "filter": "item",
## "values": ["BE0101N1"]
## }
## },
## {
## "code": "Tid",
## "selection": {
## "filter": "item",
## "values": ["2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017"]
## }
## }
## ],
## "response": {
## "format": "json"
## }
## }
The second approach is to specify the query either as an R list or a JSON object. Some Statistical Agencies, such as Statistics Sweden, supply queries directly as a JSON object on their web pages. We can use these queries directly. Below is another example of a JSON query for the table above. For details on setting up a JSON query, see the PXWEB API documentation.
{
"query": [
{
"code": "Civilstand",
"selection": {
"filter": "item",
"values": ["OG", "G", "ÄNKL", "SK"]
}
},
{
"code": "Kon",
"selection": {
"filter": "item",
"values": ["1", "2"]
}
},
{
"code": "ContentsCode",
"selection": {
"filter": "item",
"values": ["BE0101N1"]
}
},
{
"code": "Tid",
"selection": {
"filter": "item",
"values": ["2015", "2016", "2017"]
}
}
],
"response": {
"format": "json"
}
}
To use this JSON query, we store the JSON query as a file and supply
the path to the file to the “pxweb_query()”function.
Finally, we can create a PXWEB query from an R list where each list element is a variable and selected observation.
pxweb_query_list <-
list(
"Civilstand" = c("*"), # Use "*" to select all
"Kon" = c("1", "2"),
"ContentsCode" = c("BE0101N1"),
"Tid" = c("2015", "2016", "2017")
)
pxq <- pxweb_query(pxweb_query_list)
pxq## PXWEB QUERY
## query:
## [[1]] Civilstand (all):
## *
## [[2]] Kon (item):
## 1, 2
## [[3]] ContentsCode (item):
## BE0101N1
## [[4]] Tid (item):
## 2015, 2016, 2017
We can validate the query against the metadata object to asses that
we can use the query. This validation is done automatically when the
data is fetched with pxweb_get() but can also be done
manually.
When we have the URL to a data table and a query, we can download the
data with “pxweb_get()”. The function returns a
pxweb_data object that contains the downloaded data.
## PXWEB DATA
## With 4 variables and 24 observations.
If we instead want a JSON-stat object, we change the response format to JSON-stat, and we will get a JSON-stat object returned.
## {
## "dataset": {
## "dimension": {
## "Civilstand": {
## "label": ["marital status"],
## "category": {
## "index": {
## "OG": [0],
## "G": [1],
## "ÄNKL": [2],
## "SK": [3]
## },
## "label": {
## "OG": ["single"],
## "G": ["married"],
## "ÄNKL": ["widowers/widows"],
## "SK": ["divorced"]
## }
## },
## "extension": {
## "show": ["value"]
## }
## },
## "Kon": {
## "label": ["sex"],
## "category": {
## "index": {
## "1": [0],
## "2": [1]
## },
## "label": {
## "1": ["men"],
## "2": ["women"]
## }
## },
## "link": {
## "describedby": [
## {
## "extension": {
## "Kon": ["Kön"]
## }
## }
## ]
## },
## "extension": {
## "show": ["value"]
## }
## },
## "ContentsCode": {
## "label": ["observations"],
## "category": {
## "index": {
## "BE0101N1": [0]
## },
## "label": {
## "BE0101N1": ["Population"]
## },
## "unit": {
## "BE0101N1": {
## "base": ["number"],
## "decimals": [0]
## }
## }
## },
## "extension": {
## "show": ["value"]
## }
## },
## "Tid": {
## "label": ["year"],
## "category": {
## "index": {
## "2015": [0],
## "2016": [1],
## "2017": [2]
## },
## "label": {
## "2015": ["2015"],
## "2016": ["2016"],
## "2017": ["2017"]
## }
## },
## "extension": {
## "show": ["code"]
## }
## },
## "id": [
## ["Civilstand"],
## ["Kon"],
## ["ContentsCode"],
## ["Tid"]
## ],
## "size": [
## [4],
## [2],
## [1],
## [3]
## ],
## "role": {
## "metric": [
## ["ContentsCode"]
## ],
## "time": [
## ["Tid"]
## ]
## }
## },
## "label": ["Population by marital status, sex, observations and year"],
## "source": ["Statistics Sweden"],
## "updated": ["2023-02-09T07:57:00Z"],
## "value": [
## [2762601],
## [2820248],
## [2870477],
## [2394842],
## [2437315],
## [2477012],
## [1651482],
## [1672460],
## [1687016],
## [1639519],
## [1657129],
## [1671381],
## [99751],
## [99654],
## [99682],
## [345008],
## [340709],
## [335961],
## [417132],
## [420985],
## [425487],
## [540682],
## [546653],
## [553226]
## ],
## "extension": {
## "px": {
## "infofile": ["BE0101"],
## "tableid": ["TAB638"],
## "decimals": [0]
## }
## }
## }
## }
Some return formats return files. Then, these responses are stored in
the R tempdir() folded, and the file paths are returned by
pxweb_get(). Currently, px and
sdmx formats can be downloaded as files, but file an issue
if you need other response formats.
## [1] "/var/folders/x9/dsgck_4s5mx2nrzzs8zd64rc0000gq/T//RtmpFdmiD7/50026bd2b2d8df2e3f190ca568b3b587d8207465.px"
If the queries are large (contain more values than the PXWEB API
maximum allowed values), the query is chunked into optimal chunks and is
then downloaded sequentially. PXWEB data objects are then combined into
one large PXWEB data object, while JSON-stat objects are returned as a
list of JSON-stat objects, and other files are stored in
tempdir() as separate files.
For more advanced connections to the API, the
pxweb_advanced_get() gives the flexibility to access the
underlying HTTP calls using httr and log the HTTP calls for
debugging.
PXWEB API v2 tables use table identifiers and separate metadata and
data endpoints. The pxweb_get() function handles this
internally: when a query is supplied for a v2 table, the request is sent
to the table’s /data endpoint and JSON-stat2 is requested
by default.
Use pxweb_search() to find v2 tables from an API
root.
search_results <- pxweb_search(
"population",
api_url = paste0("https:", "//", "statistikdatabasen.scb.se", "/api/v2"),
lang = "en",
page_size = 5
)
search_results[, c("id", "label", "metadata_url")]v2_url <- paste0(
"https:", "//", "statistikdatabasen.scb.se",
"/api/v2/tables/TAB638/metadata?lang=sv"
)
v2_query <- list(
Region = "00",
Civilstand = "OG",
Alder = "0",
Kon = "1",
ContentsCode = "BE0101N1",
Tid = "2024"
)
v2_data <- pxweb_get(v2_url, query = v2_query)
as.data.frame(v2_data, column.name.type = "code", variable.value.type = "code")PXWEB API v2 tables may expose value sets and aggregation codelists,
for example age groups, regional groupings or other ready-made
classifications. These are represented in the API with extra URL
parameters such as codelist[Variable] and
outputValues[Variable]. The helper functions below let you
express these choices directly in the R query instead of writing those
API details by hand.
The helpers can be mixed with ordinary character values:
v2_query <- list(
Region = pxweb_all(),
Alder = pxweb_aggregation("agg_Ålder5år_1"),
Kon = c("1", "2"),
ContentsCode = "BE0101N1",
Tid = pxweb_latest()
)
v2_df <- pxweb_get_data(
v2_url,
query = v2_query,
column.name.type = "code",
variable.value.type = "code"
)The most common helpers are:
pxweb_all() # all values for a variable
pxweb_latest() # latest time period, resolved from metadata
pxweb_top(10) # top 10 values
pxweb_bottom(10) # bottom 10 values
pxweb_aggregation("agg_Ålder5år_1") # use an aggregation codelist
pxweb_valueset("vs_Ålder1årG") # use a value set codelistIf only some values from a codelist should be requested, supply them
with value_codes.
v2_query <- list(
Alder = pxweb_aggregation(
"agg_Ålder5år_1",
value_codes = c("0-4", "5-9", "10-14")
),
Kon = "1",
ContentsCode = "BE0101N1",
Tid = pxweb_latest()
)The exact aggregation and value set identifiers are found in the v2
metadata for a table. Use pxweb_codelists() to list them
without inspecting the raw API response by hand.
pxweb_latest() is resolved against the table metadata
before the data request is sent. By default it uses the placeholder
"9999" in the query object and replaces it with the last
available value for that variable from the metadata. It is mainly
intended for time variables.
If you want a data frame directly, use
pxweb_get_data().
v2_df <- pxweb_get_data(
v2_url,
query = v2_query,
column.name.type = "code",
variable.value.type = "code"
)In v2 JSON-stat2 output, the content variable is retained as a
dimension, such as ContentsCode, and observations are
returned in a generic value column.
We can then convert the downloaded PXWEB data objects to a
data. frame or to a character matrix. The character matrix
contains the “raw” data while data. frame returns an R
data.frame in a tidy format. This conversion means missing
values (such as “..” are converted to NA) in a
data. frame. Using the arguments
variable.value.type and column.name.type, we
can choose if we want the code or the text column names and value
types.
## marital status sex year Population
## 1 single men 2015 2762601
## 2 single men 2016 2820248
## 3 single men 2017 2870477
## 4 single women 2015 2394842
## 5 single women 2016 2437315
## 6 single women 2017 2477012
## Civilstand Kon Tid BE0101N1
## 1 OG 1 2015 2762601
## 2 OG 1 2016 2820248
## 3 OG 1 2017 2870477
## 4 OG 2 2015 2394842
## 5 OG 2 2016 2437315
## 6 OG 2 2017 2477012
Similarly, we can access the raw data as a character matrix with
as.matrix.
## Civilstand Kon Tid BE0101N1
## [1,] "OG" "1" "2015" "2762601"
## [2,] "OG" "1" "2016" "2820248"
## [3,] "OG" "1" "2017" "2870477"
## [4,] "OG" "2" "2015" "2394842"
## [5,] "OG" "2" "2016" "2437315"
## [6,] "OG" "2" "2017" "2477012"
In addition to the data, the PXWEB DATA object may also contain
comments for the data. This can be accessed using
pxweb_data_comments() function.
## NO PXWEB DATA COMMENTS
In this case, we did not have any comments. If we have comments, we
can turn the comments into a data. frame with one comment
per row.
Finally, if we use the data, we can easily create a citation for a
pxweb_data object using the pxweb_cite()
function. For full reproducibility, please also cite the package.
## Statistics Sweden (2024). “Population by region, marital status, age,
## sex, observations and year.” [Data accessed 2024-01-27 16:19:42.712139
## using pxweb R package 0.16.3],
## <https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy>.
##
## A BibTeX entry for LaTeX users is
##
## @Misc{,
## title = {Population by region, marital status, age, sex, observations and year},
## author = {{Statistics Sweden}},
## organization = {Statistics Sweden},
## address = {Stockholm, Sweden},
## year = {2024},
## url = {https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy},
## note = {[Data accessed 2024-01-27 16:19:42.712139 using pxweb R package 0.16.3]},
## }
## Kindly cite the pxweb R package as follows:
##
## Mans Magnusson, Markus Kainu, Janne Huovari, and Leo Lahti
## (rOpenGov). pxweb: R tools for PXWEB API. URL:
## http://github.com/ropengov/pxweb
##
## A BibTeX entry for LaTeX users is
##
## @Misc{,
## title = {pxweb: R tools for PX-WEB API},
## author = {Mans Magnusson and Markus Kainu and Janne Huovari and Leo Lahti},
## year = {2019},
## }
See TROUBLESHOOTING.md for a list of current known issues.
This work can be freely used, modified and distributed under the open license specified in the DESCRIPTION file.
We created this vignette with
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 26.04 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] pxweb_0.18.0 rmarkdown_2.31
##
## loaded via a namespace (and not attached):
## [1] backports_1.5.1 digest_0.6.39 R6_2.6.1 fastmap_1.2.0
## [5] xfun_0.59 maketools_1.3.2 cachem_1.1.0 knitr_1.51
## [9] htmltools_0.5.9 buildtools_1.0.0 lifecycle_1.0.5 cli_3.6.6
## [13] sass_0.4.10 jquerylib_0.1.4 compiler_4.6.0 sys_3.4.3
## [17] tools_4.6.0 checkmate_2.3.4 evaluate_1.0.5 bslib_0.11.0
## [21] yaml_2.3.12 otel_0.2.0 jsonlite_2.0.0 rlang_1.2.0