--- title: "rqog-package for R" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{rqog-package for R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} editor_options: chunk_output_type: console --- ```{r include = FALSE} knitr::opts_chunk$set(echo = TRUE,message = FALSE, warning = FALSE, cache = FALSE, out.width = "100%") ``` `compiled at` `r Sys.time()` *Download data from the Quality of Government Institute data* Quotation from [ Quality of Governance institute website](http://www.qog.pol.gu.se/) >The QoG Institute was founded in 2004 by Professor Bo Rothstein and Professor Sören Holmberg. It is an independent research institute within the Department of Political Science at the University of Gothenburg. We conduct and promote research on the causes, consequences and nature of Good Governance and the Quality of Government (QoG) - that is, trustworthy, reliable, impartial, uncorrupted and competent government institutions. >The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained. A second objective is to study the effects of Quality of Government on a number of policy areas, such as health, the environment, social policy, and poverty. We approach these problems from a variety of different theoretical and methodological angles. **Quality of Government institute** provides data in five different data sets, both in cross-sectional and longitudinal versions: 1. [QoG Basic Data](http://www.qog.pol.gu.se/data/datadownloads/qogbasicdata/) 2. [QoG Standard Data](http://www.qog.pol.gu.se/data/datadownloads/qogstandarddata/) 3. [QoG OECD Data](http://qog.pol.gu.se/data/datadownloads/qogoecddata) 4. [QoG Expert Survey Data](http://www.qog.pol.gu.se/data/datadownloads/qogexpertsurveydata/) 5. [QoG EU Regional Data](http://www.qog.pol.gu.se/data/datadownloads/qogeuregionaldata/) **rqog**-package provides access to **Basic**, **Standard** and **OECD** datasets through function `read_qog()`. **Standard** data has all the same indicators as in **Basic** data (367 variables) and an additional ~1600 indicators. Both **basic** and **standard** datasets have 194 countries. **OECD** dataset has 1020 indicators from 35 countries. **rqog** uses *longitudinal* datasets by default that have time-series of varying duration from majority of the indicators and countries. Quality of Government Institute provides codebooks for all datasets: 1. [Basic data codebook](http://www.qogdata.pol.gu.se/data/qog_bas_jan18.pdf) 2. [Standard data codebook](http://www.qogdata.pol.gu.se/data/qog_std_jan18.pdf) 3. [OECD data codebook](http://www.qogdata.pol.gu.se/data/qog_oecd_jan18.pdf) You consult the codebooks for description of the data and indicators. ## Installation ```{r, eval=FALSE, message=FALSE} library(devtools) install_github("ropengov/rqog") library(rqog) ``` ## Examples ### Download data and plot numeric indicators **Basic Data** Basic data has a selection of most common indicators, 344 indicators from 211 countries. Below is an example on how to extract data on population and Democracy (Freedom House/Polity) index from BRIC-countries from 1990 to 2010 and to plot it. ```{r ExampleBasic, cache=FALSE, message=FALSE,warning=FALSE, eval=TRUE, fig.width=12, fig.height=10} library(rqog) library(dplyr) library(ggplot2) library(tidyr) # Download a local coppy of the file basic <- read_qog(which_data="basic", data_type = "time-series") # Subset the data dat.l <- basic %>% # filter years and countries filter(year %in% 1990:2010, cname %in% c("Russia","China","India","Brazil")) %>% # select variables select(cname,year,p_polity2,wdi_pop1564) %>% # gather to long format gather(., var, value, 3:4) %>% # remove NA values filter(!is.na(value)) # Plot ggplot(dat.l, aes(x=year,y=value,color=cname)) + geom_point() + geom_line() + geom_text(data = dat.l %>% group_by(cname) %>% filter(year == max(year)), aes(x=year,y=value,label=cname), hjust=1,vjust=-1,size=3,alpha=.8) + facet_wrap(~var, scales="free") + theme_minimal() + theme(legend.position = "none") + labs(title = "Plotting QoG basic data", caption = "Data: QoG Basic data") ``` **Standard data** Standard data includes 2190 indicators from 211 countries. Below is an example on how to extract data on *Economic Performance* and *GINI index (World Bank estimate)* from BRIC-countries and plot it. ```{r ExampleStandard, cache=FALSE, message=FALSE,warning=FALSE, eval=TRUE, fig.width=12, fig.height=10} library(rqog) # Download a local coppy of the file standard <- read_qog("standard", "time-series") # Subset the data dat.l <- standard %>% # filter years and countries filter(year %in% 1990:2020, cname %in% c("Russia","China","India","Brazil")) %>% # select variables select(cname,year,bti_ep,wdi_gini) %>% # gather to long format gather(., var, value, 3:4) %>% # remove NA values filter(!is.na(value)) # Plot the data # Plot ggplot(dat.l, aes(x=year,y=value,color=cname)) + geom_point() + geom_line() + geom_text(data = dat.l %>% group_by(cname) %>% filter(year == max(year)), aes(x=year,y=value,label=cname), hjust=1,vjust=-1,size=3,alpha=.8) + facet_wrap(~var, scales="free") + theme_minimal() + theme(legend.position = "none") + labs(title = "Plotting QoG Standard data", caption = "Data: QoG Standard data") ``` **OECD data** OECD data includes 1006 variables, but from a smaller number of wealthier countries of 36. In the example below four indicators: 1. Total expenditure on health `oecd_pphlthxp_t1c` 2. Income inequality: GINI index (World Bank estimate) `wdi_gini` 3. Gross National Income per Capita `oecd_natinccap_t1` 4. Adjusted general government debt-to-GDP (excl. unfunded pension liability) `oecd_govdebt_t1` We will include all the countries and all the years included in the data. ```{r ExampleSocialPolicy, cache=FALSE, message=FALSE,warning=FALSE, eval=TRUE, fig.width=12, fig.height=10} library(rqog) # Download a local coppy of the file oecd <- read_qog("oecd", "time-series") # Subset the data dat.l <- oecd %>% # select variables select(cname,year,oecd_pphlthxp_t1c,wdi_gini,oecd_natinccap_t1,oecd_govdebt_t1) %>% # gather to long format gather(., var, value, 3:6) %>% # remove NA values filter(!is.na(value)) # Plot the data # Plot ggplot(dat.l, aes(x=year,y=value,color=cname)) + geom_point() + geom_line() + geom_text(data = dat.l %>% group_by(var,cname) %>% filter(year == max(year)), aes(x=year,y=value,label=cname), hjust=1,vjust=-1,size=3,alpha=.8) + facet_wrap(~var, scales="free") + theme_minimal() + theme(legend.position = "none") + labs(title = "Plotting QoG OECD data", caption = "Data: QoG OECD data") ``` ### Work with metadata and factor indicators Packages is shipped with seven metadatas for each year (2016-2022) `meta_basic_cs_2022`, `meta_basic_ts_2022`, `meta_std_cs_2022`, `meta_std_ts_2022`, `meta_oecd_cs_2022` and `meta_oecd_ts_2022`. Data frames are generated from original `spss` versions of data using `tidymetadata::create_metadata()`-function. **Browsing metadata** You can browse the content by applying `grepl` to `name` column. Let's find indicators containing term `Corruption` either in lower or uppercase. ```{r} library(rqog) meta_basic_ts_2022[grepl("Corruption", meta_basic_ts_2022$name, ignore.case = TRUE),] ``` **Assigning labels to values with metadata** The data `rqoq` imports to R is in `.csv`-format without the labels and names shipped together with `spss` or `Stata` formats. As such it is the desired format to work with in R, especially with numeric indicators. However, many of the indicators in QoG are factors meaning that they have discrete values with a corresponding label. You can use the metadatas to assign labels for values of such indicators. Lets take the `ccp_cc` as an example below and first print the `value` and `label` colums of the data. ```{r} meta_basic_ts_2022 %>% filter(code == "ccp_cc") %>% select(value,label) ``` Currently we have basic data in R in an object called `basic`. Lets see the frequencies of each value ```{r} basic %>% count(ccp_cc) ``` Now, using the metadata with assign values with corresponding labels ```{r} basic %>% count(ccp_cc) %>% mutate(ccp_cc_lab = meta_basic_ts_2022[meta_basic_ts_2022$code == "ccp_cc",]$label[match(ccp_cc,meta_basic_ts_2022[meta_basic_ts_2022$code == "ccp_cc",]$value)]) ``` So, lets find two factor variables with few more values from the *cross-sectional* data ```{r} meta_basic_cs_2022 %>% filter(class =="factor") %>% group_by(code) %>% summarise(n_of_values = n()) %>% arrange(desc(n_of_values)) ``` Lets take these two factors and summarise the regime types per regions ```{r} meta_basic_cs_2022 %>% filter(code %in% c("ht_region","ht_colonial")) %>% distinct(code, .keep_all = TRUE) ``` ```{r} # lets download the cross-sectional data first basic_cs <- read_qog(which_data = "basic", data_type = "cross-sectional") plot_d <- basic_cs %>% # group by region group_by(ht_region) %>% # count per group frequencies of each regime type count(ht_colonial) %>% ungroup() %>% # label mutate(ht_region_lab = meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_region",]$label[match(ht_region,meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_region",]$value)], ht_colonial_lab = meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_colonial",]$label[match(ht_colonial,meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_colonial",]$value)]) %>% na.omit() head(plot_d) # lets abbreviate option '0. Never colonized by a Western overseas colonial power' to '0. Never' plot_d$ht_colonial_lab[plot_d$ht_colonial_lab == "0. Never colonized by a Western overseas colonial power"] <- '0. Never' ``` Then we can create a simple bar plot ```{r fig.height = 12, fig.width = 12} # indicators names from metadata ind_name <- unique(meta_basic_cs_2022[meta_basic_cs_2022$code == "ht_colonial",]$name) group_name <- unique(meta_basic_cs_2022[meta_basic_cs_2022$code == "ht_region",]$name) ggplot(plot_d, aes(x=ht_colonial_lab,y=n)) + geom_col() + facet_wrap(~ht_region_lab, scales = "free") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, size = 7)) + labs(title = paste0(ind_name," by ",group_name ), caption = "Data: Quality of Government institute", x = NULL, y = "number of countries") + coord_flip() ``` ```{r} sessionInfo() ```