--- title: "Introduction" author: "Dhrumin Shah" date: "2015-08-22" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction This package provides easy access to the API provided by [Open Government Data Platform - India](https://data.gov.in) to download datasets from R. Here's the list of [Datasets available through API](https://data.gov.in/catalogs#path=is_api/1). ## Basic Usage ```r library(ogdindiar) ``` ``` ## Welcome to ogdindiar ``` When calling OGD India API, at minimum, you need to provide 2 parameters. * `res_id`: Resource id of the dataset you want to access * `api_key`: Your personal API key (See [Package Installation instructions](https://github.com/steadyfish/ogdindiar/blob/master/README.md#prerequisite)) Resource id for the datasets can be found on the data specific page on the data portal. For example, this page has the resource id information for [Annual And Seasonal Mean Temperature Of India](https://data.gov.in/resources/annual-and-seasonal-mean-temperature-india/api). The resource is a string that's part of Datastore API URL. * The URL for this dataset as shown on the page is - https://data.gov.in/api/datastore/resource.json?resource_id=98fe9271-a59d-4834-b05b-fd5ddb94ac01&api-key=OGDINDIA_API_KEY * The resource id is the highlighted part in the above URL. The main function this package provides is `fetch_data()`. Once you have figured out the resource id, you can download that dataset as follows: ```r mean_temp_data = fetch_data(res_id = "98fe9271-a59d-4834-b05b-fd5ddb94ac01") ``` This function returns a list of 2 elements. * The first element is the data ```r knitr::kable(head(mean_temp_data[[1]])) ``` |id | timestamp| year| annual| jan_feb| mar_may| jun_sep| oct_dec| |:----|----------:|----:|------:|-------:|-------:|-------:|-------:| |1123 | 1424778424| 1957| 23| 18| 25| 27| 21| |1423 | 1424778424| 1972| 24| 18| 25| 27| 21| |1443 | 1424778424| 1973| 24| 19| 26| 27| 21| |1463 | 1424778424| 1974| 24| 18| 26| 27| 21| |1483 | 1424778424| 1975| 23| 18| 25| 26| 21| |1503 | 1424778424| 1976| 24| 18| 25| 26| 22| * The second element is a dataframe containing metadata about the columns. ```r knitr::kable(mean_temp_data[[2]]) ``` |.id |type |size |unsigned |not null |description | |:---------|:------|:------|:--------|:--------|:--------------------------------| |id |serial |normal |TRUE |TRUE | | |timestamp |int |normal |TRUE |FALSE |The Unix timestamp for the data. | |year |int |normal |NA |FALSE | | |annual |int |normal |NA |FALSE | | |jan_feb |int |normal |NA |FALSE | | |mar_may |int |normal |NA |FALSE | | |jun_sep |int |normal |NA |FALSE | | |oct_dec |int |normal |NA |FALSE | | ## Advanced Usage Instead of downloading entire datasets you can conditionally download specific data elements. This functionality is achieved using additional arguments to `fetch_data()` function. Currently you can use - * `filter` to filter the dataset using __equality constraints__ on specific columns. * `select` to select specific set of columns to be downloaded. * `sort` to sort the resulting dataset based on multiple columns. Following example illustrates this - ```r mean_temp_25 = fetch_data(res_id = "98fe9271-a59d-4834-b05b-fd5ddb94ac01", filter = c("annual" = "25"), select = c("year", "annual", "jan_feb", "mar_may", "jun_sep", "oct_dec"), sort = c("jan_feb" = "asc", "mar_may" = "desc") ) ``` The returned dataset - ```r knitr::kable(head(mean_temp_25[[1]])) ``` | year| annual| jan_feb| mar_may| jun_sep| oct_dec| |----:|------:|-------:|-------:|-------:|-------:| | 2002| 25| 19| 27| 27| 22| | 2010| 25| 20| 27| 27| 22| | 1995| 25| 20| 26| 28| 23| | 2009| 25| 20| 26| 27| 22| | 2006| 25| 21| 26| 27| 22| Metadata about the returned dataset ```r knitr::kable(mean_temp_25[[2]]) ``` |.id |type |size |not null |description | |:-------|:----|:------|:--------|:-----------| |year |int |normal |FALSE | | |annual |int |normal |FALSE | | |jan_feb |int |normal |FALSE | | |mar_may |int |normal |FALSE | | |jun_sep |int |normal |FALSE | | |oct_dec |int |normal |FALSE | | There's one more argument that is passed to `fetch_data()` function, `field_type_correction`. The data fetch process inadvertently treats all the columns as `character`. The default setting `field_type_correction = TRUE` converts these columns back to `numeric` type based on accompanying metadata.