Title: | API Access to Datasets on Open Government Data - India Portal |
---|---|
Description: | Provides API access to selected datasets on Open Government Data - India Portal. |
Authors: | Dhrumin Shah [aut, cre], Sainath Adapa [aut] |
Maintainer: | Dhrumin Shah <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9005 |
Built: | 2024-11-02 02:41:07 UTC |
Source: | https://github.com/rOpenGov/ogdindiar |
Given a download link, obtained by using either 'search_for_datasets' or 'get_datasets_from_a_catalog', this function will download the file.
download_dataset(urllink, filepath = NULL)
download_dataset(urllink, filepath = NULL)
urllink |
Download link/url |
filepath |
If specified, the file will be downloaded to the specified location. If unspecified, it will be saved in the tmp directory |
fetch_data
is the main function from this package to load the entire data set from the Government of India API.
fetch_data( res_id, filter = NULL, select = NULL, sort = NULL, field_type_correction = TRUE, max_obs = 500 )
fetch_data( res_id, filter = NULL, select = NULL, sort = NULL, field_type_correction = TRUE, max_obs = 500 )
res_id |
a string, JSON data resource id |
filter |
a named vector, specifying equality constrainsts of the form "variable" = "condition" |
select |
a vector, specifying variables/fields to be selected |
sort |
a named vector, specifying sort order in the form "variable" = "order" |
field_type_correction |
boolean, whether to apply field type correction. All data fields are downloaded as character and then corrected (if at all) based on accompanying metadata |
max_obs |
an integer, specifying maximum no of observations to fetch (will be rounded UP to the nearest 100) |
list a list of 2 elements - data from the Government of India API, and metadata, additional information about the fields
## Not run: ### fetch a dataset using it's resource id and your personal API key # Basic Use: fetch_data(res_id = "60a68cec-7d1a-4e0e-a7eb-73ee1c7f29b7") # Advanced Use, specifying additional parameters fetch_data(res_id = "60a68cec-7d1a-4e0e-a7eb-73ee1c7f29b7" filter = c("state" = "Maharashtra"), select = c("s_no_","constituency","state"), sort = c("s_no_" = "asc","constituency" = "desc")) ## End(Not run)
## Not run: ### fetch a dataset using it's resource id and your personal API key # Basic Use: fetch_data(res_id = "60a68cec-7d1a-4e0e-a7eb-73ee1c7f29b7") # Advanced Use, specifying additional parameters fetch_data(res_id = "60a68cec-7d1a-4e0e-a7eb-73ee1c7f29b7" filter = c("state" = "Maharashtra"), select = c("s_no_","constituency","state"), sort = c("s_no_" = "asc","constituency" = "desc")) ## End(Not run)
This will return the no of elements that were returned from JSON data query.
get_count(x)
get_count(x)
x |
a list, i.e. a JSON data object |
no_elements an integer, no of elements to download a value between 1 to 100
## Not run: ###Return no of elements from a JSON data object (obtained using get_JSON_doc()) get_count(x = JSON_doc) ## End(Not run)
## Not run: ###Return no of elements from a JSON data object (obtained using get_JSON_doc()) get_count(x = JSON_doc) ## End(Not run)
This will return the data from the JSON data object.
get_data(x)
get_data(x)
x |
a list, i.e. a JSON data object |
data a list, data from the JSON data object
## Not run: ###Return data from a JSON data object (obtained using get_JSON_doc()) get_data(x = JSON_doc) ## End(Not run)
## Not run: ###Return data from a JSON data object (obtained using get_JSON_doc()) get_data(x = JSON_doc) ## End(Not run)
Get the list of data sets and related info for a catalog
get_datasets_from_a_catalog( catalog_link, limit_dataset_pages = 5L, limit_datasets = 10L )
get_datasets_from_a_catalog( catalog_link, limit_dataset_pages = 5L, limit_datasets = 10L )
catalog_link |
Link to the catalog |
limit_dataset_pages |
Limit the number of pages that should be requested and parsed, to acquire the datasets. Default is 5. Set to Inf to request all. |
limit_datasets |
Request more pages until the number of datasets obtained reaches this limit. Default is 10. Set to Inf to request all. |
search_for_datasets
## Not run: get_datasets_from_a_catalog( 'https://data.gov.in/catalog/session-wise-statistical-information-relating-questions-rajya-sabha', limit_dataset_pages = 7, limit_datasets = 10) ## End(Not run)
## Not run: get_datasets_from_a_catalog( 'https://data.gov.in/catalog/session-wise-statistical-information-relating-questions-rajya-sabha', limit_dataset_pages = 7, limit_datasets = 10) ## End(Not run)
This will return field names from the JSON data object.
get_field_names(x)
get_field_names(x)
x |
a list, i.e. a JSON data object |
field_names a vector/list, of field names for JSON data object
## Not run: ###Return field names from a JSON data object (obtained using get_JSON_doc()) get_field_names(x = JSON_doc) ## End(Not run)
## Not run: ###Return field names from a JSON data object (obtained using get_JSON_doc()) get_field_names(x = JSON_doc) ## End(Not run)
This will return field types from the JSON data object.
get_field_type(x)
get_field_type(x)
x |
a list, i.e. a JSON data object |
field_types a list/vector, field type of each of the fields
## Not run: ###Return field types from a JSON data object (obtained using get_JSON_doc()) get_field_names(x = JSON_doc) ## End(Not run)
## Not run: ###Return field types from a JSON data object (obtained using get_JSON_doc()) get_field_names(x = JSON_doc) ## End(Not run)
get_JSON_doc
will return infomation about the requested resource. Ideally, will be just used internally.
get_JSON_doc( link = "https://data.gov.in/api/datastore/resource.json?", res_id, offset, no_elements, filter, select, sort, verbose = FALSE )
get_JSON_doc( link = "https://data.gov.in/api/datastore/resource.json?", res_id, offset, no_elements, filter, select, sort, verbose = FALSE )
link |
a string, general JSON data link |
res_id |
a string, JSON data resource id |
offset |
an integer, offset of 1 corresponds to 100 elements |
no_elements |
an integer, no of elements to download a value between 1 to 100 |
filter |
a named vector, specifying equality constrainsts of the form "variable" = "condition" |
select |
a vector, specifying variables/fields to be selected |
sort |
a named vector, specifying sort order in the form "variable" = "asc" |
verbose |
a boolean, specifying whether to print verbose messages |
JSON data object i.e. a list
## Not run: library(RCurl) library(RJSONIO) # Return 100 elements from a hotels data resource JSON_doc = get_JSON_doc(link="http://data.gov.in/api/datastore/resource.json?", res_id="0749068c-a590-4a07-a571-e9df5dddcc8a", offset=0, no_elements=100) ## End(Not run)
## Not run: library(RCurl) library(RJSONIO) # Return 100 elements from a hotels data resource JSON_doc = get_JSON_doc(link="http://data.gov.in/api/datastore/resource.json?", res_id="0749068c-a590-4a07-a571-e9df5dddcc8a", offset=0, no_elements=100) ## End(Not run)
The API wrapper functions in this package all rely on a Open Government Data India API
key residing in the environment variable OGDINDIA_API_KEY
. The
easiest way to accomplish this is to set it in the '.Renviron' file in your
home directory.
ogdindia_api_key(force = FALSE)
ogdindia_api_key(force = FALSE)
force |
Force setting a new PassiveTotal API key for the current environment? |
atomic character vector containing the Open Government Data India API key
The ogdindiar package provides three categories of important functions: Downloading entire datasets, Downloading specific elelments based on certain conditions, and Search for data sets.
fetch_data search_datasets
rectify_field_type
will convert select fields to numeric based on accompanied metadata
rectify_field_type(d_in, d_fields)
rectify_field_type(d_in, d_fields)
d_in |
a data.frame on which the correction is to be applied. |
d_fields |
a data.frame containing fields metadata |
data corrected data.frame
## Not run: rectify_field_type(data_stage2, data_field_type) ## End(Not run)
## Not run: rectify_field_type(data_stage2, data_field_type) ## End(Not run)
This function scrapes the data.gov.in search results and returns most of the information available for the datasets. As this function doesn't use API and just parses the web pages, there needs to delay between successive requests, and there should be limits to the number of pages that the function downloads from the web. For a particular search input, there may be multiple pages of search results. Each result page contains a list of catalogs. And each catalog contains multiple pages, with each page containing a list of data sets. There are default limits at each one of these stages. Make them 'Inf' if you need to get all the results or if you don't expect a large number of results. Please refer to vignette for a detailed overview.
search_for_datasets( search_terms, limit_catalog_pages = 5L, limit_catalogs = 10L, return_catalog_list = FALSE, limit_dataset_pages = 5L, limit_datasets = 10L )
search_for_datasets( search_terms, limit_catalog_pages = 5L, limit_catalogs = 10L, return_catalog_list = FALSE, limit_dataset_pages = 5L, limit_datasets = 10L )
search_terms |
Either one string with multiple words separated by space, or a character vector with all the search terms |
limit_catalog_pages |
Number of pages of search results to request. Default is 5. Set to Inf to get all. |
limit_catalogs |
Number of catalogs that the function should parse to get the data sets. Default is 5. Set to Inf to get all. |
return_catalog_list |
Default is FALSE. If TRUE, the function will not look for data sets, and will only return the list of catalogs found. |
limit_dataset_pages |
Limit the number of pages that should be requested and parsed, to acquire the datasets. Default is 5. Set to Inf to request all. |
limit_datasets |
Request more pages until the number of datasets obtained reaches this limit. Default is 10. Set to Inf to request all. |
get_datasets_from_a_catalog
## Not run: # Basic Use: search_for_datasets('train usage') # Advanced Use, specifying additional parameters search_for_datasets(search_terms = c('state', 'gdp'), limit_catalog_pages = 1, limit_catalogs = 3, limit_dataset_pages = 2) search_for_datasets(search_terms = c('state', 'gdp'), limit_catalog_pages = 2, return_catalog_list = TRUE) ## End(Not run)
## Not run: # Basic Use: search_for_datasets('train usage') # Advanced Use, specifying additional parameters search_for_datasets(search_terms = c('state', 'gdp'), limit_catalog_pages = 1, limit_catalogs = 3, limit_dataset_pages = 2) search_for_datasets(search_terms = c('state', 'gdp'), limit_catalog_pages = 2, return_catalog_list = TRUE) ## End(Not run)
to_data_frame
will convert data from 'list' to a 'data.frame'.
to_data_frame(lst_elmnt)
to_data_frame(lst_elmnt)
lst_elmnt |
a list of data from a JSON data object |
data a data.frame, data from the JSON data object
## Not run: ###Convert a list to data.frame to_data_frame(x = get_data(JSON_list)) ## End(Not run)
## Not run: ###Convert a list to data.frame to_data_frame(x = get_data(JSON_list)) ## End(Not run)