Title: | Scrap Data from Europarlament's Website |
---|---|
Description: | Scrap data from europarlament's website. |
Authors: | Szymon Gorka |
Maintainer: | The package maintainer <[email protected]> |
License: | GPL |
Version: | 0.1.0 |
Built: | 2025-01-30 11:24:43 UTC |
Source: | https://github.com/rOpenGov/europarl |
Function create_database
creates a dabase
with
create_database(dbname, user, password, host)
create_database(dbname, user, password, host)
Get national parties, eu groups, postions
deputies_get_history_of_services(home_page, deputy_id)
deputies_get_history_of_services(home_page, deputy_id)
Get data about all deputies
get_all_deputies(term = 1)
get_all_deputies(term = 1)
Get data about all deputies
get_eurogroup(deputy_id, date = Sys.Date())
get_eurogroup(deputy_id, date = Sys.Date())
Get parties, eu groups, postions
get_history(home_page, deputy_id)
get_history(home_page, deputy_id)
Get all languages in europarl
get_languages()
get_languages()
Get nationality, date of birth, place of birth and/or date of death
get_more_info(home_page)
get_more_info(home_page)
Get data about all deputies
get_nationalparty(deputy_id, date = Sys.Date())
get_nationalparty(deputy_id, date = Sys.Date())
Get all statements for P8
get_statements(deputy_id, browser)
get_statements(deputy_id, browser)
File should contains:
dbname = "dbname"
host = "host"
username = "username"
password = "password"
read_config( file = system.file("config/db_config.txt", package = "europarl"), delim = " " )
read_config( file = system.file("config/db_config.txt", package = "europarl"), delim = " " )
delim |
a delim parametr in read_delim |
name |
file name or path to file |
A tibble with dbname, host, username and password for database conncetion.
## Not run: read_config() read_config(file = "path/name.txt", delim = " ") ## End(Not run)
## Not run: read_config() read_config(file = "path/name.txt", delim = " ") ## End(Not run)
Function safe_html
tries to download the URL several times.
safe_html(page, time = 60, attempts = 10)
safe_html(page, time = 60, attempts = 10)
page |
requested URL |
time |
sleep interval after each failure |
attempts |
max number of tries (if there is a problem with connection) |
Function safe_html
performes 10 (by default) attempts to download the URL
and waits 60sec (by default) after each failure
character vector
Przemyslaw Biecek
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=008') safe_html(page) ## End(Not run)
## Not run: page <- paste0('http://www.sejm.gov.pl/Sejm7.nsf/', 'wypowiedz.xsp?posiedzenie=15&dzien=1&wyp=008') safe_html(page) ## End(Not run)
Function statements_core
downloads content and details(language, time) of the statement.
statements_core(url)
statements_core(url)
Get all statements for P8
statements_get_all_statements(deputy_id, browser, term_of_office = 8)
statements_get_all_statements(deputy_id, browser, term_of_office = 8)
Return links, titles, dates of sttamenets of given deputie.
statements_get_list_of_statements(deputy_id, browser, term_of_office = 8)
statements_get_list_of_statements(deputy_id, browser, term_of_office = 8)
Function statements_core
downloads content and details(language, time) of the statement.
statements_get_statement(url)
statements_get_statement(url)
Get time of statements
statements_get_time(url)
statements_get_time(url)
url |
A url of statements |
Returns duration, start time and end time of statement
Update stamenets in data abse
statements_update_statements(deputy_id, browser, term_of_office = 8, db)
statements_update_statements(deputy_id, browser, term_of_office = 8, db)
Remove white signs and comapre two strings
strings_identical(x, y)
strings_identical(x, y)
Returns TRUE or FALSE