Sends a request to Google Scholar service and retrieves results (title, authors, source and year of publications, and the total number of citations).
As no API is provided by Google Scholar (except the one for authors with a
Google Scholar ID), this function scraps the service using the package
RSelenium
.
To bypass Google IP bans, the IP address and the User agent will be changed in case of ban.
scrap_gscholar(
search_terms,
exact = TRUE,
exclude_terms = NULL,
search_author = NULL,
search_source = NULL,
metadata = FALSE,
where = NULL,
years = NULL,
lang = NULL,
start = 0,
n_max = NULL,
include_patents = FALSE,
include_citations = FALSE,
ovpn_country,
agent = TRUE,
verbose = TRUE,
keep_html = FALSE,
output_path = "."
)
a character
of length 1. Terms to search papers for
(optional).
a logical
. If TRUE
, search for the exact terms, otherwise
search at least one of the terms.
a character
of length 1. Terms to exclude from the
search (optional).
a character
of length 1. Authors to search for
(optional).
a character
of length 1. Publication sources to
search for (optional).
a logical
. If TRUE
, all publications data are extracted.
Otherwise, only the total number of publications is returned.
a character
of length 1. One among 'any'
(search in the
whole document) or 'title'
(search only in the title).
a integer
of length 1 or 2. Year(s) specifying the temporal
extent of the search.
a character
of length 1. The ISO-2 code of the language to
search for. Use get_languages()
to get a list (optional).
a numeric
of length 1. The number of the first results from
which the results are extracted (default is 0
, start from the first
result).
a numeric
of length 1. The number of results to extract.
a logical
. If TRUE
, patents are included in the
search results.
a logical
. If TRUE
, citations are included in
the search results.
a character
vector. The ISO-2 code of the country to
pick up a VPN server. Use get_countries()
to get a list.
a logical
. If TRUE
, web browser user agent will be randomly
changed.
a logical
. If TRUE
, connection and scraping information
are printing.
a logical
. If TRUE
, raw HTML pages are kept.
a character
of length 1. The path to the folder to
save data.
No return value.
if (FALSE) {
scrap_gscholar()
}