Sends a request to Google Scholar service and retrieves results (title, authors, source and year of publications, and the total number of citations).

As no API is provided by Google Scholar (except the one for authors with a Google Scholar ID), this function scraps the service using the package RSelenium.

To bypass Google IP bans, the IP address and the User agent will be changed in case of ban.

scrap_gscholar(
  search_terms,
  exact = TRUE,
  exclude_terms = NULL,
  search_author = NULL,
  search_source = NULL,
  metadata = FALSE,
  where = NULL,
  years = NULL,
  lang = NULL,
  start = 0,
  n_max = NULL,
  include_patents = FALSE,
  include_citations = FALSE,
  ovpn_country,
  agent = TRUE,
  verbose = TRUE,
  keep_html = FALSE,
  output_path = "."
)

Arguments

search_terms

a character of length 1. Terms to search papers for (optional).

exact

a logical. If TRUE, search for the exact terms, otherwise search at least one of the terms.

exclude_terms

a character of length 1. Terms to exclude from the search (optional).

search_author

a character of length 1. Authors to search for (optional).

search_source

a character of length 1. Publication sources to search for (optional).

metadata

a logical. If TRUE, all publications data are extracted. Otherwise, only the total number of publications is returned.

where

a character of length 1. One among 'any' (search in the whole document) or 'title' (search only in the title).

years

a integer of length 1 or 2. Year(s) specifying the temporal extent of the search.

lang

a character of length 1. The ISO-2 code of the language to search for. Use get_languages() to get a list (optional).

start

a numeric of length 1. The number of the first results from which the results are extracted (default is 0, start from the first result).

n_max

a numeric of length 1. The number of results to extract.

include_patents

a logical. If TRUE, patents are included in the search results.

include_citations

a logical. If TRUE, citations are included in the search results.

ovpn_country

a character vector. The ISO-2 code of the country to pick up a VPN server. Use get_countries() to get a list.

agent

a logical. If TRUE, web browser user agent will be randomly changed.

verbose

a logical. If TRUE, connection and scraping information are printing.

keep_html

a logical. If TRUE, raw HTML pages are kept.

output_path

a character of length 1. The path to the folder to save data.

Value

No return value.

Examples

if (FALSE) {
scrap_gscholar()
}