This function is used to scrape a tibble from a website.

tidy_scrap(link, nodes, colnames, clean = FALSE, askRobot = FALSE)

Arguments

link

the link of the web page to scrape

nodes

the vector of HTML or CSS elements to consider, the SelectorGadget tool is highly recommended.

colnames

the names of the expected columns.

clean

logical. Should the function clean the extracted tibble or not ? Default is FALSE.

askRobot

logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a tidy data frame.

Examples

# \donttest{ # Extracting imdb movie titles and rating link <- "https://www.imdb.com/chart/top/" my_nodes <- c(".titleColumn a", "strong") names <- c("title", "rating") tidy_scrap(link, my_nodes, names)# }
#> # A tibble: 250 x 2 #> title rating #> <chr> <chr> #> 1 The Shawshank Redemption 9.2 #> 2 The Godfather 9.1 #> 3 The Godfather: Part II 9.0 #> 4 The Dark Knight 9.0 #> 5 12 Angry Men 8.9 #> 6 Schindler's List 8.9 #> 7 The Lord of the Rings: The Return of the King 8.9 #> 8 Pulp Fiction 8.8 #> 9 Il buono, il brutto, il cattivo 8.8 #> 10 The Lord of the Rings: The Fellowship of the Ring 8.8 #> # ... with 240 more rows