This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.
titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)
the link of the web page to scrape
filter the titles according to a character string provided.
logical. Should the contain argument be case sensitive ? defaults to FALSE
logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE
a character vector
# \donttest{
# Extracting the current titles of the New York Times
link <- "https://www.nytimes.com/"
titles_scrap(link)# }
#> [1] "New York Times - Top Stories" "More News"
#> [3] "The AthleticSports coverage" "Well"
#> [5] "Culture and Lifestyle" "AudioPodcasts and narrated articles"
#> [7] "GamesDaily puzzles" "Site Index"
#> [9] "Site Information Navigation" "Sections"
#> [11] "Top Stories" "Newsletters"
#> [13] "Podcasts" "Sections"
#> [15] "Top Stories" "Newsletters"
#> [17] "Sections" "Top Stories"
#> [19] "Newsletters" "Podcasts"
#> [21] "Sections" "Recommendations"
#> [23] "Newsletters" "Podcasts"
#> [25] "Sections" "Columns"
#> [27] "Newsletters" "Podcasts"
#> [29] "Sections" "Topics"
#> [31] "Columnists" "Podcasts"
#> [33] "Audio" "Listen"
#> [35] "Featured" "Newsletters"
#> [37] "Games" "Play"
#> [39] "Community" "Newsletters"
#> [41] "Cooking" "Recipes"
#> [43] "Editors' Picks" "Newsletters"
#> [45] "Wirecutter" "Reviews"
#> [47] "The Best..." "Newsletters"
#> [49] "The Athletic" "Leagues"
#> [51] "Top Stories" "Newsletters"
#> [53] "Play" "Sections"
#> [55] "Top Stories" "Newsletters"
#> [57] "Podcasts" "Sections"
#> [59] "Top Stories" "Newsletters"
#> [61] "Sections" "Top Stories"
#> [63] "Newsletters" "Podcasts"
#> [65] "Sections" "Recommendations"
#> [67] "Newsletters" "Podcasts"
#> [69] "Sections" "Columns"
#> [71] "Newsletters" "Podcasts"
#> [73] "Sections" "Topics"
#> [75] "Columnists" "Podcasts"
#> [77] "Audio" "Listen"
#> [79] "Featured" "Newsletters"
#> [81] "Games" "Play"
#> [83] "Community" "Newsletters"
#> [85] "Cooking" "Recipes"
#> [87] "Editors' Picks" "Newsletters"
#> [89] "Wirecutter" "Reviews"
#> [91] "The Best..." "Newsletters"
#> [93] "The Athletic" "Leagues"
#> [95] "Top Stories" "Newsletters"
#> [97] "Play"