This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.

titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)

Arguments

the link of the web page to scrape

contain

filter the titles according to a character string provided.

case_sensitive

logical. Should the contain argument be case sensitive ? defaults to FALSE

askRobot

logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE

Value

a character vector

Examples

# \donttest{
# Extracting the current titles of the New York Times

link     <- "https://www.nytimes.com/"

titles_scrap(link)# }
#>  [1] "New York Times - Top Stories"        "More News"                          
#>  [3] "The AthleticSports coverage"         "Well"                               
#>  [5] "Culture and Lifestyle"               "AudioPodcasts and narrated articles"
#>  [7] "GamesDaily puzzles"                  "Site Index"                         
#>  [9] "Site Information Navigation"         "Sections"                           
#> [11] "Top Stories"                         "Newsletters"                        
#> [13] "Podcasts"                            "Sections"                           
#> [15] "Top Stories"                         "Newsletters"                        
#> [17] "Sections"                            "Top Stories"                        
#> [19] "Newsletters"                         "Podcasts"                           
#> [21] "Sections"                            "Recommendations"                    
#> [23] "Newsletters"                         "Podcasts"                           
#> [25] "Sections"                            "Columns"                            
#> [27] "Newsletters"                         "Podcasts"                           
#> [29] "Sections"                            "Topics"                             
#> [31] "Columnists"                          "Podcasts"                           
#> [33] "Audio"                               "Listen"                             
#> [35] "Featured"                            "Newsletters"                        
#> [37] "Games"                               "Play"                               
#> [39] "Community"                           "Newsletters"                        
#> [41] "Cooking"                             "Recipes"                            
#> [43] "Editors' Picks"                      "Newsletters"                        
#> [45] "Wirecutter"                          "Reviews"                            
#> [47] "The Best..."                         "Newsletters"                        
#> [49] "The Athletic"                        "Leagues"                            
#> [51] "Top Stories"                         "Newsletters"                        
#> [53] "Play"                                "Sections"                           
#> [55] "Top Stories"                         "Newsletters"                        
#> [57] "Podcasts"                            "Sections"                           
#> [59] "Top Stories"                         "Newsletters"                        
#> [61] "Sections"                            "Top Stories"                        
#> [63] "Newsletters"                         "Podcasts"                           
#> [65] "Sections"                            "Recommendations"                    
#> [67] "Newsletters"                         "Podcasts"                           
#> [69] "Sections"                            "Columns"                            
#> [71] "Newsletters"                         "Podcasts"                           
#> [73] "Sections"                            "Topics"                             
#> [75] "Columnists"                          "Podcasts"                           
#> [77] "Audio"                               "Listen"                             
#> [79] "Featured"                            "Newsletters"                        
#> [81] "Games"                               "Play"                               
#> [83] "Community"                           "Newsletters"                        
#> [85] "Cooking"                             "Recipes"                            
#> [87] "Editors' Picks"                      "Newsletters"                        
#> [89] "Wirecutter"                          "Reviews"                            
#> [91] "The Best..."                         "Newsletters"                        
#> [93] "The Athletic"                        "Leagues"                            
#> [95] "Top Stories"                         "Newsletters"                        
#> [97] "Play"