This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.

titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)

Arguments

the link of the web page to scrape

contain

filter the titles according to a character string provided.

case_sensitive

logical. Should the contain argument be case sensitive ? defaults to FALSE

askRobot

logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE

Value

a character vector

Examples

# \donttest{
# Extracting the current titles of the New York Times

link     <- "https://www.nytimes.com/"

titles_scrap(link)# }
#>  [1] "New York Times - Top Stories"        "What to Watch and Read"             
#>  [3] "More News"                           "The AthleticSports coverage"        
#>  [5] "Well"                                "Culture and Lifestyle"              
#>  [7] "AudioPodcasts and narrated articles" "GamesDaily puzzles"                 
#>  [9] "Site Index"                          "Site Information Navigation"        
#> [11] "Sections"                            "Top Stories"                        
#> [13] "Newsletters"                         "Podcasts"                           
#> [15] "Sections"                            "Top Stories"                        
#> [17] "Newsletters"                         "Sections"                           
#> [19] "Top Stories"                         "Newsletters"                        
#> [21] "Podcasts"                            "Sections"                           
#> [23] "Recommendations"                     "Newsletters"                        
#> [25] "Podcasts"                            "Sections"                           
#> [27] "Columns"                             "Newsletters"                        
#> [29] "Podcasts"                            "Sections"                           
#> [31] "Topics"                              "Columnists"                         
#> [33] "Podcasts"                            "Audio"                              
#> [35] "Listen"                              "Featured"                           
#> [37] "Newsletters"                         "Games"                              
#> [39] "Play"                                "Community"                          
#> [41] "Newsletters"                         "Cooking"                            
#> [43] "Recipes"                             "Editors' Picks"                     
#> [45] "Newsletters"                         "Wirecutter"                         
#> [47] "Reviews"                             "The Best..."                        
#> [49] "Newsletters"                         "The Athletic"                       
#> [51] "Leagues"                             "Top Stories"                        
#> [53] "Newsletters"                         "Play"                               
#> [55] "Sections"                            "Top Stories"                        
#> [57] "Newsletters"                         "Podcasts"                           
#> [59] "Sections"                            "Top Stories"                        
#> [61] "Newsletters"                         "Sections"                           
#> [63] "Top Stories"                         "Newsletters"                        
#> [65] "Podcasts"                            "Sections"                           
#> [67] "Recommendations"                     "Newsletters"                        
#> [69] "Podcasts"                            "Sections"                           
#> [71] "Columns"                             "Newsletters"                        
#> [73] "Podcasts"                            "Sections"                           
#> [75] "Topics"                              "Columnists"                         
#> [77] "Podcasts"                            "Audio"                              
#> [79] "Listen"                              "Featured"                           
#> [81] "Newsletters"                         "Games"                              
#> [83] "Play"                                "Community"                          
#> [85] "Newsletters"                         "Cooking"                            
#> [87] "Recipes"                             "Editors' Picks"                     
#> [89] "Newsletters"                         "Wirecutter"                         
#> [91] "Reviews"                             "The Best..."                        
#> [93] "Newsletters"                         "The Athletic"                       
#> [95] "Leagues"                             "Top Stories"                        
#> [97] "Newsletters"                         "Play"