This function is used to scrape attributes from HTML elements

attribute_scrap(link, node, attr, askRobot = FALSE)

Arguments

the link of the web page to scrape

node

the HTML element to consider

attr

the attribute to scrape

askRobot

logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a character vector.

Examples

# \donttest{
# Extracting the web links within the World Bank research and publications page

link <- "https://ropensci.org/"

# scraping the class attributes' names from all the anchor

attribute_scrap(link = link, node = "a", attr = "class")
#>   [1] "navbar-brand logo"              "dropdown-item lang-nav"        
#>   [3] "dropdown-item lang-nav"         "dropdown-item lang-nav"        
#>   [5] "dropdown-item lang-nav"         "nav-link"                      
#>   [7] NA                               NA                              
#>   [9] NA                               "nav-link"                      
#>  [11] NA                               "nav-link"                      
#>  [13] NA                               NA                              
#>  [15] NA                               NA                              
#>  [17] "nav-link"                       NA                              
#>  [19] "nav-link"                       NA                              
#>  [21] NA                               NA                              
#>  [23] NA                               NA                              
#>  [25] NA                               NA                              
#>  [27] NA                               NA                              
#>  [29] NA                               NA                              
#>  [31] NA                               NA                              
#>  [33] NA                               NA                              
#>  [35] NA                               "nav-link"                      
#>  [37] NA                               NA                              
#>  [39] NA                               NA                              
#>  [41] "external-link"                  "external-link"                 
#>  [43] "nav-link"                       NA                              
#>  [45] NA                               "external-link"                 
#>  [47] NA                               "btn btn-primary"               
#>  [49] "btn btn-primary"                "btn btn-primary"               
#>  [51] "btn btn-primary"                "link-arrow package-cta"        
#>  [53] "card-link link-arrow"           "card-link link-arrow"          
#>  [55] "card-link link-arrow"           "link-arrow"                    
#>  [57] "card-link link-arrow"           "card-link link-arrow"          
#>  [59] "card-link link-arrow"           "link-arrow"                    
#>  [61] "card-link link-arrow mt-auto"   "card-link link-arrow mb-auto"  
#>  [63] "card-link link-arrow mb-auto"   "link-arrow"                    
#>  [65] "card-link link-arrow"           "card-link link-arrow"          
#>  [67] "card-link link-arrow"           "link-arrow"                    
#>  [69] "card-link link-arrow"           "card-link link-arrow"          
#>  [71] "card-link link-arrow"           "link-arrow package-cta"        
#>  [73] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [75] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [77] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [79] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [81] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [83] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [85] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [87] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [89] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [91] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [93] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [95] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [97] "packages-title"                 "link-arrow link-arrow-sm"      
#>  [99] "pack-card"                      "pack-card"                     
#> [101] "section-news__link"             "section-news__link"            
#> [103] "card-title"                     "card-link link-arrow"          
#> [105] NA                               "card-title"                    
#> [107] "card-link link-arrow"           "pack-card"                     
#> [109] "pack-card"                      NA                              
#> [111] NA                               NA                              
#> [113] NA                               NA                              
#> [115] "footer-nav__link"               "footer-nav__link"              
#> [117] "footer-nav__link"               "footer-nav__link"              
#> [119] "footer-nav__link"               "footer-nav__link"              
#> [121] "footer-nav__link"               "footer-nav__link"              
#> [123] "footer-nav__link"               "footer-nav__link"              
#> [125] "footer-nav__link"               "footer-nav__link external-link"
#> [127] "footer-nav__link"               "footer-nav__link"              
#> [129] "footer-nav__link"               "footer-nav__link"              
#> [131] "footer-nav__link"               "footer-nav__link external-link"
#> [133] "footer-nav__link"               "footer-nav__link"              
#> [135] NA                               NA                              
#> [137] NA                               NA                              
#> [139] NA                               NA                              
#> [141] NA                               NA                              
#> [143] NA                               NA                              
#> [145] NA                              
# }