This function is used to scrape an html table from a website.
table_scrap(link, choose = 1, header = TRUE, fill = FALSE, askRobot = FALSE)
link | the link of the web page containing the table to scrape |
---|---|
choose | an integer indicating which table to scrape |
header | do you want the first line to be the leader (default to TRUE) |
fill | logical. Should be set to TRUE when the table has an inconsistent number of columns. |
askRobot | logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE. |
a data frame object.
# \donttest{ # Extracting premier ligue 2019/2020 top scorers link <- "https://www.topscorersfootball.com/premier-league" table_scrap(link)# }#> # Player Team Nationality Goals #> 1 1 Mohamed Salah Liverpool Egypt 13 #> 2 2 Son Heung-Min Tottenham South Korea 12 #> 3 3 Bruno Fernandes Manchester United Portugal 11 #> 4 NA Dominic Calvert-Lewin Everton England 11 #> 5 NA Jamie Vardy Leicester England 11 #> 6 6 Harry Kane Tottenham England 10 #> 7 NA Patrick Bamford Leeds England 10 #> 8 8 Callum Wilson Newcastle England 8 #> 9 NA Wilfried Zaha Crystal Palace England 8 #> 10 10 Alexandre Lacazette Arsenal France 7 #> 11 NA Danny Ings Southampton England 7 #> 12 NA Marcus Rashford Manchester United England 7 #> 13 13 Neal Maupay Brighton France 6 #> 14 NA Ollie Watkins Aston Villa England 6 #> 15 NA Sadio Mané Liverpool Senegal 6 #> 16 NA Tammy Abraham Chelsea England 6 #> 17 17 Anwar El Ghazi Aston Villa Netherlands 5 #> 18 NA Diogo Jota Liverpool Portugal 5 #> 19 NA Harvey Barnes Leicester England 5 #> 20 NA Jack Grealish Aston Villa England 5 #> 21 NA Roberto Firmino Liverpool Brazil 5 #> 22 NA Tomas Soucek West Ham Czech Republic 5