Fix an error to getting city info from a list of ip through an ip checker website (html+table) to a data frame

28 Views Asked by At

I have a list of ip's that I need to plot on a map and I would like to use this website Ip Checker to extract the info about the city and make a data frame.

the structure of the table that the website generates is the following complete table from the site

the ROW #7 contains the information about the City if present, if not it contains the latitude and longitude, but that is a problem for later

The final result that I'm trying to achieve

IP CITY Number of Occurrences
64.4.99.114 Covington 2
201.105.211.222 Xalapa 2
73.6.19.32 Texas City 1

and so on

So far this is the result that I have obtained with the manual insertion

    url_ip <- "https://ipleak.net/?q=64.4.99.114"
      
      page <- read_html(url_ip) 
      
      tbls <- html_nodes(page, "table")
      
      tbls2 <- html_table(tbls)
      
      table2 <- table2[c(1,7),2]
      
      table2 <- t(table2)
    
city_table <-  as.data.frame(table2)

The final result of the manual insertion

V1 V2
64.4.99.114 Covington

this is what I have produced so far

if (!require("pacman"))
  install.packages("pacman")
pacman::p_load(
  dplyr,
  readr,
  ggplot2,
  hrbrthemes,
  RColorBrewer,
  lubridate,
  countrycode,
  stringi,
  tidyverse,
  rvest,
  purrr,
  tidyr,
  openxlsx,
)


ip_range <- threedsecure_authentication_report_tidy %>% pull(shopper_ip) #this is from our payment's system report generator

ip_range <- unique(ip_range)

ip_range <- trimws(ip_range)

purrr::map_df(ip_range, ~ { 
  
  url_reviews <- paste0("https://ipleak.net/?q,.x)
  
  page <- read_html(url_reviews) 
  
  # Review Question
  tbls <- html_nodes(page, "table")
  
  tbls2 <- html_table(tbls)
  
  table2 <- table2[c(1,7),2]
  
  table2 <- t(table2)
  
  table2 <- as.data.frame(table2) -> q #I'm not sure about this part
  
  data.frame(q) }) -> ip_geo

but I get this: Error in open.connection(x, "rb") : HTTP error 504

This is the traceback

12.
open.connection(x, "rb") 
11.
open(x, "rb") 
10.
read_xml.connection(con, encoding = encoding, ..., as_html = as_html, 
    base_url = x, options = options) 
9.
read_xml.character(x, encoding = encoding, ..., as_html = TRUE, 
    options = options) 
8.
read_xml(x, encoding = encoding, ..., as_html = TRUE, options = options) 
7.
withCallingHandlers(expr, warning = function(w) if (inherits(w, 
    classes)) tryInvokeRestart("muffleWarning")) 
6.
suppressWarnings(read_xml(x, encoding = encoding, ..., as_html = TRUE, 
    options = options)) 
5.
read_html.default(url_reviews) 
4.
read_html(url_reviews) 
3.
.f(.x[[i]], ...) 
2.
map(.x, .f, ...) 
1.
purrr::map_df(ip_range, ~{
    url_reviews <- paste0("https://ipleak.net/?q=", .x)
    page <- read_html(url_reviews)
    tbls <- html_nodes(page, "table") ... 

I have also saw R packages that do this but they do not seem to work: ip2location

0

There are 0 best solutions below