RSelenium - For loop through multiple webpages, grab data and paste it into data.frame

44 Views Asked by At

I'm trying to loop through a job listing website to grab their job listing and do text analysis. For this job I use RSelenium. The code I am working on is as follows:

#### REMOTE.COM ####
remDR$navigate('https://remote.com/jobs/all?query=marketing&country=anywhere')
# click on the cookies policy
remDR$findElement(using = 'xpath', '//*[@id="ccc-notify-accept"]')$clickElement()
# print all job listings
num_links <- 20
for(i in 1:num_links){
  remDR$findElement(using = 'xpath', 
                    paste('/html/body/div[2]/main/div/div/div[3]/article[',i,']', sep = ''))$clickElement()
  print(remDR$getCurrentUrl())
  remDR$goBack()
}

The problem is that when I get the loop started, two issues occur.

First, the print(remDR$getCurrentUrl()) command returns the original url (https://remote.com/jobs/all?query=marketing&country=anywhere), not the page that was clicked on in the first part of the for loop. Second, when remDR$goBack() executes, it takes me back to the previous blank page, as if there was no link clicked on.

To summarize, I think the loop is running faster than Rselenium takes to find and click on the element.

EDIT

Solution was found thanks to a recommendation:

for(i in 1:5){
  remDR$findElement(using = 'xpath', 
                    paste('/html/body/div[2]/main/div/div/div[3]/article[',i,']', sep = ''))$clickElement()
  Sys.sleep(2) # add time for page to load
  print(remDR$getCurrentUrl())
  remDR$navigate('https://remote.com/jobs/all?query=marketing&country=anywhere') # .$navigate() works better as it makes the page load and give you time
  Sys.sleep(2) # add time for page to load
}

The steps taken were to give chrome time to load the page Sys.sleep(2) and use .$navigate() instead of goBack(), reason is .$navigate() load content in browser. Important note, loop won't work without the final Sys.sleep(2) as you need the first page to completely load before the loop clicks on the second item.

0

There are 0 best solutions below