Tuesday 15 August 2023

How to Delay an Action until Webpage Loads

I am using selenium within R.

I have the following script which searches Google Maps for all pizza restaurants around a given geographical coordinate - and then keeps scrolling until all restaurants are loaded.

First, I navigate to the starting page:

library(RSelenium)
library(wdman)
library(netstat)

selenium()
seleium_object <- selenium(retcommand = T, check = F)

remote_driver <- rsDriver(browser = "chrome", chromever = "114.0.5735.90", verbose = F, port = free_port())

remDr<- remote_driver$client

lat <- 40.7484
lon <- -73.9857

# Create the URL using the paste function
URL <- paste0("https://www.google.com/maps/search/pizza/@", lat, ",", lon, ",17z/data=!3m1!4b1!4m6!2m5!3m4!2s", lat, ",", lon, "!4m2!1d", lon, "!2d", lat, "?entry=ttu")

# Navigate to the URL
remDr$navigate(URL)

Then, I use the following code to keep scrolling until all entries have been loaded:

# Waits 10 seconds for the elements to load before scrolling
elements <- remDr$findElements(using = "css selector", "div.qjESne")

while (TRUE) {
    new_elements <- remDr$findElements(using = "css selector", "div.qjESne")

    # Pick the last element in the list - this is the one we want to scroll to
    last_element <- elements[[length(elements)]]
    # Scroll to the last element
    remDr$executeScript("arguments[0].scrollIntoView(true);", list(last_element))
    Sys.sleep(10)

    # Update the elements list
    elements <- new_elements

    # Check if there are any new elements loaded - the "You've reached the end of the list." message
    if (length(remDr$findElements(using = "css selector", "span.HlvSq")) > 0) {
        print("No more elements")
        break
    }
}

Finally, I use this code to extract the names and addresses of all restaurants:

titles <- c()
addresses <- c()

# Check if there are any new elements loaded - the "You've reached the end of the list." message
if (length(remDr$findElements(using = "css selector", "span.HlvSq")) > 0) {
    # now we can parse the data since all the elements loaded
    for (data in remDr$findElements(using = "css selector", "div.lI9IFe")) {
        title <- data$findElement(using = "css selector", "div.qBF1Pd.fontHeadlineSmall")$getElementText()[[1]]
        restaurant <- data$findElement(using = "css selector", ".W4Efsd > span:nth-of-type(2)")$getElementText()[[1]]

        titles <- c(titles, title)
        addresses <- c(addresses, restaurant)
    }

    # This converts the list of titles and addresses into a dataframe
    df <- data.frame(title = titles, address = addresses)
    print(df)
    break
}

My Question: Instead of using Sys.sleep() in R, I am trying to change my code such that only scrolls (i.e. delays the action) once the previous action has been completed. I am noticing that my existing code often freezes half way through and I suspect that this is because I am trying to load a new page when the existing page is not fully loaded. I think it might be better to somehow delay the action and wait for the page to be fully loaded prior to proceeding.

Can someone please show me how I might be able to delay my script and force it to wait for the existing page to load before loading a new page? (e.g. R - Waiting for page to load in RSelenium with PhantomJS)

Thanks!

Note: I am also open to a Python solution.



from How to Delay an Action until Webpage Loads

No comments:

Post a Comment