Monday, 19 August 2019

Selenium driver: find elements by xpath; how do i parse a level 2 table (i.e. a table within a table)

I asked a question to get me to this point here, except since this was a specific different question I have it separate, but let me know if this isn't the right place.

I have this script:

from selenium import webdriver
from bs4 import BeautifulSoup
import os
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options


options = Options()
options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options,executable_path='/mnt/c/Users/kela/Desktop/selenium/chromedriver.exe')

#get the url
driver.get('http://147.8.185.62/services/NutriChem-2.0/')


#find the food name
element = driver.find_element_by_id("input_food_name")
element.send_keys("22663")


#click food-disease association
element = Select(driver.find_element_by_css_selector('[name=food_search_section]'))
element.select_by_value('food_disease')


#click submit and click plant-disease associations
driver.find_element_by_css_selector('[value="Submit"]').click()
driver.switch_to.frame(driver.find_element_by_css_selector('frame'))
driver.find_element_by_css_selector('[onclick*="plant-disease"]').click()


#to click into each drop down table rows
driver.switch_to_default_content()
driver.switch_to.frame(driver.find_element_by_name('mainFrame'))
driver.switch_to.frame(driver.find_element_by_name('ListWeb'))

This gets me to the page I want to scrape :

The next stage, for each of the grey boxes, I want to pull out (1) the PMID ID, (2) Plant, (3) direction (signified by whether the image is up_arrow.png or down_arrow.png, so just printing the image name is fine) and (4) The disease.

As you can see from my previous question, I am very new to selenium, and thought once I got to this stage, I would just loop through the table rows and print these with beautifulSoup. The short version of my issue is I just cannot get this to work.

Things I have tried:

Attempt 1:

rows = driver.find_elements_by_xpath("//table[@class='Level1Table']//tr[contains(@name,'hList')]")
test_row = rows[0]
print(test_row.text)

This above code will print 'Pomegranate Osteoartritis 3'; but then I can't work out how to loop within this (I just get empty data).

Attempt 2: Then I tried to loop through each r in rows, but that still only gives me the level 1 data. (i.e. just prints multiple lines of attempt 1).

Attempt 3:

rows = Select(driver.find_elements_by_xpath("//table[@class='Level2Table']//tr[contains(@name,'hList')]"))
print(rows)

Above, I wondering why can't I just run the same as attempt 1, but looping through the level 2 tables instead of level 1. This output is empty. I'm not sure why this doesn't work; I can see from inspecting the page that the level2table is there.

Attempt 4: This was the way I was originally thinking of doing it, but it doesn't work:

for row in rows.findAll('tr'):
        food_source = row.find_all('td')[1].text
        pmid = row.find_all('td')[0].text
        disease = row.find_all('td')[3].text
        #haven't figured out how to get the association direction yet
        print(food_source + '\t' + pmid + '\t' + disease + '\t' + association)

This is my first selenium script, so at this point I'm just out of my depth. Could someone please show me how to loop through the level 2 tables within the level 1 table and extract the required info (reference, plant, direction and disease).



from Selenium driver: find elements by xpath; how do i parse a level 2 table (i.e. a table within a table)

No comments:

Post a Comment