I asked a question to get me to this point here, except since this was a specific different question I have it separate, but let me know if this isn't the right place.
I have this script:
from selenium import webdriver
from bs4 import BeautifulSoup
import os
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
options = Options()
options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options,executable_path='/mnt/c/Users/kela/Desktop/selenium/chromedriver.exe')
#get the url
driver.get('http://147.8.185.62/services/NutriChem-2.0/')
#find the food name
element = driver.find_element_by_id("input_food_name")
element.send_keys("22663")
#click food-disease association
element = Select(driver.find_element_by_css_selector('[name=food_search_section]'))
element.select_by_value('food_disease')
#click submit and click plant-disease associations
driver.find_element_by_css_selector('[value="Submit"]').click()
driver.switch_to.frame(driver.find_element_by_css_selector('frame'))
driver.find_element_by_css_selector('[onclick*="plant-disease"]').click()
#to click into each drop down table rows
driver.switch_to_default_content()
driver.switch_to.frame(driver.find_element_by_name('mainFrame'))
driver.switch_to.frame(driver.find_element_by_name('ListWeb'))
This gets me to the page I want to scrape
:
The next stage, for each of the grey boxes, I want to pull out (1) the PMID ID, (2) Plant, (3) direction (signified by whether the image is up_arrow.png or down_arrow.png, so just printing the image name is fine) and (4) The disease.
As you can see from my previous question, I am very new to selenium, and thought once I got to this stage, I would just loop through the table rows and print these with beautifulSoup. The short version of my issue is I just cannot get this to work.
Things I have tried:
Attempt 1:
rows = driver.find_elements_by_xpath("//table[@class='Level1Table']//tr[contains(@name,'hList')]")
test_row = rows[0]
print(test_row.text)
This above code will print 'Pomegranate Osteoartritis 3'; but then I can't work out how to loop within this (I just get empty data).
Attempt 2: Then I tried to loop through each r in rows, but that still only gives me the level 1 data. (i.e. just prints multiple lines of attempt 1).
Attempt 3:
rows = Select(driver.find_elements_by_xpath("//table[@class='Level2Table']//tr[contains(@name,'hList')]"))
print(rows)
Above, I wondering why can't I just run the same as attempt 1, but looping through the level 2 tables instead of level 1. This output is empty. I'm not sure why this doesn't work; I can see from inspecting the page that the level2table is there.
Attempt 4: This was the way I was originally thinking of doing it, but it doesn't work:
for row in rows.findAll('tr'):
food_source = row.find_all('td')[1].text
pmid = row.find_all('td')[0].text
disease = row.find_all('td')[3].text
#haven't figured out how to get the association direction yet
print(food_source + '\t' + pmid + '\t' + disease + '\t' + association)
This is my first selenium script, so at this point I'm just out of my depth. Could someone please show me how to loop through the level 2 tables within the level 1 table and extract the required info (reference, plant, direction and disease).
from Selenium driver: find elements by xpath; how do i parse a level 2 table (i.e. a table within a table)
No comments:
Post a Comment