I have a scraping module on my application that uses Beautiful Soup and Selenium to get website info with this function:
def get_page(user: str) -> Optional[BeautifulSoup]:
"""Get a Beautiful Soup object that represents the user profile page in some website"""
try:
browser = webdriver.Chrome(options=options)
wait = WebDriverWait(browser, 10)
browser.get('https://somewebsite.com/' + user)
wait.until(EC.presence_of_element_located((By.TAG_NAME, 'article')))
except TimeoutException:
print("User hasn't been found. Try another user.")
return None
return BeautifulSoup(browser.page_source, 'lxml')
I need to test this function in two ways:
- if it is getting a page (the success case);
- and if it is printing the warning and returning None when it's not getting any page (the failure case).
I tried to test like this:
class ScrapeTests(unittest.TestCase):
def test_get_page_success(self):
"""
Test if get_page is getting a page
"""
self.assertEqual(isinstance(sc.get_page('myusername'), BeautifulSoup), True)
def test_get_page_not_found(self):
"""
Test if get_page returns False when looking for a user
that doesn't exists
"""
self.assertEqual(sc.get_page('iwçl9239jaçklsdjf'), None)
if __name__ == '__main__':
unittest.main()
Doing it like that makes the tests somewhat slower, as get_page itself is slow in success case and in the failure case I'm forcing a timeout error looking for an non existing user. I have the impression that my approach for testing this functions are not the right one. Probably the best way to test it is to fake a response, so get_page won't need to really connect to the server and ask for anything.
So I have two questions:
- Is this "fake web response" ideia the right approach to test this function?
- If so, how can I achieve it for that function? Do I need to rewrite the
get_pagefunction so it can be "testable"?
EDIT:
I tried to create a test to get_page like this:
class ScrapeTests(TestCase):
def setUp(self) -> None:
self.driver = mock.patch(
'scrape.webdriver.Chrome',
autospec=True
)
self.driver.page_source.return_value = "<html><head></head><body><article>Yes</article></body></html>"
self.driver.start()
def tearDown(self) -> None:
self.driver.stop()
def test_get_page_success(self):
"""
Test if get_page is getting a page
"""
self.assertEqual(isinstance(sc.get_page('whatever'), BeautifulSoup), True)
The problem I'm facing is that the driver.page_source attribute is created only after the wait.until function call. I need the wait.until because I need that Selenium browser waits the javascript create the article tags in HTML in order for me to scrape them.
When I try to define a return value for page source in setUp, I get an error: AttributeError: '_patch' object has no attribute 'page_source'
I tried lots of ways to mock webdriver attributes with mock.patch but it seems very difficult to my little knowledge. I think that maybe the best way to achieve what I desire (test get_page function without need to realy connect to a server) is to mock an entire webserver connection. But this is just a guess, in fact.
from What is the right approach to unittest this method in Python?
No comments:
Post a Comment