Thursday, 2 December 2021

What is the right approach to unittest this method in Python?

I have a scraping module on my application that uses Beautiful Soup and Selenium to get website info with this function:

def get_page(user: str) -> Optional[BeautifulSoup]:
    """Get a Beautiful Soup object that represents the user profile page in some website"""

    try:
        browser = webdriver.Chrome(options=options)
        wait = WebDriverWait(browser, 10)
        browser.get('https://somewebsite.com/' + user)
        wait.until(EC.presence_of_element_located((By.TAG_NAME, 'article')))
    except TimeoutException:
        print("User hasn't been found. Try another user.")
        return None
    return BeautifulSoup(browser.page_source, 'lxml')

I need to test this function in two ways:

  • if it is getting a page (the success case);
  • and if it is printing the warning and returning None when it's not getting any page (the failure case).

I tried to test like this:

class ScrapeTests(unittest.TestCase):
    def test_get_page_success(self):
        """
        Test if get_page is getting a page
        """
        self.assertEqual(isinstance(sc.get_page('myusername'), BeautifulSoup), True)

    def test_get_page_not_found(self):
        """
        Test if get_page returns False when looking for a user
        that doesn't exists
        """
        self.assertEqual(sc.get_page('iwçl9239jaçklsdjf'), None)


if __name__ == '__main__':
    unittest.main()

Doing it like that makes the tests somewhat slower, as get_page itself is slow in success case and in the failure case I'm forcing a timeout error looking for an non existing user. I have the impression that my approach for testing this functions are not the right one. Probably the best way to test it is to fake a response, so get_page won't need to really connect to the server and ask for anything.

So I have two questions:

  1. Is this "fake web response" ideia the right approach to test this function?
  2. If so, how can I achieve it for that function? Do I need to rewrite the get_page function so it can be "testable"?

EDIT:

I tried to create a test to get_page like this:

class ScrapeTests(TestCase):
    def setUp(self) -> None:
        self.driver = mock.patch(
            'scrape.webdriver.Chrome',
            autospec=True
        )
        self.driver.page_source.return_value = "<html><head></head><body><article>Yes</article></body></html>"
        self.driver.start()

    def tearDown(self) -> None:
        self.driver.stop()

    def test_get_page_success(self):
        """
        Test if get_page is getting a page
        """
        self.assertEqual(isinstance(sc.get_page('whatever'), BeautifulSoup), True)

The problem I'm facing is that the driver.page_source attribute is created only after the wait.until function call. I need the wait.until because I need that Selenium browser waits the javascript create the article tags in HTML in order for me to scrape them.

When I try to define a return value for page source in setUp, I get an error: AttributeError: '_patch' object has no attribute 'page_source'

I tried lots of ways to mock webdriver attributes with mock.patch but it seems very difficult to my little knowledge. I think that maybe the best way to achieve what I desire (test get_page function without need to realy connect to a server) is to mock an entire webserver connection. But this is just a guess, in fact.



from What is the right approach to unittest this method in Python?

No comments:

Post a Comment