Thursday, 7 October 2021

Selenium + Flask/Falcon in Python - 502 Bad Gateway Error

I'm using selenium to make a headless scraping of a website within an endpoint of an API using Flask for Python. I made several tests and my selenium scraping code works perfectly within a script and while running as an API in the localhost. However, when I deploy the code in a remote server, the requests always return a 502 Bad Gateway error. It is weird because by logging I can see that the scraping is working correctly, but the server responds with 502 before the scraping finish processing, as if it was trying to set up a proxy and it fails. I also noticed that removing the time.sleep in my code makes it return a 200 although the result could be wrong because it doesn't give selenium the proper time to load the all the page to scrape.

I also tried to set up to use falcon instead of flask and I get a similar error. This is a sample of my recent code using Falcon:

class GetUrl(object):

    def on_get(self, req, resp):
        """
        Get Request
        :param req:
        :param resp:
        :return:
        """

        # read parameter
        req_body = req.bounded_stream.read()
        json_data = json.loads(req_body.decode('utf8'))
        url = json_data.get("url")

        # get the url
        options = Options()
        options.add_argument("--headless")
        driver = webdriver.Firefox(firefox_options=options)

        driver.get(url)
        time.sleep(5)
        result = False

        # check for outbound links
        content = driver.find_elements_by_xpath("//a[@class='_52c6']")
        if len(content) > 0:
            href = content[0].get_attribute("href")
            result = True

        driver.quit()

        # make the return
        return_doc = {"result": result}
        resp.body = json.dumps(return_doc, sort_keys=True, indent=2)
        resp.content_type = 'text/string'
        resp.append_header('Access-Control-Allow-Origin', "*")
        resp.status = falcon.HTTP_200

I saw some other similar issues like this, but even though I can see that there is a gunicorn running in my server, I don't have nginx, or at least it is not running where it should running. And I don't think Falcon uses it. So, what exactly am I doing wrong? Some light in this issue is highly appreciated, thank you!



from Selenium + Flask/Falcon in Python - 502 Bad Gateway Error

No comments:

Post a Comment