Friday, 28 June 2019

Click display button in Scrapy-Splash

I am scraping the following webpage using scrapy-splash, http://www.starcitygames.com/buylist/, which I have to login to, to get the data I need. That works fine but in order to get the data I need to click the display button so I can scrape that data, the data I need is not accessible until the button is clicked. I already got an answer to this that told me I cannot simply click the display button and scrape the data that shows up and that I need to scrape the JSON webpage associated with that information but I am concerned that scraping the JSON instead will be a red flag to the owners of the site since most people do not open the JSON data page and it would take a human several minutes to find it versus the computer which would be much faster. So I guess my question is, is there anyway to scrape the webpage my clicking display and going from there or do I have no choice but to scrape the JSON page? This is what I have got so far... but it is not clicking the button.

import scrapy
from ..items import NameItem

class LoginSpider(scrapy.Spider):
    name = "LoginSpider"
    start_urls = ["http://www.starcitygames.com/buylist/"]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
        response,
        formcss='#existing_users form',
        formdata={'ex_usr_email': 'abc@example.com', 'ex_usr_pass': 'password'},
        callback=self.after_login
        )



    def after_login(self, response):
        item = NameItem()
        display_button = response.xpath('//a[contains(., "Display>>")]/@href').get()

        yield response.follow(display_button, self.parse)

        item["Name"] = response.css("div.bl-result-title::text").get()
        return item

Snapshot of website HTML COde



from Click display button in Scrapy-Splash

No comments:

Post a Comment