Tuesday 14 January 2020

Scrapy splash not working correctly when searching for items loaded with JS

I'm using scrapy with scrapy splash to get data from some URLs such as this product url or this product url 2.

I have a Lua Script with a wait time and return the HTML:

script = """
            function main(splash)
              assert(splash:go(splash.args.url))
              assert(splash:wait(4))
              return splash:html()
            end
"""

then i execute it.

yield SplashRequest(url, self.parse_item, args={'lua_source': script},endpoint='execute')

From here I need 3 elements, they are the 3 different product prices The 3 are loaded with JS.

prices

I have the xpath to get the 3 elements. But the problem is that sometimes it works and sometimes it doesn't work

    price_strikethrough = response.xpath('//div[@class="price-selector"]/div[@class="prices"]/span[contains(@class,"active-price strikethrough")]/span[1]/text()').extract_first() 
    price_offer1 = response.xpath('//div[@class="price-selector"]/div[@class="prices"]/div[contains(@class,"precioDescuento")][1]/text()').extract_first()
    price_offer2 = response.xpath('//div[@class="price-selector"]/div[@class="prices"]/div[contains(@class,"precioDescuento")][2]/text()').extract_first()

I don't know what else to do to make it work properly. I have tried changing the wait values, but it is the same. Sometimes it works fine, sometimes I don't get the data. How could I make sure I always get the data I need?



from Scrapy splash not working correctly when searching for items loaded with JS

No comments:

Post a Comment