Monday, 17 May 2021

Scrapy-splash Can't find image source url

I am trying to scrape a product page from ZARA. Like this one :https://www.zara.com/us/en/fitted-houndstooth-blazer-p07808160.html?v1=108967877&v2=1718115

My scrapy-splash container is running. In the shell I fetch the page

fetch('http://localhost:8050/render.html?url=https://www.zara.com/us/en/fitted-houndstooth-blazer-p07808160.html?v1=108967877&v2=1718115')
2021-05-14 14:30:42 [scrapy.core.engine] INFO: Spider opened
2021-05-14 14:30:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://localhost:8050/render.html?url=https://www.zara.com/us/en/fitted-houndstooth-blazer-p07808160.html?v1=108967877&v2=1718115> (referer: None)

Everything is working so far, and I am able to get the header and price. However, I want to get image URLs of the product.

I try to reach it by

response.css('img.media-image__image::attr(src)').getall()

But response is this:

['https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png']

Which is all background image and not the real one. I can display images on the browser and I see that images coming in the network requests. Is it because it is loaded with AJAX requests? How do I solve this?



from Scrapy-splash Can't find image source url

No comments:

Post a Comment