The spider is setup in a way where it reads the links to scrape and finally, makes a post request, and the data is parsed.
The spider is able to collect data locally, but when deployed to ZYTE it results in the error shown below..
```
yield scrapy.Request(
url=STORE_URL.format(zip_code),
headers=headers_1,
meta={"item_id": item_id, "zip_code": zip_code},
dont_filter=True,
callback=self.parse_a
)
```
yield scrapy.Request(
url=API_URL,
method="POST",
headers=headers,
body=json.dumps(payload(item_id,zip_code, store_id)),
meta={"prod_code": item_id, "zip_code": zip_code},
dont_filter=True,
callback=self.parse)
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36'
14: 2023-06-18 03:10:58 INFO [scrapy.extensions.telnet] Telnet console listening on 0.0.0.0:6023
15: 2023-06-18 03:10:58 INFO [scrapy.spidermiddlewares.httperror] Ignoring response <403 https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=30308&radius=50&pagesize=30>: HTTP status code is not handled or not allowed
16: 2023-06-18 03:11:04 INFO [scrapy.spidermiddlewares.httperror] Ignoring response <403 https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=2125&radius=50&pagesize=30>: HTTP status code is not handled or not allowed
17: 2023-06-18 03:11:11 INFO [scrapy.spidermiddlewares.httperror] Ignoring response <403 https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=60607&radius=50&pagesize=30>: HTTP status code is not handled or not allowed
from scrapy spider working locally but resulting in 403 error when running on Zyte
No comments:
Post a Comment