Wednesday, 2 February 2022

The list of items I scrape from a webpage differs from the source of the page

I'm trying to scrape a list of zpids from this webpage using the requests module. The zpids are available within a list right next to searchListZpids in the page source (ctrl + u). They are 40 in number.

The script below can fetch the zpids errorlessly. However, the problem is the list the script produces are different from the ones available on that webpage. Some of the zpids in the list I received have exact matchings with those available on that page.

Sometimes the list I get is accurate but most of the time they are different.

The script that I'm using:

import re
import requests

link = 'https://www.zillow.com/ct/9_p/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}

res = requests.get(link,headers=headers)
zpids = re.findall(r"searchListZpids[\s\S]+?\[(.*?)\]",res.text)[0]
print(zpids)

Output I get at this moment:

57912175, 177202011, 57838346, 57702376, 2083150985, 2091636205, 59028017, 2066602375, 57843835, 2066598335, 58845027, 58904562, 58118011, 58838731, 57930222, 2066611590, 59977275, 197747278, 57932219, 57893209, 58775017, 2066600444, 2066601022, 58059157, 177275234, 58819070, 59297439, 58859881, 2078457589, 58775318, 57790587, 57689409, 2066601997, 57394605, 177286302, 58133143, 59068957, 58096934, 240506947, 83121293

How can I scrape the exact list of zpids from that webpage using requests?



from The list of items I scrape from a webpage differs from the source of the page

No comments:

Post a Comment