Hemant Vishwakarma: Getting a URL with an authenticity token using python

Friday, 10 September 2021

Getting a URL with an authenticity token using python

I am trying to read a web page using a get request in python. The original URL is given here. I found out that the information I am interested in is in a subpage with this URL (I replaced the authenticity token with XXX).

I tried using the second URL in my script but I get a 406 error. Can you suggest what am I doing wrong? Is the authenticity token for preventing scraping? if so, can I work around it?

import urllib.request

url = ...
agent={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'}
req = urllib.request.Request(url,headers=agent)
data = urllib.request.urlopen(req)

Thanks!

PS, This is how I get the URL using Chrome:

First I browse to https://www.goodreads.com/book/show/385228.On_Liberty

Then I open Chrome's developer tools: three dots -> more tools -> developer tools. Choose the network tab.

Then I go to the bottom of the page (just after the last review) and click "next".

In the tool window choose the request and in the header I get the Request URL: https://www.goodreads.com/book/reviews/385228?csm_scope=&hide_last_page=true&language_code=en&page=2&authenticity_token=XXX

from Getting a URL with an authenticity token using python

Hemant Vishwakarma

Friday, 10 September 2021

Getting a URL with an authenticity token using python

No comments:

Post a Comment