Hemant Vishwakarma: Unable to parse an exact result from a webpage using requests

Sunday, 13 October 2019

Unable to parse an exact result from a webpage using requests

I've created a script in python to parse two fields from a webpage - total revenue and it's concerning date. The fields I'm after are javascript encrypted. They are available in page source within json array. The following script can parse those two fields accordingly.

However, the problem is the date visible in that page is different from the one available in page source.

Webpage link

The date in that webpage is like this

The date in page source is like this

There is clearly a variation of one day.

After visiting that webpage when you click on this tab Quarterly you can see the results there:

I've tried with:

import re
import json
import requests

url = 'https://finance.yahoo.com/quote/GTX/financials?p=GTX'

res = requests.get(url)
data = re.findall(r'root.App.main[^{]+(.*);',res.text)[0]
jsoncontent = json.loads(data)
container = jsoncontent['context']['dispatcher']['stores']['QuoteSummaryStore']['incomeStatementHistoryQuarterly']['incomeStatementHistory']
total_revenue = container[0]['totalRevenue']['raw']
concerning_date = container[0]['endDate']['fmt']
print(total_revenue,concerning_date)

Result I get (revenue in million):

802000000 2019-06-30

Result I wish to get:

802000000 2019-06-29

When I try with this ticker AAPL, I get the exact date, so subtracing or adding a day is not an option.

How can I get the exact date from that site?

Btw, I know how to get them using selenium, so I would only like to stick to requests.

from Unable to parse an exact result from a webpage using requests

Hemant Vishwakarma

Sunday, 13 October 2019

Unable to parse an exact result from a webpage using requests

No comments:

Post a Comment