I've created a script in python to parse two fields from a webpage - total revenue and it's concerning date. The fields I'm after are javascript encrypted. They are available in page source within json array. The following script can parse those two fields accordingly.
However, the problem is the date visible in that page is different from the one available in page source.
The date in that webpage is like this
The date in page source is like this
There is clearly a variation of one day.
After visiting that webpage when you click on this tab Quarterly you can see the results there:
I've tried with:
import re
import json
import requests
url = 'https://finance.yahoo.com/quote/GTX/financials?p=GTX'
res = requests.get(url)
data = re.findall(r'root.App.main[^{]+(.*);',res.text)[0]
jsoncontent = json.loads(data)
container = jsoncontent['context']['dispatcher']['stores']['QuoteSummaryStore']['incomeStatementHistoryQuarterly']['incomeStatementHistory']
total_revenue = container[0]['totalRevenue']['raw']
concerning_date = container[0]['endDate']['fmt']
print(total_revenue,concerning_date)
Result I get (revenue in million):
802000000 2019-06-30
Result I wish to get:
802000000 2019-06-29
When I try with this ticker AAPL, I get the exact date, so subtracing or adding a day is not an option.
How can I get the exact date from that site?
Btw, I know how to get them using selenium, so I would only like to stick to requests.
from Unable to parse an exact result from a webpage using requests
No comments:
Post a Comment