Monday, 7 November 2022

Python: API request nested dictionaries to dataframe with datetime indexed values

I run a query on python to get hourly price data from an API, using the get function:

result = (requests.get(url_prices, headers=headers, params={'SpotKey':'1','Fields':'hours','FromDate':'2016-05-05','ToDate':'2016-12-05','Currency':'eur','SortType':'ascending'}).json())

where 'SpotKey' identifies the item I want to retrieve from the API, in this example '1' is hourly price timeseries (the other parameters are self explanatory).

The result from the query is:

{'SpotKey': '1',
 'SpotName': 'APX',
 'Denomination': 'eur/mwh',
 'Elements': [{'Date': '2016-05-05T00:00:00.0000000',
   'TimeSpans': [{'TimeSpan': '00:00-01:00', 'Value': 23.69},
    {'TimeSpan': '01:00-02:00', 'Value': 21.86},
    {'TimeSpan': '02:00-03:00', 'Value': 21.26},
    {'TimeSpan': '03:00-04:00', 'Value': 20.26},
    {'TimeSpan': '04:00-05:00', 'Value': 19.79},
    {'TimeSpan': '05:00-06:00', 'Value': 19.79},
...
    {'TimeSpan': '19:00-20:00', 'Value': 57.52},
    {'TimeSpan': '20:00-21:00', 'Value': 49.4},
    {'TimeSpan': '21:00-22:00', 'Value': 42.23},
    {'TimeSpan': '22:00-23:00', 'Value': 34.99},
    {'TimeSpan': '23:00-24:00', 'Value': 33.51}]}]}

where 'Elements' is the relevant list containing the timeseries, structured as nested dictionaries of 'Date' keys and 'TimeSpans' keys.

Each 'TimeSpans' keys contains other nested dictionaries for each hour of the day, with a 'TimeSpan' key for the hour and a 'Value' key for the price.

I would like to transform it to a dataframe like:

Datetime           eur/mwh
2016-05-05 00:00:00 23.69
2016-05-05 01:00:00 21.86
2016-05-05 02:00:00 21.26
2016-05-05 03:00:00 20.26
2016-05-05 04:00:00 19.79
... ...
2016-12-05 19:00:00 57.52
2016-12-05 20:00:00 49.40
2016-12-05 21:00:00 42.23
2016-12-05 22:00:00 34.99
2016-12-05 23:00:00 33.51

For the time being I managed to do so doing:

df = pd.concat([pd.DataFrame(x) for x in result['Elements']])
df['Date'] = pd.to_datetime(df['Date'] + ' ' + [x['TimeSpan'][:5] for x in df['TimeSpans']], errors='coerce')
df[result['Denomination']] = [x['Value'] for x in df['TimeSpans']]
df = df.set_index(df['Date'], drop=True).drop(columns=['Date','TimeSpans'])
df = df[~df.index.isnull()]

I did so because the daylight-saving-time is replacing the 'TimeSpan' hourly values with 'dts' string, giving ParseDate errors when creating the datetime index. Since I will request data very frequently and potentially for different granularities (e.g. half-hourly), is there a better / quicker / standard way to shape so many nested dictionaries into a dataframe with the format I look for, that allows to avoid the parsing date error for daylight-saving-time changes?

thank you in advance, cheers.



from Python: API request nested dictionaries to dataframe with datetime indexed values

No comments:

Post a Comment