Saturday, 17 July 2021

How to get proper JSON strings with double quotes using pd.read_json()

I am trying to import JSON data including nested data, into a Pandas DataFrame. It is illustrated here using a nice little public dataset that I found here. Let's focus our attention on the header 'owner' for the purpose of this illustration. Note that I'm explicitly casting the data to str. If you print the response data from response.content it has the proper JSON syntax with double quotes.

However, pd.read_json() seems to convert the JSON string under 'owner' into single quotes. Am I doing something wrong here or should this be raised as a dev issue in read_json()? I see a different issue relating to the single/double quotes has been fixed in the past on pandas-dev.

>>> import pandas as pd
>>> import requests
>>> response = requests.get(url='https://api.github.com/users/mralexgray/repos')    
>>> df = pd.read_json(response.content, orient='records', dtype=str)
>>> df['owner'].iloc[1, ]
"{'login': 'mralexgray', 'id': 262517, 'node_id': 'MDQ6VXNlcjI2MjUxNw==', 'avatar_url': 'https://avatars.githubusercontent.com/u/262517?v=4',..."
>>> type(df['owner'].iloc[1, ])
str
>>> response.content
    b'[{"id":6104546,"node_id":"MDEwOlJlcG9zaXRvcnk2MTA0NTQ2","name":"-REPONAME","full_name":"mralexgray/-REPONAME","private":false,"owner":{"login":"mralexgray","id":262517,"node_id":"MDQ6VXNlcjI2MjUxNw==","avatar_url":"https://avatars.githubusercontent.com/u/262517?v=4"...

The only observation I make here is that pd.read_json() may be importing the nested JSON data as a Python dict before it is cast to str.

I'm running Python 3.8.10 with Pandas 1.2.4.



from How to get proper JSON strings with double quotes using pd.read_json()

No comments:

Post a Comment