I am trying to import JSON data including nested data, into a Pandas DataFrame. It is illustrated here using a nice little public dataset that I found here. Let's focus our attention on the header 'owner'
for the purpose of this illustration. Note that I'm explicitly casting the data to str
. If you print the response data from response.content
it has the proper JSON syntax with double quotes.
However, pd.read_json()
seems to convert the JSON string under 'owner'
into single quotes. Am I doing something wrong here or should this be raised as a dev issue in read_json()
? I see a different issue relating to the single/double quotes has been fixed in the past on pandas-dev.
>>> import pandas as pd
>>> import requests
>>> response = requests.get(url='https://api.github.com/users/mralexgray/repos')
>>> df = pd.read_json(response.content, orient='records', dtype=str)
>>> df['owner'].iloc[1, ]
"{'login': 'mralexgray', 'id': 262517, 'node_id': 'MDQ6VXNlcjI2MjUxNw==', 'avatar_url': 'https://avatars.githubusercontent.com/u/262517?v=4',..."
>>> type(df['owner'].iloc[1, ])
str
>>> response.content
b'[{"id":6104546,"node_id":"MDEwOlJlcG9zaXRvcnk2MTA0NTQ2","name":"-REPONAME","full_name":"mralexgray/-REPONAME","private":false,"owner":{"login":"mralexgray","id":262517,"node_id":"MDQ6VXNlcjI2MjUxNw==","avatar_url":"https://avatars.githubusercontent.com/u/262517?v=4"...
The only observation I make here is that pd.read_json()
may be importing the nested JSON data as a Python dict
before it is cast to str
.
I'm running Python 3.8.10
with Pandas 1.2.4
.
from How to get proper JSON strings with double quotes using pd.read_json()
No comments:
Post a Comment