Wednesday, 1 March 2023

Flatten and extract keywords from json field in csv

I have a column named diff in my df, where the value is like a json string of the format:

{'info': {'version': {'from': '2.0.0', 'to': '2.3.4'}}, 'paths': {'modified': {'/dummy': {'operations': {'added': ['PUT']}}}}, 'endpoints': {'added': [{'method': 'PUT', 'path': '/dummy'}]}, 'components': {'schemas': {'added': ['ObjectOfObjects', 'inline_object', 'ObjectOfObjects_inner']}, 'requestBodies': {'added': ['inline_object', 'nested_response']}}}

Over here info, paths, endpoints and components represent the first set of nested elements. Like the first category, then we have the next category: such as info has different fields like: title, description etc, components has fields like: schemas and so on.

The df column looks something like this: enter image description here

I want to flatten the json,meaning divided all the parameters, so that means I get around 5-6 new columns( as these are all the parameters changing,first set of elements). I don't want to keep the changes like you can see in the pic from:... to:.., I only want the field, sub-field and the sub-sub-field changed

so I get output as something as follows:

info      paths      endpoints  components
version    modified   added      schemas:added
                                 requestBodies:added

I looked into json_normalize , flatten and jsonpath, but somehow neither of these work for the use case. It yields a completely different output to the one I want. It would be really great if someone could help me with this! I seem to be a bit stuck.



from Flatten and extract keywords from json field in csv

No comments:

Post a Comment