Tuesday 5 January 2021

Parse xml with sub-nodes and create a Pandas dataframe

I have the following xml format:

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <run>
      <information>
         <logfile>s.log</logfile>
         <version>33</version>
         <mach>1</mach>
         <problemname>mm1</problemname>
         <timestamp>20201218.165122.053486</timestamp>
      </information>
      <controls>
         <item>VARS</item>
      </controls>
      <result>
         <status>4</status>
         <time>3</time>
         <obj>1.0</obj>
         <gap>0.15</gap>
      </result>
   </run>
</results>

I have a sample code below to parse this file after reading this post How to convert an XML file to nice pandas dataframe?, but it returns None. However, my question is if there is a fast way to create a dataframe that contains an index from value of (i.e., VARS) and 4 columns i.e., status, time, obj, and gap.

import pandas as pd
from xml.etree import ElementTree as et

root = (et.parse('test.xml').getroot()).getchildren()


tags = {"tags":[]}
for elem in root:
    tag = {}
    tag["status"] = elem.attrib['status']
    tag["time"] = elem.attrib['time']
    tag["obj"] = elem.attrib['obj']
    tag["gap"] = elem.attrib['gap']
    tags["tags"]. append(tag)

df_users = pd.DataFrame(tags["tags"])
df_users.head()

This is the output I am looking for:


      status  time  obj   gap
VARS  4        3    1.0   0.15


from Parse xml with sub-nodes and create a Pandas dataframe

No comments:

Post a Comment