Hemant Vishwakarma: Deduplicate pandas dataset by index value without using `networkx`

Monday, 17 July 2023

Please note I have already reviewed this link

I would like to have only one unique value of field code per value of id.

df = pd.DataFrame({'code':['A','A','B','C','D','A']},index=[1,1,1,2,3,3])
df.index.name='id'

df:

My desired output is:

I managed to accomplish this as follows, but I don't love it.

i=df.index.name
df.reset_index().drop_duplicates().set_index(i)

Here's why:

What I want to say is:

df.groupby('id').drop_duplicates()

Which is, currently, not supported.

Is there a more Pythonic way to do this?

Hemant Vishwakarma