Please note I have already reviewed this link
Pandas and python: deduplication of dataset by several fields*
I would like to have only one unique value of field code per value of id.
df = pd.DataFrame({'code':['A','A','B','C','D','A']},index=[1,1,1,2,3,3])
df.index.name='id'
df:
| id | code |
|---|---|
| 1 | A |
| 1 | A |
| 1 | B |
| 2 | C |
| 3 | D |
| 3 | A |
My desired output is:
| id | code |
|---|---|
| 1 | A |
| 1 | B |
| 2 | C |
| 3 | D |
| 3 | A |
I managed to accomplish this as follows, but I don't love it.
i=df.index.name
df.reset_index().drop_duplicates().set_index(i)
Here's why:
- This will fail if the index has no name
- I shouldn't need to re-set and set an index
- This is a fairly common operation, and there is way too much ink here.
What I want to say is:
df.groupby('id').drop_duplicates()
Which is, currently, not supported.
Is there a more Pythonic way to do this?
from Deduplicate pandas dataset by index value without using `networkx`
No comments:
Post a Comment