I am trying to create a loop or a more efficient process that can count the amount of current values in a pandas df. At the moment I'm selecting the value I want to perform the function on.
So for the above df below I want to count how many values are left for Code ['A']. I select this via the following:
import pandas as pd
d = ({
'Code' : ['A','A','B','A','B','B','A','B','A','A'],
'Area' : ['Home','Home','Shops','Park','Cafe','Shops','Home','Cafe','Work','Park'],
})
df = pd.DataFrame(data=d)
#Select code
df1 = df[df.Code == 'A'].copy()
df1['u'] = df1[::-1].groupby('Area').Area.cumcount()
ids = [1]
seen = set([df1.iloc[0].Area])
dec = False
for val, u in zip(df1.Area[1:], df1.u[1:]):
ids.append(ids[-1] + (val not in seen) - dec)
seen.add(val)
dec = u == 0
df1['On'] = ids
df1 = df1.reindex(df.index).fillna(df)
So this returns ['u'], which is how many values are left. And ['On'], which is how many total values running.
The problem is if I want to perform the same function on other values in Code. For instance, if I want to do the same thing on B, I would have to do the same but change:
df2 = df1[df1.Code == 'B'].copy()
If I have numerous values in Code to run the script on it becomes very inefficient. e.g A,B,C,D,E etc. I need a loop where it finds all unique values in 'Code' and runs the script on them? Ideally, the script would look like:
df1 = df[df.Code == 'All unique values'].copy()
Intended Output:
Code Area u On
0 A Home 2.0 1.0
1 A Home 1.0 1.0
2 B Shops 1.0 1.0
3 A Park 1.0 2.0
4 B Cafe 1.0 2.0
5 B Shops 0.0 2.0
6 A Home 0.0 2.0
7 B Cafe 0.0 1.0
8 A Work 0.0 2.0
9 A Park 0.0 1.0
from Efficient loop that counts unique values in a pandas df
No comments:
Post a Comment