I have a script that assigns specific values in a pandas df in groups of 3. However, I'm hoping to refine this based off timestamps.
For the df below, Columns [Place] and [Area] get assigned to a specific [Person]. This is achieved by grouping the first 3 unique values in [Place] for each value in [Area].
If a Person has less than 3 unique values allocated to them within an hour I want to combine them so each person has 3 values where possible. For instance, if there are 8 unique values than 2 people should have 3 unique values and the 3rd person should have 2 unique values.
Below is what I have thus far:
d = ({
'Time' : ['8:00:00','8:30:00','9:00:00','9:30:00','10:00:00','10:30:00','11:00:00','11:30:00','12:00:00'],
'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 6','House 7'],
'Area' : ['X','X','Y','X','X','X','X','X','X'],
})
df = pd.DataFrame(data=d)
def g(gps):
s = gps['Place'].unique()
d = dict(zip(s, np.arange(len(s)) // 3 + 1))
gps['Person'] = gps['Place'].map(d)
return gps
df = df.groupby('Area', sort=False).apply(g)
s = df['Person'].astype(str) + df['Area']
df['Person'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('Person ')
Output:
Time Place Area Person
0 8:00:00 House 1 X Person 1
1 8:30:00 House 2 X Person 1
2 9:00:00 House 1 Y Person 2
3 9:30:00 House 3 X Person 1
4 10:00:00 House 4 X Person 3
5 10:30:00 House 5 X Person 3
6 11:00:00 House 1 X Person 1
7 11:30:00 House 6 X Person 3
8 12:00:00 House 7 X Person 4
So,
Index 0-3 is good as the 3rd unique value for Person 1 starts within an hour of the 2nd unique value.
Index 4-5 need to be combine with Person 2 as they only have 1 unique value.
Index 6 is good as both those values have been repeated and the Person already has 3 unique values.
Index 7 is good because Person 3 should be created at Person 1 and Person 2 both have 3 unique values allocated.
Index 8 should be combined to Person 3, which will mean they have 2 unique values
Intended Output:
Time Place Area Assigned
0 8:00:00 House 1 X Person 1
1 8:30:00 House 2 X Person 1
2 9:00:00 House 1 Y Person 2
3 9:30:00 House 3 X Person 1
4 10:00:00 House 4 X Person 2
5 10:30:00 House 5 X Person 2
6 11:00:00 House 1 X Person 1
7 11:30:00 House 6 X Person 3
8 12:00:00 House 7 X Person 3
from Resort values in a pandas df
No comments:
Post a Comment