Monday, 24 September 2018

Resort values in a pandas df

I have a script that assigns specific values in a pandas df in groups of 3. However, I'm hoping to refine this based off timestamps.

For the df below, Columns [Place] and [Area] get assigned to a specific [Person]. This is achieved by grouping the first 3 unique values in [Place] for each value in [Area].

If a Person has less than 3 unique values allocated to them within an hour I want to combine them so each person has 3 values where possible. For instance, if there are 8 unique values than 2 people should have 3 unique values and the 3rd person should have 2 unique values.

Below is what I have thus far:

d = ({
   'Time' : ['8:00:00','8:30:00','9:00:00','9:30:00','10:00:00','10:30:00','11:00:00','11:30:00','12:00:00'],                 
   'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 6','House 7'],                 
   'Area' : ['X','X','Y','X','X','X','X','X','X'], 
    })

df = pd.DataFrame(data=d)

def g(gps):
    s = gps['Place'].unique()
    d = dict(zip(s, np.arange(len(s)) // 3 + 1))
    gps['Person'] = gps['Place'].map(d)
    return gps

df = df.groupby('Area', sort=False).apply(g)
s = df['Person'].astype(str) + df['Area']
df['Person'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('Person ')

Output:

       Time    Place Area    Person
0   8:00:00  House 1    X  Person 1
1   8:30:00  House 2    X  Person 1
2   9:00:00  House 1    Y  Person 2
3   9:30:00  House 3    X  Person 1
4  10:00:00  House 4    X  Person 3
5  10:30:00  House 5    X  Person 3
6  11:00:00  House 1    X  Person 1
7  11:30:00  House 6    X  Person 3
8  12:00:00  House 7    X  Person 4

So,

Index 0-3 is good as the 3rd unique value for Person 1 starts within an hour of the 2nd unique value.

Index 4-5 need to be combine with Person 2 as they only have 1 unique value.

Index 6 is good as both those values have been repeated and the Person already has 3 unique values.

Index 7 is good because Person 3 should be created at Person 1 and Person 2 both have 3 unique values allocated.

Index 8 should be combined to Person 3, which will mean they have 2 unique values

Intended Output:

       Time    Place Area  Assigned
0   8:00:00  House 1    X  Person 1
1   8:30:00  House 2    X  Person 1
2   9:00:00  House 1    Y  Person 2
3   9:30:00  House 3    X  Person 1
4  10:00:00  House 4    X  Person 2
5  10:30:00  House 5    X  Person 2
6  11:00:00  House 1    X  Person 1
7  11:30:00  House 6    X  Person 3
8  12:00:00  House 7    X  Person 3



from Resort values in a pandas df

No comments:

Post a Comment