I am trying to reassign values in a pandas df column. The main goal is to assign individuals in [Person] with up to 3 unique values for [Area] and [Place].
I have a separate Column labelled [On] that displays how many current unique values are occurring. So this provides a guide on how many individuals I need. For example, if there are 1-3 unique values on, I need 1 individual. If there are 4-6 unique values on, I need 2 individuals etc.
How I want to allocate the unique values in [Area] and [Place] to individuals is causing me trouble. I initially want to group by identical values in [Area]. So assign all the same values in [Area] to an individual (up to 3). If there are less than 3 unique values in [Area] I combine them to make up to 3 total anyway possible.
I'll go through an example now and hopefully it will display this. For the df below there are 9 unique values.
import pandas as pd
import numpy as np
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:55:00','13:12:00'],
'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 3','House 2'],
'Area' : ['X','X','X','X','Y','V','V','V','Y'],
'On' : ['1','2','3','4','5','6','7','8','9'],
})
df = pd.DataFrame(data=d)
This is my attempt:
def reduce_df(df):
values = df['Area'] + df['Place']
df1 = df.loc[~values.duplicated(),:] # ignore duplicate values for this part..
person_count = df1.groupby('Person')['Person'].agg('count')
leftover_count = person_count[person_count < 3] # the 'leftovers'
# try merging pairs together
nleft = leftover_count.shape[0]
to_try = np.arange(nleft - 1)
to_merge = (leftover_count.values[to_try] +
leftover_count.values[to_try + 1]) <= 3
to_merge[1:] = to_merge[1:] & ~to_merge[:-1]
to_merge = to_try[to_merge]
merge_dict = dict(zip(leftover_count.index.values[to_merge+1],
leftover_count.index.values[to_merge]))
def change_person(p):
if p in merge_dict.keys():
return merge_dict[p]
return p
reduced_df = df.copy()
# update df with the merges you found
reduced_df['Person'] = reduced_df['Person'].apply(change_person)
return reduced_df
df1 = (reduce_df(reduce_df(df)))
Out:
Time Place Area Person On
0 8:03:00 House 1 X Person 1 1
1 8:17:00 House 2 X Person 1 2
2 8:20:00 House 3 X Person 1 3
3 10:15:00 House 4 X Person 2 4
4 10:15:00 House 1 Y Person 2 5
5 11:48:00 House 1 V Person 4 6
6 12:00:00 House 2 V Person 4 7
7 12:55:00 House 3 V Person 4 8
8 13:12:00 House 2 Y Person 2 9
Intended Output:
Time Place Area Person On
0 8:03:00 House 1 X Person 1 1
1 8:17:00 House 2 X Person 1 2
2 8:20:00 House 3 X Person 1 3
3 10:15:00 House 4 X Person 2 4
4 10:15:00 House 1 Y Person 2 5
5 11:48:00 House 1 V Person 3 6
6 12:00:00 House 2 V Person 3 7
7 12:55:00 House 3 V Person 3 8
8 13:12:00 House 2 Y Person 2 9
from Allocate different values in a pandas column
No comments:
Post a Comment