Monday, 29 October 2018

Allocate different values in a pandas column

I am trying to reassign values in a pandas df column. The main goal is to assign individuals in [Person] with up to 3 unique values for [Area] and [Place].

I have a separate Column labelled [On] that displays how many current unique values are occurring. So this provides a guide on how many individuals I need. For example, if there are 1-3 unique values on, I need 1 individual. If there are 4-6 unique values on, I need 2 individuals etc.

How I want to allocate the unique values in [Area] and [Place] to individuals is causing me trouble. I initially want to group by identical values in [Area]. So assign all the same values in [Area] to an individual (up to 3). If there are less than 3 unique values in [Area] I combine them to make up to 3 total anyway possible.

I'll go through an example now and hopefully it will display this. For the df below there are 9 unique values.

import pandas as pd
import numpy as np

d = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:55:00','13:12:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 3','House 2'],                 
    'Area' : ['X','X','X','X','Y','V','V','V','Y'],     
    'On' : ['1','2','3','4','5','6','7','8','9'],                      
     })

df = pd.DataFrame(data=d)

This is my attempt:

def reduce_df(df):
    values = df['Area'] + df['Place']
    df1 = df.loc[~values.duplicated(),:] # ignore duplicate values for this part..
    person_count = df1.groupby('Person')['Person'].agg('count')
    leftover_count = person_count[person_count < 3] # the 'leftovers'

    # try merging pairs together
    nleft = leftover_count.shape[0]
    to_try = np.arange(nleft - 1)
    to_merge = (leftover_count.values[to_try] + 
                leftover_count.values[to_try + 1]) <= 3
    to_merge[1:] = to_merge[1:] & ~to_merge[:-1]
    to_merge = to_try[to_merge]
    merge_dict = dict(zip(leftover_count.index.values[to_merge+1], 
                leftover_count.index.values[to_merge]))
    def change_person(p):
        if p in merge_dict.keys():
            return merge_dict[p]
        return p
    reduced_df = df.copy()
    # update df with the merges you found
    reduced_df['Person'] = reduced_df['Person'].apply(change_person)
    return reduced_df

df1 = (reduce_df(reduce_df(df)))

Out:

       Time    Place Area    Person On
0   8:03:00  House 1    X  Person 1  1
1   8:17:00  House 2    X  Person 1  2
2   8:20:00  House 3    X  Person 1  3
3  10:15:00  House 4    X  Person 2  4
4  10:15:00  House 1    Y  Person 2  5
5  11:48:00  House 1    V  Person 4  6
6  12:00:00  House 2    V  Person 4  7
7  12:55:00  House 3    V  Person 4  8
8  13:12:00  House 2    Y  Person 2  9

Intended Output:

       Time    Place Area    Person On
0   8:03:00  House 1    X  Person 1  1
1   8:17:00  House 2    X  Person 1  2
2   8:20:00  House 3    X  Person 1  3
3  10:15:00  House 4    X  Person 2  4
4  10:15:00  House 1    Y  Person 2  5
5  11:48:00  House 1    V  Person 3  6
6  12:00:00  House 2    V  Person 3  7
7  12:55:00  House 3    V  Person 3  8
8  13:12:00  House 2    Y  Person 2  9



from Allocate different values in a pandas column

No comments:

Post a Comment