I have a large dataframe (100.000 rows) with many columns. These are the relevant columns for my question:
id herd birth H_BY HYcount death H_DY HYcount2
1 1345 2005-01-09 134505 1 2010-01-09 134510 1
2 1345 2010-03-05 134510 2 2015-01-09 134515 2
3 1345 2010-05-10 134510 2 2015-01-09 134515 2
4 1345 2011-06-01 134511 1 2016-01-09 134516 1
5 1345 2012-09-01 134512 1 2017-01-09 134517 2
6 1345 2015-09-13 134515 4 2017-01-09 134517 2
7 1346 2015-10-01 134615 4 2019-01-09 134619 1
8 1346 2015-10-27 134615 4 2020-01-09 134620 2
9 1346 2015-11-10 134615 4 2020-01-09 134620 2
10 1346 2016-12-10 134616 1 2021-01-09 134621 1
I am creating Herd-year fixed effects. I have already combined the herd and birth/death columns into herd+birth year and herd+death year into separate columns and counted how many time each fixed effect appears in the dataframe. As can be seen above.
However, now I want to check my whole dataframe for HYcount and HYcount2 that are less then 3. So I don't want any that are either 1 or 2 in a HY group.
I would like to run though the dataframe and combine those HY groups that are 1 or 2 per group into other groups. Below or above.
EDIT
I also want to only combine HY groups WITHIN EACH HERD!
So I don't want to add a herd member of one herd to another herd with the Herd-year variable.
Here is what I've tried with the birth year fixed effect.
#Sort the df by the relevant value
df= df.sort_values(by=['H_BY'])
df.loc[
(df['HYcount'] < 3),
'H_BY'] = df['H_BY'].shift(-1)
#Count the values again
df['HC1_c'] = df.groupby('H_BY')['H_BY'].transform('count')
But this is a very feeble attempt. I have to run through this many many times to rid my dataframe of all values that are less then 3 and it does not work with record number 1. And I want to repeat this process over at least 4 other columns.
EDIT
And of course this code does not do anything about combining within one herd.
Any tips and trick or ideas how I can do this more efficiently?
from If value of a column is less then 3 then replace value in another column with value from previous row
No comments:
Post a Comment