I have this df:
CODE DATE PP
0 000130 1991-01-01 0.0
1 000130 1991-01-02 1.0
2 000130 1991-01-03 2.0
3 000130 1991-01-04 2.0
4 000130 1991-01-05 1.1
... ... ...
10861 000142 2020-12-27 2.1
10862 000142 2020-12-28 2.2
10863 000142 2020-12-29 2.1
10864 000142 2020-12-30 0.4
10865 000142 2020-12-31 1.1
I want to have at least 3 consecutive nans and 5 non consecutive nans in df['PP']
by each df['CODE']
with their corresponding df['DATE'].dt.year
and df['DATE'].dt.month
so i must convert random values of df['PP'] to NaN to reach that 3 consecutive and 5 non consecutive NaNs. Expected result:
CODE DATE PP
0 000130 1991-01-01 0.0
1 000130 1991-01-02 NaN
2 000130 1991-01-03 NaN
3 000130 1991-01-04 NaN
4 000130 1991-01-05 1.1
5 000130 1991-01-06 2.1
6 000130 1991-01-07 NaN
7 000130 1991-01-08 2.1
8 000130 1991-01-09 0.4
9 000130 1991-01-10 NaN
... ... ... ...
Important: consecutive nans + alternate nans = 5. So i can have 3 consecutive nans per month inside the 5 nans. And if i already have n nans in a month, i should only add the difference to reach 5 nans. For example if i already have 2 nans in a month i should only add 3 consecutive nans. If i already have 5 nans in the month the code should do nothing with that month.
I tried this:
df['PPNEW']=df['PP'].groupby([df['CODE'],df['DATE'].dt.month]).sample(frac=0.984)
But i can't get the exact quantity of NaNs (only in percentage and months sometimes have 30-31 days) and i can't get consecutive NaNs.
Would you mind to help me?
Thanks in advance.
from How to insert n random NaN consecutive and no consecutive data by month?
No comments:
Post a Comment