Thursday, 31 March 2022

How to insert n random NaN consecutive and no consecutive data by month?

I have this df:

       CODE      DATE     PP
0      000130 1991-01-01  0.0
1      000130 1991-01-02  1.0
2      000130 1991-01-03  2.0
3      000130 1991-01-04  2.0
4      000130 1991-01-05  1.1
      ...        ...  ...
10861  000142 2020-12-27  2.1
10862  000142 2020-12-28  2.2
10863  000142 2020-12-29  2.1
10864  000142 2020-12-30  0.4
10865  000142 2020-12-31  1.1

I want to have at least 3 consecutive nans and 5 non consecutive nans in df['PP'] by each df['CODE'] with their corresponding df['DATE'].dt.year and df['DATE'].dt.month so i must convert random values of df['PP'] to NaN to reach that 3 consecutive and 5 non consecutive NaNs. Expected result:

       CODE      DATE     PP
0      000130 1991-01-01  0.0
1      000130 1991-01-02  NaN
2      000130 1991-01-03  NaN
3      000130 1991-01-04  NaN
4      000130 1991-01-05  1.1
5      000130 1991-01-06  2.1
6      000130 1991-01-07  NaN
7      000130 1991-01-08  2.1
8      000130 1991-01-09  0.4
9      000130 1991-01-10  NaN
...    ...    ...         ...

Important: consecutive nans + alternate nans = 5. So i can have 3 consecutive nans per month inside the 5 nans. And if i already have n nans in a month, i should only add the difference to reach 5 nans. For example if i already have 2 nans in a month i should only add 3 consecutive nans. If i already have 5 nans in the month the code should do nothing with that month.

I tried this:

df['PPNEW']=df['PP'].groupby([df['CODE'],df['DATE'].dt.month]).sample(frac=0.984)

But i can't get the exact quantity of NaNs (only in percentage and months sometimes have 30-31 days) and i can't get consecutive NaNs.

Would you mind to help me?

Thanks in advance.



from How to insert n random NaN consecutive and no consecutive data by month?

No comments:

Post a Comment