Sunday 27 September 2020

Max and Min values within pandas (sub)Dataframe

I have following dataframe -df :

                     crs         Band1 level
lat       lon                               
34.595694 32.929028  b''  4.000000e+00  1000
          32.937361  b''  1.200000e+01  950
          32.945694  b''  2.900000e+01  925
34.604028 32.929028  b''  7.000000e+00  1000
          32.937361  b''  1.300000e+01  950
                 ...           ...   ...
71.179028 25.679028  b''  6.000000e+01  750
71.187361 25.662361  b''  1.000000e+00  725
          25.670694  b''  6.000000e+01  1000
          25.679028  b''  4.000000e+01  800
71.529028 19.387361  b''  1.843913e-38  1000

[17671817 rows x 3 columns]

and two arrays:

lon1=np.arange(-11,47,0.25)
lat1=np.arange(71.5,34.5,-0.25)

These two arrays (lat1 , lon1 ) produce coordinate pairs spaced with 0.25 deg.

Dataframe df contains points (lat , lon ) which are densely spaced within points defined with lon1 and lat1 arrays. What I want to do is:

  1. find(filter) all points from df within 0.125 deg from points defined with lat1,lon1
  2. get max and min value of level from this subdataframe and store them in separate array same size as lon1 and lat1.

What I did so far is filter dataframe:

for x1 in lon1:
    for y1 in lat1:
        df3=df[(df.index.get_level_values('lon')>x1-0.125) & (df.index.get_level_values('lon')<x1+0.125)]
        df3=df3[(df3.index.get_level_values('lat')>y1-0.125) & (df3.index.get_level_values('lat')<y1+0.125)]

But this has very slow performance. I believe there is a faster one. I have tagged scikit-learn also since probably can be done with it, but I lack experience with this packate. Any help is appreceated.



from Max and Min values within pandas (sub)Dataframe

No comments:

Post a Comment