Sunday, 17 October 2021

Fast conditional overlapping windowing (framing) of numpy array

I have a huge list of numpy arrays (1 dimensional), which are time series for different events. Each point has a label, and I want to window the numpy arrays based on its label. The labels I have is 0, 1, and 2. Each window has a fixed size M.

The label of each window will be the biggest label available in the window. So if a window consists of both 0 an 1 labeled datapoints, the label will be 1 for the whole window.

But the problem is that, the windowing is not label agnostic. Because of class imbalance, I want to only do overlapped windowing in case of labels 1 and 2.

So far I have written this code:

# conditional framing
data = []
start_cursor = 0
while start_cursor < arr.size:
  end_cursor = start_cursor + window_size
  data.append(
    {
      "frame": arr[start_cursor:end_cursor],
      "label": y[start_cursor:end_cursor].max(),
    }
  )
  start_cursor = end_cursor
  if np.any(y[start_cursor, end_cursor] != 0):
    start_cursor = start_cursor - overlap_size        

But this is clearly too verbose and just plain inefficient, especially because I will call this while loop on my huge list of separate arrays.

EDIT: to explain the problem more. Imagine you are going to window a signal with fixed length M. If there only exists 0 label points in the window, there will be no overlap between adjacent windows. But if there exists labels 1 and 2, there will be an overlap between two signals with percentage p%.



from Fast conditional overlapping windowing (framing) of numpy array

No comments:

Post a Comment