I have a huge list of numpy arrays (1 dimensional), which are time series for different events. Each point has a label, and I want to window the numpy arrays based on its label. The labels I have is 0, 1, and 2. Each window has a fixed size M.
The label of each window will be the biggest label available in the window. So if a window consists of both 0 an 1 labeled datapoints, the label will be 1 for the whole window.
But the problem is that, the windowing is not label agnostic. Because of class imbalance, I want to only do overlapped windowing in case of labels 1 and 2.
So far I have written this code:
# conditional framing
data = []
start_cursor = 0
while start_cursor < arr.size:
end_cursor = start_cursor + window_size
data.append(
{
"frame": arr[start_cursor:end_cursor],
"label": y[start_cursor:end_cursor].max(),
}
)
start_cursor = end_cursor
if np.any(y[start_cursor, end_cursor] != 0):
start_cursor = start_cursor - overlap_size
But this is clearly too verbose and just plain inefficient, especially because I will call this while loop on my huge list of separate arrays.
EDIT: to explain the problem more. Imagine you are going to window a signal with fixed length M. If there only exists 0 label points in the window, there will be no overlap between adjacent windows. But if there exists labels 1 and 2, there will be an overlap between two signals with percentage p%.
from Fast conditional overlapping windowing (framing) of numpy array
No comments:
Post a Comment