Saturday, 24 October 2020

Augmenting Time Series Data for Deep Learning

If I want to apply deep learning to the dataset from the sensors that I currently possess, I would require quite a lot data, or we may see overfitting. Unfortunately, the sensors have only been active for a month and therefore the data requires augmentation. I currently have data in the form of a dataframe that can be seen below:

index   timestamp              cas_pre        fl_rat         ...
0       2017-04-06 11:25:00    687.982849     1627.040283    ...
1       2017-04-06 11:30:00    693.427673     1506.217285    ...
2       2017-04-06 11:35:00    692.686310     1537.114807    ...
....
101003  2017-04-06 11:35:00    692.686310     1537.114807    ...

Now I want to augment some particular columns with the tsaug package. The augmentation can be in the form of:

my_aug = (    
    RandomMagnify(max_zoom=1.2, min_zoom=0.8) * 2
    + RandomTimeWarp() * 2
    + RandomJitter(strength=0.1) @ 0.5
    + RandomTrend(min_anchor=-0.5, max_anchor=0.5) @ 0.5
)

The docs for the augmentation library proceed to use the augmentation in the manner below:

X_aug, Y_aug = my_aug.run(X, Y)

Upong further investigation on this site, it seems as though that the augmentation affects numpy arrays. While it states that it is a multivariate augmentation not really sure as to how that is happening effectively.

I would like to apply this consistent augmentation across the float numerical columns such as cas_pre and fl_rat in order not to diverge from the original data and the relationships between each of the columns too much. I would not like to appply it rows such as timestamp. I am not sure as to how to do this within Pandas.



from Augmenting Time Series Data for Deep Learning

No comments:

Post a Comment