I'm looking at creating a pipeline for a time-series LSTM model. I have two feeds of inputs, lets call them series1
and series2
.
I initialize the tf.data
object by calling from.tensor.slices
:
ds = tf.data.Dataset.from_tensor_slices((series1, series2))
I batch them further into windows of a set windows size and shift 1 between windows:
ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
At this point I want to play around with how they are batched together. I want to produce a certain input like the following as an example:
series1 = [1, 2, 3, 4, 5]
series2 = [100, 200, 300, 400, 500]
batch 1: [1, 2, 100, 200]
batch 2: [2, 3, 200, 300]
batch 3: [3, 4, 300, 400]
So each batch will return two elements of series1 and then two elements of series2. This code snippet does not work to batch them separately:
ds = ds.map(lambda s1, s2: (s1.batch(window_size + 1), s2.batch(window_size + 1))
Because it returns two mapping of dataset objects. Since they are objects they are not subscriptible, so this does not work either:
ds = ds.map(lambda s1, s2: (s1[:2], s2[:2]))
I'm sure the solution is some utilization of .apply
with a custom lambda function. Any help is much appreciated.
Edit
I am also looking at producing a label that represents the next element of the series. So for example, the batches will produce the following:
batch 1: (tf.tensor([1, 2, 100, 200]), tf.tensor([3]))
batch 2: (tf.tensor([2, 3, 200, 300]), tf.tensor([4]))
batch 3: (tf.tensor([3, 4, 300, 400]), tf.tensor([5]))
Where [3]
, [4]
and [5]
represent the next elements of series1
to be predicted.
from Batching in tf.data.dataset in time-series analysis
No comments:
Post a Comment