Wednesday 6 December 2023

Tensorflow JS learning data too big to fit in memory at once, how to learn?

I have the problem that my dataset became too large to fit in memory at once in tensorflow js. What are good solutions to learn from all data entries? My data comes from a mongodb instance and needs to be loaded asynchronously.

I tried to play with generator functions, but couldnt get async generators to work yet. I was also thinking that maybe fitting the model in batches to the data would be possible?

It would be great if someone could provide me with a minimal example on how to fit on data that is loaded asynchronously through either batches or a database cursor.

For example when trying to return promises from the generator, I get a typescript error.

    const generate = function* () {
        yield new Promise(() => {});
    };

    tf.data.generator(generate);

Argument of type '() => Generator<Promise<unknown>, void, unknown>' is not assignable to parameter of type '() => Iterator<TensorContainer, any, undefined> | Promise<Iterator<TensorContainer, any, undefined>>'.


Also using async generators doesnt work:

Async generators result in a type error

tf.data.generator(async function* () {})

throws Argument of type '() => AsyncGenerator<any, void, unknown>' is not assignable to parameter of type '() => Iterator<TensorContainer, any, undefined> | Promise<Iterator<TensorContainer, any, undefined>>'.



from Tensorflow JS learning data too big to fit in memory at once, how to learn?

No comments:

Post a Comment