Thursday, 17 December 2020

Map over a tensorflow dataset and mutate tf.train.Feature that is a list of byte strings

I have a feature that is a list of byte strings, e.g.

data = [b"lksjdflksdjfdlk", b"owiueroiewuroi.skjdf", b"oweiureoiwlkapq"]

Here's example code on creating, writing out, and reading back + parsing the tfrecord.

>>> data = [b"lksjdflksdjfdlk", b"owiueroiewuroi.skjdf", b"oweiureoiwlkapq"]
>>> feature = tf.train.Feature(bytes_list=tf.train.BytesList(value=data))
>>> feature
{'raws': bytes_list {
   value: "lksjdflksdjfdlk"
   value: "owiueroiewuroi.skjdf"
   value: "oweiureoiwlkapq"
 }}
>>> example = tf.train.Example(features=features).SerializeToString()
>>> with tf.io.TFRecordWriter("/tmp/out.tfrecord") as writer:
        writer.write(example)
>>> # Now read it back in and parse thee example
>>> feature_desc = {'raws': tf.io.FixedLenFeature([], tf.string)}
>>> def _parse(example):
        return tf.io.parse_single_example(example, feature_desc)
>>> ds = tf.data.TFRecordDataset(["/tmp/out.tfrecord"])
>>> parsed = ds.map(_parse)
>>> @tf.function
    def upper(x):
        x['raws'] = [s.upper() for s in x['raws']]
>>> parsed.map(upper)

This leads to the following error:

OperatorNotAllowedInGraphError: in user code:

    <ipython-input-33-be19a774366f>:3 upper  *
        x['raws'] = [s.upper() for s in x['raws']]
    /data/jkyle/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:503 __iter__
        self._disallow_iteration()
    /data/jkyle/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:496 _disallow_iteration
        self._disallow_when_autograph_enabled("iterating over `tf.Tensor`")
    /data/jkyle/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:474 _disallow_when_autograph_enabled
        " indicate you are trying to use an unsupported feature.".format(task))

    OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

For full context, the list is of byte strings for a raw image format not natively supported. Each raw image is a frame. I need to iterate over the list, convert to jpeg, then stack them into a three dimensional array. Conversion will need to be done by OpenCV. So raw -> jpeg -> numpy matrix, e.g.

Input: [b'raw1', b'raw2', b'raw3'] Output: image array of shape (1920,1080,3)

But, of course, can't do any of this until I figure out how to iterate over the list.



from Map over a tensorflow dataset and mutate tf.train.Feature that is a list of byte strings

No comments:

Post a Comment