Sunday, 2 September 2018

Tensorflow 1.10 TFRecordDataset - recovering TFRecords

Notes:

  1. this question extends upon a previous question of mine. In that question I ask about the best way to store some dummy data as Example and SequenceExample seeking to know which is better for data similar to dummy data provided. I provide both explicit formulations of the Example and SequenceExample construction as well as, in the answers, a programatic way to do so.

  2. Because this is still a lot of code, I am providing a Colab (interactive jupyter notebook hosted by google) file where you can try the code out yourself to assist. All the necessary code is there and it is generously commented.

I am trying to learn how to convert my data into TF Records as the claimed benefits are worthwhile for my data. However, the documentation leaves a lot to be desired and the tutorials / blogs (that I have seen) which try to go deeper, really only touch the surface or rehash the sparse docs that exist.

For the demo data considered in my previous question - as well as here - I have written a decent class that takes:

  • a sequence with n channels (in this example it is integer based, of fixed-length and with n channels)
  • soft-labeled class probabilities (in this example there are n classes and float based)
  • some meta data (in this example a string and two floats)

and can encode the data in 1 of 6 forms:

  1. Example, with sequence channels / classes separate in a numeric type (int64 in this case) with meta data tacked on
  2. Example, with sequence channels / classes separate as a byte string (via numpy.ndarray.tostring()) with meta data tacked on
  3. Example, with sequence / classes dumped as byte string with meta data tacked on

  4. SequenceExample, with sequence channels / classes separate in a numeric type and meta data as context

  5. SequenceExample, with sequence channels separate as a byte string and meta data as context
  6. SequenceExample, with sequence and classes dumped as byte string and meta data as context

This works fine.

In the Colab I show how to write dummy data all in the same file as well as in separate files.

My question is how can I recover this data?

I given 4 attempts at trying to do so in the linked file.

Why is TFReader under a different sub-package from TFWriter?



from Tensorflow 1.10 TFRecordDataset - recovering TFRecords

No comments:

Post a Comment