Skip to content

Instantly share code, notes, and snippets.

@robertlugg
Last active May 19, 2021 21:04
Show Gist options
  • Save robertlugg/52e42c4506c8abf6e3a51c62a68e6c5f to your computer and use it in GitHub Desktop.
Save robertlugg/52e42c4506c8abf6e3a51c62a68e6c5f to your computer and use it in GitHub Desktop.
TFRecords: from_tensor vs from_tensor_slices

from_tensors vs. from_tensor_slices

From the manual

TFRecord files using tf.data The tf.data module also provides tools for reading and writing data in TensorFlow.

Writing a TFRecord file

The easiest way to get the data into a dataset is to use the from_tensor_slices method.

Applied to an array, it returns a dataset of scalars:

tf.data.Dataset.from_tensor_slices(feature1)

<TensorSliceDataset shapes: (), types: tf.int64>

Applied to a tuple of arrays, it returns a dataset of tuples:

features_dataset = tf.data.Dataset.from_tensor_slices((feature0, feature1, feature2, feature3))
features_dataset

<TensorSliceDataset shapes: ((), (), (), ()), types: (tf.bool, tf.int64, tf.string, tf.float64)>

From stackoverflow

https://stackoverflow.com/questions/49579684/what-is-the-difference-between-dataset-from-tensors-and-dataset-from-tensor-slic

from_tensors combines the input and returns a dataset with a single element:

>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensors(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
 array([[1, 2],
        [3, 4]], dtype=int32)>]

from_tensor_slices creates a dataset with a separate element for each row of the input tensor:

>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensor_slices(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,
 <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]

I think the source of confusion (at least for me it was), is the name. Since the from_tensor_slices creates slices from the original data...the ideal name should have been "to_tensor_slices" - Because you are taking your data and create tensor slices out of it. Once you think along those lines all documentation from TF2 became very clear for me ! https://stackoverflow.com/users/6043669/hopeking

A key piece of info for me that was absent from the docs was that multiple tensors are passed to these methods as a tuple, e.g. from_tensors((t1,t2,t3,)). With that knowledge, from_tensors makes a dataset where each input tensor is like a row of your dataset, and from_tensor_slices makes a dataset where each input tensor is column of your data; so in the latter case all tensors must be the same length, and the elements (rows) of the resulting dataset are tuples with one element from each column. https://stackoverflow.com/users/1488777/user1488777

Another reference to study

https://www.programmersought.com/article/45794515931/

Also, how to use this:

'time':              tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True), 

Also useful:

https://stackoverflow.com/questions/47939537/how-to-use-dataset-api-to-read-tfrecords-file-of-lists-of-variant-length/47949002#47949002

Raven

The stuff here would work, but the tf.io.parse_single_example() is much easier:

tf.train.Example.FromString(ex.numpy()).ListFields()[0][1].feature['objects'] json.loads(tf.train.Example.FromString(ex.numpy()).ListFields()[0][1].feature['objects'].bytes_list.value[0].decode())[0] json.loads(tf.train.Example.FromString(ex.numpy()).features.feature['objects'].bytes_list.value[0].decode())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment