TFRecord files using tf.data The tf.data module also provides tools for reading and writing data in TensorFlow.
The easiest way to get the data into a dataset is to use the from_tensor_slices method.
Applied to an array, it returns a dataset of scalars:
tf.data.Dataset.from_tensor_slices(feature1)
<TensorSliceDataset shapes: (), types: tf.int64>
Applied to a tuple of arrays, it returns a dataset of tuples:
features_dataset = tf.data.Dataset.from_tensor_slices((feature0, feature1, feature2, feature3))
features_dataset
<TensorSliceDataset shapes: ((), (), (), ()), types: (tf.bool, tf.int64, tf.string, tf.float64)>
from_tensors
combines the input and returns a dataset with a single element:
>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensors(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[1, 2],
[3, 4]], dtype=int32)>]
from_tensor_slices
creates a dataset with a separate element for each row of the input tensor:
>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensor_slices(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,
<tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]
I think the source of confusion (at least for me it was), is the name. Since the from_tensor_slices
creates slices from the original data...the ideal name should have been "to_tensor_slices" - Because you are taking your data and create tensor slices out of it. Once you think along those lines all documentation from TF2 became very clear for me !
https://stackoverflow.com/users/6043669/hopeking
A key piece of info for me that was absent from the docs was that multiple tensors are passed to these methods as a tuple, e.g. from_tensors((t1,t2,t3,))
. With that knowledge, from_tensors
makes a dataset where each input tensor is like a row of your dataset, and from_tensor_slices
makes a dataset where each input tensor is column of your data; so in the latter case all tensors must be the same length, and the elements (rows) of the resulting dataset are tuples with one element from each column.
https://stackoverflow.com/users/1488777/user1488777
https://www.programmersought.com/article/45794515931/
'time': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True),
The stuff here would work, but the tf.io.parse_single_example() is much easier:
tf.train.Example.FromString(ex.numpy()).ListFields()[0][1].feature['objects'] json.loads(tf.train.Example.FromString(ex.numpy()).ListFields()[0][1].feature['objects'].bytes_list.value[0].decode())[0] json.loads(tf.train.Example.FromString(ex.numpy()).features.feature['objects'].bytes_list.value[0].decode())