Skip to content

Instantly share code, notes, and snippets.

@ntakouris
Created July 8, 2020 13:44
Show Gist options
  • Save ntakouris/cf05b5825f3fc9edc02e3aa769847de1 to your computer and use it in GitHub Desktop.
Save ntakouris/cf05b5825f3fc9edc02e3aa769847de1 to your computer and use it in GitHub Desktop.
raw_dataset = (raw_data, RAW_DATA_METADATA)
transformed_dataset, transform_fn = (
raw_dataset | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
transformed_data, transformed_metadata = transformed_dataset
transformed_data_coder = tft.coders.ExampleProtoCoder(
transformed_metadata.schema)
_ = (
transformed_data
| 'EncodeTrainData' >> beam.Map(transformed_data_coder.encode)
| 'WriteTrainData' >> beam.io.WriteToTFRecord(
os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILE)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment