Skip to content

Instantly share code, notes, and snippets.

@akj009
Last active January 11, 2020 10:40
Show Gist options
  • Save akj009/cbbde9f23249b8188d948141c9314838 to your computer and use it in GitHub Desktop.
Save akj009/cbbde9f23249b8188d948141c9314838 to your computer and use it in GitHub Desktop.
writing avro pcollection to hdfs in parquet format with custom naming strategy
genericRecordPCollection.apply(
FileIO
.<GenericRecord>write()
.via(ParquetIO.sink(schema))
.to(pipelineOptions.getOutputFilePath())
.withNumShards(pipelineOptions.getOutputFileShardCount())
.withNaming((window, pane, numShards, shardIndex, compression) -> String.format(
"%s_%d_%d.%s", outputFilePrefix, startTime, endTime, "parq"))
);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment