This is a usage and design summary of the pulsar-io-bigquery sink.
This is the current list of parameters.
| param name | description | 
|---|---|
| credentials_file_path | BigQuery Json Key file Path | 
| project_id | BigQuery Project Id | 
| topic_data_set | BigQuery target topic/dataset map | 
| eg :"topic1:dataset1,topic2:dataset2" | |
| topic_table_set | BigQuery target topic/table map | 
| eg :"topic1:table_tag1,topic2:table_tag2" | |
| add_insert_timestamp | Adds a timestamp column | 
| time_stamp_column_name | default is "sink_timestamp" | 
| useMessageTimeDatePartitioning | Use Time Date Partitioning | 
The current sink expects a gcp json credentials file to initialize, it also has message routing capabiltiy to different tables based on topic map.
sink localrun \
--archive ./pulsar-google-nar-0.0.1.nar \
--tenant public \
--namespace default \
--name bigquery-sink \
--inputs bigquery-data \
--sinkConfigFile ~/bigquery-sink.yaml
configs:
  credentials_file_path: "/tmp/kubernetes-34c5c20a8e3e.json"
  project_id: "sample-project-170720"
  topic_data_set: "bigquery-data:test1"
  topic_table_set: "bigquery-data:test_table1"
  add_insert_timestamp: "true"
  time_stamp_column_name: "inserted_timestamp"
There is no schema validation performed currently and there no integration with the pulsar ot bigquery schema registry at this time.
Option is provided to add a time_stamp column if the option is enabled to add an additional column per row with the utc timestamp generated from java, before the insertion request is made.
TODO