Skip to content

Instantly share code, notes, and snippets.

@masutaka
Last active July 21, 2022 07:31
Show Gist options
  • Save masutaka/142ca20b802401d12012fe952f2ea1f3 to your computer and use it in GitHub Desktop.
Save masutaka/142ca20b802401d12012fe952f2ea1f3 to your computer and use it in GitHub Desktop.
Example for managing BigQuery schema which the data source is Google Sheets

1. Create schema.json

See Specifying a schema

nametype は必須。mode のデフォルト値は Nullable

2. Create define.json

$ bq mkdef --noautodetect --source_format=GOOGLE_SHEETS \
'https://docs.google.com/spreadsheets/d/1oGrz62teGmDK0OzvYNxswLd4y9Tz53d1uPJUwre84so/edit' \
schema.json > define.json

ある程度の define.json が生成される。

3. Update define.json

See BigQueryTable Schema

BigQuery コンソールにもあったスキーマの自動検出 autodetect、ヘッダ行分スキップする skipLeadingRows や、未知の値を許容するかの ignoreUnknownValues などはこのドキュメントに載っている。

でも、上記に maxBadRecords を加えたオプションしか使っていない。

4. Create Dataset

$ bq mk masterdata

5. Create Table

$ bq mk --external_table_definition=./define.json masterdata.gsheet

$ bq show masterdata.gsheet
Table project_name:masterdata.gsheet

   Last modified             Schema               Type     Total URIs   Expiration   Labels
 ----------------- --------------------------- ---------- ------------ ------------ --------
  21 Jul 15:02:46   |- id: string (required)    EXTERNAL   1
                    |- corporate_name: string
                    |- rate: float
                    |- size: integer
                    |- is_subscribed: boolean
{
"autodetect": false,
"googleSheetsOptions": {
"skipLeadingRows": 1
},
"ignoreUnknownValues": false,
"maxBadRecords": 0,
"schema": {
"fields": [
{
"mode": "Required",
"name": "id",
"type": "STRING"
},
{
"name": "corporate_name",
"type": "STRING"
},
{
"name": "rate",
"type": "FLOAT"
},
{
"name": "size",
"type": "INTEGER"
},
{
"name": "is_subscribed",
"type": "BOOL"
}
]
},
"sourceFormat": "GOOGLE_SHEETS",
"sourceUris": [
"https://docs.google.com/spreadsheets/d/1oGrz62teGmDK0OzvYNxswLd4y9Tz53d1uPJUwre84so/edit"
]
}
[
{
"name": "id",
"type": "STRING",
"mode": "Required"
},
{
"name": "corporate_name",
"type": "STRING"
},
{
"name": "rate",
"type": "FLOAT"
},
{
"name": "size",
"type": "INTEGER"
},
{
"name": "is_subscribed",
"type": "BOOL"
}
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment