~/bin/mysql_table_to_big_query.sh bucket_name schema_name table_name ~/bin/mysql_schema_to_big_query.sh bucket_name [schema_name].
mysql_table_to_big_query.sh exports the table from MySQL to CSV and exports the schema to JSON and SQL file. The files are then uploaded to a folder of an existing cloud bucket. These files are then imported to big query. A BigQuery Dataset is created in the same project (if not existing) with the name {SCHEMA_NAME}_{DATE}.
If the table has a column with datatype=DATE, the BigQuery table will be partitioned.
mysql_schema_to_big_query.sh extracts a list of all tables from the MySQLschema and calls mysql_table_to_big_query.sh for each. It creates log files in the local directory for each table. These can be used to review the import errors, then deleted.
Script is based on work by Shantanuo shantanuo/mysql_to_big_query.sh Main differences from the original This version requires you to create a Google cloud storage bucket first. This gives some control of the storage class and also allows tables with underscores (which are not allowed in bucket name) There are some tweaks to the logging. Added BQ Partitioning form fork https://gist.github.com/apenney/2d78fc1fb734bf3f58d5adbd1dfdd461
bash, MySQL with authenticated access to specified schema. Google Cloud SDK installed with shell configured for a GCP project. A bucket belonging to the GCP project with write permissions. Select appropriate storage class and region. If the bucket is not used for anything else then conisder adding lifecycle rule to delete the intermediary files after a couple of days. They can't be deleted immediately without affecting the bq load.
Local Drive must have adequate space to hold the table in uncompressed CSV format. mysql_schema_to_big_query.sh expects to fine mysql_table_to_big_query.sh in ~/bin.
- Use wget to download these scripts.
- wget https://gist.github.com/intotecho/173401b1ce1a2c18decc3ce22ffeb5a7/raw/mysql_table_to_big_query.sh
- wget https://gist.github.com/intotecho/173401b1ce1a2c18decc3ce22ffeb5a7/raw/mysql_schema_to_big_query.sh
- Copy both scripts to ~/bin and chmod to executable.
- Install gsutil and bigquery (or Google Cloud SDK) and configure gsutil for your GCP project.
Cannot import tables containing blobs. Does not delete the intermediate files in the storage bucket. These must remain until the BQ load job completes. To check status of a load job use >bq ls -j Tested on Ubuntu. Subject to all the Limitations for importing CSV data into Big Query If your date or timestamp data is not in the correct format then import as string by modifying the select statement that creates the schema in json_query.txt. The Dataset Location is set by default rules for the project. Check the location of the dataset matches the buckets.
Can you explain "mysql_schema_to_big_query.sh" . How to achieve this ? .
It is asking for table name parameter