Last active
August 30, 2024 12:54
-
-
Save iklobato/09d71d420ea3e53e4127c9ea75fa1ffa to your computer and use it in GitHub Desktop.
This script is responsible to move that from bigquery between different Google Cloud accounts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
if [[ "$*" == *-v* ]]; then | |
set -x | |
fi | |
source_project_id="${SOURCE_PROJECT_ID}" | |
destination_project_id="${DESTINATION_PROJECT_ID}" | |
dataset_name="${DATASET_NAME}" | |
function export_schema_to_json() { | |
local table_name=$1 | |
bq show --format=json "${source_project_id}:${dataset_name}.${table_name}" | jq .schema.fields >"${table_name}.json" | |
echo "Schema exported to ${table_name}.json" | |
} | |
function load_data_from_csv() { | |
local table_name=$1 | |
local bucket_name="${table_name}-ax" | |
bq load --source_format=CSV --skip_leading_rows=1 "${destination_project_id}:${dataset_name}.${table_name}" "gs://${bucket_name}/*" "./${table_name}.json" | |
echo "Data loaded to marketdata.${table_name}" | |
} | |
function move_between_buckets() { | |
local bucket_name=$1 | |
gsutil -o "GSUtil:parallel_composite_upload_threshold=150M" \ | |
-o "GSUtil:parallel_thread_count=10" \ | |
-o "GSUtil:check_hashes=if_fast_else_skip" \ | |
-h "Content-Encoding:gzip" \ | |
-o "GSUtil:sliced_object_download=true" \ | |
-o "GSUtil:sliced_object_upload=true" \ | |
-m cp -r "gs://${bucket_name}" "gs://${bucket_name}-ax" | |
} | |
function check_and_create_bucket() { | |
local project_id=$1 | |
local bucket_name=$2 | |
if gsutil ls "gs://${project_id}:${bucket_name}" &>/dev/null; then | |
echo "Bucket gs://${project_id}:${bucket_name} already exists." | |
else | |
gsutil mb -p "${project_id}" "gs://${bucket_name}" | |
fi | |
} | |
function export_table_to_bucket() { | |
local table_name=$1 | |
local bucket_name="${table_name}-bkp" | |
# check_and_create_bucket "${source_project_id}" "${bucket_name}" | |
check_and_create_bucket "${destination_project_id}" "${bucket_name}" | |
bq extract --compression GZIP "${source_project_id}":"${dataset_name}"."${table_name}" gs://"${bucket_name}"/"x-*" | |
move_between_buckets "${bucket_name}" | |
echo "Data migrated from gs://${source_project_id}:${dataset_name}.${table_name} to gs://${bucket_name}" | |
} | |
file_path=$1 | |
if [ ! -f "$file_path" ]; then | |
echo "Error: File '$file_path' not found." | |
exit 1 | |
fi | |
while IFS= read -r table_name; do | |
( | |
echo "Table: $table_name" | |
echo "Source: gs://${source_project_id}:${dataset_name}.${table_name}" | |
echo "Destination: gs://${destination_project_id}:${dataset_name}.${table_name}" | |
echo "" | |
) | |
export_table_to_bucket "$table_name" | |
export_schema_to_json "$table_name" | |
load_data_from_csv "$table_name" | |
done <"$file_path" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Data Migration Script
This Bash script facilitates the migration of data from a Google BigQuery dataset in one project to another GCS bucket in a different project. The script extracts the data from BigQuery tables, compresses it using GZIP, and then copies it to the specified destination bucket in another project. The script also exports the schema of each table to separate JSON files.
After exporting the data to GCS, the script further reads the data from the generated CSV files and loads it into the corresponding destination BigQuery tables. This process ensures that the data is transferred from one project to another while maintaining the schema.
How to Use
Ensure that you have the necessary permissions to access the source and destination BigQuery datasets and GCS buckets.
Before running the script, set the environment variables
SOURCE_PROJECT_ID
,DESTINATION_PROJECT_ID
, andDATASET_NAME
with their appropriate values. For example, you can set them in your shell or in a script that precedes the execution of the main script.The script will process each table listed in the file and perform the following steps for each table:
-ax
suffix.-ax
bucket and load it into the corresponding destination BigQuery tables.Please make sure to review and adjust the environment variables and file paths before running the migration. Note that the script assumes the table names in the input file match the actual table names in the BigQuery dataset.