Last active
February 1, 2021 01:24
-
-
Save romicofre/ba8d1286480bcb247aeb8582980d6b9f to your computer and use it in GitHub Desktop.
Cargar datos desde Google Colab a Bigquery
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "HeDxXFS8s5oq" | |
| }, | |
| "source": [ | |
| "# Cargando datos desde Google Colab a Bigquery\n", | |
| "\n", | |
| "Trabajar en Google Colab, puede ahorrarnos el tener que instalar Jupyter en nuestro computador, podemos compartir el trabajo facilmente y ahorramos recursos propios para usar lo que generosamente Google nos comparte.\n", | |
| "\n", | |
| "Pero cómo llevamos los datos a un entorno de producción como BigQuery, si tenemos que autenticarnos por ejemplo. Gracias (o no?) a que estamos usando todo el ecosistema de Google esto está muy fácil. \n", | |
| "\n", | |
| "Tenemos el siguiente ejemplo, que utiliza una base de datos de COVID-19 de la universidad de Johns Hopkins.\n", | |
| "\n", | |
| "https://github.com/CSSEGISandData/COVID-19\n", | |
| "\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "qUKe5xA_vJsz" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Utilizamos pandas para trabajar con los datos en CSV\n", | |
| "import pandas as pd" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "aAkj0w9h0-Oi" | |
| }, | |
| "source": [ | |
| "## Obtención de datos" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "MNO-L9dmvZ08" | |
| }, | |
| "source": [ | |
| "Podemos utilizar el menú de la izquierda para cargar archivos, van a funcionar de la misma manera. Incluso podemos ejecutar un código que abrirá una ventana emergente para que quien lo ejecute realice la carga desde su computadora. Esta vez usaremos el truco de utilizar comandos de terminal con **!**." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "HkEJT9ojxfdK", | |
| "outputId": "7ac5c0a0-9243-4a30-e171-9bd4fd3fd74c" | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Cloning into 'COVID-19'...\n", | |
| "remote: Enumerating objects: 6, done.\u001b[K\n", | |
| "remote: Counting objects: 100% (6/6), done.\u001b[K\n", | |
| "remote: Compressing objects: 100% (6/6), done.\u001b[K\n", | |
| "remote: Total 90072 (delta 0), reused 2 (delta 0), pack-reused 90066\u001b[K\n", | |
| "Receiving objects: 100% (90072/90072), 744.75 MiB | 26.76 MiB/s, done.\n", | |
| "Resolving deltas: 100% (61931/61931), done.\n", | |
| "Checking out files: 100% (898/898), done.\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# Estamos utilizando un comando como si fuera la terminal y descargando tooodo el repositorio\n", | |
| "!git clone https://github.com/CSSEGISandData/COVID-19.git" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "DnJpx2ncvait" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Podemos leer sólo los archivos que nos interesan\n", | |
| "confirmed = pd.read_csv('/content/COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')\n", | |
| "deaths = pd.read_csv('/content/COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')\n", | |
| "recovered = pd.read_csv('/content/COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "rYEFooxoye9B", | |
| "outputId": "ab031bd5-4ff6-4f0a-efc4-af8448ec8e14" | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Index(['Province/State', 'Country/Region', 'Lat', 'Long', '1/22/20', '1/23/20',\n", | |
| " '1/24/20', '1/25/20', '1/26/20', '1/27/20',\n", | |
| " ...\n", | |
| " '1/21/21', '1/22/21', '1/23/21', '1/24/21', '1/25/21', '1/26/21',\n", | |
| " '1/27/21', '1/28/21', '1/29/21', '1/30/21'],\n", | |
| " dtype='object', length=379)\n", | |
| "(273, 379)\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print(confirmed.columns)\n", | |
| "print(confirmed.shape)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "tqPVY3J5ywyi", | |
| "outputId": "e7f8d1aa-ff43-41bd-cd2e-87640d9942d8" | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Index(['Province/State', 'Country/Region', 'Lat', 'Long', '1/22/20', '1/23/20',\n", | |
| " '1/24/20', '1/25/20', '1/26/20', '1/27/20',\n", | |
| " ...\n", | |
| " '1/21/21', '1/22/21', '1/23/21', '1/24/21', '1/25/21', '1/26/21',\n", | |
| " '1/27/21', '1/28/21', '1/29/21', '1/30/21'],\n", | |
| " dtype='object', length=379)\n", | |
| "(273, 379)\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print(deaths.columns)\n", | |
| "print(deaths.shape)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "UUu8uj34y1FT", | |
| "outputId": "ac719ee1-0efc-42f9-bc31-da0ea7f5ae33" | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Index(['Province/State', 'Country/Region', 'Lat', 'Long', '1/22/20', '1/23/20',\n", | |
| " '1/24/20', '1/25/20', '1/26/20', '1/27/20',\n", | |
| " ...\n", | |
| " '1/21/21', '1/22/21', '1/23/21', '1/24/21', '1/25/21', '1/26/21',\n", | |
| " '1/27/21', '1/28/21', '1/29/21', '1/30/21'],\n", | |
| " dtype='object', length=379)\n", | |
| "(258, 379)\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "print(recovered.columns)\n", | |
| "print(recovered.shape)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "CSZDtQbE_UW9", | |
| "outputId": "e8d9c775-8e62-4056-9988-05998eea188c" | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<bound method NDFrame.head of Province/State Country/Region Lat ... 1/28/21 1/29/21 1/30/21\n", | |
| "0 NaN Afghanistan 33.939110 ... 54891 54939 55008\n", | |
| "1 NaN Albania 41.153300 ... 75454 76350 77251\n", | |
| "2 NaN Algeria 28.033900 ... 106610 106887 107122\n", | |
| "3 NaN Andorra 42.506300 ... 9779 9837 9885\n", | |
| "4 NaN Angola -11.202700 ... 19672 19723 19782\n", | |
| ".. ... ... ... ... ... ... ...\n", | |
| "268 NaN Vietnam 14.058324 ... 1651 1657 1767\n", | |
| "269 NaN West Bank and Gaza 31.952200 ... 157593 158168 158559\n", | |
| "270 NaN Yemen 15.552727 ... 2120 2120 2120\n", | |
| "271 NaN Zambia -13.133897 ... 50319 51624 53352\n", | |
| "272 NaN Zimbabwe -19.015438 ... 32646 32952 33273\n", | |
| "\n", | |
| "[273 rows x 379 columns]>" | |
| ] | |
| }, | |
| "execution_count": 7, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "confirmed.head" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "ejLLuVLp0tDa" | |
| }, | |
| "source": [ | |
| "# Preparación de los datos\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "dBfaj5ocwDEX" | |
| }, | |
| "source": [ | |
| "Tenemos archivos separados, en BigQuery por lo general vamos a querer tener la mayor información de un modelo en sólo una tabla, sin preocuparnos de la redundancia, o la cantidad de datos. Lo importante es que responda a lo que se quiere analizar, ya que su objetivo es analítico y basado en columnas.\n", | |
| "\n", | |
| "Para este caso, vamos a unir los datos que están separados en archivos, pero que se pueden asociar por fecha y lugar. Además de cambiar el tipo de algunas columnas para que podamos subirlos a BigQuery." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "WDFnMagHy5Dg" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Unpivot de las columnas fechas\n", | |
| "confirmed = pd.melt(confirmed, id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], var_name='date', value_name='qty_confirmed')\n", | |
| "deaths = pd.melt(deaths, id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], var_name='date', value_name='qty_deaths')\n", | |
| "recovered = pd.melt(recovered, id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], var_name='date', value_name='qty_recovered')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "asOfm1QZ8ojA", | |
| "outputId": "13e7c8db-c0ee-49bc-e835-c04ed0557f69" | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<bound method NDFrame.head of Province/State Country/Region ... date qty_confirmed\n", | |
| "0 NaN Afghanistan ... 1/22/20 0\n", | |
| "1 NaN Albania ... 1/22/20 0\n", | |
| "2 NaN Algeria ... 1/22/20 0\n", | |
| "3 NaN Andorra ... 1/22/20 0\n", | |
| "4 NaN Angola ... 1/22/20 0\n", | |
| "... ... ... ... ... ...\n", | |
| "102370 NaN Vietnam ... 1/30/21 1767\n", | |
| "102371 NaN West Bank and Gaza ... 1/30/21 158559\n", | |
| "102372 NaN Yemen ... 1/30/21 2120\n", | |
| "102373 NaN Zambia ... 1/30/21 53352\n", | |
| "102374 NaN Zimbabwe ... 1/30/21 33273\n", | |
| "\n", | |
| "[102375 rows x 6 columns]>" | |
| ] | |
| }, | |
| "execution_count": 9, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "confirmed.head" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "0szUqIzWzWW2" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Unir dataframes, para adjuntar columnas de cantidades\n", | |
| "new_df = confirmed.merge(deaths, how='outer', on=['Province/State', 'Country/Region', 'Lat', 'Long', 'date'])\n", | |
| "new_df = new_df.merge(recovered, how='outer', on=['Province/State', 'Country/Region', 'Lat', 'Long', 'date'])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "nVrp0Hue9uoq" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Renombrar columnas, por ejemplo, en BigQuery no podemos usar \"/\" en el nombre de una tabla\n", | |
| "new_df = new_df.rename(columns = {'Province/State': 'province_state', 'Country/Region': 'country_region',\n", | |
| " 'Lat': 'lat', 'Long': 'long'})" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "PDY_Qzvb-MHi", | |
| "outputId": "e6e26f0a-41fc-4eea-c300-298b790e2dcd" | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "<bound method NDFrame.sample of province_state country_region ... qty_deaths qty_recovered\n", | |
| "0 NaN Afghanistan ... 0.0 0.0\n", | |
| "1 NaN Albania ... 0.0 0.0\n", | |
| "2 NaN Algeria ... 0.0 0.0\n", | |
| "3 NaN Andorra ... 0.0 0.0\n", | |
| "4 NaN Angola ... 0.0 0.0\n", | |
| "... ... ... ... ... ...\n", | |
| "104620 Hebei China ... NaN 868.0\n", | |
| "104621 Henan China ... NaN 1278.0\n", | |
| "104622 NaN Mozambique ... NaN 23945.0\n", | |
| "104623 NaN Syria ... NaN 7481.0\n", | |
| "104624 NaN Timor-Leste ... NaN 55.0\n", | |
| "\n", | |
| "[104625 rows x 8 columns]>" | |
| ] | |
| }, | |
| "execution_count": 12, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "new_df.sample" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "TyPxSBI7Ghi6", | |
| "outputId": "a3d08d0c-0be1-4d3d-e525-de45c28e9192" | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "province_state object\n", | |
| "country_region object\n", | |
| "lat float64\n", | |
| "long float64\n", | |
| "date object\n", | |
| "qty_confirmed float64\n", | |
| "qty_deaths float64\n", | |
| "qty_recovered float64\n", | |
| "dtype: object" | |
| ] | |
| }, | |
| "execution_count": 13, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "new_df.dtypes" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "tvEivVDtapmY" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Formatear fecha, en BigQuery sólo es capaz de entender formatos de fecha separadas por \"-\"\n", | |
| "new_df['date'] = pd.to_datetime(new_df['date'], format='%m/%d/%y')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "g3KpcBErb5oD", | |
| "outputId": "09f99bf6-9482-4cae-a30d-ab3ddf1da396" | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "0 2020-01-22\n", | |
| "1 2020-01-22\n", | |
| "2 2020-01-22\n", | |
| "3 2020-01-22\n", | |
| "4 2020-01-22\n", | |
| " ... \n", | |
| "104620 2021-01-30\n", | |
| "104621 2021-01-30\n", | |
| "104622 2021-01-30\n", | |
| "104623 2021-01-30\n", | |
| "104624 2021-01-30\n", | |
| "Name: date, Length: 104625, dtype: datetime64[ns]" | |
| ] | |
| }, | |
| "execution_count": 45, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "new_df['date']" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "NVqO8WNb1OdA" | |
| }, | |
| "source": [ | |
| "# Carga a BigQuery" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "Al5nbfMd6cdv" | |
| }, | |
| "source": [ | |
| "Este entorno tiene todas las librerías necesarias para poder autentificarse con los servicios de Google. A pesar de ello, la versión de google.auth dio algunos problemas e instalaremos la versión 1.7.2, esto es necesario solo si tienen error en la librería." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "HEUpfuKQDLtY", | |
| "outputId": "4c595112-142e-477f-eea8-4ea0d164b36a" | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Collecting google.auth==1.7.2\n", | |
| "\u001b[?25l Downloading https://files.pythonhosted.org/packages/ec/11/1d90cbfa72a084b08498e8cea1fee199bc965cdac391d241f5ae6257073e/google_auth-1.7.2-py2.py3-none-any.whl (74kB)\n", | |
| "\r", | |
| "\u001b[K |████▍ | 10kB 16.2MB/s eta 0:00:01\r", | |
| "\u001b[K |████████▊ | 20kB 12.7MB/s eta 0:00:01\r", | |
| "\u001b[K |█████████████▏ | 30kB 10.0MB/s eta 0:00:01\r", | |
| "\u001b[K |█████████████████▌ | 40kB 7.6MB/s eta 0:00:01\r", | |
| "\u001b[K |█████████████████████▉ | 51kB 4.4MB/s eta 0:00:01\r", | |
| "\u001b[K |██████████████████████████▎ | 61kB 4.9MB/s eta 0:00:01\r", | |
| "\u001b[K |██████████████████████████████▋ | 71kB 5.0MB/s eta 0:00:01\r", | |
| "\u001b[K |████████████████████████████████| 81kB 3.7MB/s \n", | |
| "\u001b[?25hRequirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from google.auth==1.7.2) (0.2.8)\n", | |
| "Collecting rsa<4.1,>=3.1.4\n", | |
| " Downloading https://files.pythonhosted.org/packages/02/e5/38518af393f7c214357079ce67a317307936896e961e35450b70fad2a9cf/rsa-4.0-py2.py3-none-any.whl\n", | |
| "Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from google.auth==1.7.2) (1.15.0)\n", | |
| "Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.6/dist-packages (from google.auth==1.7.2) (51.3.3)\n", | |
| "Collecting cachetools<3.2,>=2.0.0\n", | |
| " Downloading https://files.pythonhosted.org/packages/2f/a6/30b0a0bef12283e83e58c1d6e7b5aabc7acfc4110df81a4471655d33e704/cachetools-3.1.1-py2.py3-none-any.whl\n", | |
| "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.6/dist-packages (from pyasn1-modules>=0.2.1->google.auth==1.7.2) (0.4.8)\n", | |
| "\u001b[31mERROR: google-colab 1.0.0 has requirement google-auth~=1.17.2, but you'll have google-auth 1.7.2 which is incompatible.\u001b[0m\n", | |
| "Installing collected packages: rsa, cachetools, google.auth\n", | |
| " Found existing installation: rsa 4.7\n", | |
| " Uninstalling rsa-4.7:\n", | |
| " Successfully uninstalled rsa-4.7\n", | |
| " Found existing installation: cachetools 4.2.1\n", | |
| " Uninstalling cachetools-4.2.1:\n", | |
| " Successfully uninstalled cachetools-4.2.1\n", | |
| "Successfully installed cachetools-3.1.1 google.auth rsa-4.0\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!pip install google.auth==1.7.2" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "id": "QgbR1pnf_Gc8" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "from google.colab import auth" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "9geIg2lwDHeQ" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "auth.authenticate_user()\n", | |
| "print('Authenticated')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "Em7iEz6I1Tpy" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "from google.cloud import bigquery" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "fqcgVA950Y7a" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Construir el cliente de BigQuery.\n", | |
| "project_id = 'tareas-303400' # cambiar\n", | |
| "client = bigquery.Client(project = project_id)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "jPs8IRkLDz1x" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Nombre tabla, utiliza un dataset que esté creado en tu proyecto.\n", | |
| "table_id = \"tareas-303400.dataset_test.covid_19\"\n", | |
| "\n", | |
| "# Esquema tabla\n", | |
| "table_schema = [\n", | |
| " bigquery.SchemaField(\"province_state\", \"STRING\"),\n", | |
| " bigquery.SchemaField(\"country_region\", \"STRING\"),\n", | |
| " bigquery.SchemaField(\"lat\", \"FLOAT\"),\n", | |
| " bigquery.SchemaField(\"long\", \"FLOAT\"),\n", | |
| " bigquery.SchemaField(\"date\", \"DATE\"),\n", | |
| " bigquery.SchemaField(\"qty_confirmed\", \"FLOAT\"),\n", | |
| " bigquery.SchemaField(\"qty_deaths\", \"FLOAT\"),\n", | |
| " bigquery.SchemaField(\"qty_recovered\", \"FLOAT\"),\n", | |
| " ]\n", | |
| "\n", | |
| "# Trabajo de carga\n", | |
| "job_config = bigquery.LoadJobConfig(\n", | |
| " schema=table_schema,\n", | |
| " write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE, # Reemplazara la tabla\n", | |
| ")\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "4YDBlXu_FmDw" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Carga\n", | |
| "\n", | |
| "load_job = client.load_table_from_dataframe(\n", | |
| " new_df, table_id, job_config=job_config\n", | |
| ") \n", | |
| "\n", | |
| "load_job.result() # Espera que la carga termine\n", | |
| "destination_table = client.get_table(table_id) # Revisando si la tabla existe\n", | |
| "print(\"Loaded {} rows.\".format(destination_table.num_rows))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "id": "fpRx7ShTZAy7" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# If errors :\n", | |
| "# load_job.errors" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "colab": { | |
| "collapsed_sections": [], | |
| "name": "Carga_datos_bigquery_Colab.ipynb", | |
| "provenance": [] | |
| }, | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.8.5" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 1 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment