Created
July 5, 2018 15:18
-
-
Save domgiles/dda2da1f50f6caa296d485bced816d6c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Oracle Autonomous Data Warehouse (ADW) access via Python" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The following shows how to access the Oracle Autonomous Data Warehouse using Python and how to load the data using the DBMS_CLOUD package via the cx_Oracle module. This is obviously simpler via the Graphical front end or SQL Developer but using Python provdes a simple scriptable model whilst hiding some of the complexities of using native REST APIs.\n", | |
"\n", | |
"This simple example assumes that you've got an Oracle Cloud account and that you've created or got access an ADW database. You'll also have to download the credentials file to provide SQL\\*Net access. You can find the details on how to do that [here](https://docs.oracle.com/en/cloud/paas/autonomous-data-warehouse-cloud/user/connect-download-wallet.html#GUID-B06202D2-0597-41AA-9481-3B174F75D4B1). We'll be using Python in this short example but most of what we're doing could be achieved using the GUI and/or REST Calls.\n", | |
"\n", | |
"#### Connecting to ADW Instance\n", | |
"\n", | |
"To start with we'll make sure we can connect to the ADW Instance we've previously created. To do that we need to import the required libraries. If you dodn't have these I reccommend using PIP (and virtualenv)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import cx_Oracle\n", | |
"import keyring\n", | |
"import os" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We need to use an environment variable to reflect the location of the downloaded credentials files to be used by SQL\\*Net." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"os.environ['TNS_ADMIN'] = '/Users/dgiles/Downloads/wallet_DOMSDB'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This is equivlent to ```bash\n", | |
"export TNS_ADMIN=/Users/dgiles/Downloads/wallet_DOMSDB``` and points to the unzipped directory containing the tnsnames.ora, sqlnet.ora etc. **NOTE:** you'll need to update the sqlnet.ora to ensure the wallet points to the same directory specified in the TNS_ADMIN environment variable. i.e.\n", | |
"```bash\n", | |
"WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY=\"/Users/dgiles/Downloads/wallet_DOMSDB\")))\n", | |
"SSL_SERVER_DN_MATCH=yes\n", | |
"```\n", | |
"In the example above I've changed DIRECTORY to the location where I downloaded and unzipped the credentials file.\n", | |
"\n", | |
"The next steps are to connect to the Oracle ADW instance. In the example below I've store my password using the Python Module \"keyring\". I'm also using the jupyter notebook magic sql functionality. We'll test the connection using the admin user and the connect string \"domsdb_medium\" which is one of the services included in the tnsnames.ora file." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%load_ext sql" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 257, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'Connected: admin@None'" | |
] | |
}, | |
"execution_count": 257, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"password = keyring.get_password('adw','admin')\n", | |
"%sql oracle+cx_oracle://admin:$password@domsdb_medium" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 258, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>1</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[(1,)]" | |
] | |
}, | |
"execution_count": 258, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%sql admin@domsdb_medium\n", | |
"select 1 from dual" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Generating test data\n", | |
"\n", | |
"We've connected to the oracle database and can now start uploading data to the instance. In this example we'll use [datagenerator](http://www.dominicgiles.com) to generate the data into flat files and then place these on Oracle Object Store and load them from there.\n", | |
"\n", | |
"The first step is to install datagenerator. You can find details on how to do that [here](http://www.dominicgiles.com/datageneratorinstall.html). We can now simply generate data for the \"SH\" benchmark." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 259, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"/Users/dgiles/datagenerator/bin/datagenerator -c /Users/dgiles/datagenerator/bin/sh.xml -scale 100 -cl -f -d /Users/dgiles/Downloads/generated_data -tc 8\n", | |
"Started Datagenerator, Version 0.4.0.1083\n", | |
"\n", | |
"============================================\n", | |
"| Datagenerator Run Stats |\n", | |
"============================================\n", | |
"Connection Time 0:00:00.000\n", | |
"Data Generation Time 0:00:26.274\n", | |
"DDL Creation Time 0:00:00.000\n", | |
"Total Run Time 0:00:26.284\n", | |
"Rows Inserted per sec 5,367\n", | |
"Data Generated (MB) per sec 0.5\n", | |
"Actual Rows Generated 137,000\n", | |
"Commits Completed 0\n", | |
"Batch Updates Completed 0\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"import subprocess\n", | |
"\n", | |
"# Change the 2 following parameters to reflect your environment\n", | |
"generated_data_dir = '/Users/dgiles/Downloads/generated_data'\n", | |
"datagenerator_home = '/Users/dgiles/datagenerator'\n", | |
"\n", | |
"# Change the following paramters relating to the way datagenerator will create the data\n", | |
"scale = 100\n", | |
"parallel = 8\n", | |
"\n", | |
"dg_command = '{dg}/bin/datagenerator -c {dg}/bin/sh.xml -scale {s} -cl -f -d {gdd} -tc {p}'.format(\n", | |
" dg = datagenerator_home,\n", | |
" s = scale,\n", | |
" gdd = generated_data_dir,\n", | |
" p = parallel\n", | |
")\n", | |
"# Typically we'd use a command similiar to the one below but since we're in a notebook it's easier to use the default functionality\n", | |
"# p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)\n", | |
"# (output, err) = p.communicate()\n", | |
"\n", | |
"print(dg_command)\n", | |
"!{dg_command}\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We should now have a series of files in the \"generated_data_dir\" directory. These will be a mix of csv files, create table scripts, loader scripts etc." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 255, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CHANNELS.csv SALES.csv\n", | |
"CHANNELS.ctl SALES.ctl\n", | |
"COUNTRIES.csv SUPPLEMENTARY_DEMOGRAPHICS.csv\n", | |
"COUNTRIES.ctl SUPPLEMENTARY_DEMOGRAPHICS.ctl\n", | |
"CUSTOMERS.csv constraints.sql\n", | |
"CUSTOMERS.ctl createindexes.sql\n", | |
"PRODUCTS.csv createsequences.sql\n", | |
"PRODUCTS.ctl createtables.sql\n", | |
"PROMOTIONS.csv droptables.sql\n", | |
"PROMOTIONS.ctl\n" | |
] | |
} | |
], | |
"source": [ | |
"!ls {generated_data_dir}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Uploading the data to the Oracle Object Store\n", | |
"\n", | |
"We're really only interested in the \"csv\" files so we'll upload just those. But before we do this we'll need to establish a connection to the Oracle Object Store. I give some detail behind how to do this in [this notebook](https://gist.github.com/domgiles/fa7e28b80d15d6b4ee185395eccdf327). I'll be using the object storage out of the Frankfurt Region." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 256, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import oci\n", | |
"import ast\n", | |
"\n", | |
"my_config = ast.literal_eval(keyring.get_password('oci_opj','doms'))\n", | |
"my_config['region'] = 'eu-frankfurt-1'\n", | |
"object_storage_client = oci.object_storage.ObjectStorageClient(my_config)\n", | |
"namespace = object_storage_client.get_namespace().data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We've now got a handle to the Oracle Object Store Client so we can now create a bucket which we'll call and upload the \"CSV\" Files." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import os, io\n", | |
"\n", | |
"bucket_name = 'Sales_Data'\n", | |
"\n", | |
"files_to_process = [file for file in os.listdir(generated_data_dir) if file.endswith('csv')]\n", | |
"\n", | |
"try:\n", | |
" create_bucket_response = object_storage_client.create_bucket(\n", | |
" namespace,\n", | |
" oci.object_storage.models.CreateBucketDetails(\n", | |
" name=bucket_name,\n", | |
" compartment_id=my_config['tenancy']\n", | |
" )\n", | |
" )\n", | |
"except Exception as e:\n", | |
" print(e.message)\n", | |
"\n", | |
"\n", | |
"for upload_file in files_to_process:\n", | |
" print('Uploading file {}'.format(upload_file))\n", | |
" object_storage_client.put_object(namespace, bucket_name, upload_file, io.open(os.path.join(generated_data_dir,upload_file),'r')) " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We need to create an authentication token that can be used by the ADW instance to access our Object storage. To do this we need to create an identity client." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"indentity_client = oci.identity.IdentityClient(my_config)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"token = indentity_client.create_auth_token(\n", | |
" oci.identity.models.CreateAuthTokenDetails(\n", | |
" description = \"Token used to provide access to newly loaded files\"\n", | |
" ),\n", | |
" user_id = my_config['user']\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Creating users and tables in the ADW Instance\n", | |
"\n", | |
"The following steps will feel very familiar to any DBA/developer of and Oracle database. We need to create a schema and assocated tables to load the data into.\n", | |
"\n", | |
"First we'll need to create a user/schema and grant it the appropriate roles" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%sql create user mysh identified by ReallyLongPassw0rd default tablespace data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Grant the \"mysh\" user the DWROLE" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%sql grant DWROLE to mysh" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%sql oracle+cx_oracle://mysh:ReallyLongPassw0rd@domsdb_medium" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can now create the tables we'll use to load the data into." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE COUNTRIES (\n", | |
" COUNTRY_ID NUMBER NOT NULL,\n", | |
" COUNTRY_ISO_CODE CHAR(2) NOT NULL,\n", | |
" COUNTRY_NAME VARCHAR2(40) NOT NULL,\n", | |
" COUNTRY_SUBREGION VARCHAR2(30) NOT NULL,\n", | |
" COUNTRY_SUBREGION_ID NUMBER NOT NULL,\n", | |
" COUNTRY_REGION VARCHAR2(20) NOT NULL,\n", | |
" COUNTRY_REGION_ID NUMBER NOT NULL,\n", | |
" COUNTRY_TOTAL NUMBER(9) NOT NULL,\n", | |
" COUNTRY_TOTAL_ID NUMBER NOT NULL,\n", | |
" COUNTRY_NAME_HIST VARCHAR2(40)\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE SALES (\n", | |
" PROD_ID NUMBER NOT NULL,\n", | |
" CUST_ID NUMBER NOT NULL,\n", | |
" TIME_ID DATE NOT NULL,\n", | |
" CHANNEL_ID NUMBER NOT NULL,\n", | |
" PROMO_ID NUMBER NOT NULL,\n", | |
" QUANTITY_SOLD NUMBER(10) NOT NULL,\n", | |
" AMOUNT_SOLD NUMBER(10) NOT NULL\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE SUPPLEMENTARY_DEMOGRAPHICS (\n", | |
" CUST_ID NUMBER NOT NULL,\n", | |
" EDUCATION VARCHAR2(21),\n", | |
" OCCUPATION VARCHAR2(21),\n", | |
" HOUSEHOLD_SIZE VARCHAR2(21),\n", | |
" YRS_RESIDENCE NUMBER,\n", | |
" AFFINITY_CARD NUMBER(10),\n", | |
" BULK_PACK_DISKETTES NUMBER(10),\n", | |
" FLAT_PANEL_MONITOR NUMBER(10),\n", | |
" HOME_THEATER_PACKAGE NUMBER(10),\n", | |
" BOOKKEEPING_APPLICATION NUMBER(10),\n", | |
" PRINTER_SUPPLIES NUMBER(10),\n", | |
" Y_BOX_GAMES NUMBER(10),\n", | |
" OS_DOC_SET_KANJI NUMBER(10),\n", | |
" COMMENTS VARCHAR2(4000)\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE CUSTOMERS (\n", | |
" CUST_ID NUMBER NOT NULL,\n", | |
" CUST_FIRST_NAME VARCHAR2(20) NOT NULL,\n", | |
" CUST_LAST_NAME VARCHAR2(40) NOT NULL,\n", | |
" CUST_GENDER CHAR(1) NOT NULL,\n", | |
" CUST_YEAR_OF_BIRTH NUMBER(4) NOT NULL,\n", | |
" CUST_MARITAL_STATUS VARCHAR2(20),\n", | |
" CUST_STREET_ADDRESS VARCHAR2(40) NOT NULL,\n", | |
" CUST_POSTAL_CODE VARCHAR2(10) NOT NULL,\n", | |
" CUST_CITY VARCHAR2(30) NOT NULL,\n", | |
" CUST_CITY_ID NUMBER NOT NULL,\n", | |
" CUST_STATE_PROVINCE VARCHAR2(40) NOT NULL,\n", | |
" CUST_STATE_PROVINCE_ID NUMBER NOT NULL,\n", | |
" COUNTRY_ID NUMBER NOT NULL,\n", | |
" CUST_MAIN_PHONE_NUMBER VARCHAR2(25) NOT NULL,\n", | |
" CUST_INCOME_LEVEL VARCHAR2(30),\n", | |
" CUST_CREDIT_LIMIT NUMBER,\n", | |
" CUST_EMAIL VARCHAR2(40),\n", | |
" CUST_TOTAL VARCHAR2(14) NOT NULL,\n", | |
" CUST_TOTAL_ID NUMBER NOT NULL,\n", | |
" CUST_SRC_ID NUMBER,\n", | |
" CUST_EFF_FROM DATE,\n", | |
" CUST_EFF_TO DATE,\n", | |
" CUST_VALID VARCHAR2(1)\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE CHANNELS (\n", | |
" CHANNEL_ID NUMBER NOT NULL,\n", | |
" CHANNEL_DESC VARCHAR2(20) NOT NULL,\n", | |
" CHANNEL_CLASS VARCHAR2(20) NOT NULL,\n", | |
" CHANNEL_CLASS_ID NUMBER NOT NULL,\n", | |
" CHANNEL_TOTAL VARCHAR2(13) NOT NULL,\n", | |
" CHANNEL_TOTAL_ID NUMBER NOT NULL\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE PRODUCTS (\n", | |
" PROD_ID NUMBER(6) NOT NULL,\n", | |
" PROD_NAME VARCHAR2(50) NOT NULL,\n", | |
" PROD_DESC VARCHAR2(4000) NOT NULL,\n", | |
" PROD_SUBCATEGORY VARCHAR2(50) NOT NULL,\n", | |
" PROD_SUBCATEGORY_ID NUMBER NOT NULL,\n", | |
" PROD_SUBCATEGORY_DESC VARCHAR2(2000) NOT NULL,\n", | |
" PROD_CATEGORY VARCHAR2(50) NOT NULL,\n", | |
" PROD_CATEGORY_ID NUMBER NOT NULL,\n", | |
" PROD_CATEGORY_DESC VARCHAR2(2000) NOT NULL,\n", | |
" PROD_WEIGHT_CLASS NUMBER(3) NOT NULL,\n", | |
" PROD_UNIT_OF_MEASURE VARCHAR2(20),\n", | |
" PROD_PACK_SIZE VARCHAR2(30) NOT NULL,\n", | |
" SUPPLIER_ID NUMBER(6) NOT NULL,\n", | |
" PROD_STATUS VARCHAR2(20) NOT NULL,\n", | |
" PROD_LIST_PRICE NUMBER(8) NOT NULL,\n", | |
" PROD_MIN_PRICE NUMBER(8) NOT NULL,\n", | |
" PROD_TOTAL VARCHAR2(13) NOT NULL,\n", | |
" PROD_TOTAL_ID NUMBER NOT NULL,\n", | |
" PROD_SRC_ID NUMBER,\n", | |
" PROD_EFF_FROM DATE,\n", | |
" PROD_EFF_TO DATE,\n", | |
" PROD_VALID VARCHAR2(1)\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE PROMOTIONS (\n", | |
" PROMO_ID NUMBER(6) NOT NULL,\n", | |
" PROMO_NAME VARCHAR2(30) NOT NULL,\n", | |
" PROMO_SUBCATEGORY VARCHAR2(30) NOT NULL,\n", | |
" PROMO_SUBCATEGORY_ID NUMBER NOT NULL,\n", | |
" PROMO_CATEGORY VARCHAR2(30) NOT NULL,\n", | |
" PROMO_CATEGORY_ID NUMBER NOT NULL,\n", | |
" PROMO_COST NUMBER(10) NOT NULL,\n", | |
" PROMO_BEGIN_DATE DATE NOT NULL,\n", | |
" PROMO_END_DATE DATE NOT NULL,\n", | |
" PROMO_TOTAL VARCHAR2(15) NOT NULL,\n", | |
" PROMO_TOTAL_ID NUMBER NOT NULL\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 245, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"7305 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"[]" | |
] | |
}, | |
"execution_count": 245, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"CREATE TABLE times AS\n", | |
"SELECT udate time_id,\n", | |
" TO_CHAR(udate,'Day') day_name,\n", | |
" TO_CHAR(udate,'DD') day_number_in_month,\n", | |
" TO_CHAR(udate,'DDD') day_number_in_year,\n", | |
" TO_CHAR(udate,'YYYY' ) calendar_year,\n", | |
" TO_CHAR(udate,'Q' ) calendar_quarter_number,\n", | |
" TO_CHAR(udate,'MM' ) calendar_month_number,\n", | |
" TO_CHAR(udate,'WW' ) calendar_week_number,\n", | |
" TO_CHAR(udate,'YYYY-MM') calendar_month_desc,\n", | |
" TO_CHAR(udate,'YYYY-Q') calendar_quarter_desc\n", | |
"FROM\n", | |
" (SELECT to_date('31/12/1994','DD/MM/YYYY')+rownum udate\n", | |
" FROM all_objects\n", | |
" WHERE to_date('31/12/1994','DD/MM/YYYY')+rownum <= to_date( '31/12/2014','DD/MM/YYYY')\n", | |
" )" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 248, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>tname</th>\n", | |
" <th>tabtype</th>\n", | |
" <th>clusterid</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>COUNTRIES</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>SALES</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>SUPPLEMENTARY_DEMOGRAPHICS</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>CUSTOMERS</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>CHANNELS</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>PRODUCTS</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>PROMOTIONS</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>TIMES</td>\n", | |
" <td>TABLE</td>\n", | |
" <td>None</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('COUNTRIES', 'TABLE', None),\n", | |
" ('SALES', 'TABLE', None),\n", | |
" ('SUPPLEMENTARY_DEMOGRAPHICS', 'TABLE', None),\n", | |
" ('CUSTOMERS', 'TABLE', None),\n", | |
" ('CHANNELS', 'TABLE', None),\n", | |
" ('PRODUCTS', 'TABLE', None),\n", | |
" ('PROMOTIONS', 'TABLE', None),\n", | |
" ('TIMES', 'TABLE', None)]" | |
] | |
}, | |
"execution_count": 248, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"select * from tab" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Copying the data from the object store\n", | |
"We need to add the authorisation token to the newly created schema to allow it to access the object stores files. We can't do this using the sql magic syntax we've been using till this point so we'll do it using standard cx_Oracle calls." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"connection = cx_Oracle.connect('mysh', 'ReallyLongPassw0rd', 'domsdb_medium')\n", | |
"cursor = connection.cursor();" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"cursor.callproc('DBMS_CLOUD.create_credential', keywordParameters = {'credential_name':'SALES_DATA_AUTH',\n", | |
" 'username':'[email protected]',\n", | |
" 'password':token.data.token})" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can access the object storage using a url of the the format\n", | |
"```\n", | |
"https://swiftobjectstorage.<region>.oraclecloud.com/v1/<tenancy>/<bucket name>/<object name>\n", | |
"```\n", | |
"We can use this to dynamically generate a url for each of the objects inside of the bucket we've just created and use the DBMS_CLOUD package to copy the data into the ADW instance. The code below gets all of the names of the tables we've just created and the loops through each table copying the associated csv file into the ADW instance." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from tqdm import tqdm\n", | |
"\n", | |
"format = '''{\"delimiter\" : \",\",\n", | |
" \"skipheaders\" : 1,\n", | |
" \"ignoremissingcolumns\" : \"true\",\n", | |
" \"removequotes\" : \"true\",\n", | |
" \"dateformat\" : \"DD-MON-YYYY HH24:MI:SS\",\n", | |
" \"blankasnull\" : \"true\"}'''\n", | |
"\n", | |
"file_location = '''https://swiftobjectstorage.{region}.oraclecloud.com/v1/{tenancy}/{bucket_name}/{table_name}.csv'''\n", | |
"region = my_config['region']\n", | |
"tenancy= 'oracleonpremjava'\n", | |
"\n", | |
"\n", | |
"rs = cursor.execute(\"select table_name from user_tables where table_name not like 'COPY%'\")\n", | |
"rows = rs.fetchall()\n", | |
"for row in tqdm(rows):\n", | |
" url = file_location.format(region=region, tenancy=tenancy, bucket_name=bucket_name, table_name=row[0])\n", | |
" cursor.callproc('DBMS_CLOUD.copy_data', keywordParameters= {'table_name':row[0],\n", | |
" 'credential_name':'SALES_DATA_AUTH',\n", | |
" 'file_uri_list':url, \n", | |
" 'format': format\n", | |
" })\n", | |
" " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can now take a look and see how many rows we've loaded into the tables" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 261, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"COUNTRIES 22\n", | |
"SALES 97,799\n", | |
"SUPPLEMENTARY_DEMOGRAPHICS 19,599\n", | |
"CUSTOMERS 19,599\n", | |
"CHANNELS 4\n", | |
"PRODUCTS 71\n", | |
"PROMOTIONS 502\n", | |
"TIMES 7,305\n" | |
] | |
} | |
], | |
"source": [ | |
"rs = cursor.execute(\"select table_name from user_tables where table_name not like 'COPY%'\")\n", | |
"rows = rs.fetchall()\n", | |
"\n", | |
"for row in rows:\n", | |
" rs2 = cursor.execute(\"select count(*) from {}\".format(row[0]))\n", | |
" rows2 = rs2.fetchone()\n", | |
" print('{tn: <35}{rc:>10,}'.format(tn=row[0],rc=rows2[0]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It's now possible to run standard queries against the newly loaded data. No need to create anything else (indexes etc.)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 249, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0 rows affected.\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table>\n", | |
" <tr>\n", | |
" <th>channel_desc</th>\n", | |
" <th>country_iso_code</th>\n", | |
" <th>sales$</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Internet</td>\n", | |
" <td>DE</td>\n", | |
" <td> 3,130</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Internet</td>\n", | |
" <td>DK</td>\n", | |
" <td> 1,676</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Internet</td>\n", | |
" <td>FR</td>\n", | |
" <td> 2,545</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Internet</td>\n", | |
" <td>GB</td>\n", | |
" <td> 2,320</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Internet</td>\n", | |
" <td>None</td>\n", | |
" <td> 9,671</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Tele Sales</td>\n", | |
" <td>DE</td>\n", | |
" <td> 3,077</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Tele Sales</td>\n", | |
" <td>DK</td>\n", | |
" <td> 3,116</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Tele Sales</td>\n", | |
" <td>FR</td>\n", | |
" <td> 3,184</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Tele Sales</td>\n", | |
" <td>GB</td>\n", | |
" <td> 2,386</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>Tele Sales</td>\n", | |
" <td>None</td>\n", | |
" <td> 11,763</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td>None</td>\n", | |
" <td>None</td>\n", | |
" <td> 21,434</td>\n", | |
" </tr>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"[('Internet', 'DE', ' 3,130'),\n", | |
" ('Internet', 'DK', ' 1,676'),\n", | |
" ('Internet', 'FR', ' 2,545'),\n", | |
" ('Internet', 'GB', ' 2,320'),\n", | |
" ('Internet', None, ' 9,671'),\n", | |
" ('Tele Sales', 'DE', ' 3,077'),\n", | |
" ('Tele Sales', 'DK', ' 3,116'),\n", | |
" ('Tele Sales', 'FR', ' 3,184'),\n", | |
" ('Tele Sales', 'GB', ' 2,386'),\n", | |
" ('Tele Sales', None, ' 11,763'),\n", | |
" (None, None, ' 21,434')]" | |
] | |
}, | |
"execution_count": 249, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%sql mysh@domsdb_medium\n", | |
"SELECT channels.channel_desc, \n", | |
" countries.country_iso_code,\n", | |
" TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$\n", | |
"FROM sales, customers, times, channels, countries\n", | |
"WHERE sales.time_id=times.time_id \n", | |
" AND sales.cust_id=customers.cust_id \n", | |
" AND customers.country_id = countries.country_id\n", | |
" AND sales.channel_id = channels.channel_id \n", | |
" AND channels.channel_desc IN ('Tele Sales','Internet') \n", | |
" AND times.calendar_year = '2006' \n", | |
" AND countries.country_iso_code IN ('GB','DE','FR','DK')\n", | |
"GROUP BY \n", | |
" ROLLUP(channels.channel_desc,countries.country_iso_code)\n", | |
"ORDER BY 1" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Tidying up the object store and database\n", | |
"You can run the following steps if you want to remove all of the tables from the schema and purge the object store of files. Lets start by removing the tables. **NOTE : The code below will remove all of the tables from the schema. Make sure you've not got anything in the schema that you want to keep before running it.**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from tqdm import tqdm\n", | |
"\n", | |
"rs = cursor.execute(\"select table_name from user_tables\")\n", | |
"rows = rs.fetchall()\n", | |
"\n", | |
"for row in tqdm(rows):\n", | |
" rs2 = cursor.execute(\"drop table {} purge\".format(row[0]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"And then removing the object store files and bucket" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"object_list = object_storage_client.list_objects(namespace, bucket_name)\n", | |
"\n", | |
"for o in object_list.data.objects:\n", | |
" print('Deleting object {}'.format(o.name))\n", | |
" object_storage_client.delete_object(namespace, bucket_name, o.name)\n", | |
"\n", | |
"print('Deleting bucket') \n", | |
"response = object_storage_client.delete_bucket(namespace, bucket_name)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment