* All data can be found at s3://databrickssql/ * Create the following databases: * {UNIQUE_IDENTIFIER}_retail * {UNIQUE_IDENTIFIER}_nyse * Number of orders placed every month. We have to consider only COMPLETE as well as CLOSED orders. Make sure output is sorted by month. * Database: retail * Table: orders * Number of orders placed in the months of 2013 August, 2013 September and 2013 October. We have to consider all PENDING orders (PENDING, PENDING_PAYMENT) * Database: retail * Table: orders * Get revenue for each order (using order_items). Output should contain order id and revenue, revenue should be rounded off to 2 digits. Also output should be saved to a new table called order_revenue (file format parquet). * Source Database: retail * Source Table: order_items * Target Database: {UNIQUE_IDENTIFIER}_retail * Target Table: order_revenue * Reference for [Creating Tables using Spark SQL](https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html) * Create tables for products, categories and departments, then load the data into all the tables. Use the below data model to come up with column names. * Base Directory for all data sets **/data/retail_db** * File Format: TEXTFILE * Delimiter: Comma "," * Create a new table for NYSE Data. * Database Name: **{UNIQUE_IDENTIFIER}_nyse** * Table Name: **nyse_eod** * Type: **EXTERNAL** * File Type: **TEXTFILE** (default) * Location: **s3://databrickssql/nyse_data/** * Fields: **stockticker, tradedate, openprice, highprice, lowprice, closeprice, volume** * Data Types: **Prices are of type float, stockticker is of type string, tradedate is of type int and volume is of type bigint.** * Field Delimiter: **Comma (,)** * Run `SELECT * FROM ` the table name to ensure data is loaded properly. Also run the count to make sure that all the data is loaded into the table. * Compute Daily Product Revenue * Source Database: **retail** * Consider only COMPLETE and CLOSED orders * Output should contain - order_date, product_name, revenue based on order_item_subtotal * Round off revenue to 2 digits * Save output to daily_product_revenue (sort the data by order_date in ascending and revenue in descending). * Target Database: **{UNIQUE_IDENTIFIER}_retail** * Target File Format: TEXTFILE * Get the details of orders where there are no records in order_items. * Source Database: retail * Tables: orders and order_items * Data Validation: Validate if order_item_subtotal is equal to product of order_item_product_price and order_item_quantity. * Source Database: retail * Source Table: order_items * Get the order count for each day and status sorted by order_date in ascending order and then by order_status where data is sorted COMPLETE, CLOSED and rest in ascending order. * Source Database: retail * Source Table: orders * Create FACT Table(s) for the following reports: * Daily Product Revenue * Daily Category Revenue * Daily Department Revenue * Roll up to Weekly, Monthly, Yearly * Support To Date Reports