Skip to content

Instantly share code, notes, and snippets.

@zeroflag
Created June 12, 2020 09:54
Show Gist options
  • Select an option

  • Save zeroflag/2bf9eddcbfc79307899506c69c871560 to your computer and use it in GitHub Desktop.

Select an option

Save zeroflag/2bf9eddcbfc79307899506c69c871560 to your computer and use it in GitHub Desktop.
SELECT count(distinct ws_order_number) as order_count,
sum(ws_ext_ship_cost) as total_shipping_cost,
sum(ws_net_profit) as total_net_profit
FROM web_sales ws1
JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
FROM web_sales ws2 JOIN web_sales ws3
ON (ws2.ws_order_number = ws3.ws_order_number)
WHERE ws2.ws_warehouse_sk <> ws3.ws_warehouse_sk
-- and ws2.ws_sold_date between '1999-5-01' and '1999-7-01'
-- and ws3.ws_sold_date between '1999-5-01' and '1999-7-01'
) ws_wh1
ON (ws1.ws_order_number = ws_wh1.ws_order_number)
LEFT OUTER JOIN web_returns wr1 ON (ws1.ws_order_number = wr1.wr_order_number)
WHERE d.d_date between '1999-5-01' and '1999-7-01' and
-- ws_sold_date between '1999-5-01' and '1999-7-01' and
ca.ca_state = 'TX' and
s.web_company_name = 'pri' and
wr1.wr_order_number is null
limit 100;
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Tez |
| DagId: hive_20200612095237_6327b0cf-8b68-4390-b2b7-5f58b9be2ae3:41 |
| Edges: |
| Map 1 <- Map 10 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) |
| Map 11 <- Reducer 7 (BROADCAST_EDGE) |
| Map 13 <- Reducer 7 (BROADCAST_EDGE) |
| Map 14 <- Reducer 6 (BROADCAST_EDGE) |
| Reducer 12 <- Map 11 (CUSTOM_SIMPLE_EDGE), Map 13 (CUSTOM_SIMPLE_EDGE) |
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Reducer 12 (CUSTOM_SIMPLE_EDGE) |
| Reducer 3 <- Map 14 (CUSTOM_SIMPLE_EDGE), Reducer 2 (CUSTOM_SIMPLE_EDGE) |
| Reducer 4 <- Reducer 3 (SIMPLE_EDGE) |
| Reducer 5 <- Reducer 4 (CUSTOM_SIMPLE_EDGE) |
| Reducer 6 <- Reducer 2 (CUSTOM_SIMPLE_EDGE) |
| Reducer 7 <- Map 1 (CUSTOM_SIMPLE_EDGE) |
| DagName: hive_20200612095237_6327b0cf-8b68-4390-b2b7-5f58b9be2ae3:41 |
| Vertices: |
| Map 1 |
| Map Operator Tree: |
| TableScan |
| alias: ws1 |
| filterExpr: (ws_ship_addr_sk is not null and ws_web_site_sk is not null and ws_ship_date_sk is not null) (type: boolean) |
| Statistics: Num rows: 21600036511 Data size: 5528875272680 Basic stats: COMPLETE Column stats: COMPLETE |
| Filter Operator |
| predicate: (ws_ship_addr_sk is not null and ws_web_site_sk is not null and ws_ship_date_sk is not null) (type: boolean) |
| Statistics: Num rows: 21583844790 Data size: 5524730742376 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: ws_ship_date_sk (type: bigint), ws_ship_addr_sk (type: bigint), ws_web_site_sk (type: bigint), ws_order_number (type: bigint), ws_ext_ship_cost (type: decimal(7,2)), ws_net_profit (type: decimal(7,2)) |
| outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 |
| Statistics: Num rows: 21583844790 Data size: 5524730742376 Basic stats: COMPLETE Column stats: COMPLETE |
| Map Join Operator |
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 _col1 (type: bigint) |
| 1 _col0 (type: bigint) |
| outputColumnNames: _col0, _col2, _col3, _col4, _col5 |
| input vertices: |
| 1 Map 8 |
| Statistics: Num rows: 407242361 Data size: 100305764080 Basic stats: COMPLETE Column stats: COMPLETE |
| Map Join Operator |
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 _col2 (type: bigint) |
| 1 _col0 (type: bigint) |
| outputColumnNames: _col0, _col3, _col4, _col5 |
| input vertices: |
| 1 Map 9 |
| Statistics: Num rows: 58177483 Data size: 13315405840 Basic stats: COMPLETE Column stats: COMPLETE |
| Map Join Operator |
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 _col0 (type: bigint) |
| 1 _col0 (type: bigint) |
| outputColumnNames: _col3, _col4, _col5 |
| input vertices: |
| 1 Map 10 |
| Statistics: Num rows: 6463723 Data size: 895528872 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col3 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col3 (type: bigint) |
| Statistics: Num rows: 6463723 Data size: 895528872 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col4 (type: decimal(7,2)), _col5 (type: decimal(7,2)) |
| Select Operator |
| expressions: _col3 (type: bigint) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 6463723 Data size: 103419568 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: min(_col0), max(_col0), bloom_filter(_col0, expectedEntries=1000000) |
| minReductionHashAggr: 0.99 |
| mode: hash |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| null sort order: |
| sort order: |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: binary) |
| Execution mode: vectorized, llap |
| LLAP IO: may be used (ACID table) |
| Map 10 |
| Map Operator Tree: |
| TableScan |
| alias: d |
| filterExpr: d_date BETWEEN DATE'1999-05-01' AND DATE'1999-07-01' (type: boolean) |
| Statistics: Num rows: 73049 Data size: 4675136 Basic stats: COMPLETE Column stats: COMPLETE |
| Filter Operator |
| predicate: d_date BETWEEN DATE'1999-05-01' AND DATE'1999-07-01' (type: boolean) |
| Statistics: Num rows: 8116 Data size: 519424 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: d_date_sk (type: bigint) |
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| outputColumnNames: _col0 |
| Statistics: Num rows: 8116 Data size: 519424 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col0 (type: bigint) |
| Statistics: Num rows: 8116 Data size: 519424 Basic stats: COMPLETE Column stats: COMPLETE |
| Execution mode: vectorized, llap |
| LLAP IO: may be used (ACID table) |
| Map 11 |
| Map Operator Tree: |
| TableScan |
| alias: ws2 |
| filterExpr: (ws_order_number BETWEEN DynamicValue(RS_34_ws1_ws_order_number_min) AND DynamicValue(RS_34_ws1_ws_order_number_max) and in_bloom_filter(ws_order_number, DynamicValue(RS_34_ws1_ws_order_number_bloom_filter))) (type: boolean) |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| Filter Operator |
| predicate: (ws_order_number BETWEEN DynamicValue(RS_34_ws1_ws_order_number_min) AND DynamicValue(RS_34_ws1_ws_order_number_max) and in_bloom_filter(ws_order_number, DynamicValue(RS_34_ws1_ws_order_number_bloom_filter))) (type: boolean) |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: ws_warehouse_sk (type: bigint), ws_order_number (type: bigint) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col1 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col1 (type: bigint) |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint) |
| Execution mode: vectorized, llap |
| LLAP IO: may be used (ACID table) |
| Map 13 |
| Map Operator Tree: |
| TableScan |
| alias: ws3 |
| filterExpr: (ws_order_number BETWEEN DynamicValue(RS_34_ws1_ws_order_number_min) AND DynamicValue(RS_34_ws1_ws_order_number_max) and in_bloom_filter(ws_order_number, DynamicValue(RS_34_ws1_ws_order_number_bloom_filter))) (type: boolean) |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| Filter Operator |
| predicate: (ws_order_number BETWEEN DynamicValue(RS_34_ws1_ws_order_number_min) AND DynamicValue(RS_34_ws1_ws_order_number_max) and in_bloom_filter(ws_order_number, DynamicValue(RS_34_ws1_ws_order_number_bloom_filter))) (type: boolean) |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: ws_warehouse_sk (type: bigint), ws_order_number (type: bigint) |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col1 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col1 (type: bigint) |
| Statistics: Num rows: 21600036511 Data size: 345557373256 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint) |
| Execution mode: vectorized, llap |
| LLAP IO: may be used (ACID table) |
| Map 14 |
| Map Operator Tree: |
| TableScan |
| alias: wr1 |
| Statistics: Num rows: 2160007345 Data size: 17280058760 Basic stats: COMPLETE Column stats: COMPLETE |
| Filter Operator |
| predicate: (wr_order_number BETWEEN DynamicValue(RS_37_ws1_ws_order_number_min) AND DynamicValue(RS_37_ws1_ws_order_number_max) and in_bloom_filter(wr_order_number, DynamicValue(RS_37_ws1_ws_order_number_bloom_filter))) (type: boolean) |
| Statistics: Num rows: 2160007345 Data size: 17280058760 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: wr_order_number (type: bigint) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 2160007345 Data size: 17280058760 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col0 (type: bigint) |
| Statistics: Num rows: 2160007345 Data size: 17280058760 Basic stats: COMPLETE Column stats: COMPLETE |
| Execution mode: vectorized, llap |
| LLAP IO: may be used (ACID table) |
| Map 8 |
| Map Operator Tree: |
| TableScan |
| alias: ca |
| filterExpr: (ca_state = 'TX') (type: boolean) |
| Statistics: Num rows: 40000000 Data size: 3760000000 Basic stats: COMPLETE Column stats: COMPLETE |
| Filter Operator |
| predicate: (ca_state = 'TX') (type: boolean) |
| Statistics: Num rows: 754717 Data size: 70943398 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: ca_address_sk (type: bigint) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 754717 Data size: 70943398 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col0 (type: bigint) |
| Statistics: Num rows: 754717 Data size: 70943398 Basic stats: COMPLETE Column stats: COMPLETE |
| Execution mode: vectorized, llap |
| LLAP IO: may be used (ACID table) |
| Map 9 |
| Map Operator Tree: |
| TableScan |
| alias: s |
| filterExpr: (web_company_name = 'pri ') (type: boolean) |
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Statistics: Num rows: 84 Data size: 8064 Basic stats: COMPLETE Column stats: COMPLETE |
| Filter Operator |
| predicate: (web_company_name = 'pri ') (type: boolean) |
| Statistics: Num rows: 12 Data size: 1152 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: web_site_sk (type: bigint) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 12 Data size: 1704 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col0 (type: bigint) |
| Statistics: Num rows: 12 Data size: 1704 Basic stats: COMPLETE Column stats: COMPLETE |
| Execution mode: vectorized, llap |
| LLAP IO: may be used (ACID table) |
| Reducer 12 |
| Execution mode: vectorized, llap |
| Reduce Operator Tree: |
| Map Join Operator |
| condition map: |
| Inner Join 0 to 1 |
| keys: |
| 0 KEY.reducesinkkey0 (type: bigint) |
| 1 KEY.reducesinkkey0 (type: bigint) |
| outputColumnNames: _col0, _col1, _col2 |
| input vertices: |
| 1 Map 13 |
| Statistics: Num rows: 258749294920 Data size: 6209896656240 Basic stats: COMPLETE Column stats: COMPLETE |
| DynamicPartitionHashJoin: true |
| Filter Operator |
| predicate: (_col0 <> _col2) (type: boolean) |
| Statistics: Num rows: 258749294920 Data size: 6209896656240 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: _col1 (type: bigint) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 258749294920 Data size: 2069994359360 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| keys: _col0 (type: bigint) |
| minReductionHashAggr: 0.99 |
| mode: hash |
| outputColumnNames: _col0 |
| Statistics: Num rows: 21600036511 Data size: 172800292088 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col0 (type: bigint) |
| Statistics: Num rows: 21600036511 Data size: 172800292088 Basic stats: COMPLETE Column stats: COMPLETE |
| Reducer 2 |
| Execution mode: vectorized, llap |
| Reduce Operator Tree: |
| Map Join Operator |
| condition map: |
| Left Semi Join 0 to 1 |
| keys: |
| 0 KEY.reducesinkkey0 (type: bigint) |
| 1 KEY.reducesinkkey0 (type: bigint) |
| outputColumnNames: _col3, _col4, _col5 |
| input vertices: |
| 1 Reducer 12 |
| Statistics: Num rows: 6463723 Data size: 895528872 Basic stats: COMPLETE Column stats: COMPLETE |
| DynamicPartitionHashJoin: true |
| Reduce Output Operator |
| key expressions: _col3 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col3 (type: bigint) |
| Statistics: Num rows: 6463723 Data size: 895528872 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col4 (type: decimal(7,2)), _col5 (type: decimal(7,2)) |
| Select Operator |
| expressions: _col3 (type: bigint) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 6463723 Data size: 103419568 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: min(_col0), max(_col0), bloom_filter(_col0, expectedEntries=1000000) |
| minReductionHashAggr: 0.99 |
| mode: hash |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| null sort order: |
| sort order: |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: binary) |
| Reducer 3 |
| Execution mode: vectorized, llap |
| Reduce Operator Tree: |
| Map Join Operator |
| condition map: |
| Left Outer Join 0 to 1 |
| keys: |
| 0 KEY.reducesinkkey0 (type: bigint) |
| 1 KEY.reducesinkkey0 (type: bigint) |
| outputColumnNames: _col3, _col4, _col5, _col12 |
| input vertices: |
| 1 Map 14 |
| Statistics: Num rows: 17063916 Data size: 3439575200 Basic stats: COMPLETE Column stats: COMPLETE |
| DynamicPartitionHashJoin: true |
| Filter Operator |
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| predicate: _col12 is null (type: boolean) |
| Statistics: Num rows: 6463723 Data size: 1302893376 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| expressions: _col3 (type: bigint), _col4 (type: decimal(7,2)), _col5 (type: decimal(7,2)) |
| outputColumnNames: _col3, _col4, _col5 |
| Statistics: Num rows: 6463723 Data size: 1302893376 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: sum(_col4), sum(_col5) |
| keys: _col3 (type: bigint) |
| minReductionHashAggr: 0.91645867 |
| mode: hash |
| outputColumnNames: _col0, _col2, _col3 |
| Statistics: Num rows: 3231861 Data size: 749791752 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: bigint) |
| null sort order: z |
| sort order: + |
| Map-reduce partition columns: _col0 (type: bigint) |
| Statistics: Num rows: 3231861 Data size: 749791752 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col2 (type: decimal(17,2)), _col3 (type: decimal(17,2)) |
| Reducer 4 |
| Execution mode: vectorized, llap |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: sum(VALUE._col0), sum(VALUE._col1) |
| keys: KEY._col0 (type: bigint) |
| mode: partial2 |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 3231861 Data size: 749791752 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: count(_col0), sum(_col1), sum(_col2) |
| mode: partial2 |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 232 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| null sort order: |
| sort order: |
| Statistics: Num rows: 1 Data size: 232 Basic stats: COMPLETE Column stats: COMPLETE |
| TopN Hash Memory Usage: 0.04 |
| value expressions: _col0 (type: bigint), _col1 (type: decimal(17,2)), _col2 (type: decimal(17,2)) |
| Reducer 5 |
| Execution mode: vectorized, llap |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: count(VALUE._col0), sum(VALUE._col1), sum(VALUE._col2) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 232 Basic stats: COMPLETE Column stats: COMPLETE |
| Limit |
| Number of rows: 100 |
| Statistics: Num rows: 1 Data size: 232 Basic stats: COMPLETE Column stats: COMPLETE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 232 Basic stats: COMPLETE Column stats: COMPLETE |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| Reducer 6 |
| Execution mode: vectorized, llap |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: min(VALUE._col0), max(VALUE._col1), bloom_filter(VALUE._col2, expectedEntries=1000000) |
| mode: final |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| null sort order: |
| sort order: |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: binary) |
| Reducer 7 |
| Execution mode: vectorized, llap |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: min(VALUE._col0), max(VALUE._col1), bloom_filter(VALUE._col2, expectedEntries=1000000) |
| mode: final |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| null sort order: |
| sort order: |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: binary) |
| Reduce Output Operator |
| null sort order: |
| sort order: |
| Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: binary) |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: 100 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment