thanoojgithub · May 29, 2018 08:02 · thanoojgithub · Jan 28, 2016
diff --git a/ORDER BY and SORT BY b/ORDER BY and SORT BY
 ORDER BY and SORT BY
 It performs a total ordering of the query result set.
 This means that all the data is passed through a single reducer, which may take an unacceptably long time to execute for larger data sets.
 Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby performing a local ordering, where each reducer’s output will be sorted. Better performance is traded for total ordering.
 In both cases, the syntax differs only by the use of the ORDER or SORT keyword. You can specify any columns you wish and specify whether or not the columns are ascending using the ASC keyword (the default) or descending using the DESC keyword.

 Here is an example using ORDER BY:
 hive> SELECT s.ymd, s.symbol, s.price_close
 	>FROM stocks s
 	>ORDER BY s.ymd ASC, s.symbol DESC;

 Here is the same example using SORT BY instead:
 hive>SELECT s.ymd, s.symbol, s.price_close
 	>FROM stocks s
 	>SORT BY s.ymd ASC, s.symbol DESC;

 The two queries look almost identical, but if more than one reducer is invoked, the output will be sorted differently. While each reducer’s output files will be sorted, the data will probably overlap with the output of other reducers.

 DISTRIBUTE BY with SORT BY
 DISTRIBUTE BY controls how map output is divided among reducers.

 All data that flows through a MapReduce job is organized into key-value pairs. Hive must use this feature internally when it converts your queries to MapReduce jobs.
 By default, MapReduce computes a hash on the keys output by mappers and tries to evenly distribute the key-value pairs among the available reducers using the hash values.
 hive> SELECT s.ymd, s.symbol, s.price_close
 	> FROM stocks s
 	> DISTRIBUTE BY s.symbol
 	> SORT BY s.symbol ASC, s.ymd ASC;
 	
 DISTRIBUTE BY works similar to GROUP BY in the sense that it controls how reducers receive rows for processing, while SORT BY controls the sorting of data inside the reducer.

 Note that Hive requires that the DISTRIBUTE BY clause come before the SORT BY clause.
	ORDER BY and SORT BY
	It performs a total ordering of the query result set.
	This means that all the data is passed through a single reducer, which may take an unacceptably long time to execute for larger data sets.
	Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby performing a local ordering, where each reducer’s output will be sorted. Better performance is traded for total ordering.
	In both cases, the syntax differs only by the use of the ORDER or SORT keyword. You can specify any columns you wish and specify whether or not the columns are ascending using the ASC keyword (the default) or descending using the DESC keyword.

	Here is an example using ORDER BY:
	hive> SELECT s.ymd, s.symbol, s.price_close
	>FROM stocks s
	>ORDER BY s.ymd ASC, s.symbol DESC;

	Here is the same example using SORT BY instead:
	hive>SELECT s.ymd, s.symbol, s.price_close
	>FROM stocks s
	>SORT BY s.ymd ASC, s.symbol DESC;

	The two queries look almost identical, but if more than one reducer is invoked, the output will be sorted differently. While each reducer’s output files will be sorted, the data will probably overlap with the output of other reducers.

	DISTRIBUTE BY with SORT BY
	DISTRIBUTE BY controls how map output is divided among reducers.

	All data that flows through a MapReduce job is organized into key-value pairs. Hive must use this feature internally when it converts your queries to MapReduce jobs.
	By default, MapReduce computes a hash on the keys output by mappers and tries to evenly distribute the key-value pairs among the available reducers using the hash values.
	hive> SELECT s.ymd, s.symbol, s.price_close
	> FROM stocks s
	> DISTRIBUTE BY s.symbol
	> SORT BY s.symbol ASC, s.ymd ASC;

	DISTRIBUTE BY works similar to GROUP BY in the sense that it controls how reducers receive rows for processing, while SORT BY controls the sorting of data inside the reducer.

	Note that Hive requires that the DISTRIBUTE BY clause come before the SORT BY clause.
No results found