Created
December 4, 2020 18:04
-
-
Save thanoojgithub/11062712445881bc177241828081ffd2 to your computer and use it in GitHub Desktop.
Why Spark SQL Over Hive QL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| By default hive uses MR engine but, we can set to taz or even spark engine (in-memory computation) | |
| But, | |
| hive has SQL like HiveQL (HQL) and more usage when you are a SQL developer | |
| even though we have UDFs, we do not have extra backyard area to do some core/complex business logic | |
| and Spark has Spark SQL and we can move from DF to RDD and RDD to DF to perform core/complex business logic | |
| No resume capability | |
| Hive can not drop encripted databases | |
| Spark SQL has hive support as well (connecting to Hive Metaspace and database tables) | |
| and run Hive queries on spark engine | |
| Hive supports Batch processing, where as SparkSQL can be used in streaming process as well. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment