Skip to content

Instantly share code, notes, and snippets.

@thanoojgithub
Created December 4, 2020 18:04
Show Gist options
  • Select an option

  • Save thanoojgithub/11062712445881bc177241828081ffd2 to your computer and use it in GitHub Desktop.

Select an option

Save thanoojgithub/11062712445881bc177241828081ffd2 to your computer and use it in GitHub Desktop.
Why Spark SQL Over Hive QL
By default hive uses MR engine but, we can set to taz or even spark engine (in-memory computation)
But,
hive has SQL like HiveQL (HQL) and more usage when you are a SQL developer
even though we have UDFs, we do not have extra backyard area to do some core/complex business logic
and Spark has Spark SQL and we can move from DF to RDD and RDD to DF to perform core/complex business logic
No resume capability
Hive can not drop encripted databases
Spark SQL has hive support as well (connecting to Hive Metaspace and database tables)
and run Hive queries on spark engine
Hive supports Batch processing, where as SparkSQL can be used in streaming process as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment