Skip to content

Instantly share code, notes, and snippets.

@razhangwei
Last active February 5, 2021 21:19
Show Gist options
  • Save razhangwei/a4093a617b2ebf9c9525c19ecf16cfc6 to your computer and use it in GitHub Desktop.
Save razhangwei/a4093a617b2ebf9c9525c19ecf16cfc6 to your computer and use it in GitHub Desktop.
Hive / Spark SQL

Useful udf:

  • Use Daiquery 'interactive spark' to debug the query first.
  • HiveInsertOperatorWithSchema does not support CTE in select query; need to put it in preselect
  • empty map with types: FB_CAST(NULL, 'MAP<INT, ARRAY<DOUBLE>>')
  • array: FB_ARRAY_APPLY, FB_ARRAY_AGGREGATE, FB_ARRAY_GET, FB_ARRAY_SORT,
  • FB_PREV
  • aggregate: FB_COLLECT
  • sample:

DISTRIBUTE BY ASC

LAMBDA(x TYPE) SOME_EXPR(x)

  • different names for primitive types: FLOAT, STRING
  • type composition: e.g., MAP<INT, FLOAT>, ARRAY<FLOAT>
  • type conversion: FB_CAST(a, 'MAP<INT, ARRAY<BIGINT>>')
  • dynamic partition inserts:
INSERT OVERWRITE TABLE <OUTPUT_TBL>
	PARTITION(ds = '<DATEID>', pipeline = '<PIPELINE>', version, type)
...
SELECT 
   ...,
   version, 
   type

Reference:

  1. Language manual: https://fburl.com/wiki/vr669g3h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment