Skip to content

Instantly share code, notes, and snippets.

@aialenti
Last active September 13, 2020 21:32
Show Gist options
  • Save aialenti/ade74a55467f9d9bed65245018f1130b to your computer and use it in GitHub Desktop.
Save aialenti/ade74a55467f9d9bed65245018f1130b to your computer and use it in GitHub Desktop.
# Read the source tables in Parquet format
sales_table = spark.read.parquet("./data/sales_parquet")
sellers_table = spark.read.parquet("./data/sellers_parquet")
'''
SELECT a.*,
b.*
FROM sales_table a
LEFT JOIN sellers_table b
ON a.seller_id = b.seller_id
'''
# Left join
left_join_execution_plan = sales_table.join(sellers_table,
on=sales_table["seller_id"] == sellers_table["seller_id"],
how="left")
# Inner join
inner_join_execution_plan = sales_table.join(sellers_table,
on=sales_table["seller_id"] == sellers_table["seller_id"],
how="inner")
# Right join
right_join_execution_plan = sales_table.join(sellers_table,
on=sales_table["seller_id"] == sellers_table["seller_id"],
how="right")
# Full Outer join
full_outer_join_execution_plan = sales_table.join(sellers_table,
on=sales_table["seller_id"] == sellers_table["seller_id"],
how="full_outer")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment