Skip to content

Instantly share code, notes, and snippets.

@aialenti
Created September 13, 2020 14:57
Show Gist options
  • Save aialenti/542cb23d3fe15d0151f5cb87ed4886b0 to your computer and use it in GitHub Desktop.
Save aialenti/542cb23d3fe15d0151f5cb87ed4886b0 to your computer and use it in GitHub Desktop.
# Read the source tables in Parquet format
sales_table = spark.read.parquet("./data/sales_parquet")
'''
SELECT DISTINCT seller_id,
date
FROM sales_table
'''
sales_table_execution_plan = sales_table.select(
col("seller_id"), col("date")
).distinct()
# Print Schema
sales_table_execution_plan.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment