Skip to content

Instantly share code, notes, and snippets.

@aialenti
Last active September 13, 2020 15:19
Show Gist options
  • Save aialenti/a4bcc8b293e5d9fce639728ed7997daa to your computer and use it in GitHub Desktop.
Save aialenti/a4bcc8b293e5d9fce639728ed7997daa to your computer and use it in GitHub Desktop.
# Read the source tables in Parquet format
sales_table = spark.read.parquet("./data/sales_parquet")
'''
SELECT seller_id,
CASE WHEN num_pieces_sold < 30 THEN 'Lower than 30',
WHEN num_pieces_sold < 60 THEN 'Between 31 and 60'
WHEN num_pieces_sold < 90 THEN 'Between 61 and 90'
ELSE 'More than 91' AS sales_bucket
FROM sales_table
'''
sales_table_execution_plan = sales_table.select(
col("seller_id"),
when(col("num_pieces_sold") < 30, "Lower than 30").
when(col("num_pieces_sold") < 60, "Between 31 and 60").
when(col("num_pieces_sold") < 90, "Between 61 and 90").
otherwise("More than 91").alias("sales_bucket")
)
sales_table_execution_plan.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment