Skip to content

Instantly share code, notes, and snippets.

@aialenti
Last active September 13, 2020 15:36
Show Gist options
  • Save aialenti/7e0447bcfe46689219bcb363cbb6c080 to your computer and use it in GitHub Desktop.
Save aialenti/7e0447bcfe46689219bcb363cbb6c080 to your computer and use it in GitHub Desktop.
# Read the source tables in Parquet format
sales_table = spark.read.parquet("./data/sales_parquet")
'''
SELECT product_id,
SUM(num_pieces_sold) AS total_pieces_sold,
AVG(num_pieces_sold) AS average_pieces_sold,
MAX(num_pieces_sold) AS max_pieces_sold_of_product_in_orders,
MIN(num_pieces_sold) AS min_pieces_sold_of_product_in_orders,
COUNT(num_pieces_sold) AS num_times_product_sold
FROM sales_table
GROUP BY product_id
'''
sales_table_execution_plan = sales_table.groupBy(
col("product_id")
).agg(
sum("num_pieces_sold").alias("total_pieces_sold"),
avg("num_pieces_sold").alias("average_pieces_sold"),
max("num_pieces_sold").alias("max_pieces_sold_of_product_in_orders"),
min("num_pieces_sold").alias("min_pieces_sold_of_product_in_orders"),
count("num_pieces_sold").alias("num_times_product_sold")
)
sales_table_execution_plan.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment