Skip to content

Instantly share code, notes, and snippets.

@wicaksana
Last active October 19, 2022 06:45
Show Gist options
  • Save wicaksana/ff1f956a0b6f95dd4fd70b38b515e9f6 to your computer and use it in GitHub Desktop.
Save wicaksana/ff1f956a0b6f95dd4fd70b38b515e9f6 to your computer and use it in GitHub Desktop.
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("test") \
.getOrCreate()
# data is taken from https://github.com/GoogleCloudPlatform/serverless-spark-workshop/blob/main/cell-tower-anomaly-detection/cell-tower-anomaly-detection/01-datasets/telecom_customer_churn_data.csv
customer_churn = spark.read.format('csv') \
.option('header','true')\
.option('inferSchema', 'true')\
.load('gs://marifw-datalake-demo/cell-tower-anomaly-detection/telecom_customer_churn/telecom_customer_churn_data.csv')
customer_churn.createOrReplaceTempView('customer_churn')
res = spark.sql('''
SELECT
phones,
AVG(hnd_price) AS avg_hnd_price,
AVG(totcalls) AS avg_total_calls
FROM customer_churn
GROUP BY phones
''')
res.show(truncate=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment