Skip to content

Instantly share code, notes, and snippets.

@j-thepac
Created July 20, 2024 13:32
Show Gist options
  • Save j-thepac/497e4caa448aa0eb2f43883a492a66d5 to your computer and use it in GitHub Desktop.
Save j-thepac/497e4caa448aa0eb2f43883a492a66d5 to your computer and use it in GitHub Desktop.
'''
find max_duration, avg_duration for each CPID using Duration , round numbers to 2 deciaml places
ChargingEvent CPID StartDate StartTime EndDate EndTime Energy PluginDuration
16673806 AN11719 31/12/17 14:46:00 31/12/17 18:00:00 2.4 3.2333333333333334
16670986 AN01706 31/12/17 11:25:00 31/12/17 13:14:00 6.1 1.8166666666666667
3174961 AN18584 31/12/17 11:26:11 01/01/18 12:54:11 24 25.466666666666665
16674334 AN00812 31/12/17 15:18:00 01/01/18 14:06:00 6.7 22.8
3176831 AN24139 31/12/17 18:25:18 01/01/18 13:09:18 6.1 18.733333333333334
16673920 AN03984 31/12/17 14:54:00 31/12/17 19:19:00 5.6 4.416666666666667
'''
basePath="/Users/deepakjayaprakash/Downloads/personal_doc/DigitalRoute/"
inputPath=f"{basePath}input/electric-chargepoints-2017.csv"
resultPath=f"{basePath}result"
df=spark.read.option("header","true").csv(path)
df=df.withColumn("PluginDuration",df.PluginDuration.cast(FloatType() ))
df2=df.groupby("CPID")\
.agg(round(max("PluginDuration"),2).alias("max_duration")
,round(avg("PluginDuration"),2).alias("avg_duration")
).withColumnRenamed("CPID","chargepoint_id")
df3=df2.withColumn("max_duration",df2.max_duration.cast(StringType()) )\
.withColumn("avg_duration",df2.avg_duration.cast(StringType()) )
df4= df3\
.withColumn("max_duration",df3["max_duration"][0:6])\
.withColumn("avg_duration",df3["avg_duration"][0:6])
df5=df4\
.withColumn("max_duration",df4.max_duration.cast(FloatType() ))\
.withColumn("avg_duration",df4.avg_duration.cast(FloatType() ))
AN03155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment