Skip to content

Instantly share code, notes, and snippets.

@dipanjanS
Created April 9, 2019 18:20
Show Gist options
  • Select an option

  • Save dipanjanS/d76853bd0c4a721c3fec0b607e5b8c54 to your computer and use it in GitHub Desktop.

Select an option

Save dipanjanS/d76853bd0c4a721c3fec0b607e5b8c54 to your computer and use it in GitHub Desktop.
daily_hosts_df = (host_day_distinct_df
.groupBy('day')
.count()
.select(col("day"),
col("count").alias("total_hosts")))
total_daily_reqests_df = (logs_df
.select(F.dayofmonth("time")
.alias("day"))
.groupBy("day")
.count()
.select(col("day"),
col("count").alias("total_reqs")))
avg_daily_reqests_per_host_df = total_daily_reqests_df.join(daily_hosts_df, 'day')
avg_daily_reqests_per_host_df = (avg_daily_reqests_per_host_df
.withColumn('avg_reqs', col('total_reqs') / col('total_hosts'))
.sort("day"))
avg_daily_reqests_per_host_df = avg_daily_reqests_per_host_df.toPandas()
avg_daily_reqests_per_host_df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment