This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
host_day_df = logs_df.select(logs_df.host, | |
F.dayofmonth('time').alias('day')) | |
host_day_df.show(5, truncate=False) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
unique_host_count = (logs_df | |
.select('host') | |
.distinct() | |
.count()) | |
unique_host_count |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
not200_df = (logs_df | |
.filter(logs_df['status'] != 200)) | |
error_endpoints_freq_df = (not200_df | |
.groupBy('endpoint') | |
.count() | |
.sort('count', ascending=False) | |
.limit(10) | |
) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
paths_df = (logs_df | |
.groupBy('endpoint') | |
.count() | |
.sort('count', ascending=False).limit(20)) | |
paths_pd_df = paths_df.toPandas() | |
paths_pd_df |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
host_sum_df =(logs_df | |
.groupBy('host') | |
.count() | |
.sort('count', ascending=False).limit(10)) | |
host_sum_df.show(truncate=False) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
log_freq_pd_df = (log_freq_df | |
.toPandas() | |
.sort_values(by=['log(count)'], | |
ascending=False)) | |
sns.catplot(x='status', y='log(count)', data=log_freq_pd_df, | |
kind='bar', order=status_freq_pd_df['status']) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
log_freq_df = status_freq_df.withColumn('log(count)', | |
F.log(status_freq_df['count'])) | |
log_freq_df.show() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt | |
import seaborn as sns | |
import numpy as np | |
%matplotlib inline | |
sns.catplot(x='status', y='count', data=status_freq_pd_df, | |
kind='bar', order=status_freq_pd_df['status']) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
status_freq_pd_df = (status_freq_df | |
.toPandas() | |
.sort_values(by=['count'], | |
ascending=False)) | |
status_freq_pd_df |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
status_freq_df = (logs_df | |
.groupBy('status') | |
.count() | |
.sort('status') | |
.cache()) | |
print('Total distinct HTTP Status Codes:', status_freq_df.count()) |