Skip to content

Instantly share code, notes, and snippets.

@georgf
Created April 25, 2016 16:10
Show Gist options
  • Save georgf/a97f774c2e2fec40edb16657e872032c to your computer and use it in GitHub Desktop.
Save georgf/a97f774c2e2fec40edb16657e872032c to your computer and use it in GitHub Desktop.
Missing child payloads
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# coding: utf-8
# ### Missing child payloads
# In[2]:
import ujson as json
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import plotly.plotly as py
from plotly.graph_objs import *
from moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client, get_clients_history, get_records
get_ipython().magic(u'pylab inline')
# ### Get pings
# In[44]:
submission_dates = ("20160410", "20160420")
fraction = 0.1
channel = "aurora"
ss_pings = get_pings(sc,
app="Firefox",
channel=channel,
doc_type="saved_session",
schema="v4",
submission_date=submission_dates,
fraction=fraction)
main_pings = get_pings(sc,
app="Firefox",
channel=channel,
doc_type="main",
schema="v4",
submission_date=submission_dates,
fraction=fraction)
# In[45]:
properties = ["clientId",
"environment/settings/e10sEnabled",
"meta/reason",
"payload/childPayloads",
"payload/keyedHistograms/SUBPROCESS_ABNORMAL_ABORT/content",
"payload/info/sessionLength"]
ss_props = get_pings_properties(ss_pings, properties)
main_props = get_pings_properties(main_pings, properties)
(ss_props.count(), main_props.count())
# We only want to look at pings with e10s enabled.
# In[46]:
def e10s_enabled(ping):
return ping.get("environment/settings/e10sEnabled", False)
# In[47]:
ss = ss_props.filter(e10s_enabled)
main = main_props.filter(e10s_enabled) .filter(lambda p: p.get("meta/reason") == "shutdown")
counts = (ss.count(), main.count())
counts
# Now look for pings with missing child payloads.
# In[48]:
def missing_childPayloads(ping):
return None == ping.get("payload/childPayloads", None)
ss_missing = ss.filter(missing_childPayloads)
main_missing = main.filter(missing_childPayloads)
missing_counts = (ss_missing.count(), main_missing.count())
missing_counts
# In[49]:
def relative(sub, full):
return (round(float(sub[0]) / full[0], 4), round(float(sub[1]) / full[1], 4))
relative(missing_counts, counts)
# ### Does it correlate with crashes?
# Maybe missing child payloads correlate with content crashes?
# In[50]:
def has_content_crash(p):
return 0 < p.get("payload/keyedHistograms/SUBPROCESS_ABNORMAL_ABORT/content", 0)
ss_missing_crash = ss_missing.filter(has_content_crash)
main_missing_crash = main_missing.filter(has_content_crash)
missing_crash_counts = (ss_missing_crash.count(), main_missing_crash.count())
relative(missing_crash_counts, missing_counts)
# So, yes, the bulk of the missing child payloads can be blamed on content crashes.
# If the content process crashes, it won't send its Telemetry data to the parent process on shutdown.
# This still leaves >30% of the pings unexplained though.
# ### Correlation to session lengths?
# Another reason for missing child payloads could simply be short session lengths.
# If a session is too short we might not have spawned child processes or not have finished the Telemetry initialization in the child process etc.
# Let's check that.
# In[51]:
def mapper(p):
return p.get("payload/info/sessionLength", -1)
ss_other = ss_missing.filter(lambda p: not has_content_crash(p)) .map(mapper)
main_other = main_missing.filter(lambda p: not has_content_crash(p)) .map(mapper)
# In[52]:
ss_series = pd.Series(ss_other.collect())
main_series = pd.Series(main_other.collect())
# In[53]:
ss_series.describe(percentiles=[.25, .5, .75, .95, .99])
# In[54]:
main_series.describe(percentiles=[.25, .5, .75, .95, .99])
# So, most of the remaining missing child payloads are correlating with short session lengths.
# We can investigate that in more detail for improved data quality, but this doesn't seem to be a big problem right now.
# In[55]:
ss_series.plot(kind='hist', bins=10, title="Histogram of session length distributions in saved-session")
# In[56]:
main_series.plot(kind='hist', bins=10, title="Histogram of session length distributions in shutdown")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment