Metrics help us understand how people are using our products which in turn helps us improve them in ways that increase the value folks get out of them. We aren't interested in using metrics to turn data about our users into a product that we can sell, we just want to provide the best possible user experience.
Engagement and retention are our bread and butter metrics. By improving these measurements we can be confident that we're improving the user experience.
Of all the people who used Sync in the last (28) days, what percentage of them used it on a given day?
Having a high ratio of daily users in relation to monthly users indicates that a majority of users are using Sync frequently. Tracking the engagement ratio allows us to monitor how changes to the user experience affect how frequently users engage.
Since the engagement ratio is a primary measure for our success it can be useful to compare ratios of different subsets of the user base. Segmenting by count of connected device and storage size are two examples. By identifying segments with higher engagement, we can focus efforts into making it easy for users to become members of those segments.
TODO add graph screenshot
How loyal are our users?
Knowing how long people continue using Sync is our second primary measure. By improving retention we know that we're maintaining users' interest.
Short term retention can help us understand:
- the quality of our user acquisition strategy
- the quality of our initial user experience
- how well we inform users about what Sync does
Long term retention can help us understand:
- what features lead to long term use
- long term satisfaction of individual features
Retention timeframes are TBD.
High level service health helps us correlate service degradation or disruption with other metrics. Lower level stats for debugging problems are handled by other systems.
TODO add graph screenshot
Logs from the servers are loaded into Redshift for analysis. Each day a new log table is created. These tables are dropped after approximately 90 days.
Here's the schema for the log tables:
create table sync0715
(
uid CHAR(32) NOT NULL encode lzo, -- a sha256-hashed Firefox Account (FxA) user id
s_uid CHAR(32) encode lzo, -- a surrogate user id (generated by token server)
dev CHAR(32) NOT NULL encode lzo, -- a sha256-hashed FxA device id
s_dev CHAR(32) encode lzo, -- a surrogate device id (generated by token server)
ts TIMESTAMP NOT NULL encode lzo, -- timestamp of the request
method VARCHAR(32) encode lzo, -- request method (GET, POST, etc)
code SMALLINT encode lzo, -- http status code of the response
bucket VARCHAR(255) encode bytedict, -- sync bucket name (bookmarks, history, etc)
t INTEGER encode bytedict, -- request time in milliseconds
ua_browser VARCHAR(255) encode lzo, -- request User Agent browser
ua_version INTEGER encode lzo, -- request User Agent browser version
ua_os VARCHAR(255) encode lzo, -- request User Agent OS
host VARCHAR(255) encode lzo -- server hostname that handled the request
)
distkey(uid)
interleaved sortkey(uid, dev);
After a daily table has been loaded, a few sql commands do rollups of the data.
This table summarizes a day of request activity per device id.
create table device_activity
(
uid char(32) not null encode lzo,
dev char(32) not null encode lzo,
ua_os varchar(8) encode bytedict,
ua_browser varchar(16) encode lzo,
ua_version integer encode lzo,
day date not null encode delta,
min_t integer encode bytedict,
max_t integer encode lzo,
posts integer encode lzo,
gets integer encode lzo,
puts integer encode lzo,
dels integer encode lzo,
aoks integer encode lzo,
oops integer encode lzo,
fups integer encode lzo,
r_clients integer encode lzo,
r_crypto integer encode lzo,
r_forms integer encode lzo,
r_history integer encode lzo,
r_keys integer encode lzo,
r_meta integer encode lzo,
r_bookmarks integer encode lzo,
r_prefs integer encode lzo,
r_tabs integer encode lzo,
r_passwords integer encode lzo,
r_addons integer encode lzo,
w_clients integer encode lzo,
w_crypto integer encode lzo,
w_forms integer encode lzo,
w_history integer encode lzo,
w_keys integer encode lzo,
w_meta integer encode lzo,
w_bookmarks integer encode lzo,
w_prefs integer encode lzo,
w_tabs integer encode lzo,
w_passwords integer encode lzo,
w_addons integer encode lzo
)
distkey(uid)
compound sortkey(day, uid);
And the query that inserts today's data.
insert into device_activity (
select
uid,
dev,
max(ua_os) as ua_os,
max(ua_browser) as ua_browser,
max(ua_version) as ua_version,
trunc(max(ts)) as day,
min(t) as min_t,
max(t) as max_t,
sum(posts) as posts,
sum(gets) as gets,
sum(puts) as puts,
sum(dels) as dels,
sum(aoks) as aoks,
sum(oops) as oops,
sum(fups) as fups,
sum(r_clients) as r_clients,
sum(r_crypto) as r_crypto,
sum(r_forms) as r_forms,
sum(r_history) as r_history,
sum(r_keys) as r_keys,
sum(r_meta) as r_meta,
sum(r_bookmarks) as r_bookmarks,
sum(r_prefs) as r_prefs,
sum(r_tabs) as r_tabs,
sum(r_passwords) as r_passwords,
sum(r_addons) as r_addons,
sum(w_clients) as w_clients,
sum(w_crypto) as w_crypto,
sum(w_forms) as w_forms,
sum(w_history) as w_history,
sum(w_keys) as w_keys,
sum(w_meta) as w_meta,
sum(w_bookmarks) as w_bookmarks,
sum(w_prefs) as w_prefs,
sum(w_tabs) as w_tabs,
sum(w_passwords) as w_passwords,
sum(w_addons) as w_addons
from
(select
uid,
dev,
ts,
t,
decode(substring(ua_os,0,8), 'iPad', 'ios', 'iPod', 'ios', 'iPhone', 'ios', 'Android', 'android', 'Windows', 'windows', 'Macinto', 'mac', 'Linux', 'linux', null, 'unknown', 'other') as ua_os,
ua_browser,
ua_version,
case when bucket = 'clients' and method = 'GET' then 1 end as r_clients,
case when bucket = 'crypto' and method = 'GET' then 1 end as r_crypto,
case when bucket = 'forms' and method = 'GET' then 1 end as r_forms,
case when bucket = 'history' and method = 'GET' then 1 end as r_history,
case when bucket = 'keys' and method = 'GET' then 1 end as r_keys,
case when bucket = 'meta' and method = 'GET' then 1 end as r_meta,
case when bucket = 'bookmarks' and method = 'GET' then 1 end as r_bookmarks,
case when bucket = 'prefs' and method = 'GET' then 1 end as r_prefs,
case when bucket = 'tabs' and method = 'GET' then 1 end as r_tabs,
case when bucket = 'passwords' and method = 'GET' then 1 end as r_passwords,
case when bucket = 'addons' and method = 'GET' then 1 end as r_addons,
case when bucket = 'clients' and method = 'POST' then 1 end as w_clients,
case when bucket = 'crypto' and method = 'POST' then 1 end as w_crypto,
case when bucket = 'forms' and method = 'POST' then 1 end as w_forms,
case when bucket = 'history' and method = 'POST' then 1 end as w_history,
case when bucket = 'keys' and method = 'POST' then 1 end as w_keys,
case when bucket = 'meta' and method = 'POST' then 1 end as w_meta,
case when bucket = 'bookmarks' and method = 'POST' then 1 end as w_bookmarks,
case when bucket = 'prefs' and method = 'POST' then 1 end as w_prefs,
case when bucket = 'tabs' and method = 'POST' then 1 end as w_tabs,
case when bucket = 'passwords' and method = 'POST' then 1 end as w_passwords,
case when bucket = 'addons' and method = 'POST' then 1 end as w_addons r_history,
from sync0715)
group by uid, dev);
This table summarizes how many active devices each user had on a given day, where an active device is one that has had any activity in the last 7 days.
Schema:
create table device_counts
(
uid CHAR(32) NOT NULL encode lzo,
day DATE encode delta,
devs SMALLINT encode lzo
)
distkey(uid)
compound sortkey(day, uid);
Query:
select
uid,
max(day) as day,
count(distinct dev) as devs
from
(select
uid,
dev,
day
from device_activity
where uid in
(select distinct(uid) from device_activity where day = '2016-07-14')
and day > '2016-07-14' - 7
and day < '2016-07-14' + 1)
group by uid;