Helpful Resources
- Dashboards and Query code snippets in the re:dash demo
- Telemetry meets SQL by rvitillo
- Custom Dashboards with re:dash
- Longitudinal Examples
- Longitudinal Dataset Tutorial in Python
You can use Google Spreadshets as a query resource which allows you to do additional calculations and other processing of data which is then collected into re:dash to build your dashboards. The disadvantage of Google Spreadsheets is that unlike other queries you will need to update the spread data manually instead of having re:dash re-run the query on a repeating schedule.
NOTE: The first row of your spreadsheet should be the column names
Here is how to query your spreadsheet:
- Share your spreadsheet with [email protected] (edit privs?)
- (agree to the warning about this being an external to the Mozilla account)
- In the re:dash query use the ID of the spreadsheet and the (zero based) index of the sheet
ID|SHEET_INDEX
- The ID can be found in the URL: https://docs.google.com/spreadsheets/d/1-DaBLABLAmu_lVGunk8I/edit has an ID of
1-DaBLABLAmu_lVGunk8I
- The sheet index can be omitted and the query will use the first (left most) sheet, to use the second sheet you would write
1-DaBLABLAmu_lVGunk8I|1
. The sheet can be named anything, the sheet index is just like an array index of all the sheets, if you rearrange the order of your sheets you will need to update the index values you use. - Run your query
Presto docs | Presto Functions
Use this snippet to see all the tables available.
show tables
You can download this data as a CSV / XSL file so you always have it handy.
This is a sample (1%) of all users (all channels) over time (180 days). If you're unsure which db to use, this is the one you're looking for.
The longitudinal dataset is logically organized as a table where rows represent profiles and columns represent the various metrics (e.g. startup time). Each field of the table contains a list of values, one per Telemetry submission received for that profile.
Here's why you want to use this table
- Easier reporting of profile level metrics by grouping all data for a client-id in the same row
- Samples to 1% of all recent profiles, which will reduce query computation time and save resources (compared to main_summary)
- Profiles across all channels with up to 180 days of data
Get Started Examples
desc longitudinal
This will list all the available columns in the table.
SELECT *
FROM longitudinal
LIMIT 1000
This will query a sample of the table, by adding the LIMIT 1000
you can run test queries much faster and remove the limit when you're ready to run the query for real.
SELECT *
FROM longitudinal
-- LIMIT 1000
SQL uses the --
at the start of a line to comment out the line, remember?
System contains a number of elements related to the Firefox operating system.
The following example queries to group systems by memory for systems running in WoW64 mode.
SELECT count(client_id) as clients, system[1].memory_mb as memory_mb
FROM longitudinal
WHERE system[1].is_wow64 = TRUE -- 32-bit Firefox on 64-bit OS
GROUP BY system[1].memory_mb
Build contains a number of elements related to the build of Firefox running on the system.
The following example queries to group all systems by architecture.
SELECT count(client_id) as clients, build[1].architecture as architecture
FROM longitudinal
GROUP BY build[1].architecture -- 'x86-64' equals 64-bit Firefox (on 64-bit OS)
System GFX contains a number of elements related to the graphics system of Firefox.
The following example queries to group all systems by the number of monitors connected.
SELECT count(client_id) as clients, cardinality(system_gfx[1].monitors) as monitors
FROM longitudinal
GROUP BY cardinality(system_gfx[1].monitors)
HAVING cardinality(system_gfx[1].monitors) > 0 -- removing odd headless
Client count is a subset of the main_summary database for querying counts like, How many users with e10s are on each channel?
desc client_count
SELECT * FROM client_count LIMIT 100
Limit the activity to a range that makes sense for your query. Start with when the telemetry probe went live, like '2016-02-00'
in the example. Then cut off the last two weeks so your graph doesn't trail off, looking like total doom to come. from_iso8601_date(activity_date) < current_date - interval '14' day
WHERE activity_date > '2016-02-00' AND from_iso8601_date(activity_date) < current_date - interval '14' day
Do Not Want | WANT |
---|---|
![]() |
![]() |
This is the full database, you probably don't want to use this unless you like waiting 30+ min for your queries to finish. On the plus side this db returns the most normal SQL compared to others which return array values.
desc main_summary
SELECT * FROM main_summary LIMIT 100
@clarkbw, here a more readable line for clamping on dates: