Last active
June 26, 2019 12:24
-
-
Save tjmonsi/01511617cbaea6954d62256e920a1d26 to your computer and use it in GitHub Desktop.
Counts the sites that delivers hosted fonts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#standardSQL | |
-- counts the host_url of a given font | |
WITH | |
summary_pages AS ( | |
SELECT * FROM `httparchive.sample_data.summary_pages_desktop_1k` | |
UNION ALL | |
SELECT * FROM `httparchive.sample_data.summary_pages_mobile_1k` | |
), | |
summary_requests AS ( | |
SELECT * FROM `httparchive.sample_data.summary_requests_desktop_1k` | |
UNION ALL | |
SELECT * FROM `httparchive.sample_data.summary_requests_mobile_1k` | |
), | |
merge_table AS ( | |
SELECT * FROM summary_requests JOIN ( | |
SELECT pageid, url AS page_url FROM summary_pages | |
) USING (pageid) | |
), | |
processedTable AS ( | |
SELECT (NET.HOST(url) != NET.HOST(page_url)) as hosted, url, page_url | |
FROM merge_table WHERE type = 'font' | |
) | |
SELECT value AS host, count | |
FROM ( | |
SELECT APPROX_TOP_COUNT(NET.HOST(url), 500) AS font_host | |
FROM processedTable | |
WHERE hosted = true | |
), UNNEST(font_host) |
@rviscomi I am checking the payload right now...
would it be ok to use Sec-Fetch-Site
header to check if it is hosted somewhere else then get the host url for counting?
@rviscomi I just checked this and it returns all requests, including the sum of all requests from one origin. Is this count correct?
Here's an example of the approach I'd take:
SELECT value AS host, count
FROM (
SELECT APPROX_TOP_COUNT(NET.HOST(url), 25) AS font_host
FROM `httparchive.sample_data.summary_requests_desktop_1k`
WHERE type = 'font'
), UNNEST(font_host)
It finds the 25 most common hosts of font resources.
You can modify this query to handle other metrics, like find the top 25 font formats by querying the ext
field:
SELECT value AS ext, count
FROM (
SELECT APPROX_TOP_COUNT(LOWER(ext), 25) AS font_extension
FROM `httparchive.sample_data.summary_requests_desktop_1k`
WHERE type = 'font'
), UNNEST(font_extension)
Wow, i made a lot more hehe... But thanks for this :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Since these are JSON payloads, I'd like to see the JSON_EXTRACT function used to improve the resiliency.