Query based on similar query about sizes
and srcset
created by Colin Bendell for the 2019 Web Almanac
This takes about 850GB to run for 2019 this produces the following results:
Row | client | hasHeight | hasWidth | hasPicture | total | pctHeight | pctWidth | pctPicture |
---|---|---|---|---|---|---|---|---|
1 | mobile | 3415295 | 3548809 | 108589 | 5447942 | 62.69% | 65.14% | 1.99% |
2 | desktop | 2794913 | 2900974 | 94251 | 4474762 | 62.46% | 64.83% | 2.11% |
SQL:
SELECT
client,
COUNTIF(hasHeight) AS hasHeight,
COUNTIF(hasWidth) AS hasWidth,
COUNTIF(hasPicture) AS hasPicture,
COUNT(0) AS total,
ROUND(COUNTIF(hasHeight) * 100 / COUNT(0), 2) AS pctHeight,
ROUND(COUNTIF(hasWidth) * 100 / COUNT(0), 2) AS pctWidth,
ROUND(COUNTIF(hasPicture) * 100 / COUNT(0), 2) AS pctPicture
FROM (
SELECT
client,
REGEXP_CONTAINS(body, r'(?is)<(?:img)[^>]*height=[\'"]?([^\'"]*)') AS hasHeight,
REGEXP_CONTAINS(body, r'(?is)<(?:img)[^>]*width=[\'"]?([^\'"]*)') AS hasWidth,
REGEXP_CONTAINS(body, r'(?si)<picture.*?<img.*?/picture>') AS hasPicture
FROM
`httparchive.almanac.summary_response_bodies`
WHERE
firstHtml
)
GROUP BY
client
ORDER BY
client DESC