Skip to content

Instantly share code, notes, and snippets.

@foolip
Last active December 20, 2019 14:39
Show Gist options
  • Save foolip/52e9a1ce11106bcc1713c6d2aae88c4e to your computer and use it in GitHub Desktop.
Save foolip/52e9a1ce11106bcc1713c6d2aae88c4e to your computer and use it in GitHub Desktop.

@othermaciej asked:

Dear lazyweb, is anyone ware of websites with any webp-only image content (i.e. not available in other formats)? Even if it's just a portion of their content? (Please provide URL if so.)

HTTPArchive query

This query looks for pages that in their main document have an <img src="something.webp"> but don't use <picture>.

SELECT * FROM (
  SELECT
    url,
    REGEXP_EXTRACT(body, r'<img src="[^"]*\.webp"') as imgtag
  FROM `httparchive.response_bodies.2019_11_01_desktop`
  WHERE page = url
    AND STRPOS(body, '<picture') = 0
) WHERE imgtag IS NOT NULL
    AND STRPOS(imgtag, '//') = 0 # exclude other hosts, typically https://example-cdn.com
    AND STRPOS(imgtag, 'pagespeed') = 0

Results: ~1600 pages in bigquery-webp.csv

Samples were loaded in Chrome, Firefox and Safari and the images sometimes located using:

Array.from(document.querySelectorAll('img')).filter(img => img.src.includes('webp'));

Broken pages

In the first 100 pages it was easy to find pages with broken images in Safari:

https://artdveri.com.ua/ has a broken image in the sidebar carousel:


http://www.gicaingenieros.com/ has no images:


https://www.indastro.com/ has a rotating (!) broken image:


https://www.internet.am/ has logo and other key images broken:


https://www.onnetflix.co.uk/ has a broken poster for one series:


http://supremesolar.in/ has its main logo and one carousel entry broken:


https://www.universmini.com/ is missing an image carousel:


https://viperos.gitlab.io/ is missing logo and important background image:

Commentary

These appear to be cases where the web developer has simply uploaded an image, found that it worked in some browser, and moved on. This sort of thing is bound to happen when browsers are not interoperable, even if there are ways for web developer to detect support which they "should" use.

Appendix: query approach that didn't work

I first looked for sites that don't use <picture>, have a lot of JPEG images and few WebP images, on the hypothesis that those few WebP images wouldn't be handled correctly.

SELECT url, reqWebp, reqJpg FROM
  `httparchive.summary_pages.2019_11_01_desktop` AS summary
  JOIN
  `httparchive.pages.2019_11_01_desktop` AS pages
  USING (url)
WHERE bytesWebp > 0 AND reqWebp > 0 AND reqWebp < 10 AND reqJpg >= 10
  AND JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.Picture') IS NULL

About 87k pages were found. However, checking some at random revealed that most were serving different images to different browsers. However, they didn't always use the Vary: Accept header so the criteria suggested by @yoavweiss didn't seem tractable.

Instead I started to look for the simplest possible cases, as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment