@othermaciej asked:
Dear lazyweb, is anyone ware of websites with any webp-only image content (i.e. not available in other formats)? Even if it's just a portion of their content? (Please provide URL if so.)
This query looks for pages that in their main document have an <img src="something.webp">
but don't use <picture>
.
SELECT * FROM (
SELECT
url,
REGEXP_EXTRACT(body, r'<img src="[^"]*\.webp"') as imgtag
FROM `httparchive.response_bodies.2019_11_01_desktop`
WHERE page = url
AND STRPOS(body, '<picture') = 0
) WHERE imgtag IS NOT NULL
AND STRPOS(imgtag, '//') = 0 # exclude other hosts, typically https://example-cdn.com
AND STRPOS(imgtag, 'pagespeed') = 0
Results: ~1600 pages in bigquery-webp.csv
Samples were loaded in Chrome, Firefox and Safari and the images sometimes located using:
Array.from(document.querySelectorAll('img')).filter(img => img.src.includes('webp'));
In the first 100 pages it was easy to find pages with broken images in Safari:
https://artdveri.com.ua/ has a broken image in the sidebar carousel:
http://www.gicaingenieros.com/ has no images:
https://www.indastro.com/ has a rotating (!) broken image:
https://www.internet.am/ has logo and other key images broken:
https://www.onnetflix.co.uk/ has a broken poster for one series:
http://supremesolar.in/ has its main logo and one carousel entry broken:
https://www.universmini.com/ is missing an image carousel:
https://viperos.gitlab.io/ is missing logo and important background image:
These appear to be cases where the web developer has simply uploaded an image, found that it worked in some browser, and moved on. This sort of thing is bound to happen when browsers are not interoperable, even if there are ways for web developer to detect support which they "should" use.
I first looked for sites that don't use <picture>
, have a lot of JPEG images and few WebP images, on the hypothesis that those few WebP images wouldn't be handled correctly.
SELECT url, reqWebp, reqJpg FROM
`httparchive.summary_pages.2019_11_01_desktop` AS summary
JOIN
`httparchive.pages.2019_11_01_desktop` AS pages
USING (url)
WHERE bytesWebp > 0 AND reqWebp > 0 AND reqWebp < 10 AND reqJpg >= 10
AND JSON_EXTRACT(payload, '$._blinkFeatureFirstUsed.Features.Picture') IS NULL
About 87k pages were found. However, checking some at random revealed that most were serving different images to different browsers. However, they didn't always use the Vary: Accept
header so the criteria suggested by @yoavweiss didn't seem tractable.
Instead I started to look for the simplest possible cases, as above.