ETags: a pretty sweet feature of HTTP 1.1

HTTP caching review

HTTP provides two ways for servers to control client-side caching of page components:

freshness may be based on a date or a token whose meaning is app-specific
whether or not the client needs to confirm the cached version is up-to-date with the server

This breaks down as follows:

Cache locally and don't check before using.
- This avoids a network request completely.
- Expires header - asks the browser to use the local copy until some date
- Cache-Control:max-age header - asks the browser to use the local copy until a number of seconds after download
Cache locally, but check before using.
- This requires a network request to contact the server, but avoids bandwidth costs of re-downloading.
- Last-Modified header - time-based validation - asks the browser to confirm the server hasn't updated the component since the Last-Modified date
- ETag header - token-based validation - asks the browser to confirm the component on the server has the same ETag as the cached copy.

Caching static assets is easy, because static URLs can often change transparently

The golden rule of front-end performance is to minimize HTTP requests, so we use far-future Expires or Cache-Control:max-age headers aggressively. If we are willing to break URLs, we can set the cache far in the future; this is exactly what we did with the connect-cachify tool we saw last week:

when a static asset is downloaded, it's given a far-future caching header (Expires or Cache-control:max-age)
if the asset changes, its URL is changed, breaking the cache & causing the browser to re-download the new file

Caching dynamic content is harder, because the rate of change is unpredictable and URLs must be stable

For dynamic content with persistent URLs, like templated web pages, we can't use the same approach:

we can't set far-future caching headers, unless we know exactly when the page will be updated in the future
we can't break the URL without hurting page discoverability, breaking bookmarks, and generally breaking the web

Although the browser has to check with the server that its cached copy is fresh, we can at least save the bandwidth and time required to redownload the file.

Because dynamic content might change at any time, we can't use the Last-Modified header; we have to use ETags to make dynamic content cacheable at all.

ETags allow dynamic content to be cached using an app-specific "opaque token"

An ETag, or entity tag, is an opaque token that identifies a version of the component served by a particular URL. The token can be anything enclosed in quotes; often it's an md5 hash of the content, or the content's VCS version number. If you're dealing with internationalized templates, the ETag should be different for each localized version. In general, ETag implementations should respect variations in content usually specified with Vary headers:

Vary:Accept-Language is used to signal to browsers that different representations exist, and should be cached separately, depending on the value of the Accept-Language request header.
Vary:Cookie is used to signal that the same page, though it might be seen by anonymous and logged-in users, should be cached separately--otherwise, logged-in users would see the anonymous version until they force-refreshed their browsers.

ETags and If-None-Match in action

I moved the ETags/If-None-Match "dialogue" to a separate gist

You can use either ETag or Last-Modified headers, or both, or neither; the HTTP 1.1 RFC actually recommends using both, in which case the server would only return a 304 if both the If-None-Match token and the If-Modified-Since date were fresh.

ETags require some configuration to be helpful; otherwise, they can cause caching problems.

The original YSlow rules, and the book High Performance Web Sites, suggest disabling ETags unless you take the time to properly configure them. This is because Apache and IIS both have terrible default values for ETags, using server-specific node info or server-specific timestamps, so that the ETag set on a component is different for each node in a server farm. Since ETags provide comparatively little performance benefit in general (conditional GETs still require an HTTP request), it's often an improvement just to disable them.

ETags have other, really cool applications

ETags are more than just a caching header; they identify a version of a representation served at a URL. This leads to some cool applications we'll just mention in passing:

optimistic concurrency: if 2 authors try to update a shared document, or 2 nodes try to update the same RESTful endpoint, they can avoid clobbering others' edits by passing the last-seen ETag as an If-Match header in a conditional PUT or conditional DELETE. If the version on the server differs from the version the client has edited, then the client's edits shouldn't be allowed.
sub-second updates: if a firehose API endpoint or auction webpage changes multiple times per second, and gets lots of traffic, ETags save clients and server a ton of bandwidth, and allow clients to sync with the server continuously. Without ETags, the server would have to use no caching, and let clients redownload stale content, or use HTTP date-based caching, forcing clients to a latency of at least 1 full second between updates.
304s in xhr responses: returning 304s from CRUD-style model endpoints can make short-polling real-time apps more efficient: although the app needs to be built to handle empty 304 responses, doing so can avoid the network and CPU cost (and potential UI lag) of re-downloading, re-JSONifying, and re-processing stale server input.
weak ETags can be used to issue partial Range requests of specific byte ranges. This is weird, wild stuff; if you need 206 Partial requests or the like, have fun digging into the RFC :-)

Because dynamic content might change at any time, we can't use the Last-Modified header

Of course we can ;)

The only difficulty is being able to calculate the proper value, which is – very simply – the maximum last-modified of all dynamic elements in the page. Typically in a blog it will be

the modification date of the article
the modification date of all assets with "breakable" URLs in the page (if an asset is updated, URL changes, and therefore contents change)
the modification date of widgets?

All this can often be easily calculated, and always easily cached if necessary. The important thing being Etag is calculated from the content, which means your server has to generate this content, which can be quite costly. While Last-Modified can often be calculated very more easily than the actual content (assuming you cached assets last-modified a single query will generally do the trick), and you can then return early (before generating content) and save server resources.

The advice should then be:

If you're able to efficiently and quickly calculate Last-Modified header, you should primarily use it
In every case, generating a hash of the content is always simple and fast, so you should use ETag too

jaredhirsch/gist:4971859

ETags: a pretty sweet feature of HTTP 1.1

HTTP caching review

Caching static assets is easy, because static URLs can often change transparently

Caching dynamic content is harder, because the rate of change is unpredictable and URLs must be stable

ETags allow dynamic content to be cached using an app-specific "opaque token"

ETags and If-None-Match in action

ETags require some configuration to be helpful; otherwise, they can cause caching problems.

ETags have other, really cool applications

naholyr commented Feb 20, 2013

Uh oh!

jaredhirsch commented Feb 21, 2013

Uh oh!

rmongia commented Jun 18, 2014

Uh oh!

cmawhorter commented Sep 17, 2014

Uh oh!

Armalon commented Aug 16, 2016 •

edited

Loading

Uh oh!

hashbender commented Oct 4, 2017

Uh oh!

sp00m commented Apr 9, 2018

Uh oh!

kirtiso commented Jun 22, 2018

Uh oh!

jaredhirsch/gist:4971859

ETags: a pretty sweet feature of HTTP 1.1

HTTP caching review

Caching static assets is easy, because static URLs can often change transparently

Caching dynamic content is harder, because the rate of change is unpredictable and URLs must be stable

ETags allow dynamic content to be cached using an app-specific "opaque token"

ETags and If-None-Match in action

ETags require some configuration to be helpful; otherwise, they can cause caching problems.

ETags have other, really cool applications

naholyr commented Feb 20, 2013

Uh oh!

jaredhirsch commented Feb 21, 2013

Uh oh!

rmongia commented Jun 18, 2014

Uh oh!

cmawhorter commented Sep 17, 2014

Uh oh!

Armalon commented Aug 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashbender commented Oct 4, 2017

Uh oh!

sp00m commented Apr 9, 2018

Uh oh!

kirtiso commented Jun 22, 2018

Uh oh!

Armalon commented Aug 16, 2016 •

edited

Loading