Skip to content

Instantly share code, notes, and snippets.

@MatMoore
Last active November 11, 2024 15:09
Show Gist options
  • Save MatMoore/4bb1ce31f0738fd24ac51c9cccf44b2d to your computer and use it in GitHub Desktop.
Save MatMoore/4bb1ce31f0738fd24ac51c9cccf44b2d to your computer and use it in GitHub Desktop.
Website performance notes

Google user-centric performance metrics

  1. Is it happening? -> First paint, First contentful paint
  2. Is it useful? (Can users engage with the content?) -> First meaningful paint
  3. Is it usable? (Can users interact with the page?) -> Time to interactive, DOMContentLoaded
  4. Is it delightful? (Are the interactions smooth and natural) -> Cumulative layout shift

Time to first byte

This is the time from when the request is initiated until the browser receives some data from the server

It depends on:

  • Latency
  • Connection speed
  • Server timing

Network delay

Each of these steps adds delay:

  1. The user makes the intial request e.g. by typing in a URL, clicking a link
  2. The browser resolves the domain
  3. The browser establishes a connection to the server
  4. The browser and server negotiate a secure connection
  5. The server issues potential redirects
  6. The browser receives the HTML response from the server

Round trip time (RTT) to the server is a huge factor, which is why we use CDNs.

Ilya Grigorik wrote an article about this: Latency: The New Web Performance Bottleneck

As an example, if a request has to cross the atlantic ocean, round trips to the server will take 100ms of milliseconds at best. The RTT can be more than a second.

The speed of the network also matters, mobile networks are a lot slower.

Additional resources

Every 3rd party request to a new domain is a new connection process.

CSS and scripts in the head tag block following content from rendering.

Largest contentful paint

Largest contentful paint (LCP) is the render time of the largest content element visible in the viewport.

It's a rough indicator of when the user can meaningfully engage with the website.

The following resources can all cause delays between the first paint and LCP, because the browser has to make additional round trips to the server to process them:

  • Images (additional round trips)
  • Custom fonts
  • Javascript

Images are particuarly problematic because after they load, text beneath them will reflow which is very disruptive.

Images that use data URIs are blocking: since all the data is there the browser will immediately start rendering, before following content is rendered.

By default, fetching external scripts blocks rendering as well.

If a server fails to respond, browsers can wait ages for a response, which leaves the website in a broken state. Chrome waits 30s; iOS waits 75s.

3rd party scripts can request other 3rd party scripts. Request map is a useful tool for visualising 3rd party scripts. For example optimizeley JS makes 3 more requests to other optimizeley URLs.

3rd party scripts can inject content haphazardly

Time to interactive

Median time to interactive is 9.3 seconds on mobile. CNN is 10s on desktop!

You should aim for < 5s on a slow 3G connection on a median mobile device.

When using frameworks like React, there is a pattern of rendering a version of the page on the server and then rehydrating on the client. But this forces the device to render the content twice!

The react docs state:

If you intentionally need to render something different on the server and the client, you can do a two-pass rendering. Components that render something different on the client can read a state variable like this.state.isClient, which you can set to true in componentDidMount(). This way the initial render pass will render the same content as the server, avoiding mismatches, but an additional pass will happen synchronously right after hydration. Note that this approach will make your components slower because they have to render twice, so use it with caution.

But this comes with the following warning:

Remember to be mindful of user experience on slow connections. The JavaScript code may load significantly later than the initial HTML render, so if you render something different in the client-only pass, the transition can be jarring. However, if executed well, it may be beneficial to render a “shell” of the application on the server, and only show some of the extra widgets on the client. To learn how to do this without getting the markup mismatch issues, refer to the explanation in the previous paragraph.

Rendering on the web describes different rendering models including this one.

The primary downside of SSR with rehydration is that it can have a significant negative impact on Time To Interactive, even if it improves First Paint. SSR’d pages often look deceptively loaded and interactive, but can’t actually respond to input until the client-side JS is executed and event handlers have been attached. This can take seconds or even minutes on mobile.

Perhaps you’ve experienced this yourself - for a period of time after it looks like a page has loaded, clicking or tapping does nothing. This quickly becoming frustrating... “Why is nothing happening? Why can’t I scroll?”

Performance metrics collected from real websites using SSR rehydration indicate its use should be heavily discouraged. Ultimately, the reason comes down to User Experience: it's extremely easy to end up leaving users in an “uncanny valley”.

There’s hope for SSR with rehydration, though. In the short term, only using SSR for highly cacheable content can reduce the TTFB delay, producing similar results to prerendering. Rehydrating incrementally, progressively, or partially may be the key to making this technique more viable in the future.

Less perceivable metrics

The total kilobytes metric is important as data costs money. But it's not something a user would notice directly until they saw their bill.

What does my site cost?

Metrics for javascript heavy sites:

  • Input delay: how long does it take to respond to user interaction
  • Custom metrics for your site (for example, Twitter used the time to open tweet the box)

Identifying performance problems

What's causing the metric to be slower than what we expect?

-> Set goals and budgets -> Avoid regression

WebpageTest

Waterfall chart

Focus first on everything to the left of the "Start render" line, because all of those resources block rendering.

The browser main thread row shows how busy the main thread is.

How to read a WebPageTest Waterfall View chat by Matt Hobbs

WebPageTest power users

Summary

Simple testing

  • This is the best place to start

Advanced testing

  • Script tab can be used for automation
  • Block tab can filter out URLs, so you can see how dependent your site is on particular URLs
  • SPOF tab (single point of failure) mimics real life failures - you can emulate what happens if a dependency is hanging
  • Can use blackhole.webpagetest.com directly in source code as well

Google PageSpeed insights

This simulates devices rather than using real devices. Gives a high level score, web vitals metrics, and recommendations.

Browser dev tools

Cmd+Option+I

Firefox/Chrome:

  • Performance tab tells you why page might feel laggy after loading
  • Network tab shows waterfall

Chrome audits tab creates Lighthouse reports locally

Profiling real devices

Can connect with USB

Goals & Budgets

  • Budgets is the worst performance that's acceptable
  • Goals is where you want to be

Goals

Start with competitors

See where competitors are doing better than you

Or use statistics from HTTP archive

E.g. 53% visits to mobiles sites are abandoned after 3 seconds

Can mark this on presentations

Budgets

Codifying and monitoring

  • Lighthousebot integrates with github, but doesn't catch everything happening on production
  • Speedcurve - dashboard and alerts
  • Calibre - good for communicating cost of 3rd party trackers and ads

Getting familiar with the tools

  • Audit sites worked on
  • Identify 3 most critical performance problems

Media

Cloudinary is a service that handles serving an appropriately sized images and videos. Can use it for media management.

Downside is dependency on 3rd party urls:

  • point of failure
  • additional network costs

Images

  • ImageOptim - a mac desktop app (jpg and png)
  • Optimage - a mac desktop app (freemium). Supports other formats like webp
  • WebPonize - converts to WebP
  • Imagemin - compresses imagees

WebP is often lighter than jpg and png. Support is pretty good but not supported on older browsers

One way to do it is to configure server to rewrite jpg/png requests to webp.

AVIF is even newer https://jakearchibald.com/2020/avif-has-landed/

Video

  • HandBrake
  • MiroVideoConverter

MPEG-4 works everywhere - but big filesize

WebM - better compression, less supported

Recommend serving both and negotiating with the browser

Fonts

  • "subsetting" removes characters that aren't used. e.g. removing languages that won't be used.
  • glyphhanger crawls a site and tells you what should be supported
  • WOFF 2.0 is the most modern format (but doesn't support IE)
  • Can use WOFF 1 as a fallback for IE11.
  • Can package multiple variants of a font in one file
  • Font Foundries offer fonts as a service, e.g. google web fonts - but has a cost of 3rd party request. Recommend hosting your own.
  • May be a licensing factor to hosting fonts on your own site

Text files

  • Minifying (e.g. removing unnecessary whitespace)
  • CSS minifier is a tool for 1 off minifying
  • cssmin is a build tool

SVG

  • svgo is the standard tool. lots of options and good defaults.

Javascript

  • uglify-js

Webpack can do "Tree Shaking" (dead-code elimination in javascript)

CSS

  • UnCSS removes unused rules
  • UnCSS online can be used to test it out

Compression

  • deflate is the main algorithm used (gzip)
  • in network panel in dev tools you should see a difference between transfer size and actual size
  • Brotli makes smaller files than gzip. Also uses deflate. Browser tells the server it can accept it.
  • Should degrade to gzip for older browsers
  • Real-World Effectiveness of Brotli: Brotli FCP improvement vs. Gzip: 3.462% decrease

Factors affecting time to first byte

Latency

  • physical distance
  • connection speed

Upfront connections (DNS/TLS/etc)

Static files are faster than dynamically generated files

Use CDNs

Use modern software/configuration

Is TLS fast yet

High performance browser networking

Minimise redirects

e.g. trailing slashes (use canonical tag and serve both) No longer need this for SEO

Page rendering models

Dynamic server side rendering (rails, wordpress)

Can use caching plugins or CDNs to serve static versions, mimic static server situation

Static server side rendering

e.g. 11ty jekyll

SSR with Rehydration (used to be called isometric javascript)

e.g. React

good for time to first byte

Client-side with pre-rendering

e.g. intially deliver skeleton with grey boxes

not very useful for users

Client side rendering

The user gets nothing until they get everything

Improving time to first paint

The browser blocks rendering while fetching standard link/script tags in the head of the page. "blocking file latency"

Javascript

If they're not needed for rendering, you can delay scripts with defer or async attributes. This is a common pattern for JS.

  • Async - less common. Executes whenever it arrives
  • Defer - load just before domContentLoaded, respect order in the HTML
  • type="module" scripts are defered by default.
  • You can use a script to generate a script tag and make that async

Defering javascript is the biggest easiest improvement for time to first paint

Moving script to the end of the page also makes them load later.

You may want to load some critical JS early:

  • Feature tests
  • Polyfills
  • File loaders
  • Conditional logic to bootstrap the page

The goal is to deliver one smooth rendering, so you want to "enhance optimistically"

With this pattern there is a small amount of js in the head that adds an attribute to the page saying "this is being enhanced". You can style based on the presence of that class even when waiting for the rest of the JS, and then if the script fails to load after a reasonable amount of time you can remove it again to go back to the non-enhanced version of the page.

CSS

There's no way to add async/defer to css. But print stylesheets always download in the background.

This leads to a hack for loading CSS asynchronously: <link rel="stylesheet href="site.css" media="print" onload="this.media='all'">

You generally don't want to load CSS async because of flash of unstylish content, but there is a pattern that uses it only for "crticial CSS"

Server push

This leaves the HTML the same but the server/edge has to do stuff.

Cloudfare supports it via a header https://www.cloudflare.com/en-gb/website-optimization/http2/serverpush/

But it was removed from edge and chrome.

Preload (experimental)

Indicates resources that will be required for rendering later on so you can load late-discovered resources early.

Preloading content

This differs from rel="prefetch" which is low priority and intended for the next navigation.

Example use cases:

  • Resources that are pointed to from inside CSS, like fonts or images.
  • Resources that JavaScript can request, like JSON, imported scripts, or web workers.
  • Larger images and video files.
<link rel="preload" href="font.woff2" as="font" type="font/woff2" crossorigin>

Inlining CSS

If external CSS resources are small, insert directly. This is bad for caching though. Tip: identify and inline CSS necessary for rendering "above the fold content"

Inlining critical and defering non-critical CSS

grunt-criticalcss is one of the tools that extracts critical css. This can be combined with the async trick to load the rest of it.

Splitting CSS files

  • You can have shared vs template CSS files to make use of cached files across pages.
  • You can use media queries to load CSS for different screens seperately. If it doesn't match it will load asynchronously, like print stylesheets.

Improving time to Largest Contentful Paint

This is about the focal points of the page. The metric measures the largest content element visible in the viewport

Analogy: If the first paint sets the stage LCP brings the characters into the scene

Prefetching important assets early

This allows the browser to connect to a domain in advance. Use it when fetching 3rd party script later down the page.

dns-prefetch is similar but just does the DNS resolution. It's not as useful.

Preload is a good tool for shuffling priorities if the architecture is messy

Example: preloading a/b test code that is dynamically included later

This can slow things down if you preload low priority things, so always test first.

Image sizing/targetting

width/height attributes

If you use width/height in html, browsers use it as a hint for aspect ratio and can paint a blank box. CSS is still used for the actual size.

img srcset and sizes

  • srcset is supported in most browsers. You specify width for each image source and browser decides which one to load.
  • sizes specifies up front a desired size when the image is rendered in the layout.

For example: 100vw (max-width: 500px), 50vw means 100% wide for viewports up to 500px, then 50%

Here vw = viewport width.

You can use sizes to implement a zoom feature!

This is good for when you have different crops at different viewport sizes (e.g. art-directed imagery).

Another common pattern is providing type fallbacks; e.g. provide webp but fall back to other type.

srcset alone can request very large images if the device is high resolution, so it's useful to constrain srcsets by providing a max-width. Then you can provide a fallback for the largest size.

preload can be paired with picture or img elements with matching constraints. It's probably overkill for most images, though.

Video can be responsive too in a similar way to picture.

Lazy loading

BBC use the lazy sizes project for lazy loading images when they come into the viewport.

Now there is a native attribute loading="lazy"

Progressive font loading

By default browsers hide text in custom font (AKA the flash of invisible text).

font-display: swap CSS renders in fallback and swaps in custom when the preferred font loads.

It's dangerous to use icon fonts because if they are blocked you see ridiculous things (see tripadviser rating of four fax machines and a laptop)

If you have a lot of seperate fonts, they could come in at different times and cause a lot of repaints. In this situation you could use JS to load fonts. Another advantage of JS approaches is you can inspect navigation.connection to better target the enhancement.

See A comprehensive guide to font loading strategies by Zach Leatherman.

3rd party JS shouldn't block content

Remove unnecessary scripts (telegraph: remove it and see if anyone notices)

Some options if you can't do that:

  • don't vary content for first-time visits
  • load it async and only mess with stuff far down the page
  • preconnect the scripts
  • move content variation to server side

You can also uyse cloudfare workers push browser functionality into the middle tier. This is great for A/B tests and personalisation.

Improving time to interactive

This should be a less problematic if the other metrics are good.

Server side rendering with rehydration has a drawback that it can look good but still take a long time to be interactive.

We can measure total blocking time.

Use vanilla JS where possible

The Vanilla Javascript Toolkit

Lean harder on native HTML and CSS

You don't need JS to style selects, and diverging from browser native behaviour requires a lot of thought (e.g. about accessibility).

Tree shaking (dead code elimination)

rollup.js is a great tool for that

Webpack can also do code splitting to split into different bundles. You can use this to load only what you need when you need it, and defer loading of features you don't need.

CPU profiles

In chrome dev tools you can view the functions that take up the most CPU time.

Be sure to throttle the network and CPU.

First input delay is the time to respond to user interaction.

Useful javascript features

window.requestIdleCallback is a more sophisticated setTimeout which can help avoid interactivity delays. You can set a time frame.

window.requestAnimationFrame is a similar idea for painting to the screen.

Intersection Observer lets us observe elements as they come in and out of the viewport. If you are listening to scroll events and checking if stuff is visible, this is a much more performant way to do that.

Optimizing for return visits

Native browser caching

Expires header is the old way of doing this.

Cache-Control is more flexible.

If the asset changes often

Cache-Control: no-cache

Browser will cache, server will revalidate the file in cache every time

If the asset won't change soon/ever

Cache-Control: public, max-age=2628000

Can be cached by the browser and anything in the middle for a long time.

If the user explicitly refreshes the page, the browser will revalidate.

Cache-Control: public, max-age=2628000, immutable will avoid this.

If you change your mind you can bust caches by varying the filename. This is why it's useful to version asset filenames.

If you think you know what resources are very likely to be requested

Prefetch asks the browser to download and caches a resource in a background. It's treated as low priority (unlike preload).

Prerender loads a URL and recursively fetches its resources.

You can also use service workers to manage requests and resposnes. The service worker will load and install after the initial page has loaded.

You can then cache a bunch of specific files for completely offline use. You can design for offline-first, then network, rather than the other way round.

Distinguishing new visitors without cookies

Tuning Performance for new and "Old" friends

You can use service workers to set a customer header with versions of files in cache. The server can then use this to understand client state.

@MatMoore
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment