This is English text version trnslated by openai (image links are broken. I'll fix it later...)

https://mizchi-20241123-jsconfjp.pages.dev/

All Happy Projects Look Alike, But Each Unhappy Project is Unhappy in Its Own Way

@mizchi at JSConfJP 2024

About

https://x.com/mizchi
Node.js and Frontend Specialist
Achieving a 120%* improvement in your frontend performance
Available for internal workshops or detailed discussions on today's topic.

What I’ll Cover / Won’t Cover

What I’ll Cover
- Performance budgeting as a mindset
- Measurements with Lighthouse
- Binary searching source code to locate issues
- Handling identified problems effectively
What I Won’t Cover
- Specific problem solutions (research case-by-case)

Presentation Context

Raising awareness about measuring specific code impacts
- Extensive measurement of third-party script effects across numerous websites
- Need to formalize and share what’s become tacit knowledge
Frontend Measurement ≈ E2E Measurement
- Addressing issues holistically, not just frontend-specific ones
- E2E measurement from the user’s perspective as the real UX metric
Spreading awareness of "preventive observability" in frontend
- What is Observability? | Splunk
- Google Online Security Blog: Eliminating Memory Safety Vulnerabilities at the Source

Performance Budgets

Building a fast website is one thing, but maintaining its performance is surprisingly difficult.
Over time, new features, third-party tracking scripts, or even unintentionally large image uploads can degrade performance through various development decisions.

Google Developers Japan: Introduction to Performance Budgets - Managing Web Performance Budgets

Recent Cases (Examples)

Common Causes of Performance Issues

No Measurement in Place
- Lack of awareness that problems can be measured/solved
- Misconception that performance inversely correlates with code size (it doesn’t)
Adopting Anti-Patterns, Leading to Chaos
- Ignoring official recommendations becoming the norm
- Wrong goals or methods lead to wrong solutions
Culture of Tolerating Poor DX/UX
- Complaining about non-functional requirements seen as unprofessional?
- (Short-temper/Laziness/Hubris, the "three programmer virtues," are harder to justify these days…)

My Conclusion: Happy Projects = Greenfield Projects

Fast but Empty Projects

Adopting theoretically optimal boilerplates
- Confirming Lighthouse perfection at the start
- Beyond this, caching via CDN is required
Developers add features by "spending" speed/complexity budgets
- Feature development = budget management
- Feature additions continue until the budget runs out
Healthy service delivery requires recognizing lost budgets and proactively reclaiming them

Unhappy Projects = Projects that Have Exhausted Their Budgets

"All happy families are alike, but each unhappy family is unhappy in its own way."

Anna Karenina (Leo Tolstoy)

“Anything that can go wrong, will go wrong.” | Murphy’s Law

Given modern software complexity, anticipating every problem is impossible.

If you can’t anticipate it, you must measure it.

Rob Pike's 5 Rules of Programming

Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

...

https://users.ece.utexas.edu/~adnan/pike.html

What is Frontend Measurement?

Synthetic monitoring - MDN Web Docs

Synthetic monitoring observes page performance in a controlled "laboratory" environment, typically with automated tools.
With consistent baselines, synthetic monitoring is suitable for measuring the impact of code changes on performance. However, it does not always reflect user experiences.

Frontend measurements with Lighthouse ≈ Application-wide (E2E) assumptions

Metrics and Tools

Web Vitals
- FCP: Time to display the first meaningful element
- LCP: Time when the largest element in initial render is confirmed
- TBT: Total time blocking the user's CPU
- CLS: Cumulative layout shift caused by CSS or JS modifications
Lighthouse: Tool for measuring Web Vitals
Chrome DevTools: Align with Lighthouse recommendations for measurement

My Approach to Tuning

Focus on Measurement First
- Don't guess based on the semantics of the source code. Prior knowledge can bias measurement
- (Possible due to my freelance perspective)
Iterate with DevTools and Source Code Analysis
- Use DevTools for trends, modify source code, and validate
- Always measure in a production-like environment multiple times (3+ iterations)
Tailor Metrics to the Application
- Web Vitals are just reference points but are well-designed (and impact SEO)

Lighthouse + Chrome DevTools

Interpreting Lighthouse

DevTools > Lighthouse > Analyze page load

FCP → Initial response latency issues
LCP → RTT problems
TBT → CPU/JS bundle processing issues
CLS → Critical path CSS
SI → Aggregate result (often unnecessary)

Lighthouse: Focus on LCP

Target: 2.5s (Lighthouse: 100)
What element determines LCP?
The path to LCP is the focus of tuning

DevTools > Performance

Read Vertically

Identical processes should have the same trigger
Vertical group count shows RTT

Read Dense Areas

MainThread: CPU load → TBT
Network: Request queue count

What happens just before LCP?

DevTools > Network

Sort by Size or Time

Size: Assets with high transfer volumes
Time: Slow requests
Waterfall: Visually confirm request sequence

DevTools > Network Blocking

Right-click a request > Block request URL
Reload and check side effects
Measure Lighthouse in this state

"What’s the score impact of removing GTM?"
"What happens without WebFonts?"

DevTools > Sources > Overrides

Modify responses for validation
- e.g., Empty an array
(Enabling this can be tricky)
- Overrides > Select folders to Override
  ☑ Enable Local Overrides

How to Learn Chrome DevTools

Learn by using!

Chrome for Developers is the best resource, but it’s limited.

Test every "More tools" option for the best insights.

Source Code Analysis and Measurement

Prerequisites for Source Code Analysis

Focus must already be narrowed down
- CPU / Network / RTT
Modify source code to find the minimal reproducible case
- Write as dirty a code as necessary!
- Create a separate branch for explanation purposes
Once the bottleneck is identified, evaluate the potential gain
- Mock or skip processing
- Determine how many points can be gained by resolving the issue
- Re-measure with Lighthouse if the bottleneck shifts

Measurement Methods: JavaScript

Print Debugging
- console.log(): The classic
- console.time(); console.timeEnd(): Measure time intervals (in milliseconds)
- PerformanceObserver
Timestamps
- Date.now(): Unix Time (millisecond precision, unsuitable for CPU processing)
- performance.now(): Microsecond precision from navigation start
Debugger
- debugger;: Pause in DevTools Debugger for inspection

My Measurement Approach

Primarily use print debugging
- Independent of the target environment
- Advanced tools tend to depend on the environment
This is a personal preference

const started = performance.now();
console.log("js:started", started);

function MyApp(props) {
  useEffect(() => {
    console.log('useEffect with', props);
    console.time('req:xxx');
    fetch('/xxx').then((res) => {
      console.timeEnd('req:xxx');
      debugger;
    });
  }, []);
  return <div>...</div>;
}

Narrowing Down with Binary Search (1)

Suppose we have the following code:

import a from './a';
import b from './b';
import c from './c';
import d from './d';

function async run() {
  await a();
  await b();
  await c();
  await d();
}

Binary Search (2) - Measure the First Half

Comment out the latter half
Insert measurement code

async function run() {
  console.time('a');
  await a(); // 200-300
  console.timeEnd('a');
  console.time('b');
  await b(); // 30
  console.timeEnd('b');
  // await c();
  // await d();
}

Binary Search (3) - Measure the Second Half and Check Dependencies

async function run() {
  // await a(); // 200-300
  // await b(); // 30

  console.time('c');
  await c(); // 0
  console.timeEnd('c');
  console.time('d');
  await d(); // 1000-1800
  console.timeEnd('d');
}

Perform the same steps for the second half
If there are dependencies in logic:
- Use mocks if simple
- Leave dependencies if mocking is complex
Record measurement results

Binary Search (4) - Recursive Narrowing

async function run() {
  await a(); // 200-300
  // await b();
  // await c();
  await d(); // 1000-1800
}
// Recursive measurement
export default function d() {
  d1(); // 0
  await d2(); // 1000-1700 <- This is it!
  await d3(); // 100
}

Leave only the necessary execution paths
Recursively measure the heaviest part

Binary Search (5) - Identify Specific Code

export default function d() {
  d1(); // 0
  await d2(); // 1000-1700
  // await d3(); // 100
}
async function d2() {
  console.time(`d2:fetch`);
  let cnt = 0;
  while (cnt < 10) {
    const ret = await fetch(`/api/d/${cnt}`);
    if (!ret.ok) break;
    cnt++;
  }
  console.timeEnd(`d2:fetch`);
  return cnt;
}

Pinpoint the actual bottleneck
(In practice, this is often in library APIs or native code)

Remove Bottlenecks and Re-measure

async function d2() {
  let cnt = 0;
  // while(cnt < 10) {
  //   const ret = await fetch(`/api/posts/${cnt}`);
  //   if (!ret.ok) break;
  //   cnt++;
  // }
  // console.timeEnd(`d2:fetch`);
  return 0;
}

Create a new branch from origin/main
Remove the issue with minimal changes (diff)
Measure Lighthouse improvements in this state
The score difference indicates the potential improvement

Finally: "Measure First, Then Tune"

async function d2() {
  let cnt = 0;
  while (cnt < 10) {
    console.time(`d/${cnt}`); // 200-300
    const ret = await fetch(`/api/d/${cnt}`);
    console.timeEnd(`d/${cnt}`);
    if (!ret.ok) break;
    cnt++;
  }
  console.log("end cnt", cnt); // 6
  return cnt;
}
// Can we optimize like this?
async function d2_parallel() {
  return Promise.all([...Array(10).keys()].map(idx => {
    return fetch(`/api/d/${idx}`).catch(console.error);
  }));
}
// Or refactor the server implementation
async function d2_once() {
  return await fetch(`/api/d`);
}

What is the purpose of this code?
Can the API itself be improved?
Is the issue fundamentally solvable?

Decision-Making for Performance Improvements

Triaging Identified Issues

Examples:
- LCP: /api/xxx in series at 300ms * 3RTT
  - Difficulty: Medium, +20~
- FCP: Initial response time of 1800ms
  - Difficulty: High, +10~
- TBT: A library of 800kb was included in the bundle
  - Difficulty: Low, +10
- CLS: Image sizes change after third-party script loads
  - Difficulty: Low, +5, Requires specification changes
Start with low-difficulty, high-gain, no-spec-change issues
Assign responsibilities (frontend, server, specifications)

Measuring Interaction Between Issues

Combine major issues for measurements
- Measure combinations like (A & B), (A & C), (B & C)
Root issues interact!
- -10 (CPU) -10 (CPU) => Total 80 (TBT -20)
- -15 (Network) -10 (CPU) => Total 85 (LCP -15)
Application vs. Third-Party
- Third-party scripts (e.g., GTM) may disrupt the initialization request queue
- If third-party issues are extreme, review their usage and operations

Preventing Recurrence

TBT: Use tools like @next/bundle-analyzer or vite-bundle-visualizer
CI: Automate measurement with lighthouse-ci
Culture: Raise awareness of measurement methods (this presentation itself is a preventive effort for the industry)
Organization: Ensure resources are allocated for performance budgets

Ultimately, It’s a Specification Decision

There’s a limit to what can be mechanically fixed
Without regular KPI tracking, it's hard to remove unnecessary features
- Does your company measure whether added features are actually being used?
Does the lost performance budget justify its value?
- Developers should propose solutions that minimize "implementation pain."

In Conclusion: Summary

While "Don't Guess, Measure" holds, ultimately, "Experience and Intuition" matter
- Facing problems and reviewing numerous cases are crucial
- DevTools proficiency only comes with use
Cost-effective fixes are unevenly distributed
- Problem A/B both -10, but the cost to fix doesn’t scale with the score
- Failing to recognize cumulative issues early often makes them irreversible later
  - Often, these stem from "quick hacks"
Specifications Matter Most
- Communication between Dev and Biz for mutual proposals is key
- The speed ratios among Server:Frontend:GTM reflect organizational power dynamics

The End

We can also conduct workshops on measurement methodologies.

Looking forward to discussing your project needs

mizchi/jsconfjp2024-performance-tuning-en.md