jdm · December 5, 2017 17:48
diff --git a/gistfile1.txt b/gistfile1.txt
 Importance of missing features:
 * disable Gecko DOM APIs that Servo doesn't support, see performance/visual effect on TP6 sites
 * disable Gecko layout features that Servo doesn't support, see performance/visual effect on TP6 sites

 JS/DOM architecture:
 * measure time spent in cycle collector (doesn't exist in Servo)
 * measure memory usage difference between eager and lazy reflector strategy
  - single page HTML spec
  - start with HTML parser nodes (nsHtml5TreeOperation::Append)
  - use save & measure, then measure & diff from about:memory, look at explicit/window-objects in content process
    - http://demo.borland.com/testsite/stadyn_largepagewithimages.html (duplicated ten times) (200mb total):
      - ~10mb in js-compartment usage
      - ~5mb in cycle-collector accounting
      - ~15mb in DOM-side JS runtime root tracking
    - http://html.spec.whatwg.org/ (345mb total):
      - ~16mb in js-compartment usage
      - ~8mb in cycle-collector accounting
      - ~17mb in DOM-side JS runtime root tracking
  - consider: huge number of CSSOM objects in gmail
  - evaluate: difference in memory usage post-load
 * measure perf difference for creating reflectors (actual time)
  - check if possible to remove branches on maybe wrapping?
 * measure difference in GC times between eager and lazy reflectors
 * measure GC times in Servo

 Per-origin threads (lower bound; at least 14% of tasks were not attributed):
 * Loading lifehacker.com, scrolling down the page so multiple ads appear, then quitting the page (~40s):
  - lifehacker.com: 8176.16; 19.7671% of total time
    * Task distribution (ms)
    * <0.01: 5199
    * <0.02: 934
    * <0.05: 1545
    * <0.1: 1298
    * <0.2: 1226
    * <0.5: 1530
    * <1.0: 797
    * <1.5: 399
    * <2.0: 257
    * <5.0: 511
    * <10.0: 114
    * >=10.0: 77
    
  - doubleclick.net: 458.709 ms; 1.109% of total time
    * Task distribution (ms)
    * <0.01: 456
    * <0.02: 79
    * <0.05: 201
    * <0.1: 293
    * <0.2: 385
    * <0.5: 239
    * <1.0: 55
    * <1.5: 17
    * <2.0: 6
    * <5.0: 16
    * <10.0: 4
    * >=10.0: 7
    
  - googlesyndication.com: 4447.69 ms; 10.7529% of total time
    * Task distribution (ms):
    * <0.01: 1433
    * <0.02: 537
    * <0.05: 1077
    * <0.1: 751
    * <0.2: 993
    * <0.5: 1327
    * <1.0: 603
    * <1.5: 216
    * <2.0: 116
    * <5.0: 225
    * <10.0: 70
    * >=10.0: 40

  - imrworldwide.com: 22.496 ms; 0.0543874% of total time
    * Task distribution (ms)
    * <0.01: 14
    * <0.02: 119
    * <0.05: 132
    * <0.1: 26
    * <0.2: 11
    * <0.5: 8
    * <1.0: 3
    * <1.5: 0
    * <2.0: 0
    * <5.0: 2
    * <10.0: 0
    * >=10.0: 0
    
  - 2mdn.net: 636.673 ms; 1.53925% of total time
    * Task distribution (ms)
    * <0.01: 715
    * <0.02: 73
    * <0.05: 400
    * <0.1: 361
    * <0.2: 240
    * <0.5: 132
    * <1.0: 50
    * <1.5: 6
    * <2.0: 8
    * <5.0: 16
    * <10.0: 10
    * >=10.0: 14
    
  - casalemedia.com: 39.1671 ms; 0.0946922% of total time
    * Task distribution (ms)
    * <0.01: 442
    * <0.02: 253
    * <0.05: 167
    * <0.1: 66
    * <0.2: 52
    * <0.5: 27
    * <1.0: 4
    * <1.5: 1
    * <2.0: 1
    * <5.0: 1
    * <10.0: 0
    * >=10.0: 0
    
  - krxd.net: 215.884 ms; 0.52193% of total time
    * Task distribution (ms)
    * <0.01: 111
    * <0.02: 47
    * <0.05: 167
    * <0.1: 167
    * <0.2: 83
    * <0.5: 52
    * <1.0: 14
    * <1.5: 3
    * <2.0: 2
    * <5.0: 6
    * <10.0: 6
    * <inf: 4

 * consider diffing tracking protection to estimate

 Layout architecture:
 * determine proportion of layout on main thread in Gecko which would be background in Servo (after display list construction)
  - calculate time spent painting under nsLayoutUtils::PaintFrame before DOM load event dispatched
 * compare previous numbers against numbers for something like TTI
  - removing main thread painting would make document load event numbers smaller
  - removing main thread painting would not affect TTI-like numbers
 * numbers on fennec?
  
 * TP6 measurements:
  - tp6-amazon.html:
    * 164.02ms/1480ms (load)
    * 246ms/2255.42ms (quiescent display list)
    * Paint distributions (ms)
      * <0.1: 0
      * <0.25: 0
      * <0.50: 0
      * <1.00: 1
      * <2.50: 24
      * <5.00: 2
      * <10.0: 0
      * <12.5: 0
      * <15.0: 0
      * >=15.0: 5

  - tp6-facebook.html:
    * 49.9434ms/911ms (load)
    * 63.7832ms/1205.93ms (quiescent display list)
    * Paint distributions (ms)
      * <0.1: 0
      * <0.25: 0
      * <0.50: 0
      * <1.00: 10
      * <2.50: 5
      * <5.00: 2
      * <10.0: 0
      * <12.5: 0
      * <15.0: 0
      * >=15.0: 1

  - tp6-google.html:
    * 87.5909ms/512ms (load)
    * 100.608ms/886.117ms (quiescent display list)
    * Paint distributions (ms)
      * <0.1: 0
      * <0.25: 0
      * <0.50: 4
      * <1.00: 31
      * <2.50: 2
      * <5.00: 1
      * <10.0: 2
      * <12.5: 0
      * <15.0: 0
      * >=15.0: 1

  - tp6-youtube.html:
    * 66.8185ms/1419ms (load)
    * 71.4241ms/1475.76ms (quiescent display list)
    * Paint distributions (ms)
      * <0.1: 0
      * <0.25: 0
      * <0.50: 6
      * <1.00: 77
      * <2.50: 21
      * <5.00: 4
      * <10.0: 2
      * <12.5: 0
      * <15.0: 0
      * >=15.0: 4

  - concerns: not as clear cut as "5-15% time reduction without main thread painting", since does not account for network fetches?
    * load times in particular are susceptible to network interference
    * would it be better to measure average input latency? time until input latency is consistently below a certain amount?
  - evaluate: histograms for paint times, use to determine possible minimum time until last display list painted
  
 * take measurements for initial page load as well as after initial load

 * measure how long layout takes on tp6

 Graphics architecture:
 * what proportion is rendering for TP6 sites
	Importance of missing features:
	* disable Gecko DOM APIs that Servo doesn't support, see performance/visual effect on TP6 sites
	* disable Gecko layout features that Servo doesn't support, see performance/visual effect on TP6 sites

	JS/DOM architecture:
	* measure time spent in cycle collector (doesn't exist in Servo)
	* measure memory usage difference between eager and lazy reflector strategy
	- single page HTML spec
	- start with HTML parser nodes (nsHtml5TreeOperation::Append)
	- use save & measure, then measure & diff from about:memory, look at explicit/window-objects in content process
	- http://demo.borland.com/testsite/stadyn_largepagewithimages.html (duplicated ten times) (200mb total):
	- ~10mb in js-compartment usage
	- ~5mb in cycle-collector accounting
	- ~15mb in DOM-side JS runtime root tracking
	- http://html.spec.whatwg.org/ (345mb total):
	- ~16mb in js-compartment usage
	- ~8mb in cycle-collector accounting
	- ~17mb in DOM-side JS runtime root tracking
	- consider: huge number of CSSOM objects in gmail
	- evaluate: difference in memory usage post-load
	* measure perf difference for creating reflectors (actual time)
	- check if possible to remove branches on maybe wrapping?
	* measure difference in GC times between eager and lazy reflectors
	* measure GC times in Servo

	Per-origin threads (lower bound; at least 14% of tasks were not attributed):
	* Loading lifehacker.com, scrolling down the page so multiple ads appear, then quitting the page (~40s):
	- lifehacker.com: 8176.16; 19.7671% of total time
	* Task distribution (ms)
	* <0.01: 5199
	* <0.02: 934
	* <0.05: 1545
	* <0.1: 1298
	* <0.2: 1226
	* <0.5: 1530
	* <1.0: 797
	* <1.5: 399
	* <2.0: 257
	* <5.0: 511
	* <10.0: 114
	* >=10.0: 77

	- doubleclick.net: 458.709 ms; 1.109% of total time
	* Task distribution (ms)
	* <0.01: 456
	* <0.02: 79
	* <0.05: 201
	* <0.1: 293
	* <0.2: 385
	* <0.5: 239
	* <1.0: 55
	* <1.5: 17
	* <2.0: 6
	* <5.0: 16
	* <10.0: 4
	* >=10.0: 7

	- googlesyndication.com: 4447.69 ms; 10.7529% of total time
	* Task distribution (ms):
	* <0.01: 1433
	* <0.02: 537
	* <0.05: 1077
	* <0.1: 751
	* <0.2: 993
	* <0.5: 1327
	* <1.0: 603
	* <1.5: 216
	* <2.0: 116
	* <5.0: 225
	* <10.0: 70
	* >=10.0: 40

	- imrworldwide.com: 22.496 ms; 0.0543874% of total time
	* Task distribution (ms)
	* <0.01: 14
	* <0.02: 119
	* <0.05: 132
	* <0.1: 26
	* <0.2: 11
	* <0.5: 8
	* <1.0: 3
	* <1.5: 0
	* <2.0: 0
	* <5.0: 2
	* <10.0: 0
	* >=10.0: 0

	- 2mdn.net: 636.673 ms; 1.53925% of total time
	* Task distribution (ms)
	* <0.01: 715
	* <0.02: 73
	* <0.05: 400
	* <0.1: 361
	* <0.2: 240
	* <0.5: 132
	* <1.0: 50
	* <1.5: 6
	* <2.0: 8
	* <5.0: 16
	* <10.0: 10
	* >=10.0: 14

	- casalemedia.com: 39.1671 ms; 0.0946922% of total time
	* Task distribution (ms)
	* <0.01: 442
	* <0.02: 253
	* <0.05: 167
	* <0.1: 66
	* <0.2: 52
	* <0.5: 27
	* <1.0: 4
	* <1.5: 1
	* <2.0: 1
	* <5.0: 1
	* <10.0: 0
	* >=10.0: 0

	- krxd.net: 215.884 ms; 0.52193% of total time
	* Task distribution (ms)
	* <0.01: 111
	* <0.02: 47
	* <0.05: 167
	* <0.1: 167
	* <0.2: 83
	* <0.5: 52
	* <1.0: 14
	* <1.5: 3
	* <2.0: 2
	* <5.0: 6
	* <10.0: 6
	* <inf: 4

	* consider diffing tracking protection to estimate

	Layout architecture:
	* determine proportion of layout on main thread in Gecko which would be background in Servo (after display list construction)
	- calculate time spent painting under nsLayoutUtils::PaintFrame before DOM load event dispatched
	* compare previous numbers against numbers for something like TTI
	- removing main thread painting would make document load event numbers smaller
	- removing main thread painting would not affect TTI-like numbers
	* numbers on fennec?

	* TP6 measurements:
	- tp6-amazon.html:
	* 164.02ms/1480ms (load)
	* 246ms/2255.42ms (quiescent display list)
	* Paint distributions (ms)
	* <0.1: 0
	* <0.25: 0
	* <0.50: 0
	* <1.00: 1
	* <2.50: 24
	* <5.00: 2
	* <10.0: 0
	* <12.5: 0
	* <15.0: 0
	* >=15.0: 5

	- tp6-facebook.html:
	* 49.9434ms/911ms (load)
	* 63.7832ms/1205.93ms (quiescent display list)
	* Paint distributions (ms)
	* <0.1: 0
	* <0.25: 0
	* <0.50: 0
	* <1.00: 10
	* <2.50: 5
	* <5.00: 2
	* <10.0: 0
	* <12.5: 0
	* <15.0: 0
	* >=15.0: 1

	- tp6-google.html:
	* 87.5909ms/512ms (load)
	* 100.608ms/886.117ms (quiescent display list)
	* Paint distributions (ms)
	* <0.1: 0
	* <0.25: 0
	* <0.50: 4
	* <1.00: 31
	* <2.50: 2
	* <5.00: 1
	* <10.0: 2
	* <12.5: 0
	* <15.0: 0
	* >=15.0: 1

	- tp6-youtube.html:
	* 66.8185ms/1419ms (load)
	* 71.4241ms/1475.76ms (quiescent display list)
	* Paint distributions (ms)
	* <0.1: 0
	* <0.25: 0
	* <0.50: 6
	* <1.00: 77
	* <2.50: 21
	* <5.00: 4
	* <10.0: 2
	* <12.5: 0
	* <15.0: 0
	* >=15.0: 4

	- concerns: not as clear cut as "5-15% time reduction without main thread painting", since does not account for network fetches?
	* load times in particular are susceptible to network interference
	* would it be better to measure average input latency? time until input latency is consistently below a certain amount?
	- evaluate: histograms for paint times, use to determine possible minimum time until last display list painted

	* take measurements for initial page load as well as after initial load

	* measure how long layout takes on tp6

	Graphics architecture:
	* what proportion is rendering for TP6 sites
No results found