Skip to content

Instantly share code, notes, and snippets.

@addyosmani
Last active May 28, 2022 22:40
Show Gist options
  • Save addyosmani/a74668d403c6de5b86b25e9daf6fe385 to your computer and use it in GitHub Desktop.
Save addyosmani/a74668d403c6de5b86b25e9daf6fe385 to your computer and use it in GitHub Desktop.
Thoughts on precompiling JS bytecode for delivery through a server/CDN

Some quick thoughts on https://twitter.com/dan_abramov/status/884892244817346560. It's not ignorant at all to ask how browser vendors approach performance. On the V8 side we've discussed bytecode precompilation challenges a few times this year. Here's my recollection of where we stand on the idea:

JavaScript engines like V8 have to work on multiple architectures. Every version of V8 is different. The architectures we target are different. A precompiled bytecode solution would require a system (e.g the server or a CDN) to generate bytecode builds for every target architecture, every version of V8 supported and every version of the JavaScript libraries or bundles bytecode is being generated for. This is because we would need to make sure every user accessing a page using that bytecode can still get the final JS successfully executed.

Consider that if a cross-browser solution to this problem was desired, the above would need to be applied to JavaScriptCore, SpiderMonkey and Chakra as well. It would need to carefully deliver the right bytecode per target or risk wasting bandwidth having to go back and forth between the client and server until a compatible version was found.

In addition, a bytecode solution would need to go through security and validation phases before an engine could accept something prebuilt coming down the wire. A CDN would need to support fallbacks for such bytecode not being interpretable by the target (e.g imagine a browser that doesn't support this bytecode accessing the service - it would need to provide a normal JS bundle as the fallback).

Practically speaking, given how different every JS engine is, we would likely need to craft something higher level than Ignition in V8 to be able to explore such an idea. Having discussed bytecode precompilation with the V8 team multiple times this year, there's a risk that unless exact V8 ignition code was shipped we would probably have to still do a lot of the expensive work we are already doing (which makes the idea of standardizing on an intermediate representation a little tricker).

The Ignition bytecode isn't a stable binary format, nor is it intended to be (we want to be able to change it with every V8 release in order to allow optimizations to new language features or passing additional operands in the bytecode to be used for type feedback by the optimizing compilers). We would also need to verify any bytecode which came over the wire to ensure it is valid and doesn't contain security exploits (e.g., accessing the stack out-of-bounds).

The idea itself has a lot of nuances to it. E.g imagine if a CDN or server shipped code allowing us to avoid parsing inner functions: it would need to provide us all the information for optimisations we do (e.g tracking whether variables are assigned after initialization) which is a moving target in addition to newer language features appearing that would require an intermediate representation to keep up to date. There's some skepticism about how much a bytecode solution would actually save (e.g I don't think you would see anything near 50% at the point where we just parse and compile a single function at a time).

None of this is to say that the idea can't or shouldn't be explored, it's just a little more complex a problem space than it might initially seem. An intermediate bytecode representation a CDN could use would need to be something which is cross-browser - perhaps structurally similar to WebASM bytecode (but with JS level semantics instead of machine semantics).

@addyosmani
Copy link
Author

addyosmani commented Jul 12, 2017

Sure. According the maestro @paulirish, CDN cache hit rates are far lower than we might have once thought. He suggests locally bundling and serving library code instead.

@sophiebits
Copy link

I may have misunderstood, but I thought Dan was suggesting a local bytecode cache so that bytecode can be reused between different websites visited by the same client if they use the same libraries – along with work to encourage people to use those shared builds of libraries. That mitigates many of the standardization and versioning concerns you mentioned (as well as lessening the potential positive benefits).

@addyosmani
Copy link
Author

a local bytecode cache so that bytecode can be reused between different websites visited by the same client if they use the same libraries – along with work to encourage people to use those shared builds of libraries.

I may be the one misunderstanding :) When I hear "local bytecode cache" I interpret that as bytecode that still needs to be precompiled by a tool and understandable by multiple JavaScript engines even if the bytecode lives on the CDN. Could you or Dan perhaps expand on the idea a little more?

@lourd
Copy link

lourd commented Jul 12, 2017

I think the idea is somewhere along the lines of, what if libraries could have inclusion in the browser or hosting that was somehow standard and official, that the browser could use to understand to use a parse cache. Browser says, “Oh, it’s this module, I know that module. We saw that when you went to Facebook. I can use the parsing output we got earlier because I know it’s the exact same content.”

@addyosmani
Copy link
Author

addyosmani commented Jul 12, 2017

I think I understand what you're after now. This is effectively possible today using the existing bytecode cache but due to our current heuristics (scripts need to be hit a few times within 72 hours) is happening far less than one might hope. If V8/JS engines were more explicit with how they allowed you to bytecode cache (e.g we gave you an API without these heuristics or less conservative ones), you'd be able to do this for both the CDN use-case and the more general one pretty much right away on 2nd visit onwards.

When the browser saw a request for a module that was previously used (and had a bytecode cache entry) it would just use the parsing output from earlier. There wouldn't be that requirement of multiple hits (which I imagine Facebook doesn't get because of deployment frequency). If my understanding of the above is correct, let me know and I can see what I can do to chase this up with Chrome.

@SindreSvendby
Copy link

SindreSvendby commented Jul 12, 2017

Did I understand the last comment correct:
Will two sites that use the same library reuse the same bytecode even when they are on two different urls? (if it is hit a few times within 72 hours)

Not sure when that kicks in, but would be nice if we could use the web as if it was one common immutable CDN / webpack bundle.
Say you didn't need to download and parse jquery, moment and react multiple times. if it was possible to somehow tag a module with a name and version and make it secure. right now we as developers has npm. but we haven't been able to give the same benefits to the users.
Any thoughts on this?

if we where able to tag a url or snippets of code with enough info, the browser should be able to skip downloading and parse it the second time (even on another url) and use the bytecode at once.

@gaganjakhotiya
Copy link

gaganjakhotiya commented Jul 12, 2017

@addyosmani can you elaborate a bit more on how one can make the most out of bytecode caching on today's browser?

@addyosmani
Copy link
Author

Will two sites that use the same library reuse the same bytecode even when they are on two different urls? (if it is hit a few times within 72 hours)

The bytecode cache is not currently able to match and reuse bytecode for the same resource from different URLs. It's unclear to me why it would need to do this based on the original proposal - if a CDN is hosting React (let's take Cloudflare):

https://cdnjs.cloudflare.com/ajax/libs/react/15.6.1/react.min.js

You're caching this asset for the cloudflare.com origin. My understanding of the current implementation is if URL A and URL B both reference the above, they would be able to take advantage of the bytecode cache for react.min.js.

if we where able to tag a url or snippets of code with enough info, the browser should be able to skip downloading and parse it the second time (even on another url) and use the bytecode at once.

The best way to currently do this would be referencing a CDN URL that is likely to already have been code cached (something that should be true if a number of sites have been visited that reference it). We can chat about what a tagging/annotation proposal could look like but simply doing it based on URLs is a pretty low-friction way to take advantage of the bytecode cache.

@mbrevda
Copy link

mbrevda commented Jul 15, 2017

It seems there is still place for non CDN hosts to say "I can offer you library x, with hash y" if you want it, and the browser can say "I've already got x with hash y as cached". This would take caching past unique url's, and allow code that was hosted by a given domain to be shared by a different domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment