Skip to content

Instantly share code, notes, and snippets.

@jeff-hykin
Last active August 16, 2023 13:52
Show Gist options
  • Save jeff-hykin/4ed8910ccf248233d54294ad894142b9 to your computer and use it in GitHub Desktop.
Save jeff-hykin/4ed8910ccf248233d54294ad894142b9 to your computer and use it in GitHub Desktop.
Deno Dependencies Discussion on Dups, Dissapearance, and Dev Experience

I appreciate the discussion, and I'll be eagerly awaiting Ryan's talk. @kevinwhinnery I'm really glad my response wasn't taken too negatively. I want these conversations to be productive, and, when changes don't negatively impact the ecosystem I generally try to avoid complaining about them (like deno KV). And feel free to bluntly correct/argue/criticize any of my points.

without some kind of semver aware central package manager, I don’t know how you solve duplicate and disappearing dependencies

Oh, that seems like the easy part for me, which does make wish I had the opportunity to present/discuss them before the team did all this work on a centralized package manager. I would like to start off with a bit more feedback before I get to that.

1st reaction; PM's don't--can't--help Disappearing deps

I like the deno team, and I would trust my best friend with my life. But if my best friend told me "Trust me bro, Github servers might go down, but MY server is reliable I've got this fancy SLA that says so. So just make everything depend on my server and only my server; that'll solve your reliability problem" I'd laugh him out of the room. I don't want to distance myself from the team, so please try to imagine standing alongside me and look at this argument as a 3rd party; "URLs (like deno.land/x which makes up the majority of Deno dependencies) sometimes disappear. Therefore we are strongly encouraging everyone to use deno.land/r, and are going to show favoritism towards that specific URL". How does that make any sense? Why does a package manager make deno.land/r more reliable than deno.land/x? Why not just say "we're going to give deno.land/x an SLA, so publish your modules there"? I mean deno vendor partially addresses disappearing URLs, deno bundle used to partially address the issue, source maps and patching (<-my favorite part of the talk) partially address the issue, but centralization and a package manager? How does it even relate?

(is this just me or did you feel similarly on this^ @arsh, @AapoAlas, @ anyone ?)

2nd reaction; DX and Naming doesn't need hard-coded syntax

If deno announced deno.land/r as a curated version alongside of the un-curated deno.land/x (instead of saying it would replace deno.land/x) I would be excited. And not just 'exited' but I would be like "dang, they did it again 🎉 I better start writing my draft to all the other packaging systems telling them they need to follow suit".

What makes that^ so different from the actual announcement is; I instantly know which half of my modules I would put on /r (like deno-tree-sitter) and which half STILL belong on /x (like binaryify). Even just recently @arsh's es-codec was something I said should be a PR on std to make it "more official" and he argued it didn't really belong in std; the real solution is es-codec belongs on a curated list like /r. My python packages, my VS Code extensions, my node modules, and ruby gems, all of them nicely fit into either an official/maintained group or a small-project/experimental/unofficial group.

And again I'm curious if that's just me or if you feel the same way @arsh, @AapoAlas, @ anyone.

In terms of DX, deno.land is already the de-facto way to find deno modules. I love that deno.land search results have a stamp for official modules/std. If deno.land also had a big "Curated" stamp (or the opposite; a big warning stamp on non-curated modules) IMO that 100% deliver on the experience of "hey, go install X" => one official X to install. And again, no hard-coded urls in the runtime, no need to tell projects to have a deno.jsonc. Only modules published in the deno-registry would need a deno.jsonc as a metadata file for licensing, semver, version-ranges, etc.

3rd reaction; DX is important... but there's not going to be a dpm command?

For talking about a better DX around "hey, go install X", like making it feel easy and de-facto, it seems like the best part was missed. "hey, go install X" => npm install X, pip install X, gem/cargo/poetry/yarn/hex install X I mean do a discord poll; "would you like a deno official module that installs a cli tool for helping updated/install/patch http modules?" I think the overwhelming answer would be yes. I don't think its bad that dpm is missing, but if the team truly wants a better more de-facto feeling DX, take the centralized hardcoded stuff out of the runtime and put it into dpm install X. The whole reason it's problematic is that the change is the RUNTIME.

Also why call it a package manager if dpm isn't going to be a thing? Isn't that a package index/repository not a manager? Just seems weird.

Last reaction: Concerns about version ranges

While I have faith in the team to succeed at re-inventing something that almost all other teams have failed at, I am concerned about version ranges (and very concerned by phrases like "similar semver"). Pinned versions, and them being so heavily encouraged, are THE reason I use Deno. If python exclusively used Deno-style imports, I wouldn't even be touching Deno. What I love about Deno 1.0 is that IMO it does TheRightThing™ even when it hurts, even when it means one script needs 110 versions of lodash that only differ by patch number, Deno doesn't cut corners, it does its duty to be as faithful as possible and gets all 110 versions of lodash. Module authors saying "well my module Should™ work with v2.x" doesn't mean anything. I have no doubt in my mind that Deno modules JustWork™ precisely because the author of [email protected] used dep [email protected], and 99.9% of users of [email protected] also used it with [email protected], one module == one experience. Version ranges for python, npm, and ruby have caused me nothing short of months of pain. One module version == "thanks for the GH issue, but can't reproduce, what version are you using for numpy and matplotlib?". If Deno version ranges are ONLY used to enhance patching/source maps/updates, then I'm onboard and excited, it would be nice to know the authors expectation.

But if Deno 2.0 is going to, by default, try to save storage space by cutting corners just "[email protected]" Should™ work for Y, then I just lost my #1 reason for loving the deno ecosystem. Losing that on top of secondary reasons like decentralization, EverythingIncluded™ (e.g. deno bundle), and Minimalist (e.g. the addition of Deno KV) is a really heavy blow to my favorite language/runtime.

Solution Discussion; We can do better! Even with Deno 1.0!

And by better I mean

  • better uptime
  • even less duplication
  • better security against changing http endpoints
  • faster downloads
  • faster runtimes
  • all while preserving a good DX, version ranges, etc. All we need is merkle trees and some basic tools, and I'm happy to put a lot of work explaining/applying these tools to Deno.

I'm going to start a thread since there's a lot to address with the solution

Better De-dup

Let's start with #2, because as the team knows, solving disappearing endpoints (#1) requires dup-detection. (@ others, because, if claims to be a mirror of we need dup-checking to know if they're actually providing a duplicate or providing something malicious instead)

To start off, consider adding a file hash to URLs (or as @crowlKats mention to me, leveraging the lock file), not only does it solve the basic dup-detection problem (same hash = duplicate), but we can already do it with Deno 1.0 by adding anchors to the existing URLs, e.g import "https://deno.land/std/log/mod.ts#hash=a2b6d4f9" or even jsx-like annotationimport "https://deno.land/std/log/mod.ts"/* @hash=a2b6d4f9 */, but common Deno dependency tools like udd and trex can do this automatically without developers needing to do anything (no-workflow-change required is pretty fantastic DX). I would love to see Deno 2.0 add a warning, similar to an un-pinned version imports, that warns that a content-hash wasn't, and even better, throw an error if the content hash was different from what was actually provided by the url (e.g. content change).

That's just the tip of the iceberg, Eszip and merkle trees can get way better de-dup, and even faster runtimes. We can do a recursive hash similar to how eszip does a recursive retrieval of sources, every source file is then broken up into chunks (e.g. leaf nodes). For example, IPFS already does this using fixed byte-length chunks, but since this is exclusively hashing JS/TS we can do WAY better; 1 chunk per top-level statement (based on an initial AST passover, like the one ready performed by eszip). From there de-dup is trivial; store the chunks in any content addressed storage (CAS) and all duplication is eliminated. A efficient CAS can be anything from an insanely large-scale cloud object storage (like amazon S3) to something as tiny as a single in-memory hashmap. And that (the hashmap) is where de-dup helps the runtime. When the deno is loading dependencies, it wouldn't need to read all the imported files from deno.land/x/[email protected] and the imported files from esm.sh/[email protected], because, by the time it finishes reading deno.land/x/lodash, all the lodash chunks for esm.sh/lodash are already in the hashmap! I could even past lodash files into a repo, load that repo, and it would still automatically de-dup everything my repo had in common with esm.sh/lodash. Runtime isn't the only thing either, eszip should be to compress binaries even further.

Merkle tree based de-dup compared to semver-based is like comparing Git version management to SVM.

Disappearing URLs

Now that we've established a killer de-dup system, solving #1 without a centralized package manager is easy. I think we can all agree decentralized systems (either federalized like email/git or peer-to-peer like torrents), are the undisputed heavyweight champs of resiliency. When powerful governments WANT a url/content to disappear, the content isn't on some website with a measly SLA, its the first magnet link on google.

Once URLs include a hash of the content, automated safe, fast, peer-to-peer sharing becomes an option. While having it as a backup option effectively eliminates the disappearing URL problem as best as theoretically possible, it also can be used as a download accelerator. Consider, how many times is one or more of the dependencies we need already on one of our coworker's PC (sitting on the same local network)? Especially for beefy dev machines, an opt-in system for local network transfers could be both a better DX and reduce the load on a central server (e.g. lower cost, more sharding/redundancy, SLA less expensive to maintain). The announcement could be that the team makes all files from deno.land/x available on IPFS. I'd be so excited by Deno 2.0 checking file hashes and falling back on IPFS that I'd update the day the beta was available (instead of my usual wait 1-year after a stable release before installing).

Semver, Version ranges, and Patching

I wish version ranges and more semver stuff was mentioned in the talk as it has big implications (e.g. it means the runtime needs to be aware of semver, which isn't the case for the other goals). If there's other not-mentioned goals well; I can't propose useful solutions if I don't the goals/constraints.

Anyways, there's slew of options and truly any solution is good to me if:

  1. Pinned and exact-match deps keep being the default (e.g. heavily encouraged)
  2. No hardcoded domains in the Deno runtime

There's tons of branching options, but here's a kind of base solution where:

  • backwards compat is fully maintained (old http imports still work)
  • its easy for devs to shallowly and deeply substitute versions a module by name (similar to package.json)
  • the deno runtime can get: a semver number, license, patch info, and any version range information for a module

Lets start with how the Deno runtime will identify semver imports. Deno 1.0 already runs a regex on imports so it seems natural that Deno 2.0 would extend it. And to make it safe (no false-positives) we can require a prefix or postfix indicator:

// one possible regex pattern, first match is used
// this is where the runtime gets a module name and semver version
/([a-zA-Z_][a-zA-Z_0-9]+)@(\d+\.\d+\./\d+)/

// postfix example using URL anchor's
import "https://x.nest.land/[email protected]/#?pkg"
import "https://x.nest.land/[email protected]/#?pkg&hash=309424"

// prefix example
import "pkg:https://x.nest.land/[email protected]/"

// postfix using JSX-like directives
import "https://x.nest.land/[email protected]/"; /*@pkg*/
import "https://x.nest.land/[email protected]/"; /*@pkg, @hash=309424 */

Once an import is confirmed to be a semver import, then Deno looks for an endpoint of deno.jsonc immediately after the semver match. E.g. the runtime fetches https://x.nest.land/[email protected]/deno.jsonc. If it doesn't exist or if it is malformed, its a parsing/import error. The deno.jsonc provides the license, version ranges, patches, and source map. For version ranges, there's so many viable http-generic formats, I'm just going to pull one off the my head:

{
    "ranges": {
       "https://deno.land/std@": "0.198._",
       "https://x.nest.land/eggs@": "1.1._",
       "https://x.nest.land/eggs@": "1.1.2<->1.1.10", 
       "https://x.nest.land/lodash@": ["1.1.2", "1.1.4", "1.1.6"], 
     }
}

This provides the runtime with any information needed for overriding ranges.

Substitution is still easy. Lets say the dev wants to deeply make all mysql point to 4.1.1.

{
    "deepOverride": {
        "mysql": "https://deno.land/x/[email protected]",
    },
}

So long as recursive lodash deps are from a single source (e.g. deno.land/x/mysql) then the runtime has no issue. However, if there is static semver import like nest.x.land/[email protected] then during parse, deno can halt and require that the top level deno.jsonc to be more specific. E.g. have it require:

{
    "deepOverride": {
        "deno.land/mysql": "https://deno.land/x/[email protected]",
        "nest.x.land/mysql": "https://deno.land/x/[email protected]",
    },
}
@jeff-hykin
Copy link
Author

jeff-hykin commented Aug 16, 2023

what are some passionate observations

Yeah 😅. I mean hey, I hope the passion comes as a compliment too. You won't see me resisting a feature on some trash like C++ even if it was actually meant as an April fools joke.

My "you were supposed to destroy PM's" feeling was mostly that; a dissapointment (not critisim). And in terms of exploring the dark side; I will say I was pretty impressed when I tried my newly found npm + package.json support on fkill-cli(6.8K stars, fairly large/complex dependency tree) and it mostly "just worked". I did have to use my own bundler to get it to fully work and to enhance the original implementation (so deno bundle missing still hurts) but still impressive.

And at the end of the day, I can remind myself:

  1. There's still the opportunity that hopefully, one day, maybe deno 7.0, node: and npm: will get deprecated from lack of use and Deno can go back to being a more purist runtime.
  2. Roll your JavaScript runtime is still always an option

(to any encountering this gist)

I'd bet $20 the only people reading this are people from the team. Which I think is good, in the future I may DM the team the gist since I don't think anyone really needs my, uh, passionate critisim detracting from their perception of deno.

would avoid jumping to that many conclusions about how this will all work

Great 😁 I'm just going to assume version ranges are used well until proven otherwise


availability of servers

Regarding the relationship between a registry and the availability of servers: any server can go down, but we can have more control over a central registry (like we do today with deno.land/x which we do generally trust). What we've observed in practice is that dependencies that live on infrastructure that isn't as actively monitored are much less reliable. We've seen this negatively impact the DX and reliability of dependency management in Deno today.

Idk if this is a cultural thing, as other Deno team responses seem to similarly fail-to-justify/argue-a-point, but come on,

  • it doesn't answer any of the challenges that were presented, like "Why does a package manager [or index] make deno.land/r more reliable than deno.land/x?"
  • and it follows that up with 3 disjointed facts that nobody disagrees with
    • central package managers have more control? Sure, Obviously
    • infrastructure that isn't as actively monitored => unreliable? Again, okay? yeah?
    • this negatively impacts the DX? Yes, of course unreliability feels bad. And?

My speech and debate teacher would be dissapointed "Where is the Therefore??"

If we must justify something that people don't like, we should say it clearly so that it only needs to be said once.

It's not just me, I was on this deno thread right before checking my notifications:

Screen Shot 2023-08-15 at 10 48 55 PM

Here's the kind of response I would hope for (please correct these answers if you think they're inaccurate)

Sure I'll address some of your concerns
What can deno.land/r theoretically do to increase reliability (that cannot be done on deno.land/x)?

  • Technically speaking nothing; deno.land/r was created for curation/semver reasons not reliability reasons.

How will Deno.land/r increase the reliability of packages hosted on currently-not-as-reliable sites?

  • It won't. I mean maybe automatic mirroring would be possible, but we don't have any plans to do that.

How will Deno.land/r reduce the future quantity/frequency of packages published on not-as-reliable sites?

  • This is where we think we can improve. We want to heavily discourage people from publishing on/importing from unreliable sites, as it degrades the average deno dev experience. One of the ways we can do that is by baking syntax shortcuts right into the runtime that map to deno.land/r, followed by giving deno.land/r an SLA to validate its reliabilty. In turn, this favoritism should effectively divert would-be-unreliable packages to deno.land/r, which improves the average deno experience.

Still kind of a weak argument, I'd probably drop it from the "big 3" in favor of the curation argument, but its better justification than what I initially gleaned from the talk.

And, if those answers are accuate, then I'll assume my merkle tree solutions probably wasn't worth to write up.


Curation

We wouldn't want to actively curate and approve modules in most cases

Ah, I think that's kinda unforunate, but I did assume that was the case during the talk.

explicitly retain the ability to intervene

I do think the explicit is good, and I appreciate it. Technically there's nothing stopping deno.land/x from being curated, but its really nice to see team honoring the kind of implicit "we won't delete your module just cause it hasn't be used".


Dpm

a standalone product (like a "dpm") - a central design goal with Deno is to provide critical tooling together in a single package

(unless its bundle -- I jest) This is actually big news! I did not realize AT ALL from the talk that the deno binary itself would be the dpm command-equivlent.


De-dup

Especially since it wasn't mentioned, and there's so many viable solutions. I'd probably remove this from the justifying-arguments as well. Maybe replace it with the licensing or semver argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment