I appreciate the discussion, and I'll be eagerly awaiting Ryan's talk. @kevinwhinnery I'm really glad my response wasn't taken too negatively. I want these conversations to be productive, and, when changes don't negatively impact the ecosystem I generally try to avoid complaining about them (like deno KV). And feel free to bluntly correct/argue/criticize any of my points.
without some kind of semver aware central package manager, I don’t know how you solve duplicate and disappearing dependencies
Oh, that seems like the easy part for me, which does make wish I had the opportunity to present/discuss them before the team did all this work on a centralized package manager. I would like to start off with a bit more feedback before I get to that.
I like the deno team, and I would trust my best friend with my life. But if my best friend told me "Trust me bro, Github servers might go down, but MY server is reliable I've got this fancy SLA that says so. So just make everything depend on my server and only my server; that'll solve your reliability problem" I'd laugh him out of the room. I don't want to distance myself from the team, so please try to imagine standing alongside me and look at this argument as a 3rd party; "URLs (like deno.land/x which makes up the majority of Deno dependencies) sometimes disappear. Therefore we are strongly encouraging everyone to use deno.land/r, and are going to show favoritism towards that specific URL". How does that make any sense? Why does a package manager make deno.land/r more reliable than deno.land/x? Why not just say "we're going to give deno.land/x an SLA, so publish your modules there"? I mean deno vendor partially addresses disappearing URLs, deno bundle used to partially address the issue, source maps and patching (<-my favorite part of the talk) partially address the issue, but centralization and a package manager? How does it even relate?
(is this just me or did you feel similarly on this^ @arsh, @AapoAlas, @ anyone ?)
If deno announced deno.land/r as a curated version alongside of the un-curated deno.land/x (instead of saying it would replace deno.land/x) I would be excited. And not just 'exited' but I would be like "dang, they did it again 🎉 I better start writing my draft to all the other packaging systems telling them they need to follow suit".
What makes that^ so different from the actual announcement is; I instantly know which half of my modules I would put on /r (like deno-tree-sitter) and which half STILL belong on /x (like binaryify). Even just recently @arsh's es-codec was something I said should be a PR on std to make it "more official" and he argued it didn't really belong in std; the real solution is es-codec belongs on a curated list like /r. My python packages, my VS Code extensions, my node modules, and ruby gems, all of them nicely fit into either an official/maintained group or a small-project/experimental/unofficial group.
And again I'm curious if that's just me or if you feel the same way @arsh, @AapoAlas, @ anyone.
In terms of DX, deno.land is already the de-facto way to find deno modules. I love that deno.land search results have a stamp for official modules/std. If deno.land also had a big "Curated" stamp (or the opposite; a big warning stamp on non-curated modules) IMO that 100% deliver on the experience of "hey, go install X" => one official X to install. And again, no hard-coded urls in the runtime, no need to tell projects to have a deno.jsonc. Only modules published in the deno-registry would need a deno.jsonc as a metadata file for licensing, semver, version-ranges, etc.
For talking about a better DX around "hey, go install X", like making it feel easy and de-facto, it seems like the best part was missed. "hey, go install X" => npm install X
, pip install X
, gem/cargo/poetry/yarn/hex install X
I mean do a discord poll; "would you like a deno official module that installs a cli tool for helping updated/install/patch http modules?" I think the overwhelming answer would be yes. I don't think its bad that dpm
is missing, but if the team truly wants a better more de-facto feeling DX, take the centralized hardcoded stuff out of the runtime and put it into dpm install X
. The whole reason it's problematic is that the change is the RUNTIME.
Also why call it a package manager if dpm
isn't going to be a thing? Isn't that a package index/repository not a manager? Just seems weird.
While I have faith in the team to succeed at re-inventing something that almost all other teams have failed at, I am concerned about version ranges (and very concerned by phrases like "similar semver"). Pinned versions, and them being so heavily encouraged, are THE reason I use Deno. If python exclusively used Deno-style imports, I wouldn't even be touching Deno. What I love about Deno 1.0 is that IMO it does TheRightThing™ even when it hurts, even when it means one script needs 110 versions of lodash that only differ by patch number, Deno doesn't cut corners, it does its duty to be as faithful as possible and gets all 110 versions of lodash. Module authors saying "well my module Should™ work with v2.x" doesn't mean anything. I have no doubt in my mind that Deno modules JustWork™ precisely because the author of [email protected] used dep [email protected], and 99.9% of users of [email protected] also used it with [email protected], one module == one experience. Version ranges for python, npm, and ruby have caused me nothing short of months of pain. One module version == "thanks for the GH issue, but can't reproduce, what version are you using for numpy and matplotlib?". If Deno version ranges are ONLY used to enhance patching/source maps/updates, then I'm onboard and excited, it would be nice to know the authors expectation.
But if Deno 2.0 is going to, by default, try to save storage space by cutting corners just "[email protected]" Should™ work for Y, then I just lost my #1 reason for loving the deno ecosystem. Losing that on top of secondary reasons like decentralization, EverythingIncluded™ (e.g. deno bundle), and Minimalist (e.g. the addition of Deno KV) is a really heavy blow to my favorite language/runtime.
And by better I mean
- better uptime
- even less duplication
- better security against changing http endpoints
- faster downloads
- faster runtimes
- all while preserving a good DX, version ranges, etc. All we need is merkle trees and some basic tools, and I'm happy to put a lot of work explaining/applying these tools to Deno.
I'm going to start a thread since there's a lot to address with the solution
Let's start with #2, because as the team knows, solving disappearing endpoints (#1) requires dup-detection. (@ others, because, if claims to be a mirror of we need dup-checking to know if they're actually providing a duplicate or providing something malicious instead)
To start off, consider adding a file hash to URLs (or as @crowlKats mention to me, leveraging the lock file), not only does it solve the basic dup-detection problem (same hash = duplicate), but we can already do it with Deno 1.0 by adding anchors to the existing URLs, e.g import "https://deno.land/std/log/mod.ts#hash=a2b6d4f9"
or even jsx-like annotationimport "https://deno.land/std/log/mod.ts"/* @hash=a2b6d4f9 */
, but common Deno dependency tools like udd
and trex
can do this automatically without developers needing to do anything (no-workflow-change required is pretty fantastic DX). I would love to see Deno 2.0 add a warning, similar to an un-pinned version imports, that warns that a content-hash wasn't, and even better, throw an error if the content hash was different from what was actually provided by the url (e.g. content change).
That's just the tip of the iceberg, Eszip and merkle trees can get way better de-dup, and even faster runtimes. We can do a recursive hash similar to how eszip does a recursive retrieval of sources, every source file is then broken up into chunks (e.g. leaf nodes). For example, IPFS already does this using fixed byte-length chunks, but since this is exclusively hashing JS/TS we can do WAY better; 1 chunk per top-level statement (based on an initial AST passover, like the one ready performed by eszip). From there de-dup is trivial; store the chunks in any content addressed storage (CAS) and all duplication is eliminated. A efficient CAS can be anything from an insanely large-scale cloud object storage (like amazon S3) to something as tiny as a single in-memory hashmap. And that (the hashmap) is where de-dup helps the runtime. When the deno is loading dependencies, it wouldn't need to read all the imported files from deno.land/x/[email protected] and the imported files from esm.sh/[email protected], because, by the time it finishes reading deno.land/x/lodash, all the lodash chunks for esm.sh/lodash are already in the hashmap! I could even past lodash files into a repo, load that repo, and it would still automatically de-dup everything my repo had in common with esm.sh/lodash. Runtime isn't the only thing either, eszip should be to compress binaries even further.
Merkle tree based de-dup compared to semver-based is like comparing Git version management to SVM.
Now that we've established a killer de-dup system, solving #1 without a centralized package manager is easy. I think we can all agree decentralized systems (either federalized like email/git or peer-to-peer like torrents), are the undisputed heavyweight champs of resiliency. When powerful governments WANT a url/content to disappear, the content isn't on some website with a measly SLA, its the first magnet link on google.
Once URLs include a hash of the content, automated safe, fast, peer-to-peer sharing becomes an option. While having it as a backup option effectively eliminates the disappearing URL problem as best as theoretically possible, it also can be used as a download accelerator. Consider, how many times is one or more of the dependencies we need already on one of our coworker's PC (sitting on the same local network)? Especially for beefy dev machines, an opt-in system for local network transfers could be both a better DX and reduce the load on a central server (e.g. lower cost, more sharding/redundancy, SLA less expensive to maintain). The announcement could be that the team makes all files from deno.land/x available on IPFS. I'd be so excited by Deno 2.0 checking file hashes and falling back on IPFS that I'd update the day the beta was available (instead of my usual wait 1-year after a stable release before installing).
I wish version ranges and more semver stuff was mentioned in the talk as it has big implications (e.g. it means the runtime needs to be aware of semver, which isn't the case for the other goals). If there's other not-mentioned goals well; I can't propose useful solutions if I don't the goals/constraints.
Anyways, there's slew of options and truly any solution is good to me if:
- Pinned and exact-match deps keep being the default (e.g. heavily encouraged)
- No hardcoded domains in the Deno runtime
There's tons of branching options, but here's a kind of base solution where:
- backwards compat is fully maintained (old http imports still work)
- its easy for devs to shallowly and deeply substitute versions a module by name (similar to package.json)
- the deno runtime can get: a semver number, license, patch info, and any version range information for a module
Lets start with how the Deno runtime will identify semver imports. Deno 1.0 already runs a regex on imports so it seems natural that Deno 2.0 would extend it. And to make it safe (no false-positives) we can require a prefix or postfix indicator:
// one possible regex pattern, first match is used
// this is where the runtime gets a module name and semver version
/([a-zA-Z_][a-zA-Z_0-9]+)@(\d+\.\d+\./\d+)/
// postfix example using URL anchor's
import "https://x.nest.land/[email protected]/#?pkg"
import "https://x.nest.land/[email protected]/#?pkg&hash=309424"
// prefix example
import "pkg:https://x.nest.land/[email protected]/"
// postfix using JSX-like directives
import "https://x.nest.land/[email protected]/"; /*@pkg*/
import "https://x.nest.land/[email protected]/"; /*@pkg, @hash=309424 */
Once an import is confirmed to be a semver import, then Deno looks for an endpoint of deno.jsonc
immediately after the semver match. E.g. the runtime fetches https://x.nest.land/[email protected]/deno.jsonc
. If it doesn't exist or if it is malformed, its a parsing/import error. The deno.jsonc provides the license, version ranges, patches, and source map. For version ranges, there's so many viable http-generic formats, I'm just going to pull one off the my head:
{
"ranges": {
"https://deno.land/std@": "0.198._",
"https://x.nest.land/eggs@": "1.1._",
"https://x.nest.land/eggs@": "1.1.2<->1.1.10",
"https://x.nest.land/lodash@": ["1.1.2", "1.1.4", "1.1.6"],
}
}
This provides the runtime with any information needed for overriding ranges.
Substitution is still easy. Lets say the dev wants to deeply make all mysql point to 4.1.1.
{
"deepOverride": {
"mysql": "https://deno.land/x/[email protected]",
},
}
So long as recursive lodash deps are from a single source (e.g. deno.land/x/mysql) then the runtime has no issue. However, if there is static semver import like nest.x.land/[email protected]
then during parse, deno can halt and require that the top level deno.jsonc to be more specific. E.g. have it require:
{
"deepOverride": {
"deno.land/mysql": "https://deno.land/x/[email protected]",
"nest.x.land/mysql": "https://deno.land/x/[email protected]",
},
}
Hey there - thanks again for taking the time to share your thoughts. Here are a few high-level reactions:
deno:mysql
module that pointed to an unmaintained or misleading module, we'd want to prevent that. Or if there was adeno:stripe
(for example) module that wasn't an official client for the Stripe API, that might negatively impact developer experience as well.