English | Russian
This note is an answer to questions you can guess reading Jack Franklin blog post.
Once you check your
node_modules
in, there's no need to run an install step before you can get up and running on the codebase
-
git works bad with a lot of files in repository. You can search “git performance many files” in Google and see a lot of info about this. For example: Just as git does not scale well with large files, it can also become painful to work with when you have a large number of files
-
Some packages are platform dependent. For example, development tools, such as
dart-sass
. -
If you commit
node_modules
it means any developer can change any dependency with ease (it's called “monkey patching”), and this definitely will lead to a problem: when you will update this changed dependency, old changes will be lost, and you have to solve that. You never will be sure that dependency of certain version have the same code that you got initially.
This isn't just useful for developers locally, but a big boost for any bots you might have running on a Continuous Integration platform (e.g. CircleCI, GitHub Actions, and so on). That's now a step that the bots can miss out entirely.
Usually, CI is configured to cache dependencies to not download them all each time. You can google it with something like “ci node_modules cache”.
Having your
node_modules
checked in guarantees that two developers running the code are running the exact same code with the exact same set of dependencies
This is a work for lock file — a file you have to commit in which your package manager (NPM/PNPM/Yarn) writes all needed info for each downloaded dependency for guaranteed replicated build.
If you open yarn.lock
you can see something like this:
"@apideck/better-ajv-errors@^0.2.4":
version "0.2.5"
resolved "https://registry.yarnpkg.com/@apideck/better-ajv-errors/-/better-ajv-errors-0.2.5.tgz#b9c0092b7f7f23c356a0a31600334f7b8958458b"
integrity sha512-Pm1fAqCT8OEfBVLddU3fWZ/URWpGGhkvlsBIgn9Y2jJlcNumo0gNzPsQswDJTiA8HcKpCjOhWQOgkA9kXR4Ghg==
dependencies:
json-schema "^0.3.0"
jsonpointer "^4.1.0"
leven "^3.1.0"
Yarn carefully recorded that package @apideck/better-ajv-errors
is downloaded with:
- version
0.2.5
- by address
resolved
(direct link to.tgz
) - hashsum was
sha512-Pm1fAqCT8OE...
- there was 3 dependencies
And so on for each dependency in node_modules
folder. Next time while yarn install
will run in project directory, all dependencies will be downloaded using yarn.lock
information, not package.json
. Therefore all development team and CI, regardless of platform (Linux/macOS/Windows) have the same files, the same code, with the same hashsums.
Yes, this can be managed by a package-lock.json file, or other tools, but I've seen all of them slip up rarely or allow a slight variation in a minor version number that causes issues.
This mistake is often made when, when deploying a project, the developer runs npm install
, which installs packages based on information from package.json
, not package-lock.json
. To install packages from the lock file, you need to run npm ci
.
I've been surprised at how more aware I am of adding dependencies when the git diff shows me the entirety of the code that is being added to the project. This has lead us to make contributions to tools to help reduce their file size on disk and have a better awareness of the impact a dependency will have on our bundle size.
When choosing dependencies, you can use special tools, and not just read miles of code.
-
It will show how much the dependency weighs, how much it will be with GZIP, how long it will be downloaded over slow 3G and medium 4G Internet, it will show the percentage of the composition of the sub-dependencies, what the dependency exports (if it is written in ES Modules), as well as what alternatives or neighboring packages it has. Here's example.
-
It will show exactly how many kilobytes of code will be added when importing like
import { map } from "nanostores"
-
It will show a graph of all dependencies in the form of a 2D or 3D graph. Look at Vue 3 example.
I mentioned earlier that people see the noise in a git diff as a downside to adding dependencies to version control, and I do acknowledge that it can be a downside to this approach, but I've found that noise to often be a useful signal. Adding that one extra dependency because I don't want to write a few lines of code myself is something I used to do frequently - but now I'm much more considered because I can see the code that's being added and can reflect on if it's worth it.
You can read the code befure it added as a dependency to your project. GitHub repository for example. I strongly advise you to at least briefly look at the dependencies, the adequacy of the code, the number of open issues and the date of the last commit.
Note: this doesn't mean that we don't have dependencies! There are times where it is worth it to add a dependency - but seeing the code in version control has made me more considered about doing it - the cost is no longer invisible.
It was never been invisible.
There is no shying away from the fact that if a developer works on a change that adds a new dependency, there could be a lot of noise in the diff. One of our dependencies that we check in is TypeScript, and every time we update that, the git diff is huge and frankly not worth looking at (beyond the CHANGELOG). We've come up with a rule that helps us here: a change that updates node_modules may not touch any other code in the codebase. So if I update node_modules/typescript with its latest version, I will be warned by our tooling if any other folder outside of node_modules is changed.
You lead to a workaround.
This rule serves us well the majority of the time, because any work that relies on a new or updated dependency can be split into two changes:
- Update the dependency
- Use the dependency in the code
There are times where this doesn't work; updating TypeScript may require us to update some code to fix errors that the new version of TypeScript is now detecting. In that case we have the ability to override the rule.
And here are the consequences of using that workaround.
The now infamous left_pad incident, where a popular npm package was removed from the repository all of a sudden, causing builds everywhere to break, would not have impacted a team who checked all their dependencies into git. They would still have to deal with the long term impact of "what do we do with this now unsupported dependency", but in the short term their builds wouldn't break and they wouldn't be blocked on shipping new features.
I remember that day when left_pad
was removed from NPM. I worked in a digital agency on websites, and, of course, in all the projects for which I was responsible, left_pad
was a sub-dependency. We solved this problem then in about half an hour when CI showed 404 when trying to download the package. I don't remember what exactly we did, but such a task should not be a challenge and a reason to make workarounds.
In the end, to protect your projects against exactly such problems, you can raise your proxy registry, for example, using Verdaccio. It will keep all copies of all downloaded packages.
copied this note to https://dev.to/grawl/why-you-dont-have-to-commit-nodemodules-folder-33nm