Skip to content

Instantly share code, notes, and snippets.

@Grawl
Last active May 6, 2022 05:24
Show Gist options
  • Save Grawl/358f7b926d7466336caf66cc4dda829c to your computer and use it in GitHub Desktop.
Save Grawl/358f7b926d7466336caf66cc4dda829c to your computer and use it in GitHub Desktop.
Why you don't have to commit node_modules folder

English | Russian

This note is an answer to questions you can guess reading Jack Franklin blog post.

No need for npm installs

Once you check your node_modules in, there's no need to run an install step before you can get up and running on the codebase

  1. git works bad with a lot of files in repository. You can search “git performance many files” in Google and see a lot of info about this. For example: Just as git does not scale well with large files, it can also become painful to work with when you have a large number of files

  2. Some packages are platform dependent. For example, development tools, such as dart-sass.

  3. If you commit node_modules it means any developer can change any dependency with ease (it's called “monkey patching”), and this definitely will lead to a problem: when you will update this changed dependency, old changes will be lost, and you have to solve that. You never will be sure that dependency of certain version have the same code that you got initially.

This isn't just useful for developers locally, but a big boost for any bots you might have running on a Continuous Integration platform (e.g. CircleCI, GitHub Actions, and so on). That's now a step that the bots can miss out entirely.

Usually, CI is configured to cache dependencies to not download them all each time. You can google it with something like “ci node_modules cache”.

Guaranteed replicated builds

Having your node_modules checked in guarantees that two developers running the code are running the exact same code with the exact same set of dependencies

This is a work for lock file — a file you have to commit in which your package manager (NPM/PNPM/Yarn) writes all needed info for each downloaded dependency for guaranteed replicated build.

If you open yarn.lock you can see something like this:

"@apideck/better-ajv-errors@^0.2.4":
  version "0.2.5"
  resolved "https://registry.yarnpkg.com/@apideck/better-ajv-errors/-/better-ajv-errors-0.2.5.tgz#b9c0092b7f7f23c356a0a31600334f7b8958458b"
  integrity sha512-Pm1fAqCT8OEfBVLddU3fWZ/URWpGGhkvlsBIgn9Y2jJlcNumo0gNzPsQswDJTiA8HcKpCjOhWQOgkA9kXR4Ghg==
  dependencies:
    json-schema "^0.3.0"
    jsonpointer "^4.1.0"
    leven "^3.1.0"

Yarn carefully recorded that package @apideck/better-ajv-errors is downloaded with:

  • version 0.2.5
  • by address resolved (direct link to .tgz)
  • hashsum was sha512-Pm1fAqCT8OE...
  • there was 3 dependencies

And so on for each dependency in node_modules folder. Next time while yarn install will run in project directory, all dependencies will be downloaded using yarn.lock information, not package.json. Therefore all development team and CI, regardless of platform (Linux/macOS/Windows) have the same files, the same code, with the same hashsums.

Yes, this can be managed by a package-lock.json file, or other tools, but I've seen all of them slip up rarely or allow a slight variation in a minor version number that causes issues.

This mistake is often made when, when deploying a project, the developer runs npm install, which installs packages based on information from package.json, not package-lock.json. To install packages from the lock file, you need to run npm ci.

Better awareness of the code you're shipping

I've been surprised at how more aware I am of adding dependencies when the git diff shows me the entirety of the code that is being added to the project. This has lead us to make contributions to tools to help reduce their file size on disk and have a better awareness of the impact a dependency will have on our bundle size.

When choosing dependencies, you can use special tools, and not just read miles of code.

  • Bundlephobia

    It will show how much the dependency weighs, how much it will be with GZIP, how long it will be downloaded over slow 3G and medium 4G Internet, it will show the percentage of the composition of the sub-dependencies, what the dependency exports (if it is written in ES Modules), as well as what alternatives or neighboring packages it has. Here's example.

  • bundlejs.com

    It will show exactly how many kilobytes of code will be added when importing like

    import { map } from "nanostores"

    Look at nanostores example.

  • npm.anvaka.com

    It will show a graph of all dependencies in the form of a 2D or 3D graph. Look at Vue 3 example.

More consideration to adding a dependency because it's not invisible

I mentioned earlier that people see the noise in a git diff as a downside to adding dependencies to version control, and I do acknowledge that it can be a downside to this approach, but I've found that noise to often be a useful signal. Adding that one extra dependency because I don't want to write a few lines of code myself is something I used to do frequently - but now I'm much more considered because I can see the code that's being added and can reflect on if it's worth it.

You can read the code befure it added as a dependency to your project. GitHub repository for example. I strongly advise you to at least briefly look at the dependencies, the adequacy of the code, the number of open issues and the date of the last commit.

Note: this doesn't mean that we don't have dependencies! There are times where it is worth it to add a dependency - but seeing the code in version control has made me more considered about doing it - the cost is no longer invisible.

It was never been invisible.

You can manage the large diffs

There is no shying away from the fact that if a developer works on a change that adds a new dependency, there could be a lot of noise in the diff. One of our dependencies that we check in is TypeScript, and every time we update that, the git diff is huge and frankly not worth looking at (beyond the CHANGELOG). We've come up with a rule that helps us here: a change that updates node_modules may not touch any other code in the codebase. So if I update node_modules/typescript with its latest version, I will be warned by our tooling if any other folder outside of node_modules is changed.

You lead to a workaround.

This rule serves us well the majority of the time, because any work that relies on a new or updated dependency can be split into two changes:

  1. Update the dependency
  2. Use the dependency in the code

There are times where this doesn't work; updating TypeScript may require us to update some code to fix errors that the new version of TypeScript is now detecting. In that case we have the ability to override the rule.

And here are the consequences of using that workaround.

Protection from another left pad

The now infamous left_pad incident, where a popular npm package was removed from the repository all of a sudden, causing builds everywhere to break, would not have impacted a team who checked all their dependencies into git. They would still have to deal with the long term impact of "what do we do with this now unsupported dependency", but in the short term their builds wouldn't break and they wouldn't be blocked on shipping new features.

I remember that day when left_pad was removed from NPM. I worked in a digital agency on websites, and, of course, in all the projects for which I was responsible, left_pad was a sub-dependency. We solved this problem then in about half an hour when CI showed 404 when trying to download the package. I don't remember what exactly we did, but such a task should not be a challenge and a reason to make workarounds.

In the end, to protect your projects against exactly such problems, you can raise your proxy registry, for example, using Verdaccio. It will keep all copies of all downloaded packages.

@Grawl
Copy link
Author

Grawl commented Apr 4, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment