Skip to content

Instantly share code, notes, and snippets.

@djcsdy
Last active July 18, 2024 20:26
Show Gist options
  • Save djcsdy/2f5a287b3ba16f2a8f0312f45588e6ce to your computer and use it in GitHub Desktop.
Save djcsdy/2f5a287b3ba16f2a8f0312f45588e6ce to your computer and use it in GitHub Desktop.

npm scripts security flaw

I reported a security flaw to npm on 13 April 2024. The security flaw itself is not particularly serious, and as far as I know has never been exploited, but the underlying problem does manifest quite often in the wild as extremely unexpected behaviour when developers install packages using very recent versions of npm, or release packages using any npm-compatible tools.

When I reported this, npm didn't provide a particularly satisfactory response or pay me a bounty, and I think three months is plenty of time for them to have fixed the problem, so I'm documenting it here. Since this problem does come up in the wild fairly often, I want to be able to point developers to a page that explains what's going on.

I haven't checked if npm have done anything to fix or mitigate this problem, but from reports from other developers it appears that they have not. The npm repository itself is affected, and potentially any tools that consume packages from the npm repository are also affected.

Summary

The npm repository stores two sets of metadata for every package in the repository, and it is possible for the two to be inconsistent. The first set of metadata is the package.json file that is stored in the package tarball itself. The second set of metadata is stored in the npm database. Package metadata stored in the npm database is supposed to be a strict subset of the metadata stored in package.json, including the scripts field, but it is possible for the two to differ. There is no agreement on which set of metadata is the source of truth.

In particular, the npm command line tool is internally inconsistent about which set of metadata is considered the source of truth. For the scripts field, npm v10.3.x and earlier treat package.json as the source of truth, but v10.4.0 and later treat the npm database as the source of truth. This means that when a developer installs the same package using two different versions of npm, it is possible that each installation runs different scripts as part of the installation process.

Detail

Tools publish packages to the npm repository by sending a PUT request to npmjs.com. This PUT request contains package metadata in the form of a JSON document and, separately, the package tarball itself.

When the npm repository receives a valid PUT request, it writes the provided package metadata into the database, and stores the package tarball in the repository as-is. Crucially, the npm repository does not check that the scripts field in the package metadata matches the script field in package.json inside the provided tarball. (It may or may not check some of the other fields, I have not tested.)

Obviously, it is possible to craft a PUT request that intentionally creates a package with metadata that differs between the database and the package tarball. However, a much more common scenario occurs when developers accidentally engineer this situation using common tools that publish to npm.

Simplified, the publishing process using most tools (including npm and yarn) works like this:

  1. Tool reads package.json to gather package metadata.
  2. Tool runs the prepack script if present.
  3. Tool generates a tarball for the package (including package.json, which is read from the disk again during this step).
  4. Tool makes a PUT request to npmjs.com containing:
    • The package metadata gathered in step 1.
    • The package tarball created in step 3.
  5. Tool runs the postpack script if present.

The package metadata from step 1 is what goes in the npm database, and the package tarball from step 3 is what tools will download when a developer installs the package.

Crucially, if the prepack script (step 2) modifies package.json, then the package metadata gathered in step 1 will differ from the metadata that actually goes into the tarball in step 3. It is quite common for developers to do this intentionally to prepare package.json for publishing, not realising that this inconsistency will result. For example, the tool pinst is widely used and is specifically intended to do this.

The most common outcome of this problem that I have personally seen is that some packages unintentionally contain a postinstall script in their metadata on the npm database. This happens when the package source contains a postinstall script that is only intended for use during development, and the developer runs a tool like pinst in their prepack script intending for the postinstall script to be removed before publishing. In this case, when another developer installs the package, if they are using npm 10.4.0 or later, the postinstall script will be run even though it is not contained in the package tarball, causing unexpected behaviour.

You can query the npm database for a particular package's scripts from the command line using npm view. For example, pinst itself suffers from this problem as of v3.0.0:

npm view [email protected] scripts
{
  test: 'jest',
  lint: 'eslint .',
  fix: 'npm run lint -- --fix',
  postinstall: 'husky install',
  preversion: 'npm test && npm run lint',
  postversion: 'git push && git push --tags && npm publish',
  prepack: 'node bin --disable',
  postpack: 'node bin --enable'
}

Notice the postinstall script. That shouldn't be there, and if you download the corresponding pinst tarball, you will find it has been renamed to _postinstall as intended.

The npm view command has existed in npm for a long time and it will always query the database, and never the contents of the package tarball. So for npm v10.4.0 and later, npm view {pkg} scripts will tell you what scripts npm will actually run. But for npm v10.3.x and earlier, it will just tell you what's in the database, which differs from what those versions of npm will actually run when installing the package.

Exploit

An attacker can exploit this flaw to obfuscate a malicious script that will be run when installing their package, for example a malicious postinstall script. The attacker must control an npm package that will be installed by third parties.

The attacker assumes that developers are less likely to notice a malicious postinstall script if it is stored in the npm database but not contained within the package tarball. The attacker also accepts that the attack will only affect the subset of developers who use npm 10.4.0 or later (this may be considered an advantage, since an attack that affects a small subset of developers is also less likely to be noticed).

The attacker prepares a tarball for their package that either does not contain a postinstall script, or that contains an innocent postinstall script. The attacker then crafts a PUT request to npmjs.com that publishes the innocent package tarball, but contains a malicious postinstall script within the metadata.

Developers and security researchers who download and inspect the package tarball will find nothing malicious and may conclude that it is safe. Only those who inspect the npm database itself will find evidence of a malicious script, but relatively few developers or security researchers will know to do this.

When a developer installs the package using npm 10.4.0 or later, npm will run the malicious postinstall script. The script may then perform arbitrary actions (install malicious software, exfiltrate private data, etc.) with the permisssions of the developer.

@Dragas
Copy link

Dragas commented Jul 18, 2024

Isn't this publishing a different manifest with extra steps?

https://blog.vlt.sh/blog/the-massive-hole-in-the-npm-ecosystem

Not trying to downplay what you found, but this feels eerily familiar.

@djcsdy
Copy link
Author

djcsdy commented Jul 18, 2024

@Dragas Yes, it is. I wasn't aware that this was public knowledge. Certainly it isn't sufficiently widely known, considering there are widely used packages with metadata in the npm database that unintentionally differs from the metadata in the tarball.

The only new thing here is that, since npm v10.4.0, npm treats the npm database as the source of truth for scripts, whereas previously it treated the tarball as the source of truth for scripts. This is probably a result of npm trying to work towards closing the manifest confusion bug, but IMHO it just makes things worse. The npm repository still doesn't validate that the tarball matches the uploaded manifest, and nor (to my knowledge) do any tools that consume packages from the npm repository. I believe that most developers would expect the tarball to be the source of truth, and so IMHO this change just introduces a new attack vector for an already troublesome flaw.

Thank you for the link. It adds a lot of colour to what I found. If I had known about this I would have linked to it, shortened my own explanation substantially, and made what I found public a lot sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment