I reported a security flaw to npm on 13 April 2024. The security flaw itself is not particularly serious, and as far as I know has never been exploited, but the underlying problem does manifest quite often in the wild as extremely unexpected behaviour when developers install packages using very recent versions of npm, or release packages using any npm-compatible tools.
When I reported this, npm didn't provide a particularly satisfactory response or pay me a bounty, and I think three months is plenty of time for them to have fixed the problem, so I'm documenting it here. Since this problem does come up in the wild fairly often, I want to be able to point developers to a page that explains what's going on.
I haven't checked if npm have done anything to fix or mitigate this problem, but from reports from other developers it appears that they have not. The npm repository itself is affected, and potentially any tools that consume packages from the npm repository are also affected.
The npm repository stores two sets of metadata for every package in the repository, and it is possible for the two to be inconsistent. The first set of metadata is the package.json
file that is stored in the package tarball itself. The second set of metadata is stored in the npm database. Package metadata stored in the npm database is supposed to be a strict subset of the metadata stored in package.json
, including the scripts
field, but it is possible for the two to differ. There is no agreement on which set of metadata is the source of truth.
In particular, the npm command line tool is internally inconsistent about which set of metadata is considered the source of truth. For the scripts
field, npm v10.3.x and earlier treat package.json
as the source of truth, but v10.4.0 and later treat the npm database as the source of truth. This means that when a developer installs the same package using two different versions of npm, it is possible that each installation runs different scripts as part of the installation process.
Tools publish packages to the npm repository by sending a PUT
request to npmjs.com
. This PUT
request contains package metadata in the form of a JSON document and, separately, the package tarball itself.
When the npm repository receives a valid PUT
request, it writes the provided package metadata into the database, and stores the package tarball in the repository as-is. Crucially, the npm repository does not check that the scripts
field in the package metadata matches the script field in package.json
inside the provided tarball. (It may or may not check some of the other fields, I have not tested.)
Obviously, it is possible to craft a PUT
request that intentionally creates a package with metadata that differs between the database and the package tarball. However, a much more common scenario occurs when developers accidentally engineer this situation using common tools that publish to npm.
Simplified, the publishing process using most tools (including npm and yarn) works like this:
- Tool reads
package.json
to gather package metadata. - Tool runs the
prepack
script if present. - Tool generates a tarball for the package (including
package.json
, which is read from the disk again during this step). - Tool makes a
PUT
request tonpmjs.com
containing:- The package metadata gathered in step 1.
- The package tarball created in step 3.
- Tool runs the
postpack
script if present.
The package metadata from step 1 is what goes in the npm database, and the package tarball from step 3 is what tools will download when a developer installs the package.
Crucially, if the prepack
script (step 2) modifies package.json
, then the package metadata gathered in step 1 will differ from the metadata that actually goes into the tarball in step 3. It is quite common for developers to do this intentionally to prepare package.json
for publishing, not realising that this inconsistency will result. For example, the tool pinst is widely used and is specifically intended to do this.
The most common outcome of this problem that I have personally seen is that some packages unintentionally contain a postinstall
script in their metadata on the npm database. This happens when the package source contains a postinstall
script that is only intended for use during development, and the developer runs a tool like pinst in their prepack
script intending for the postinstall
script to be removed before publishing. In this case, when another developer installs the package, if they are using npm 10.4.0 or later, the postinstall
script will be run even though it is not contained in the package tarball, causing unexpected behaviour.
You can query the npm database for a particular package's scripts from the command line using npm view
. For example, pinst itself suffers from this problem as of v3.0.0:
npm view [email protected] scripts
{
test: 'jest',
lint: 'eslint .',
fix: 'npm run lint -- --fix',
postinstall: 'husky install',
preversion: 'npm test && npm run lint',
postversion: 'git push && git push --tags && npm publish',
prepack: 'node bin --disable',
postpack: 'node bin --enable'
}
Notice the postinstall
script. That shouldn't be there, and if you download the corresponding pinst tarball, you will find it has been renamed to _postinstall
as intended.
The npm view
command has existed in npm for a long time and it will always query the database, and never the contents of the package tarball. So for npm v10.4.0 and later, npm view {pkg} scripts
will tell you what scripts npm will actually run. But for npm v10.3.x and earlier, it will just tell you what's in the database, which differs from what those versions of npm will actually run when installing the package.
An attacker can exploit this flaw to obfuscate a malicious script that will be run when installing their package, for example a malicious postinstall
script. The attacker must control an npm package that will be installed by third parties.
The attacker assumes that developers are less likely to notice a malicious postinstall
script if it is stored in the npm database but not contained within the package tarball. The attacker also accepts that the attack will only affect the subset of developers who use npm 10.4.0 or later (this may be considered an advantage, since an attack that affects a small subset of developers is also less likely to be noticed).
The attacker prepares a tarball for their package that either does not contain a postinstall
script, or that contains an innocent postinstall
script. The attacker then crafts a PUT
request to npmjs.com
that publishes the innocent package tarball, but contains a malicious postinstall
script within the metadata.
Developers and security researchers who download and inspect the package tarball will find nothing malicious and may conclude that it is safe. Only those who inspect the npm database itself will find evidence of a malicious script, but relatively few developers or security researchers will know to do this.
When a developer installs the package using npm 10.4.0 or later, npm will run the malicious postinstall
script. The script may then perform arbitrary actions (install malicious software, exfiltrate private data, etc.) with the permisssions of the developer.
Isn't this publishing a different manifest with extra steps?
https://blog.vlt.sh/blog/the-massive-hole-in-the-npm-ecosystem
Not trying to downplay what you found, but this feels eerily familiar.