The following is an altenative way to do dependency analysis for Node projects but using SQL.
Note
This project is not intended to be comprehensive by any means but is simply a demonstration of what is possible.
The approach centers around reading the dependency tree using the the output of the npm ls
command and loading it
into SQLite.
npm ls --all --production --json > dumps.json
sqlite3 mydatabase.db
By default SQLite doesn't format the results in a very readable way. You will need to issue the following directives to ensure that the output is displayed in a tabulated form.
.headers ON .mode columns .width 50 50 30 50 50 .load http0 .load fileio
First need to save the dependency hierarchy to a file. The npm ls
command uses the package-lock.json under the hood but parsing the package-lock.json
is rather difficult.
Since all the metadata is available from the NPM registry, we can simply rely on the output of the npm ls
command. It contains both the installed versions and the hierarchy.
.shell npm ls --all --omit=dev --json > node_modules/.ls.json
We first load the JSON into a table using the READFILE
directive. This simply slurps the entire JSON blob without any
transformations.
Once the JSON is loaded, we can query it using the commands.
DROP
VIEW IF EXISTS deps_view
;
CREATE
VIEW deps_view
AS WITH RECURSIVE
deps_tree(id, name, version, parent, json_data, root_id)
AS
( -- Anchor: Start with top-level dependencies
SELECT '/' || JSON_EACH.key || '@' || JSON_EXTRACT(JSON_EACH.value, '$.version') AS id
, JSON_EACH.key AS name
, JSON_EXTRACT(JSON_EACH.value, '$.version') AS version
, NULL AS parent
, JSON_EACH.value AS json_data -- Store the entire JSON object for later use
, '/' || JSON_EACH.key || '@' || JSON_EXTRACT(JSON_EACH.value, '$.version') AS root_id
FROM packages
, JSON_EACH(JSON_EXTRACT(READFILE('dumps.json'), '$.dependencies'))
UNION
ALL
-- Recursive step: Traverse nested dependencies
SELECT deps_tree.id || '/' || JSON_EACH.key || '@' || JSON_EXTRACT(JSON_EACH.value, '$.version') AS id
, JSON_EACH.key AS name
, JSON_EXTRACT(JSON_EACH.value, '$.version') AS version
, deps_tree.id AS parent
, JSON_EACH.value AS json_data
, deps_tree.root_id
FROM deps_tree
JOIN JSON_EACH(JSON_EXTRACT(deps_tree.json_data, '$.dependencies')) -- Extract nested dependencies
WHERE JSON_VALID(JSON_EXTRACT(deps_tree.json_data, '$.dependencies')) -- Ensure there are nested dependencies
)
-- Final output: Select dependencies with their versions, parent, id, and scope
SELECT id
, name
, version
, parent
, CASE WHEN name LIKE '@%'
THEN SUBSTR(name, 1, INSTR(name, '/') - 1) -- Extract scope
ELSE NULL
END AS scope
, root_id
FROM deps_tree
ORDER
BY parent
, name
;
--Print all
SELECT *
FROM deps_view
;
Scoped packages often - not a rule-of-thumb, but often - require consistent versions. Having multiple versions of the same dependency can lead to issues later.
This query lists all scoped packages where multiple versions exist.
--Show me all scoped packages which have multiple versions
-- Query to find all dependencies with multiple distinct versions where scope is not null
SELECT name
, GROUP_CONCAT(version, ', ') AS versions
FROM
( SELECT name
, version
FROM deps_view
WHERE scope IS NOT NULL -- Filter for dependencies that are scoped
GROUP
BY name
, version -- Ensure distinct versions for each package
)
AS distinct_versions
GROUP
BY name
HAVING COUNT(version) > 1
;
Here's a project that provides this functionality https://www.npmjs.com/package/find-duplicate-dependencies
Similar to npm outdated
, this command will list all the root depenencies that are outdated. It does
not care about the version contraints and simply lists the latest availble version
WITH project_deps
AS
( SELECT name AS package_name
, version
FROM deps_view
WHERE parent IS NULL
ORDER
BY name ASC
)
, package_data
AS
( SELECT package_name
, HTTP_GET_BODY('https://registry.npmjs.org/' || package_name) AS response
FROM project_deps
)
SELECT pd.package_name
, npm_data.license AS license
, pd.version AS current_version
, npm_data.latest_version AS latest_version
FROM project_deps pd
JOIN
( SELECT pd.package_name
, JSON_EXTRACT(pd.response, '$.license') AS license
, MAX(JSON_EXTRACT(versions.value, '$.version')) AS latest_version
FROM package_data pd
JOIN JSON_EACH(JSON_EXTRACT(pd.response, '$.versions')) AS versions
GROUP
BY pd.package_name
) AS npm_data
ON pd.package_name = npm_data.package_name;
Here's some sample output
package_name license current_version latest_version
----------------- ------------ --------------- --------------
@langchain/openai MIT 0.2.5 0.3.5
dotenv BSD-2-Clause 16.4.5 9.0.2
zod MIT 3.23.8 4.0.0-beta.1
WITH deps_with_size
AS
(
SELECT dv.*
, JSON_EXTRACT(
HTTP_GET_BODY(
CONCAT_WS('/', 'https://registry.npmjs.org', dv.name, dv.version)), '$.dist.unpackedSize') AS file_size
FROM deps_view dv
)
SELECT dv.name
, dv.version
, dv.scope
, COALESCE(total_file_size, 0) AS total_file_size
FROM deps_view dv
LEFT
JOIN
(
SELECT root_id
, SUM(file_size) AS total_file_size
FROM deps_with_size
GROUP
BY root_id
)
AS aggregated_sizes
ON dv.id = aggregated_sizes.root_id
WHERE dv.parent IS NULL
;
name version scope total_file_size
----------------- ------- ---------- ---------------
@langchain/openai 0.2.5 @langchain 42232126
dotenv 16.4.5 79078
zod 3.23.8 666786