Skip to content

Instantly share code, notes, and snippets.

@mridang
Last active October 11, 2024 03:01
Show Gist options
  • Save mridang/54913bfa6db5c738a2a743ab125866cc to your computer and use it in GitHub Desktop.
Save mridang/54913bfa6db5c738a2a743ab125866cc to your computer and use it in GitHub Desktop.
Querying NPM packages using SQL

The following is an altenative way to do dependency analysis for Node projects but using SQL.

Note

This project is not intended to be comprehensive by any means but is simply a demonstration of what is possible.

The approach centers around reading the dependency tree using the the output of the npm ls command and loading it into SQLite.

npm ls --all --production --json > dumps.json 
sqlite3 mydatabase.db 

By default SQLite doesn't format the results in a very readable way. You will need to issue the following directives to ensure that the output is displayed in a tabulated form.

.headers ON
.mode columns
.width 50 50 30 50 50
.load http0
.load fileio

First need to save the dependency hierarchy to a file. The npm ls command uses the package-lock.json under the hood but parsing the package-lock.json is rather difficult.

Since all the metadata is available from the NPM registry, we can simply rely on the output of the npm ls command. It contains both the installed versions and the hierarchy.

.shell npm ls --all --omit=dev --json > node_modules/.ls.json

We first load the JSON into a table using the READFILE directive. This simply slurps the entire JSON blob without any transformations.

Once the JSON is loaded, we can query it using the commands.

DROP 
VIEW IF EXISTS deps_view
;

CREATE
  VIEW deps_view 
    AS   WITH RECURSIVE 
              deps_tree(id, name, version, parent, json_data, root_id) 
           AS 
            ( -- Anchor: Start with top-level dependencies
              SELECT '/' || JSON_EACH.key || '@' || JSON_EXTRACT(JSON_EACH.value, '$.version') AS id
                   , JSON_EACH.key AS name
                   , JSON_EXTRACT(JSON_EACH.value, '$.version') AS version
                   , NULL AS parent
                   , JSON_EACH.value AS json_data  -- Store the entire JSON object for later use
                   , '/' || JSON_EACH.key || '@' || JSON_EXTRACT(JSON_EACH.value, '$.version') AS root_id
                FROM packages
                   , JSON_EACH(JSON_EXTRACT(READFILE('dumps.json'), '$.dependencies'))

               UNION 
                 ALL

           -- Recursive step: Traverse nested dependencies
              SELECT deps_tree.id || '/' || JSON_EACH.key || '@' || JSON_EXTRACT(JSON_EACH.value, '$.version') AS id
                   , JSON_EACH.key AS name
                   , JSON_EXTRACT(JSON_EACH.value, '$.version') AS version
                   , deps_tree.id AS parent
                   , JSON_EACH.value AS json_data
                   , deps_tree.root_id
                FROM deps_tree
                JOIN JSON_EACH(JSON_EXTRACT(deps_tree.json_data, '$.dependencies'))   -- Extract nested dependencies
               WHERE JSON_VALID(JSON_EXTRACT(deps_tree.json_data, '$.dependencies'))  -- Ensure there are nested dependencies
            ) 
  
       -- Final output: Select dependencies with their versions, parent, id, and scope
       SELECT id
            , name
            , version
            , parent
            , CASE WHEN name LIKE '@%' 
                   THEN SUBSTR(name, 1, INSTR(name, '/') - 1)        -- Extract scope
                   ELSE NULL 
                    END AS scope
            , root_id
         FROM deps_tree
        ORDER 
           BY parent
            , name
;
--Print all
SELECT *
  FROM deps_view
;

Examples

List all scoped dependencies with version conflicts

Scoped packages often - not a rule-of-thumb, but often - require consistent versions. Having multiple versions of the same dependency can lead to issues later.

This query lists all scoped packages where multiple versions exist.

--Show me all scoped packages which have multiple versions
-- Query to find all dependencies with multiple distinct versions where scope is not null
SELECT name
     , GROUP_CONCAT(version, ', ') AS versions
  FROM 
     ( SELECT name
            , version
         FROM deps_view
        WHERE scope IS NOT NULL  -- Filter for dependencies that are scoped
        GROUP
           BY name
            , version  -- Ensure distinct versions for each package
     ) 
    AS distinct_versions
 GROUP 
    BY name
HAVING COUNT(version) > 1
;

Here's a project that provides this functionality https://www.npmjs.com/package/find-duplicate-dependencies

Listing the outdated dependencies

Similar to npm outdated, this command will list all the root depenencies that are outdated. It does not care about the version contraints and simply lists the latest availble version

 WITH project_deps 
    AS 
     ( SELECT name AS package_name
            , version
         FROM deps_view 
        WHERE parent IS NULL 
        ORDER 
           BY name ASC 
     )
     , package_data 
    AS 
     ( SELECT package_name
            , HTTP_GET_BODY('https://registry.npmjs.org/' || package_name) AS response
         FROM project_deps
     )
SELECT pd.package_name
     , npm_data.license AS license
     , pd.version AS current_version
     , npm_data.latest_version AS latest_version
  FROM project_deps pd
  JOIN 
     ( SELECT pd.package_name
            , JSON_EXTRACT(pd.response, '$.license') AS license
            , MAX(JSON_EXTRACT(versions.value, '$.version')) AS latest_version
         FROM package_data pd
         JOIN JSON_EACH(JSON_EXTRACT(pd.response, '$.versions')) AS versions
        GROUP 
           BY pd.package_name
     ) AS npm_data
    ON pd.package_name = npm_data.package_name;

Here's some sample output

package_name       license       current_version  latest_version
-----------------  ------------  ---------------  --------------
@langchain/openai  MIT           0.2.5            0.3.5         
dotenv             BSD-2-Clause  16.4.5           9.0.2         
zod                MIT           3.23.8           4.0.0-beta.1  

Show the bundle size

  WITH deps_with_size 
    AS
     (
        SELECT dv.*
             , JSON_EXTRACT(
                 HTTP_GET_BODY(
                   CONCAT_WS('/', 'https://registry.npmjs.org', dv.name, dv.version)), '$.dist.unpackedSize') AS file_size 
          FROM deps_view dv
     )
SELECT dv.name
     , dv.version
     , dv.scope
     , COALESCE(total_file_size, 0) AS total_file_size
  FROM deps_view dv
  LEFT 
  JOIN 
     (
       SELECT root_id
            , SUM(file_size) AS total_file_size
         FROM deps_with_size
        GROUP
           BY root_id
     ) 
    AS aggregated_sizes 
    ON dv.id = aggregated_sizes.root_id
 WHERE dv.parent IS NULL
;
name               version  scope       total_file_size
-----------------  -------  ----------  ---------------
@langchain/openai  0.2.5    @langchain  42232126       
dotenv             16.4.5               79078          
zod                3.23.8               666786    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment