Skip to content

Instantly share code, notes, and snippets.

@belgattitude
Last active November 5, 2024 08:32
Show Gist options
  • Save belgattitude/042f9caf10d029badbde6cf9d43e400a to your computer and use it in GitHub Desktop.
Save belgattitude/042f9caf10d029badbde6cf9d43e400a to your computer and use it in GitHub Desktop.
Composite github action to improve CI time with yarn 3+ / node-modules linker.

Why

Although @setup/node as a built-in cache option, it lacks an opportunity regarding cache persistence. Depending on usage, the action below might give you faster installs and potentially reduce carbon emissions (♻️🌳❤️).

Requirements

Yarn 3+/4+ with nodeLinker: node-modules. (Not using yarn ? see the corresponding pnpm 7/8+ action gist)

Structure

.
└── .github
    ├── actions
    │   └── yarn-nm-install/action.yml (composite action)    
    └── workflows
        └── ci.yml (uses: ./.github/actions/yarn-nm-install)    

Composite action

Create a file in .github/actions/yarn-nm-install/action.yml and paste

########################################################################################
# "yarn install" composite action for yarn 3/4+ and "nodeLinker: node-modules"         #
#--------------------------------------------------------------------------------------#
# Requirement: @setup/node should be run before                                        #
#                                                                                      #
# Usage in workflows steps:                                                            #
#                                                                                      #
#      - name: 📥 Monorepo install                                                     #
#        uses: ./.github/actions/yarn-nm-install                                       #
#        with:                                                                         #
#          enable-corepack: false                   # (default = 'false')              #
#          cwd: ${{ github.workspace }}/apps/my-app # (default = '.')                  #
#          cache-prefix: add cache key prefix       # (default = 'default')            #
#          cache-node-modules: false                # (default = 'false')              #
#          cache-install-state: false               # (default = 'false')              #
#                                                                                      #
# Reference:                                                                           #
#   - latest: https://gist.github.com/belgattitude/042f9caf10d029badbde6cf9d43e400a    #
#                                                                                      #
# Versions:                                                                            #
#   - 1.2.1 - 25-05-2024 - fix a missing action/cache not upraded to v4                #
#   - 1.2.0 - 01-05-2024 - action/cache upraded to v4                                  #
#   - 1.1.0 - 22-07-2023 - Option to enable npm global cache folder.                   #
#   - 1.0.4 - 15-07-2023 - Fix corepack was always enabled.                            #
#   - 1.0.3 - 05-07-2023 - YARN_ENABLE_MIRROR to false (speed up cold start)           #
#   - 1.0.2 - 02-06-2023 - install-state default to false                              #
#   - 1.0.1 - 29-05-2023 - cache-prefix doc                                            #
#   - 1.0.0 - 27-05-2023 - new input: cache-prefix                                     #
########################################################################################

name: 'Monorepo install (yarn)'
description: 'Run yarn install with node_modules linker and cache enabled'
inputs:
  cwd:
    description: "Changes node's process.cwd() if the project is not located on the root. Default to process.cwd()"
    required: false
    default: '.'
  cache-prefix:
    description: 'Add a specific cache-prefix'
    required: false
    default: 'default'
  cache-npm-cache:
    description: 'Cache npm global cache folder often used by node-gyp, prebuild binaries (invalidated on lock/os/node-version)'
    required: false
    default: 'true'
  cache-node-modules:
    description: 'Cache node_modules, might speed up link step (invalidated lock/os/node-version/branch)'
    required: false
    default: 'false'
  cache-install-state:
    description: 'Cache yarn install state, might speed up resolution step when node-modules cache is activated (invalidated lock/os/node-version/branch)'
    required: false
    default: 'false'
  enable-corepack:
    description: 'Enable corepack'
    required: false
    default: 'true'

runs:
  using: 'composite'

  steps:
    - name: ⚙️ Enable Corepack
      if: inputs.enable-corepack == 'true'
      shell: bash
      working-directory: ${{ inputs.cwd }}
      run: corepack enable

    - name: ⚙️ Expose yarn config as "$GITHUB_OUTPUT"
      id: yarn-config
      shell: bash
      working-directory: ${{ inputs.cwd }}
      env:
        YARN_ENABLE_GLOBAL_CACHE: 'false'
      run: |
        echo "CACHE_FOLDER=$(yarn config get cacheFolder)" >> $GITHUB_OUTPUT
        echo "CURRENT_NODE_VERSION="node-$(node --version)"" >> $GITHUB_OUTPUT
        echo "CURRENT_BRANCH=$(echo ${GITHUB_REF#refs/heads/} | sed -r 's,/,-,g')" >> $GITHUB_OUTPUT
        echo "NPM_GLOBAL_CACHE_FOLDER=$(npm config get cache)" >> $GITHUB_OUTPUT

    - name: ♻️ Restore yarn cache
      uses: actions/cache@v4
      id: yarn-download-cache
      with:
        path: ${{ steps.yarn-config.outputs.CACHE_FOLDER }}
        key: yarn-download-cache-${{ inputs.cache-prefix }}-${{ hashFiles(format('{0}/yarn.lock', inputs.cwd), format('{0}/.yarnrc.yml', inputs.cwd)) }}
        restore-keys: |
          yarn-download-cache-${{ inputs.cache-prefix }}-

    - name: ♻️ Restore node_modules
      if: inputs.cache-node-modules == 'true'
      id: yarn-nm-cache
      uses: actions/cache@v4
      with:
        path: ${{ inputs.cwd }}/**/node_modules
        key: yarn-nm-cache-${{ inputs.cache-prefix }}-${{ runner.os }}-${{ steps.yarn-config.outputs.CURRENT_NODE_VERSION }}-${{ steps.yarn-config.outputs.CURRENT_BRANCH }}-${{ hashFiles(format('{0}/yarn.lock', inputs.cwd), format('{0}/.yarnrc.yml', inputs.cwd)) }}

    - name: ♻️ Restore global npm cache folder
      if: inputs.cache-npm-cache == 'true'
      id: npm-global-cache
      uses: actions/cache@v4
      with:
        path: ${{ steps.yarn-config.outputs.NPM_GLOBAL_CACHE_FOLDER }}
        key: npm-global-cache-${{ inputs.cache-prefix }}-${{ runner.os }}-${{ steps.yarn-config.outputs.CURRENT_NODE_VERSION }}-${{ hashFiles(format('{0}/yarn.lock', inputs.cwd), format('{0}/.yarnrc.yml', inputs.cwd)) }}

    - name: ♻️ Restore yarn install state
      if: inputs.cache-install-state == 'true' && inputs.cache-node-modules == 'true'
      id: yarn-install-state-cache
      uses: actions/cache@v4
      with:
        path: ${{ inputs.cwd }}/.yarn/ci-cache
        key: yarn-install-state-cache-${{ inputs.cache-prefix }}-${{ runner.os }}-${{ steps.yarn-config.outputs.CURRENT_NODE_VERSION }}-${{ steps.yarn-config.outputs.CURRENT_BRANCH }}-${{ hashFiles(format('{0}/yarn.lock', inputs.cwd), format('{0}/.yarnrc.yml', inputs.cwd)) }}

    - name: 📥 Install dependencies
      shell: bash
      working-directory: ${{ inputs.cwd }}
      run: yarn install --immutable --inline-builds
      env:
        # Overrides/align yarnrc.yml options (v3, v4) for a CI context
        YARN_ENABLE_GLOBAL_CACHE: 'false' # Use local cache folder to keep downloaded archives
        YARN_ENABLE_MIRROR: 'false' # Prevent populating global cache for caches misses (local cache only)
        YARN_NM_MODE: 'hardlinks-local' # Reduce node_modules size
        YARN_INSTALL_STATE_PATH: '.yarn/ci-cache/install-state.gz' # Might speed up resolution step when node_modules present
        # Other environment variables
        HUSKY: '0' # By default do not run HUSKY install

Workflow action

To use it in the workflows

    steps:
      - uses: actions/checkout@v4

      - name: Use Node.js ${{ matrix.node-version }}
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}

      - name: 📥 Monorepo install
        uses: ./.github/actions/yarn-nm-install

Yarn config

Be sure that your .yarnrc.yml sets the nodeLinker: node-modules parameter:

nodeLinker: node-modules

#compressionLevel: 0  # Will give 10%-30% install speed up, but takes more space locally

# This line can be ommited if corepack is enabled (requires the packageManager field in package.json)
yarnPath: .yarn/releases/yarn-3.6.0.cjs # or 4.0.0-rc.45 (rc's seems quite stable imho)...

Input parameters

      - name: 📥 Monorepo install
        uses: ./.github/actions/yarn-nm-install
        with:
          cwd: '.'
          enable-corepack: false           
          cache-node-modules: true
          cache-install-state: true 
Parameter Default Comment
cwd '.' Run the install in a specific folder.
enable-corepack false Activate corepack.
cache-prefix default Allows to have multiple distinct install.
cache-node-modules false Cache node-modules (only for exceptional use-cases)
cache-install-state false Only useful is cache-node-modules is activated

This action always caches the yarn config get cacheFolder to avoid fetching archives from the npm repository. Depending on the number of your dependencies, this generally gives a 2x overall improvement. It affects the yarn fetch step and protects from npm outages as well. An example of speed gain could be:

CI Scenario Install CI fetch cache Total Cache size CI persist cache
yarn4 with cache 34s 3s 37s 201Mb (±5s)
yarn4 without cache 83s N/A 83s N/A N/A

Link: https://github.com/belgattitude/compare-package-managers#-install-speed

In some circumstances, you might archieve better install time by caching the node_modules folder as well. This will impact the yarn link step. The link step is where yarn runs postinstalls (node-gyp, download binaries...). But be aware that the time saved by doing this creates an overhead for the cache fetch/compression/persist (@action/cache has more to deal with). Use this option with care and in exceptional circumstances (ie you have multiple dependendent steps that run install). Also remember that the node_modules folder does not have the same portability/reliability than the recommended yarn cache folder. To get an idea of the performance gains, see this table:

Command Mean [s] Min [s] Max [s] Relative
yarnCache:on - installState:off - nm:off 22.717 ± 0.225 22.510 22.957 5.95 ± 0.25
yarnCache:on - installState:on - nm:off 21.350 ± 0.216 21.185 21.595 5.59 ± 0.24
yarnCache:on - installState:off - nm:on 11.676 ± 0.041 11.647 11.722 3.06 ± 0.13
yarnCache:on - installState:on - nm:on 3.820 ± 0.156 3.645 3.946 1.00

Link: https://github.com/belgattitude/compare-package-managers#compare-yarn-options

Results

On install, when only few deps changed

image

Cost of action/cache compression

image

Cleanup caches

When a PR is closed or merged the best is to remove install cache rather than letting github reach the max (10GB) and prune.

image

Link: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries

Here's an example (feel free to adapt if you need to preserse some things, ie gh actions-cache list -R $REPO -B $BRANCH | cut -f 1 | grep yarn will only clear yarn related caches)

# https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#force-deleting-cache-entries
name: Cleanup caches for closed branches

on:
  pull_request:
    types:
      - closed
  workflow_dispatch:

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - name: Check out code
        uses: actions/checkout@v4

      - name: Cleanup
        run: |
          gh extension install actions/gh-actions-cache

          REPO=${{ github.repository }}
          BRANCH="refs/pull/${{ github.event.pull_request.number }}/merge"

          echo "Fetching list of cache key"
          cacheKeysForPR=$(gh actions-cache list -R $REPO -B $BRANCH | cut -f 1 )

          ## Setting this to not fail the workflow while deleting cache keys. 
          set +e
          echo "Deleting caches..."
          for cacheKey in $cacheKeysForPR
          do
              gh actions-cache delete $cacheKey -R $REPO -B $BRANCH --confirm
          done
          echo "Done"
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@danielkochengineer
Copy link

@belgattitude Thanks! I can see your update in the gist but still I see actions/cache@v3 being used on the step "Restore node_modules". Is this on purpose or am I missing something?

@belgattitude
Copy link
Author

Oh yes I see. Fixed

@quantizor
Copy link

Why is CURRENT_BRANCH part of the cache key? If the yarn lock is the same shouldn't that be safe enough and allow for better reusability?

@quantizor
Copy link

Btw to get this to work in our setup (deeply nested composite action) I had to switch from GITHUB_OUTPUT to GITHUB_ENV and update the interpolations accordingly because the post steps were losing the value for some reason

@MathiasVandePol
Copy link

I noticed the same issue as @quantizor . Had to switch to GITHUB_ENV and additionally had to set the current dir (cwd) in the environment as well or I had these warnings popping up

Warning: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.

   - name: ⚙️ Expose yarn config as "$GITHUB_ENV"
      id: yarn-config
      shell: bash
      working-directory: ${{ inputs.cwd }}
      env:
        YARN_ENABLE_GLOBAL_CACHE: "false"
      run: |
        echo "CACHE_FOLDER=$(yarn config get cacheFolder)" >> $GITHUB_ENV
        echo "CURRENT_NODE_VERSION="node-$(node --version)"" >> $GITHUB_ENV
        echo "CURRENT_BRANCH=$(echo ${GITHUB_REF#refs/heads/} | sed -r 's,/,-,g')" >> $GITHUB_ENV
        echo "NPM_GLOBAL_CACHE_FOLDER=$(npm config get cache)" >> $GITHUB_ENV
        echo "CURRENT_DIR=$(pwd)" >> $GITHUB_ENV

and then use it as

    - name: ♻️ Restore yarn install state
      if: inputs.cache-install-state == 'true' && inputs.cache-node-modules == 'true'
      id: yarn-install-state-cache
      uses: actions/cache@v4
      with:
        path: ${{ env.CURRENT_DIR }}/.yarn/ci-cache
        key: yarn-install-state-cache-${{ inputs.cache-prefix }}-${{ runner.os }}-${{ env.CURRENT_NODE_VERSION }}-${{ hashFiles(format('{0}/yarn.lock', inputs.cwd), format('{0}/.yarnrc.yml', inputs.cwd)) }}

@belgattitude
Copy link
Author

I haven't had the issue on "simple" repos... but nice to know. I would love to have more info about GITHUB_ENV vs GITHUB_OUTPUT.

Tested on https://github.com/belgattitude/httpx with the same action

image

@quantizor
Copy link

quantizor commented Oct 8, 2024 via email

@mhp-borisbojic
Copy link

Hi there,

we tried to implement this caching for our yarn 4 monorepo as well, which works so far.

But although we cache the yarn and npm global cache and the node_modules, the link step of yarn still takes over 30seconds within the CI/CD workflow.

We have quite a lot entries like:

xxx must be built because it never has been before or the last one failed

On my local machine, this is a matter of 2 seconds and we don't see the linking messages. What could be the issue here?

@belgattitude
Copy link
Author

belgattitude commented Nov 4, 2024

Edit/ I haven’t read well your question… lazy to edit on my phone. Hope you’ll find answers

Afaik there isn’t a way to prevent the link step to happen when the node modules isn’t present. That explain the differences when working locally.

I understand why it’s needed actually. One way would be to cache the node modules as well but the trade off is that the cache retrievals, persistence would take more time. (Packages that stores/download binaries and stores them somewhere else… more difficult, see cache-npm-cache parameter of the action)

My recommendation is to look at the packages that requires/trigger a link step… see if they can be deduped (ie 5 versions of esbuild). Check if you use for example an older version of sharp, some node gyp based packages, nx with source analysis enabled, test containers, prisma with long running generators…)

This can reduce the link step time.

there’s more to say but you always copy paste the list you see… I might have some better idea

@mhp-borisbojic
Copy link

Edit/ I haven’t read well your question… lazy to edit on my phone. Hope you’ll find answers

Afaik there isn’t a way to prevent the link step to happen when the node modules isn’t present. That explain the differences when working locally.

I understand why it’s needed actually. One way would be to cache the node modules as well but the trade off is that the cache retrievals, persistence would take more time. (Packages that stores/download binaries and stores them somewhere else… more difficult, see cache-npm-cache parameter of the action)

That's what we did - we also cached the node_modules, but it seems this didn't change the "need" of linking, which is odd.

@mhp-borisbojic
Copy link

@belgattitude Update - I needed to add .yarn/install-state.gz and it worked - no linking is happening anymore.

Maybe you should add this to your action above (instead / in addition to /.yarn/ci-cache)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment