Turborepo has already landed the big algorithmic win — topological-wave parallel task hashing (commit b3c0f46da8) — but several implementation details are now the bottleneck:
-
Parallelism is re-serialized by shared locks + deep clones:
TaskHashTrackerStateuses oneRwLockguarding 6 independent HashMaps (crates/turborepo-task-hash/src/lib.rs:226-242), so anyinsert_hash()write blocks all concurrent reads. The visitor's wave-hashing funnels results throughArc<Mutex<HashMap>>with per-wave serialization (crates/turborepo-lib/src/task_graph/visitor/mod.rs:233-303). -
Allocation hot-loops dominate hash & dispatch:
DetailedMap(3 nested HashMaps) deep-cloned on everyenv_vars()read (line 622).EnvironmentVariableMapcloned per task at line 388. Repeated regex compilations for env patterns (4 per task inhashable_task_env) and glob patterns (O(packages) recompilations). SCM hashing allocates 10,000+ hexStrings inside parallel loops. -
I/O paths leave easy wins on the table: Cache archives use zstd level 0, single-thread encode, sequential file reads, and an 8KB restore buffer.
-
Already optimized: Lockfile transitive closure (3.3x faster in recent commits). Workspace discovery is async/parallel. globwalk internal regexes cached with
OnceLock.CompiledWildcardsexists for builtins pass-through path. -
Existing tracing infrastructure: Spans like
precompute_task_hashes,task_cache_new,exec_context_new,visit_recv_waitexist.TURBO_LOG_VERBOSITY=debugenables filtering.
Files: crates/turborepo-task-hash/src/lib.rs:226-242, 585-661
| Before | After |
|---|---|
state: Arc<RwLock<TaskHashTrackerState>> (single lock, 6 maps) |
6 independent DashMap fields or per-field RwLock |
env_vars() returns state.get(id).cloned() — deep clone of 3 HashMaps |
Store Arc<DetailedMap>, return cheap Arc::clone() |
insert_hash() write-locks all 6 maps |
Each map locked independently |
Verify perf: TURBO_LOG_VERBOSITY=debug → compare precompute_task_hashes span duration before/after on a large monorepo. Run cargo run -p turborepo -- run build --dry=json twice and diff hashes to confirm stability.
Verify correctness: cargo test -p turborepo-task-hash && cargo test -p turborepo-lib
Estimated speedup: 15-30% on parallel hash computation
Files: crates/turborepo-lib/src/task_graph/visitor/mod.rs:233-303
| Before | After |
|---|---|
results: Arc<Mutex<HashMap<TaskId, (String, EnvironmentVariableMap)>>> |
DashMap<TaskId, (String, Arc<EnvironmentVariableMap>)> |
| Per-wave: collect results → lock mutex → insert all → unlock | Direct results.insert() inside par_iter() closure |
.clone() at line 388 deep-copies env map |
Arc::clone() — pointer copy |
Verify perf: Watch precompute_task_hashes and visit_recv_wait spans. Expect lower tail latency on large waves. Add debug assertion: precomputed.len() == engine.tasks().len().
Verify correctness: cargo test -p turborepo-lib && cargo test -p turborepo-engine
Estimated speedup: 10-25% on task dispatch
Files: crates/turborepo-env/src/lib.rs:273-319, 354-383, 493-496
| Before | After |
|---|---|
hashable_task_env() calls wildcard_map_from_wildcards() → 4 RegexBuilder::new().build() per task |
Pre-compile in TaskHasher::new() → compiled_task_env: HashMap<TaskName, CompiledWildcards> |
get_global_hashable_env_vars() recompiles at lines 493, 496 |
Pre-compile DEFAULT_ENV_VARS once, store on TaskHasher |
CompiledWildcards (lines 173-225) exists but only used for pass-through path |
Extend to all env wildcard matching paths |
Verify perf: Add tracing span around env hashing to confirm it drops from dominating CPU. Count regex compilations per run with a debug counter.
Verify correctness: cargo test -p turborepo-env && cargo test -p turborepo-task-hash. Add unit test comparing old-path vs compiled-path output for same env + patterns.
Estimated speedup: Eliminates O(tasks × 4) regex compilations per build
Files: crates/turborepo-cache/src/cache_archive/create.rs:115,151,175-177 and restore.rs:49
| Before | After |
|---|---|
zstd::Encoder::new(writer, 0) |
zstd::Encoder::new(writer, 3) (configurable) |
Restore buffer: [0; 8192] (8KB) |
[0; 65536] (64KB) |
| Single-threaded zstd encoder | Explore zstd multi-threaded feature flag |
Verify perf: Measure cache artifact sizes and restore times. For remote cache: compare upload/download times.
Verify correctness: cargo test -p turborepo-cache. Add integration test: create archive → restore → assert file contents match.
Estimated speedup: 30-50% cache size reduction; faster restores with larger buffers
Files: crates/turborepo-scm/src/package_deps.rs:221,226,295,318,364-366
Change: Add OnceLock<DashMap<String, CompiledGlob>> cache keyed by pattern string. Plumb pre-compiled patterns to globwalk() call sites.
Verify: Add glob_compilations_total tracing counter. In 100+ package repo, confirm count collapses to ~unique(patterns).
Test: cargo test -p turborepo-scm && cargo test --workspace
Files: hash_object.rs:69, repo_index.rs:162,242,255,246-247,265-266, ls_tree.rs:74, manual.rs:119-202
Change: Represent hashes as [u8; 20] or [u8; 40] internally; format to hex String only at UI/serialization boundaries. Merge the two manual.rs walks into one traversal when include_default_files=true. Replace format! range boundaries with reusable buffers.
Verify: Track "hash string allocations" with debug counter. Validate hashes match previous implementation for fixed repo state.
Test: cargo test -p turborepo-scm && cargo test --workspace
Files: crates/turborepo-engine/src/builder.rs:162-166, 173-193, 991-1021
Change: Convert serial workspace loops to rayon::par_iter(). Sort results after parallel load for deterministic ordering.
Verify: Add spans around config discovery/parse/validation. In large monorepos, expect near-linear speedup.
Test: cargo test -p turborepo-engine && cargo test --workspace
Files: crates/turborepo-repository/src/change_mapper/package.rs:60-84
Change: Replace O(n×m) linear scan with sorted prefix vec + binary search or radix trie (note: radix_trie is in workspace deps). Build index from package paths once in DefaultPackageChangeMapper::new().
Verify: Add detect_package_comparisons_total counter. Run --affected with TURBO_LOG_VERBOSITY=trace to confirm changed-file classification remains correct.
Test: cargo test -p turborepo-repository && cargo test --workspace
- Carry
TaskIdby reference in wave closures; store once per task and reuse cargo test -p turborepo-lib
crates/turborepo-lockfiles/src/lib.rs:135:DashMap::new()→DashMap::with_capacity(workspaces.len() * 50)cargo test -p turborepo-lockfiles
- Hash/Env sharing via
Arcreduces clones but can increase retained memory if references outlive expected scope. Audit end-of-run cleanup paths. - DashMap introduces nondeterministic iteration order. Any code serializing map contents (dry-run JSON, run summaries) must sort explicitly before output.
- Parallel engine config loading can produce nondeterministic error messages. Sort validation results by package path.
- SCM hash representation change (
[u8; 40]instead ofString) ripples through call sites. Convert at UI/JSON boundaries only. Add golden tests. - Cache compression level 3 trades slightly more CPU for smaller artifacts. Make configurable. Add heuristic for tiny caches (level 1).
- Near-linear scaling of task-hash computation: Once RwLock contention + deep clones are removed, the topological-wave parallelism can finally scale with cores.
- Faster incremental/watch loops: The prefix index for
detect_package()directly improves--affectedand watch-mode responsiveness — the paths users feel most. - More efficient remote caching: Smaller artifacts + faster restore makes cache-hit runs meaningfully faster, not just "no work executed".
- Performance observability: With existing spans (
precompute_task_hashes,task_cache_new,exec_context_new) andTURBO_LOG_VERBOSITYcontrols, you can build a stable before/after perf dashboard.
| Change | Effort | Risk | Notes |
|---|---|---|---|
| P0.1 Lock split + Arc(DetailedMap) | Medium | Medium | Biggest parallelism blocker. Risk: lifetime/ownership bugs, nondeterministic iteration |
| P0.2 Wave Mutex → DashMap | Low-Medium | Low | Straightforward swap. Invariant: every task has exactly one entry after precompute |
| P0.3 Precompile env regexes | Medium | Low | Mostly plumbing. Compare old/new outputs in tests |
| P0.4 Cache archive tweaks | Low | Low | Buffers + compression defaults easy; failures obvious (restore mismatch) |
| P1.1 Glob cache | Medium | Medium | Must respect platform-specific glob behavior |
| P1.2 SCM allocations + walk merge | Medium-High | Medium | Hash-string refactor ripples through many call sites; needs golden tests |
| P1.3 Engine builder parallelism | Medium | Medium | Nondeterministic validation ordering; requires explicit sorting |
| P1.4 Change mapper prefix index | Medium | Low-Medium | Edge cases: nested packages, symlinks, path normalization |
| P2.x | Low | Low | Only after P0/P1 land |
Global verification command for any change: cargo test --workspace (full test suite). For perf regression detection: TURBO_LOG_VERBOSITY=debug cargo run -p turborepo -- run build --dry=json on a representative monorepo, comparing span timings and hash stability across runs.
Answer the investigation question using the provided context. Keep recommendations specific to this repository and call out assumptions.
Question: Find every possible perf win with the goal of making turborepo builds as fast as possible and show how to verify the perf improvements and that they're not breaking anything
09e25577a7 release(turborepo): 2.8.11-canary.27 (#11975) db01cb4490 perf: Fast path for shallow wildcard glob patterns in workspace discovery (#11972) d82c6919d1 fix: Resolve git_root to worktree root in linked worktrees (#11974) e2bc393cec release(turborepo): 2.8.11-canary.26 (#11973) 4b5410b397 perf: Send engine callback before tracker bookkeeping and add tracing spans (#11970) 75406f62d0 release(turborepo): 2.8.11-canary.25 (#11971) b3c0f46da8 perf: Parallelize task hash computation across topological waves (#11969) 69a89b33a4 release(turborepo): 2.8.11-canary.24 (#11968)
Mode: research Previously explored areas:
- engine-builder-parallel-config-loading
Files: crates/turborepo-engine/src/builder.rs, crates/turborepo-repository/src/package_graph/mod.rs, crates/turborepo-repository/src/discovery.rs
Maturity: stable
Relevance: The EngineBuilder BFS loads turbo.json configs serially per workspace during every
turbo runcold start — parallelizing this and caching resolved configs would cut graph construction time proportionally to workspace count. Key findings: The EngineBuilder BFS loads turbo.json configs serially per workspace during everyturbo runcold start — parallelizing this and caching resolved configs would cut graph construction time proportionally to workspace count. - cache-archive-parallel-compression Files: crates/turborepo-cache/src/fs.rs, crates/turborepo-cache/src/cache_archive/create.rs, crates/turborepo-cache/src/cache_archive/mod.rs Maturity: stable Relevance: Cache archive creation uses single-threaded zstd at level 0 with sequential file reads — enabling multi-threaded zstd compression and parallel I/O would directly cut wall-clock time on every cache-miss task, especially for large Next.js builds with hundreds of output chunks. Key findings: Cache archive creation uses single-threaded zstd at level 0 with sequential file reads — enabling multi-threaded zstd compression and parallel I/O would directly cut wall-clock time on every cache-miss task, especially for large Next.js builds with hundreds of output chunks.
- globwalk-pattern-caching
Files: crates/turborepo-globwalk/src/lib.rs, crates/turborepo-run-cache/src/lib.rs, crates/turborepo-scm/src/package_deps.rs, crates/turborepo-types/src/lib.rs
Maturity: stable
Relevance: Glob patterns are recompiled from scratch on every
globwalk()call with zero caching—in a 100-package monorepo the samedist/**pattern gets regex-compiled 100+ times per build acrosssave_outputsand file-hashing paths, and adding a compiled-pattern cache is a low-risk, high-leverage win. Key findings: Glob patterns are recompiled from scratch on everyglobwalk()call with zero caching—in a 100-package monorepo the samedist/**pattern gets regex-compiled 100+ times per build acrosssave_outputsand file-hashing paths, and adding a compiled-pattern cache is a low-risk, high-leverage win. - scm-package-hash-alloc-reduction Files: crates/turborepo-scm/src/hash_object.rs, crates/turborepo-scm/src/repo_index.rs, crates/turborepo-scm/src/manual.rs, crates/turborepo-scm/src/ls_tree.rs, crates/turborepo-task-hash/src/lib.rs Maturity: stable Relevance: Per-file String allocations from hex encoding, redundant hash clones across packages, and a double directory walk in manual mode create measurable overhead on every build — these are hot-path wins that scale with repo size. Key findings: Based on the research across all three explorations, here's the highest-leverage untapped area:
- lockfile-transitive-closure
Files: crates/turborepo-lockfiles/src/lib.rs, crates/turborepo-lockfiles/src/berry/.rs, crates/turborepo-lockfiles/src/pnpm/.rs, crates/turborepo-lockfiles/src/npm.rs, crates/turborepo-lockfiles/src/bun.rs
Maturity: stable
Relevance:
all_transitive_closuressits on the critical startup path — everyturbo runblocks on it, and an unsized DashMap plus redundant per-workspace DFS walks over shared dependency subgraphs leave significant time on the table for large monorepos. Key findings:all_transitive_closuressits on the critical startup path — everyturbo runblocks on it, and an unsized DashMap plus redundant per-workspace DFS walks over shared dependency subgraphs leave significant time on the table for large monorepos. - task-env-regex-recompilation
Files: crates/turborepo-env/src/lib.rs, crates/turborepo-task-hash/src/lib.rs
Maturity: stable
Relevance:
hashable_task_envrecompiles regex viawildcard_map_from_wildcardson every task despiteCompiledWildcardsalready existing for the pass-through path — extending it here eliminates O(tasks) regex compilations in the hot loop. Key findings:hashable_task_envrecompiles regex viawildcard_map_from_wildcardson every task despiteCompiledWildcardsalready existing for the pass-through path — extending it here eliminates O(tasks) regex compilations in the hot loop. - task-hash-tracker-rwlock-contention Files: crates/turborepo-task-hash/src/lib.rs, crates/turborepo-lib/src/task_graph/visitor/mod.rs Maturity: stable Relevance: Single RwLock guards five independent HashMaps in TaskHashTracker, causing write-lock contention that serializes parallel rayon task-hash computation across every topological wave. Key findings: Single RwLock guards five independent HashMaps in TaskHashTracker, causing write-lock contention that serializes parallel rayon task-hash computation across every topological wave.
- task-graph-visitor-allocations Files: crates/turborepo-lib/src/task_graph/visitor/mod.rs, crates/turborepo-lib/src/task_graph/visitor/exec.rs Maturity: stable Relevance: The task visitor's main loop double-clones EnvironmentVariableMap per task and redundantly clones TaskId/info 5+ times per task execution, all in the hottest path of every turbo run. Key findings: The task visitor's main loop double-clones EnvironmentVariableMap per task and redundantly clones TaskId/info 5+ times per task execution, all in the hottest path of every turbo run.
- change-mapper-linear-package-detection Files: crates/turborepo-repository/src/change_mapper/package.rs, crates/turborepo-repository/src/change_mapper/mod.rs Maturity: stable Relevance: O(n×m) file-to-package detection in detect_package() blocks --affected and watch-mode scaling in large monorepos with hundreds of packages Key findings: O(n×m) file-to-package detection in detect_package() blocks --affected and watch-mode scaling in large monorepos with hundreds of packages
AREA: change-mapper-linear-package-detection FILES: crates/turborepo-repository/src/change_mapper/package.rs, crates/turborepo-repository/src/change_mapper/mod.rs MATURITY: stable RELEVANCE: O(n×m) file-to-package detection in detect_package() blocks --affected and watch-mode scaling in large monorepos with hundreds of packages NOTES: (unspecified)
Raw Exploration Notes: AREA: change-mapper-linear-package-detection FILES: crates/turborepo-repository/src/change_mapper/package.rs, crates/turborepo-repository/src/change_mapper/mod.rs MATURITY: stable RELEVANCE: O(n×m) file-to-package detection in detect_package() blocks --affected and watch-mode scaling in large monorepos with hundreds of packages
===== FILE: crates/turborepo-repository/src/change_mapper/package.rs ===== use thiserror::Error; use turbopath::{AnchoredSystemPath, AnchoredSystemPathBuf}; use wax::{BuildError, Program};
use crate::{ change_mapper::{AllPackageChangeReason, PackageInclusionReason}, package_graph::{PackageGraph, PackageName, WorkspacePackage}, package_manager::PackageManager, };
pub enum PackageMapping { /// We've hit a global file, so all packages have changed All(AllPackageChangeReason), /// This change is meaningless, no packages have changed None, /// This change has affected one package Package((WorkspacePackage, PackageInclusionReason)), }
/// Maps a single file change to affected packages. This can be a single
/// package (Package), none of the packages (None), or all of the packages
/// (All).
pub trait PackageChangeMapper {
fn detect_package(&self, file: &AnchoredSystemPath) -> PackageMapping;
}
impl<L, R> PackageChangeMapper for either::Either<L, R> where L: PackageChangeMapper, R: PackageChangeMapper, { fn detect_package(&self, file: &AnchoredSystemPath) -> PackageMapping { match self { either::Either::Left(l) => l.detect_package(file), either::Either::Right(r) => r.detect_package(file), } } }
/// Detects package by checking if the file is inside the package.
///
/// Does not use the globalDependencies in turbo.json.
/// Since we don't have these dependencies, any file that is
/// not in any package will automatically invalidate all
/// packages. This is fine for builds, but less fine
/// for situations like watch mode.
pub struct DefaultPackageChangeMapper<'a> {
pkg_dep_graph: &'a PackageGraph,
}
impl<'a> DefaultPackageChangeMapper<'a> { pub fn new(pkg_dep_graph: &'a PackageGraph) -> Self { Self { pkg_dep_graph } } fn is_file_in_package(file: &AnchoredSystemPath, package_path: &AnchoredSystemPath) -> bool { file.components() .zip(package_path.components()) .all(|(a, b)| a == b) } }
impl PackageChangeMapper for DefaultPackageChangeMapper<'_> { fn detect_package(&self, file: &AnchoredSystemPath) -> PackageMapping { for (name, entry) in self.pkg_dep_graph.packages() { if name == &PackageName::Root { continue; } if let Some(package_path) = entry.package_json_path.parent() && Self::is_file_in_package(file, package_path) { return PackageMapping::Package(( WorkspacePackage { name: name.clone(), path: package_path.to_owned(), }, PackageInclusionReason::FileChanged { file: file.to_owned(), }, )); } }
PackageMapping::All(AllPackageChangeReason::GlobalDepsChanged {
file: file.to_owned(),
})
}
}
pub struct DefaultPackageChangeMapperWithLockfile<'a> { base: DefaultPackageChangeMapper<'a>, }
impl<'a> DefaultPackageChangeMapperWithLockfile<'a> { pub fn new(pkg_dep_graph: &'a PackageGraph) -> Self { Self { base: DefaultPackageChangeMapper::new(pkg_dep_graph), } } }
impl PackageChangeMapper for DefaultPackageChangeMapperWithLockfile<'_> { fn detect_package(&self, path: &AnchoredSystemPath) -> PackageMapping { // If we have a lockfile change, we consider this as a root package change, // since there's a chance that the root package uses a workspace package // dependency (this is cursed behavior but sadly possible). There's a chance // that we can make this more accurate by checking which package // manager, since not all package managers may permit root pulling from // workspace package dependencies if PackageManager::supported_managers() .iter() .any(|pm| pm.lockfile_name() == path.as_str()) { PackageMapping::Package(( WorkspacePackage { name: PackageName::Root, path: AnchoredSystemPathBuf::from_raw("").unwrap(), }, PackageInclusionReason::ConservativeRootLockfileChanged, )) } else { self.base.detect_package(path) } } }
#[derive(Error, Debug)] pub enum Error { #[error(transparent)] InvalidFilter(#[from] BuildError), }
/// A package detector.
///
/// It uses a global deps list to determine
/// if a file should cause all packages to be marked as changed.
/// This is less conservative than the DefaultPackageChangeMapper,
/// which assumes that any changed file that is not in a package
/// changes all packages. Since we have a list of global deps,
/// we can check against that and avoid invalidating in unnecessary cases.
pub struct GlobalDepsPackageChangeMapper<'a> {
base: DefaultPackageChangeMapperWithLockfile<'a>,
global_deps_matcher: wax::Any<'a>,
}
impl<'a> GlobalDepsPackageChangeMapper<'a> { pub fn new<S: wax::Pattern<'a>, I: Iterator<Item = S>>( pkg_dep_graph: &'a PackageGraph, global_deps: I, ) -> Result<Self, Error> { let base = DefaultPackageChangeMapperWithLockfile::new(pkg_dep_graph); let global_deps_matcher = wax::any(global_deps)?;
Ok(Self {
base,
global_deps_matcher,
})
}
}
impl PackageChangeMapper for GlobalDepsPackageChangeMapper<'> {
fn detect_package(&self, path: &AnchoredSystemPath) -> PackageMapping {
match self.base.detect_package(path) {
// Since DefaultPackageChangeMapper is overly conservative, we can check here if
// the path is actually in globalDeps and if not, return it as
// PackageDetection::Package(WorkspacePackage::root()).
PackageMapping::All() => {
let cleaned_path = path.clean();
let in_global_deps = self.global_deps_matcher.is_match(cleaned_path.as_str());
if in_global_deps {
PackageMapping::All(AllPackageChangeReason::GlobalDepsChanged {
file: path.to_owned(),
})
} else {
PackageMapping::Package((
WorkspacePackage::root(),
PackageInclusionReason::FileChanged {
file: path.to_owned(),
},
))
}
}
result => result,
}
}
}
#[cfg(test)] mod tests { use tempfile::tempdir; use turbopath::{AbsoluteSystemPath, AnchoredSystemPathBuf};
use super::{DefaultPackageChangeMapper, GlobalDepsPackageChangeMapper};
use crate::{
change_mapper::{
AllPackageChangeReason, ChangeMapper, LockfileContents, PackageChanges,
PackageInclusionReason,
},
discovery::{self, PackageDiscovery},
package_graph::{PackageGraphBuilder, WorkspacePackage},
package_json::PackageJson,
package_manager::PackageManager,
};
#[allow(dead_code)]
pub struct MockDiscovery;
impl PackageDiscovery for MockDiscovery {
async fn discover_packages(
&self,
) -> Result<discovery::DiscoveryResponse, discovery::Error> {
Ok(discovery::DiscoveryResponse {
package_manager: PackageManager::Npm,
workspaces: vec![],
})
}
async fn discover_packages_blocking(
&self,
) -> Result<discovery::DiscoveryResponse, discovery::Error> {
self.discover_packages().await
}
}
#[tokio::test]
async fn test_different_package_detectors() -> Result<(), anyhow::Error> {
let repo_root = tempdir()?;
let root_package_json = PackageJson::default();
let pkg_graph = PackageGraphBuilder::new(
AbsoluteSystemPath::from_std_path(repo_root.path())?,
root_package_json,
)
.with_package_discovery(MockDiscovery)
.build()
.await?;
let default_package_detector = DefaultPackageChangeMapper::new(&pkg_graph);
let change_mapper = ChangeMapper::new(&pkg_graph, vec![], default_package_detector);
let package_changes = change_mapper.changed_packages(
[AnchoredSystemPathBuf::from_raw("README.md")?]
.into_iter()
.collect(),
LockfileContents::Unchanged,
)?;
// We should return All because we don't have global deps and
// therefore must be conservative about changes
assert_eq!(
package_changes,
PackageChanges::All(AllPackageChangeReason::GlobalDepsChanged {
file: AnchoredSystemPathBuf::from_raw("README.md")?,
})
);
let turbo_package_detector =
GlobalDepsPackageChangeMapper::new(&pkg_graph, std::iter::empty::<&str>())?;
let change_mapper = ChangeMapper::new(&pkg_graph, vec![], turbo_package_detector);
let package_changes = change_mapper.changed_packages(
[AnchoredSystemPathBuf::from_raw("README.md")?]
.into_iter()
.collect(),
LockfileContents::Unchanged,
)?;
// We only get a root workspace change since we have global deps specified and
// README.md is not one of them
assert_eq!(
package_changes,
PackageChanges::Some(
[(
WorkspacePackage::root(),
PackageInclusionReason::FileChanged {
file: AnchoredSystemPathBuf::from_raw("README.md")?,
}
)]
.into_iter()
.collect()
)
);
Ok(())
}
} ===== END FILE: crates/turborepo-repository/src/change_mapper/package.rs =====
===== FILE: crates/turborepo-repository/src/change_mapper/mod.rs =====
//! Maps changed files to changed packages in a repository.
//! Used for both --filter and for isolated builds.
use std::{ collections::{HashMap, HashSet}, hash::Hash, };
pub use package::{ DefaultPackageChangeMapper, DefaultPackageChangeMapperWithLockfile, Error, GlobalDepsPackageChangeMapper, PackageChangeMapper, PackageMapping, }; use tracing::debug; use turbopath::{AbsoluteSystemPath, AnchoredSystemPathBuf}; use wax::Program;
use crate::package_graph::{ ChangedPackagesError, ExternalDependencyChange, PackageGraph, PackageName, WorkspacePackage, };
mod package;
const DEFAULT_GLOBAL_DEPS: &[&str] = ["package.json", "turbo.json", "turbo.jsonc"].as_slice();
// We may not be able to load the lockfile contents, but we // still want to be able to express a generic change. pub enum LockfileChange { Empty, ChangedPackages(HashSet<turborepo_lockfiles::Package>), }
/// This describes the state of a change to a lockfile.
pub enum LockfileContents {
/// We know the lockfile did not change
Unchanged,
/// We know the lockfile changed but don't have the file contents of the
/// previous lockfile (i.e. git status, or perhaps a lockfile that was
/// deleted or otherwise inaccessible with the information we have)
UnknownChange,
/// We know the lockfile changed and have the contents of the previous
/// lockfile
Changed(Vec),
}
#[derive(Debug, PartialEq, Eq, Hash, Clone)] pub enum PackageInclusionReason { /// All the packages are invalidated All(AllPackageChangeReason), /// Root task was run RootTask { task: String }, /// We conservatively assume that the root package is changed because /// the lockfile changed. ConservativeRootLockfileChanged, /// The lockfile changed and caused this package to be invalidated LockfileChanged { removed: Vec<turborepo_lockfiles::Package>, added: Vec<turborepo_lockfiles::Package>, }, /// A transitive dependency of this package changed DependencyChanged { dependency: PackageName }, /// A transitive dependent of this package changed DependentChanged { dependent: PackageName }, /// A file contained in this package changed FileChanged { file: AnchoredSystemPathBuf }, /// The filter selected a directory which contains this package InFilteredDirectory { directory: AnchoredSystemPathBuf }, /// Package is automatically included because of the filter (or lack /// thereof) IncludedByFilter { filters: Vec }, }
#[derive(Debug, PartialEq, Eq, Hash, Clone)]
pub enum AllPackageChangeReason {
GlobalDepsChanged {
file: AnchoredSystemPathBuf,
},
/// A file like package.json or turbo.json changed
DefaultGlobalFileChanged {
file: AnchoredSystemPathBuf,
},
LockfileChangeDetectionFailed,
LockfileChangedWithoutDetails,
RootInternalDepChanged {
root_internal_dep: PackageName,
},
GitRefNotFound {
from_ref: Option,
to_ref: Option,
},
}
pub fn merge_changed_packages<T: Hash + Eq>( changed_packages: &mut HashMap<T, PackageInclusionReason>, new_changes: impl IntoIterator<Item = (T, PackageInclusionReason)>, ) { for (package, reason) in new_changes { changed_packages.entry(package).or_insert(reason); } }
#[derive(Debug, PartialEq, Eq)] pub enum PackageChanges { All(AllPackageChangeReason), Some(HashMap<WorkspacePackage, PackageInclusionReason>), }
pub struct ChangeMapper<'a, PD> { pkg_graph: &'a PackageGraph,
ignore_patterns: Vec<String>,
package_detector: PD,
}
impl<'a, PD: PackageChangeMapper> ChangeMapper<'a, PD> { pub fn new( pkg_graph: &'a PackageGraph, ignore_patterns: Vec, package_detector: PD, ) -> Self { Self { pkg_graph, ignore_patterns, package_detector, } }
fn default_global_file_changed(
changed_files: &HashSet<AnchoredSystemPathBuf>,
) -> Option<&AnchoredSystemPathBuf> {
changed_files
.iter()
.find(|f| DEFAULT_GLOBAL_DEPS.iter().any(|dep| *dep == f.as_str()))
}
pub fn changed_packages(
&self,
changed_files: HashSet<AnchoredSystemPathBuf>,
lockfile_contents: LockfileContents,
) -> Result<PackageChanges, ChangeMapError> {
if let Some(file) = Self::default_global_file_changed(&changed_files) {
debug!("global file changed");
return Ok(PackageChanges::All(
AllPackageChangeReason::DefaultGlobalFileChanged {
file: file.to_owned(),
},
));
}
// get filtered files and add the packages that contain them
let filtered_changed_files = self.filter_ignored_files(changed_files.iter())?;
// calculate lockfile_change here based on changed_files
match self.get_changed_packages(filtered_changed_files.into_iter()) {
PackageChanges::All(reason) => Ok(PackageChanges::All(reason)),
PackageChanges::Some(mut changed_pkgs) => {
match lockfile_contents {
LockfileContents::Changed(previous_lockfile_contents) => {
// if we run into issues, don't error, just assume all packages have changed
let Ok(lockfile_changes) =
self.get_changed_packages_from_lockfile(&previous_lockfile_contents)
else {
debug!(
"unable to determine lockfile changes, assuming all packages \
changed"
);
return Ok(PackageChanges::All(
AllPackageChangeReason::LockfileChangeDetectionFailed,
));
};
debug!(
"found {} packages changed by lockfile",
lockfile_changes.len()
);
merge_changed_packages(
&mut changed_pkgs,
lockfile_changes.into_iter().map(|change| {
let ExternalDependencyChange {
package,
added,
removed,
} = change;
(
package,
PackageInclusionReason::LockfileChanged { added, removed },
)
}),
);
Ok(PackageChanges::Some(changed_pkgs))
}
// We don't have the actual contents, so just invalidate everything
LockfileContents::UnknownChange => {
// this can happen in a blobless checkout
debug!(
"we know the lockfile changed but we don't have the contents so we \
have to assume all packages changed and rebuild everything"
);
Ok(PackageChanges::All(
AllPackageChangeReason::LockfileChangedWithoutDetails,
))
}
// We don't know if the lockfile changed or not, so we can't assume anything
LockfileContents::Unchanged => {
debug!("the lockfile did not change");
Ok(PackageChanges::Some(changed_pkgs))
}
}
}
}
}
fn filter_ignored_files<'b>(
&self,
changed_files: impl Iterator<Item = &'b AnchoredSystemPathBuf> + 'b,
) -> Result<HashSet<&'b AnchoredSystemPathBuf>, ChangeMapError> {
let matcher = wax::any(self.ignore_patterns.iter().map(|s| s.as_str()))?;
Ok(changed_files
.filter(move |f| !matcher.is_match(f.as_path()))
.collect())
}
// note: this could probably be optimized by using a hashmap of package paths
fn get_changed_packages<'b>(
&self,
files: impl Iterator<Item = &'b AnchoredSystemPathBuf>,
) -> PackageChanges {
let root_internal_deps = self.pkg_graph.root_internal_package_dependencies();
let mut changed_packages = HashMap::new();
for file in files {
match self.package_detector.detect_package(file) {
// Internal root dependency changed so global hash has changed
PackageMapping::Package((pkg, _)) if root_internal_deps.contains(&pkg) => {
debug!(
"{} changes root internal dependency: \"{}\"\nshortest path from root: \
{:?}",
file.to_string(),
pkg.name,
self.pkg_graph.root_internal_dependency_explanation(&pkg),
);
return PackageChanges::All(AllPackageChangeReason::RootInternalDepChanged {
root_internal_dep: pkg.name.clone(),
});
}
PackageMapping::Package((pkg, reason)) => {
debug!("{} changes \"{}\"", file.to_string(), pkg.name);
changed_packages.insert(pkg, reason);
}
PackageMapping::All(reason) => {
debug!("all packages changed due to {file:?}");
return PackageChanges::All(reason);
}
PackageMapping::None => {}
}
}
PackageChanges::Some(changed_packages)
}
fn get_changed_packages_from_lockfile(
&self,
lockfile_content: &[u8],
) -> Result<Vec<ExternalDependencyChange>, ChangeMapError> {
// We pass None for yarnrc since we're only comparing lockfiles for changes,
// not resolving package dependencies. Catalog resolution isn't needed here.
let previous_lockfile = self.pkg_graph.package_manager().parse_lockfile(
self.pkg_graph.root_package_json(),
lockfile_content,
None,
)?;
let additional_packages = self
.pkg_graph
.changed_packages_from_lockfile(previous_lockfile.as_ref())?;
Ok(additional_packages)
}
pub fn lockfile_changed(
turbo_root: &AbsoluteSystemPath,
changed_files: &HashSet<AnchoredSystemPathBuf>,
lockfile_path: &AbsoluteSystemPath,
) -> bool {
let lockfile_path_relative = turbo_root
.anchor(lockfile_path)
.expect("lockfile should be in repo");
changed_files.iter().any(|f| f == &lockfile_path_relative)
}
}
#[derive(thiserror::Error, Debug)] pub enum ChangeMapError { #[error(transparent)] Wax(#[from] wax::BuildError), #[error("Package manager error: {0}")] PackageManager(#[from] crate::package_manager::Error), #[error("No lockfile")] NoLockfile, #[error("Lockfile error: {0}")] Lockfile(turborepo_lockfiles::Error), }
impl From for ChangeMapError { fn from(value: ChangedPackagesError) -> Self { match value { ChangedPackagesError::NoLockfile => Self::NoLockfile, ChangedPackagesError::Lockfile(e) => Self::Lockfile(e), } } }
#[cfg(test)] mod test { use test_case::test_case;
use super::ChangeMapper;
use crate::change_mapper::package::DefaultPackageChangeMapper;
#[cfg(unix)]
#[test_case("/a/b/c", &["package.lock"], "/a/b/c/package.lock", true ; "simple")]
#[test_case("/a/b/c", &["a", "b", "c"], "/a/b/c/package.lock", false ; "lockfile unchanged")]
fn test_lockfile_changed(
turbo_root: &str,
changed_files: &[&str],
lockfile_path: &str,
expected: bool,
) {
let turbo_root = turbopath::AbsoluteSystemPathBuf::new(turbo_root).unwrap();
let lockfile_path = turbopath::AbsoluteSystemPathBuf::new(lockfile_path).unwrap();
let changed_files = changed_files
.iter()
.map(|s| turbopath::AnchoredSystemPathBuf::from_raw(s).unwrap())
.collect();
let changes = ChangeMapper::<DefaultPackageChangeMapper>::lockfile_changed(
&turbo_root,
&changed_files,
&lockfile_path,
);
assert_eq!(changes, expected);
}
#[cfg(windows)]
#[test_case("C:\\\\a\\b\\c", &["package.lock"], "C:\\\\a\\b\\c\\package.lock", true ; "simple")]
#[test_case("C:\\\\a\\b\\c", &["a", "b", "c"], "C:\\\\a\\b\\c\\package.lock", false ; "lockfile unchanged")]
fn test_lockfile_changed(
turbo_root: &str,
changed_files: &[&str],
lockfile_path: &str,
expected: bool,
) {
let turbo_root = turbopath::AbsoluteSystemPathBuf::new(turbo_root).unwrap();
let lockfile_path = turbopath::AbsoluteSystemPathBuf::new(lockfile_path).unwrap();
let changed_files = changed_files
.iter()
.map(|s| turbopath::AnchoredSystemPathBuf::from_raw(s).unwrap())
.collect();
let changes = ChangeMapper::<DefaultPackageChangeMapper>::lockfile_changed(
&turbo_root,
&changed_files,
&lockfile_path,
);
// we don't want to implement PartialEq on the error type,
// so simply compare the debug representations
assert_eq!(changes, expected);
}
} ===== END FILE: crates/turborepo-repository/src/change_mapper/mod.rs =====
IMPORTANT: Structure your response using the exact section headings above.