Skip to content

Instantly share code, notes, and snippets.

@lalunamel
Last active October 22, 2024 02:01
Show Gist options
  • Save lalunamel/716de8bb16cbf1d942324fc2120931ee to your computer and use it in GitHub Desktop.
Save lalunamel/716de8bb16cbf1d942324fc2120931ee to your computer and use it in GitHub Desktop.
Understanding how XCode builds (llbuild)

Understanding how XCode builds

Raw notes are at the top

Human-readable blog post is in the middle

Raw Notes:

llbuild is the build system used by xcode and the swift package manager.

Understanding llbuild will allow you to understand how xcode builds your app.

Enabling debug features in xcode to make llbuild spit out logging information

Run this in your terminal:

defaults write com.apple.dt.XCBuild EnableBuildDebugging -bool YES

This will tell XCBuild to enable build debugging, which will in turn tell llbuild to enable build debugging. You'll want to turn this off when you're done as it will produce build artifacts that take up a fair amount of disk space if you're building many times a day, every day.

Artifacts produced by EnableBuildDebugging

  • manifest.xcbuild
    • This is the build file that describes your build as a yml file with a bunch of json in it.
  • build.db
    • This is an SQLite database that holds build information. Essentially, this is the cache behind the build that enables incremental compilation. It stores cache keys related to the artifacts that were build previously which are stored in the derived data folder.
  • build.trace
    • The trace file is the log of what llbuild actually did as it executed manifest.xcbuild

manifest.xcbuild (build file)

This file describes your build. It has a few different parts: client, target, nodes, and commands.

I've not seen client or target contain particularly useful information. nodes and commands are where it's at.

You can find the source documentation for each of these things here, but I'll cover them in my own words now.

Node

A node represents some input or output of the build process. Typically, this is a file.

Nodes can have attributes.

Misc facts about nodes

By default, if a node ends in a /, then it is considered a directory. By default, if a node matches the pattern <.*> (where .* is the regex for "any number of characters"), like <link>, then it is considered "virtual" and is assumed to not represent a file. If a node has the attribute is-command-timestamp, then it will be used to hold the timestamp when a command was completed. This timestamp can be used to see if a command that was run in a previous build needs to be re-run.

Command

A command is some task to be run by the build system. This can be something simple like "copy the contents of this framework somewhere else" (ProcessXCFramework), or "compile these swift files" (Debug:CompileSwiftSources).

A command has a tool, description, probably inputs and outputs and then a few other attributes depending on what tool is specified.

There are a few built in tools listed here.

Here are all the tools used by a build I just ran:

phony
mkdir
shell
stale-file-removal

auxiliary-file
copy-plist
copy-strings-file
create-build-directory
embed-swift-stdlib
file-copy
info-plist-processor
process-product-entitlements
process-xcframework
register-execution-policy-exception
Inputs and Outputs

The inputs and outputs part of commands are quite important - they're the thing that determine whether the command should be run, or if the output from a previous run of the command is sufficient and we can all save a bunch of time by skipping it. This is what's known as "incremental compilation" or "compilation avoidance".

In the parlance of llbuild, a command creates a rule, which creates a task. commands will always be run unless the output they produce is the same from one build to the next. rules will run if their "signature" has changed from one build to the next OR any of their inputs have different "file info" from one build to the next

Running Commands

A Shell command is the type of command that will compile your source code. Since compiling source code is probably the biggest thing you want to avoid, lets take a look at what causes Shell commands to be re-run on an incremental build.

Firstly, a Shell command is a subclass of ExternalCommand - you can find more information on when an ExternalCommand is run by looking at the source code here.

An ExternalCommand is run if:

  • it's deemed to be always out of date, which is an attribute that can specified on a command within the build file.
  • if the prior value wasn't "successful", where "successful" means that the command was 1. executed and 2. for every output specified for the command, that output's stat information is recorded
  • for each output of a command
    • if the output is "virtual", don't consider that output
    • if the output has different "information" than before, run the command. "information" is the result of the unix stat command and is retrieved here. It includes a file's st_dev, st_ino, st_mode, st_size, st_mtime - you can read more about what each of those mean here.

If a command is run (which is controlled by the outputs for the command), it will create a new rule. The rule may or may not be run based on its inputs.

Running Rules that are created by Commands
  • BuildEngine#scanRule
    • Check if it needs to run ("checking-rule-needs-to-run")
      • Does it need to run because the signature changed? ("signature-changed")
        • ruleInfo.rule->signature != ruleInfo.result.signature
          • ruleInfo.rule is from this run

          • ruleInfo.result is from last run

            • how are rules identified between runs? do they have a stable ID?
              • they are identified through their KeyID
              • a KeyID is a sequential database ID (integer) that maps to an actual key (the string representation of a rule, like "N/Users/.../GraphQLSchema.swift") through the key_names table
            • how are rules created?
              • BuildEngine#addRule
              • a rule is looked up from the db (SQLiteBuildDB#lookupRuleResult)
                • if it's found, return it
                • if it's not, don't do anything and allow the newly created rule to be empty
          • Where do these rule signatures get computed?

            • BuildSystem#lookupRule is where new rules get their values (including their signature)
              • BuildKey::Kind::Command - command->getSignature()
              • BuildKey::Kind::Node - node->getSignature()
            • Only Commands and Nodes have signatures
          • What's in a signature?

            • BuildKey::Kind::Command - command->getSignature()
              • combine the name of the command, the names of all the inputs to the command, the names of all the outputs, whether or not the command allows missing inputs, whether or not the command allows modified outputs, and whether or not the command is always out of date
            • BuildKey::Kind::Node - node->getSignature()
              • combine the type of the node with the names of all the producers for the node
                • what is the type of a node? BuildNode.h - Plain, Directory, DirectoryStructure, Virtual
                • what's a producer for a node?
                  • a producer is a Command that can produce a node
        • So based on the above, the signature is not going to change when you, e.g. change the contents of a file. It will change when you change the name of a file, or a file. It's more of a structural thing that will only ever change when files or dependencies are added or removed.
      • Does it need to run because it's "invalid"? ("invalid-value")
        • How does this differ from the signature check above?
          • The signature is computed based on file names, compiler arguments, and the like
          • Whether or not a rule is valid has to do with the contents or modification times of the files involved have changed since last time
        • BuildKey::Kind::Command - CommandTask::isResultValid(engine, *command, BuildValue::fromData(value))
          • Delegate to the command - command.isResultValid(buildSystem.getBuildSystem(), value)
            • See above for this chain of events - checking if a command is valid is essentially checking if the outputs are all there
        • BuildKey::Kind::Node - FileInputNodeTask::isResultValid(engine, *node, BuildValue::fromData(value))
          • BuildSystem.cpp#FileInputNodeTask#isResultValid
            • compares "value" to current file "info"
              • value
                • is stored in the db
                • where's it created?
                  • anywhere getFileInfo is called
              • info
                • BuildInfo.cpp#BuildNode::getFileInfo
                  • FileSystem.cpp#FileInfo getFileInfo
                    • FileInfo.cpp#FileInfo::getInfoForPath
                    • { device: st_dev (device the file's inode resides on), inode: st_ino (inode number), mode: st_mode (inode protection mode), size: st_size (size, in bytes, of the file), modTime.seconds: st_mtim.tv_sec (time of last file modification in seconds), modTime.nanoseconds: st_mtim.tv_nsec (time of last file modification in nanoseconds), }
      • Does it need to run because the input has been rebuilt? ("input-rebuilt")
        • The system that enqueues the rules for running (BuildEngine.cpp#executeTasks) is basically a big while loop. It goes like this:
          • While there are new rules in the queue, pick the top one off the stack
            • If the rule is picked up but has not finished scanning, skip it and pick up the next rule ("rule-scanning-deferred-on-input")
            • Do all the checks above (signature changed, invalid value)
            • Check to make sure that the rule has all the inputs it needs to run
              • Inputs from previous builds are still valid (this is how caching/derived data works)
              • If it doesn't, enqueue new rules to scan all its inputs and then mark the rule as "deferred" and put it back in the queue
                • The trace includes "rule-scanning-deferred-on-task" when this happens
                • When the inputs are all scanned and built and the task is picked up again, it needs to be run and "input-rebuilt"
              • If it does, run it
      • Do any of the inputs for a rule need to be run? ("rule-does-not-need-to-run")
        • If nothing depends on the rule, or all the inputs for a rule have already been run, it does not need to be run.
      • If the rule passes all the tests above and does actually need to run, enqueue rule for scanning (aka task execution in BuildEngine.cpp#executeTasks) ("rule-scheduled-for-scanning")
    • Scan / Execute the rule (BuildEngine.cpp#executeTasks)
      • For every rule enqueued for scanning
        • BuildEngine.cpp#processRuleScanRequest
          • For every input for the rule
            • "Check if it needs to run" (see above). If it does, scan it.
            • If the rule has been scanned already and does in fact need to run, scan the input ("rule-scanning-next-input")
              • BuildEngine.cpp#demandRule
                • Assert that the rule has already been scanned
                • If the rule is "complete", exit
                • If the rule is "in progress" already, exit
                • If the rule doesn't actually need to run ("rule-does-not-need-to-run" from before), mark it "complete" and exit
                • If all of those checks don't cause an early exit, create a new task for this rule ("created-task-for-rule")
                  • BuildSystem.cpp#createTask
                    • the type of task created for each rule is determined in BuildSystem.cpp#BuildSystemEngineDelegate::lookupRule
                    • the most relevant tasks for investigating build slowness are probably VirtualInputNodeTask, DirectoryInputNodeTask, DirectoryStructureInputNodeTask, FileInputNodeTask, ProducedNodeTask
                • Start the task by enqueuing it into a queue called readyTaskInfos

BuildEngine.cpp#executeTasks is the thing that writes the rule results to the db with setRuleResult BuildEngine.cpptaskIsComplete is where task results (signatures and values and computed_at) are recorded

build.trace (trace file)

All the trace keywords and what they mean

  • new-task - when a new task is created (BuildEngine.cpp#ruleInfo.rule->createTask)
  • new-rule - when a new rule is created (BuildSystem.cpp#new BuildSystemRule)
  • build-started - when the build is started (BuildEngine.cpp#trace->buildStarted())
  • handling-build-input-request - when a build input request is handled from BuildEngine.cpp#inputRequests. A "build input request" is a request made by a rule to do some work
  • created-task-for-rule - when a rule creates a task to do some work on its behalf
  • handling-task-input-request - when a build input request is handled from BuildEngine.cpp#inputRequests. A "build input request" is a request made by a task to do some work
  • paused-input-request-for-rule-scan - when a rule is scanned, but already marked as "pending scan", so it's skipped and not scanned twice
  • readying-task-input-request - when a rule's inputs are computed/completed and the work that the rule represents is enqueued
  • added-rule-pending-task - when a rule's inputs are not computed/completed and the work that the rule represents is attempted to be enqueued (but fails because its inputs are not ready)
  • completed-task-input-request - when a rule is dequeued after it's been enqueued by "readying-task-input-request"
  • updated-task-wait-count - when a task is no longer waiting on an input (tasks wait on all their inputs before they're run)
  • unblocked-task - when a task is no longer waiting on any inputs (happens right after "updated-task-wait-count")
  • readied-task - when a task is dequeued from readyTaskInfos queue and ready to run. The readyTaskInfos queue contains tasks that are waiting on no inputs
  • finished-task - when a task is dequeued from finishedTaskInfos. Tasks are placed on this queue when they are completed. A task is "changed" if its value was computed in the current build (and not pulled from a prior build).
  • build-ended - when the build ends
  • checking-rule-needs-to-run - when a rule is scanned to determine whether or not it needs to be run
  • rule-scheduled-for-scanning - when it is determined that a rule needs to be run and it is enqueued for processing (where its inputs are checked to make sure it's ready to run, then it's executed)
  • rule-scanning-next-input - while a rule is processed, when one of its inputs is retrieved and enqueued for scanning, and has been scanned already
  • rule-scanning-deferred-on-input - while a rule is processed, when one of its inputs is retrieved, has not been scanned, and is therefore enqueued for scanning
  • rule-scanning-deferred-on-task - when a rule is processed, when one if its inputs is retrieved, has been scanned already, but the task representing that input has not been completed
  • rule-needs-to-run, never-built - when a rule is scanned and has not been run and therefore is marked as "needs to run"
  • rule-needs-to-run, signature-changed - when a rule is scanned and the file associated with the rule for this run has a different signature than that of the previous cached build
  • rule-needs-to-run, invalid-value - when a rule is scanned and the file associated with the rule for this run has a different stat output (file modification time and other file metadata) than that of the previous cached build
  • rule-needs-to-run, input-missing - this is a possible trace output, but it isn't currently used anywhere
  • rule-needs-to-run, input-rebuilt - when the rule has been computed at a certain time, but has an input that's been computed more recently
  • rule-does-not-need-to-run - if the rule has no dependencies
  • cycle-force-rule-needs-to-run - force a rule to be run in order to break a build cycle llbuild has detected
  • cycle-supply-prior-value - when a rule is forced to be run in order to break a build cycle and the value from the previous build is set as the rule result

A "signature" is computed by hashing some inputs with hash_combine like in ExternalCommand#getSignature

Questions left to answer:

  • What's the difference between the BuildEngine and the BuildSystem?
    • The BuildEngine seems to be the thing doing things - enqueuing commands, rules, tasks, telling them to run, printing trace output
    • The BuildSystem seems to be the thing that contains the logic for all the parts - "what does this type of task do?", "how does this rule know it needs to run"?

Things left to do:

  • Approach this document as if I didn't know anything and I had a goal: I notice my ios build is taking too long, what can I do about it?
  • Maybe create an architecture diagram of how all the data moves around and where it's stored





Cleaned up blog post version of the above

Debugging XCode build performance by understanding llbuild

XCode is a powerful tool that allows developers to create amazing apps, games, and full featured applications. I've used it as a mobile developer in the past and now as a mobile infrastructure engineer. Most of the time it does its job without complaint, but sometimes it doesn't, and doesn't give much indication as to why.

I've explored the nitty gritty of how Android apps get built efficiently and here I'll document the same, but for the build system used by XCode called llbuild.

llbuild was integrated into XCode fairly recently (XCode 10, 2018) and is still sometimes referred to as "the new build system". The next version is in the works, but that's probably a long way out and will change a fair bit before it's used.

Both build systems for Android and iOS (Gradle and llbuild) operate with similar fundamental concepts. Gradle operates on "tasks" with "inputs" and "outputs". llbuild uses "commands", "rules", and "tasks" - all of which also have "inputs" and "outputs". In order for builds to operate efficiently, the inputs and outputs from a build are recorded and stored. On subsequent builds those outputs are reused if the inputs haven't changed. That's the basic idea behind incremental compilation, also called compilation avoidance.

So if a clean build takes 10 minutes, subsequent builds with no changes should hypothetically run in seconds. The power of caching!

But what happens when the dream of incremental compilation doesn't come true? Up until recently I chalked it up to "just the way things are". Today, though, lets dive in and figure out what's really going on and how we can reach the promised land of build system performance and 0 sescond incremental compilation.

Investigating a slow incremental build

Finding the slow part

Before we get into the details, lets find a part of the build to investigate. For this I recommend XCode's "build with timing summary" or the third party tool XCLogParser. A ton has already been written on how to use these tools to find slow aspects of a build, so I won't duplicate that here - a quick google will turn up some great guides.

The information we'll be working with

Now we'll get some files in front of us and explore what they do and how they're structured.

Firstly, run this in your terminal: defaults write com.apple.dt.XCBuild EnableBuildDebugging -bool YES.

That will tell XCode to enable build debugging output, which will tell llbuild to do the same. Once you're done debugging your builds, change that YES to a NO and rerun the command. The artifacts that get produced are a bit less than 100Mb per run, which can really add up if you're building all day every day.

Now, build your project with XCode. The build logs (View > Navigators > Reports) will tell you about a few new files:

  • manifest.xcbuild
    • This is the build file that describes your build as a yml file with a bunch of json in it. It's fed to the build system as a set of instructions.
  • build.db
    • This is an SQLite database that holds prior build metadata. It's one half of the cache that allows incremental compilation. The other half is the actual files stored in DerivedData.
  • build.trace
    • The trace file is the log of what llbuild actually did as it executed the instructions in manifest.xcbuild

Investigating

Now, lets look at those files and investigate what's going on. Open up the manfifest.xcbuild and build.trace in your text editor of choice (careful - they're big!). You can find the exact location of those files by looking at the build log file in the Report Navigator.

This next part will be fairly freeform - you're going to need to put on your detective hat and explore:

  1. Start by taking a look at the build.trace and try and find any log lines related to the slow aspect of the build you're investigating.

Eventually you should find a line that mentions the framework you're interested in and whatever operation is taking a long time - maybe the string :Debug:CompileSwiftSources if you're curious about why the framework is being recompiled. Try and find the line that describes the entire slow operation you're interested in and not just one part, i.e. compiling an entire framework rather than an individual file.

  1. After you've located the slow operation, the next question to answer is "why did this happen?" To answer this, just employ the standard process you use every day to debug regular programs: start from the observed behavior, walk backwards to find its cause, and repeat until you find the root of the problem.

As you're working through the trace, use the handy glossary at the bottom to understand what each line means.

Eventually you'll reach a trace line that makes you say "huh, I didn't modify that file!" or "why'd that change?". What you do next will totally depend on the aspect of the build you're investigating, the cause of the slowness, and so on. I'll leave that up to you!

Wrapping Up

Assuming you've followed the steps above, you've walked backwards from the observed slow behavior and found its root cause, whether that's an errant file modification, mis-configured build settings, or something else. Hopefully you've been able to remedy that root cause and are on your way to faster, more consistent builds.

Remember that you don't have to tackle every single slowdown all at once! Any non-trivial XCode project will have multiple steps that can be improved or optimized.

Good luck, and keep working towards 0 second incremental builds in XCode!

Glossary

Anatomy of a trace line
{ "new-rule", "R7897", "N/Users/blah/foo/bar/file.json" }

The first element is the "trace keyword" - it describes what this trace line is doing
The second element is the rule ID, formatted something like R###
The third element is the rule key, and it usually looks like the path to a file. The N at the front is a marker for the build system. It might be a different capital letter sometimes.
{ "rule-scanning-next-input", "R7853", "R7854" }

The first element is the trace keyword
The second element is a rule ID for the rule that's being scanned
The third element is a rule ID for the input of the rule that's being scanned. 

Think of it this way: the third element is an input to the second, and therefore to scan the second element, all its inputs must be scanned as well - that's what's happening here.
Trace keywords
  • new-task - when a new task is created
  • new-rule - when a new rule is created
  • build-started - when the build is started
  • handling-build-input-request - when a build input request is handled from the BuildEngine. A "build input request" is a request made by a rule to do some work
  • created-task-for-rule - when a rule creates a task to do some work on its behalf
  • handling-task-input-request - when a build input request is handled from the BuildEngine. A "build input request" is a request made by a task to do some work
  • paused-input-request-for-rule-scan - when a rule is scanned, but already marked as "pending scan", so it's skipped and not scanned twice
  • readying-task-input-request - when a rule's inputs are computed/completed and the work that the rule represents is enqueued
  • added-rule-pending-task - when a rule's inputs are not computed/completed and the work that the rule represents is attempted to be enqueued (but fails because its inputs are not ready)
  • completed-task-input-request - when a rule is dequeued after it's been enqueued by "readying-task-input-request"
  • updated-task-wait-count - when a task is no longer waiting on an input (tasks wait on all their inputs before they're run)
  • unblocked-task - when a task is no longer waiting on any inputs (happens right after "updated-task-wait-count")
  • readied-task - when a task is dequeued from readyTaskInfos queue and ready to run. The readyTaskInfos queue contains tasks that are waiting on no inputs
  • finished-task - when a task is dequeued from finishedTaskInfos. Tasks are placed on this queue when they are completed. A task is "changed" if its value was computed in the current build (and not pulled from a prior build).
  • build-ended - when the build ends
  • checking-rule-needs-to-run - when a rule is scanned to determine whether or not it needs to be run
  • rule-scheduled-for-scanning - when it is determined that a rule needs to be run and it is enqueued for processing (where its inputs are checked to make sure it's ready to run, then it's executed)
  • rule-scanning-next-input - while a rule is processed, when one of its inputs is retrieved and enqueued for scanning, and has been scanned already
  • rule-scanning-deferred-on-input - while a rule is processed, when one of its inputs is retrieved, has not been scanned, and is therefore enqueued for scanning
  • rule-scanning-deferred-on-task - when a rule is processed, when one if its inputs is retrieved, has been scanned already, but the task representing that input has not been completed
  • rule-needs-to-run, never-built - when a rule is scanned and has not been run and therefore is marked as "needs to run"
  • rule-needs-to-run, signature-changed - when a rule is scanned and the file associated with the rule for this run has a different signature than that of the previous cached build
  • rule-needs-to-run, invalid-value - when a rule is scanned and the file associated with the rule for this run has a different stat output (file modification time and other file metadata) than that of the previous cached build
  • rule-needs-to-run, input-missing - this is a possible trace output, but it isn't currently used anywhere
  • rule-needs-to-run, input-rebuilt - when the rule has been computed at a certain time, but has an input that's been computed more recently
  • rule-does-not-need-to-run - if the rule has no dependencies
  • cycle-force-rule-needs-to-run - force a rule to be run in order to break a build cycle llbuild has detected
  • cycle-supply-prior-value - when a rule is forced to be run in order to break a build cycle and the value from the previous build is set as the rule result
manifest.xcbuild (build file)

This file describes your build. It has a few different parts: client, target, nodes, and commands.

I've not seen client or target contain particularly useful information. nodes and commands are where it's at.

You can find the source documentation for each of these things here, but I'll cover them in my own words now.

Nodes

A node represents some input or output of the build process. Typically, this is a file.

Nodes can have attributes.

Sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment