This example walks through analyzing a Time Profiler trace of two micro-benchmarks (addStrings and foldArray) that exercise an Expo modules JS↔Swift bridge. The goal is to show what each diagnostic step actually produces and how the analysis evolves.
addStrings('hello ', 'world') × 500,000 iterations. The native body is a + b — one Swift string concat. So essentially everything in the profile is bridge overhead, not the work itself. Trace was ~3.4s wall-clock.
xcrun xctrace export --input strings-fold-array.trace --toc | head -40Output revealed:
- 2 runs, both Time Profiler.
- Run 1: 3.4s, attached to BareExpo (the test app).
- Schemas under
<data>:time-sample,time-profile,kdebug-signpost, etc.
First instinct was to export time-sample (the per-millisecond samples). Result: 986 rows, but every backtrace was a <kperf-bt> element with raw text-addresses="4342336580 4342304496 6446776812 ...". No symbol names. Useless.
xcrun xctrace export --input strings-fold-array.trace \
--xpath '/trace-toc/run[@number="1"]/data/table[@schema="time-profile"]' \
--output /tmp/run1-tp.xml963 rows, each with <tagged-backtrace> and resolved <frame name="...">. This is the table to analyze.
The first parse attempt without a ref-resolver returned 0 useful rows — every <thread>, <backtrace>, and most <frame> elements were references like <thread ref="32"/> to entries earlier in the file. Adding a resolver fixed it:
id_map = {el.get('id'): el for el in root.iter() if 'id' in el.attrib}
def resolve(el):
if el is None: return None
if 'ref' in el.attrib:
return id_map.get(el.get('ref'))
return elPer thread:
841 com.facebook.react.runtime.JavaScript (...)
98 Main Thread (...)
22 BareExpo.thread.0 (...)
2 com.apple.uikit.eventfetch-thread (...)
84% of samples on the JS thread. That's the workload. Filter to that thread for everything that follows.
40 4.8% swift::RefCounts<...>::doDecrementSlow<(swift::PerformDeinit)1>(...)
37 4.4% hermes::vm::Interpreter::interpretFunction<false, false>(...)
24 2.9% swift_weakLoadStrong
24 2.9% _xzm_free
19 2.3% String._slowFromCodeUnits<A, B>(_:encoding:repair:)
19 2.3% static String._slowFromCodeUnits<A, B>(_:encoding:repair:)
18 2.1% swift_release
...
The leaders are all low-level helpers. Reporting "doDecrementSlow is 4.8%" tells the user nothing useful. Need to bucket.
The boundary frame for this codebase is SyncFunctionDefinition.runBody (where the bridge calls user code). Frames above it are "the work". Categories were chosen by reading the actual top stacks for addStrings-shaped work:
def categorize(frames_top_to_bottom):
above_sync = []
for f in frames_top_to_bottom:
if 'SyncFunctionDefinition' in f or 'createFunctionClosure' in f:
break
above_sync.append(f)
text = ' || '.join(above_sync)
if 'DynamicStringType.cast' in text or 'MainValueConverter.toNative' in text:
if '_allASCII' in text or '_slowFromCodeUnits' in text or 'parseScalar' in text or 'UTF16' in text:
return 'Arg-conv: jsi::String -> Swift String (UTF16 decode)'
if 'JavaScriptValue.getString' in text:
return 'Arg-conv: JavaScriptValue.getString'
if 'utf16(' in text:
return 'Arg-conv: jsi::String::utf16'
return 'Arg-conv: DynamicStringType.cast (other)'
if 'JavaScriptValuesBuffer' in text:
return 'Arg-conv: JavaScriptValuesBuffer access'
if 'IndexingIterator' in text or 'formIndex(after:)' in text or 'getTupleTypeMetadata' in text:
return 'Arg-conv: arg loop / tuple metadata'
if 'tryCast' in text or 'swift_dynamicCast' in text:
return 'Arg-conv: swift_dynamicCast'
if 'doDecrementSlow' in text or 'swift_release' in text or '_xzm_free' in text or '_free' in text:
return 'Swift ARC: release / dealloc'
if 'swift_retain' in text or 'swift_bridgeObjectRetain' in text:
return 'Swift ARC: retain / weak'
if 'swift_weakLoadStrong' in text:
return 'Swift weak load'
if 'swift_isClassOrObjCExistentialType' in text or 'MetadataCacheKey' in text or 'pthread_getspecific' in text:
return 'Swift runtime overhead (metadata/exclusivity/TLS)'
if '_xzm_xzone_malloc' in text or 'malloc' in text:
return 'Allocator: malloc'
if 'Interpreter::interpretFunction' in text:
return 'Hermes: interpretFunction (JS-side dispatch)'
if not above_sync:
return 'Inside SyncFunctionDefinition.runBody (no frame above)'
return f'Other: {above_sync[0][:80]}'Iteration matters. The first version of categorize() had ~5 categories. Running it produced too many Other: ... samples — meaning samples weren't being grouped meaningfully. Each pass added more pattern matches. Final version above is the one that produced actionable output.
=== Time breakdown by category ===
139 16.5% Arg-conv: jsi::String -> Swift String (UTF16 decode)
118 14.0% Arg-conv: DynamicStringType.cast (other)
82 9.8% Swift ARC: release / dealloc
74 8.8% Arg-conv: arg loop / tuple metadata
60 7.1% Swift runtime overhead (metadata/exclusivity/TLS)
31 3.7% Arg-conv: JavaScriptValuesBuffer access
20 2.4% Allocator: malloc
20 2.4% Arg-conv: swift_dynamicCast
19 2.3% Arg-conv: JavaScriptValue.getString
18 2.1% Swift ARC: retain / weak
11 1.3% Return-conv: JavaScriptValue.asJSIValue
...
This was the moment the analysis became useful. Roughly 39% of time is spent in JS-string → Swift-String conversion; another 12% is ARC churn around the temporaries; the actual a + b doesn't show up. The fix candidates are clear.
For 'Arg-conv: jsi::String -> Swift String (UTF16 decode)':
[19 samples]
String._slowFromCodeUnits<A, B>(_:encoding:repair:)
String._fromCodeUnits<A, B>(_:encoding:repair:)
String.init(_:)
DynamicStringType.cast(jsValue:appContext:)
protocol witness for AnyDynamicType.cast(jsValue:appContext:) in conformance DynamicStringType
MainValueConverter.toNative
SyncFunctionDefinition.runBody
The exact source location is DynamicStringType.cast(jsValue:) calling String(jsiString.utf16(runtime)). Reading that file confirmed: it goes through _slowFromCodeUnits (the always-validating UTF-16 path). Switching to String(jsiString.utf8(runtime)) uses Swift's UTF-8 fast path.
That fix alone dropped the benchmark by ~15%.
After the encoding switch:
=== Time breakdown by category ===
132 18.3% Outside benchmark loop
121 16.8% Arg-conv: DynamicStringType.cast (other)
111 15.4% Arg-conv: arg loop / tuple metadata
54 7.5% Swift ARC: release / dealloc / jsi::Value dtor
54 7.5% Swift runtime overhead (metadata/exclusivity/TLS)
37 5.1% Arg-conv: BorrowedValue.getString
...
11 1.5% Arg-conv: jsi::String -> Swift String (decode)
Notes worth catching:
- The 16.5% UTF-16 bucket dropped to 1.5%. The fix worked.
- "DynamicStringType.cast (other)" went from 14% → 16.8% — not because it got slower, but because the denominator shrank. Always verify by checking absolute sample counts: 121 vs the original 118 — basically unchanged. The bucket is the same absolute size; it just owns more of a smaller pie.
- "arg loop / tuple metadata" went from 8.8% → 15.4% — same effect, plus genuinely surfacing as the next-biggest fish.
This is why you re-trace after every change. The order of fixes is dictated by the current profile, not the old one.
Same skill, different workload. foldArray([1...10]) × 500k iterations exercises array decoding (10 elements per call × 500k = 5M element conversions).
The same categorize() function, with array-specific buckets added:
if 'DynamicArrayType' in text:
if 'JavaScriptArray' in text or 'getValue' in text or 'isArray' in text:
return 'Array: JS array iteration (length/getValue)'
if 'DynamicNumberType' in text:
return 'Array: per-element DynamicNumberType.cast'
if 'Collection.map' in text or 'reserveCapacity' in text:
return 'Array: build [Any] result'
return 'Array: DynamicArrayType.cast (other)'Result:
944 43.3% Array: JS array iteration (length/getValue)
252 11.6% Array: build [Any] result
204 9.4% Generic iteration / tuple metadata
156 7.2% swift_dynamicCast
130 6.0% Array: per-element DynamicNumberType.cast
One bucket is 43.3% of total time. Drilling in:
[114 samples]
swift_weakLoadStrong
JavaScriptArray.length.getter
closure #1 in JavaScriptArray.map<A>(_:)
...
[75 samples]
swift_weakLoadStrong
JavaScriptArray.getValue(at:)
closure #1 in JavaScriptArray.map<A>(_:)
...
Source inspection of JavaScriptArray.getValue(at:) showed two weak-runtime loads per call (one for guard let runtime, one inside the bounds-check length getter). Multiplied by 5M element accesses = 10M weak loads. Adding a non-bounds-checked internal accessor (getValueUnchecked(at:in:)) and hoisting runtime and length out of the loop body collapsed most of that bucket. The benchmark dropped 28%.
- Top-of-stack is a sanity check, not the answer. Always categorize by stack signature.
- The categorization function is the artifact. Spend time refining it. Treat it as the analysis output that's worth reusing across traces of the same codebase.
- Drill before recommending. A bucket name says "where", a stack signature says "exactly what to read in source".
- Re-trace after changes. Bucket percentages shift; the next-biggest fish moves; what was 5% might now be 15%.
- Distinguish % shift from absolute change. Surviving buckets get bigger as the total shrinks even when their work doesn't change. Note absolute sample counts when comparing before/after.
- Source-level fixes follow from stack signatures. "doDecrementSlow is 5%" is not actionable. "doDecrementSlow on a
JavaScriptValue.deinitcalled per element fromJavaScriptArray.map" is.