Owner: Stephen Margheim Scope: mobile-rn only. Replace the custom OTA orchestration service with a near-defaults Expo Updates setup, and finish the user-visible OTA control surface started in #1891 (merged) + #1927 (open). No backend, no infra changes. Linear: CORE-2252 Motivation: A production OTA published on 2026-06-07 failed to apply on a TestFlight build despite multiple restarts and a ~2-minute background gap. Root cause is bespoke logic that no other Expo team uses; see § Background. The fix also closes the loop on user-visible OTA state — making it trivial for a user (or support staff) to see what version they're on and force an update.
This plan removes ~145 lines of custom OTA orchestration in favour of Expo's documented useUpdates() hook plus a thin wrapper for the one piece Expo doesn't give us out of the box (critical-update gating). The current setup was built to solve a real problem — "don't reload mid-OTP-paste" — but the chosen mechanism (a 2-minute background timer before applying) is undocumented, untestable from the user side, and silently broken for the kill-and-reopen case.
Goals
- Apply OTAs reliably without user-visible silent failures.
- Replace the 2-minute timer heuristic with explicit user agency.
- Surface OTA state to the user — they should always be able to see what bundle they're on and trivially trigger a check + update. (Already half-shipped in #1891 + #1927.)
- Preserve the critical-update overlay path (regulatory/security holdback story).
- Cut the surface area: drop the bespoke status machine, drop the parallel Zustand store, drop the manual
checkForUpdateorchestration. - Stay close to documented Expo behaviour so future Expo SDK upgrades don't surprise us.
Non-goals
- Changing the runtime-version policy this PR. The static-prod / fingerprint-staging split is called out as follow-up work in § Follow-ups.
- Changing CI's runtime-bump automation.
- Changing how OTAs are published, branched, or channeled.
On 2026-06-07 11:51 AM, OTA 94d687f1 (commit 8832dc8, branch release-v3.30, runtime 46) was published to both the production and production-v3.30 channels. Stephen, on TestFlight build 3.30.0 (1382) running runtime 46 update 019e9772 from 2026-06-05, was unable to receive it after:
- Killing and reopening the app ~5 times
- Backgrounding for ~2 minutes
- Checking the Expo dashboard (which showed only 2 downloads cluster-wide)
Diagnosed cause: the custom OTA service only applies a downloaded update when the app transitions background → foreground with ≥ 120,000 ms between the two. Kill-and-reopen produces a fresh process with backgroundedAt = null, so the gate is never satisfied. A background gap of "about 2 minutes" sits right on the threshold and likely missed by milliseconds.
The same failure mode silently affects every user. There's no UI, log, or analytics event that fires when the gate denies a reload. We don't know how many users are perpetually one OTA behind because of this.
This plan composes with two PRs already opened against the same Linear ticket (CORE-2252):
-
#1891 — Show runtime + OTA version in Account footer (merged). Adds
getRuntimeVersion,getOTAUpdateId,isEmbeddedLaunch,getOTACreatedAtwrappers indeviceInfoService, and renders theRUNTIME: 46 · 019e9772 · 2026-06-05line under the existing app version inContactUsFooter. Read-only diagnostics — gives users + support staff the data needed to answer "what bundle am I running?". This is the line visible in the TestFlight screenshot from the 2026-06-07 incident. -
#1927 — Add
OTAUpdateButtonfor manual OTA checks (open). Renders a Tertiary button below the version rows that triggers a check/download cycle without waiting for the next foreground transition. AddslastCheckOutcome: 'none' | 'available' | 'error'tootaStoreso the button can report "up to date" / "ready to apply" / "failed". Built on top of the current service architecture — needs rebasing onto the newuseUpdates()-based design (see § Migration).
Together they answer the two requirements that came out of today's incident debrief:
"It should be easy for a user to know whether they are on the most-up-to-date OTA version or not, and it should be easy for them to get up-to-date manually."
The simplification plan in this doc completes that story by making the manual-update path the canonical way to apply non-critical updates ahead of the next cold start — replacing the broken 2-minute background gate.
Across Expo's official docs, the UpdatesAPIDemo repo, blog posts from companies running production Expo apps, and GitHub/Discord discussion, the 2-minute background gate is not a pattern used by anyone else. The canonical patterns are:
- Apply silently on next cold start (Expo default behaviour).
- Apply immediately on launch behind a brief "updating" overlay.
- Apply on the next
AppState: activetransition (with no time gate). - Show a non-blocking "Update ready, tap to restart" banner.
References — see § References.
| File | Lines | Role |
|---|---|---|
mobile-rn/app/services/ota_updates/service.ts |
146 | Custom check → download → gated-apply state machine |
mobile-rn/app/services/ota_updates/types.ts |
1 | OTAUpdateStatus enum (idle | checking | downloading | ready | error) |
mobile-rn/app/services/ota_updates/index.ts |
2 | Barrel re-export |
mobile-rn/app/stores/otaStore.ts |
43 | Zustand store mirroring service state |
mobile-rn/app/utils/getCriticalIndex.ts |
18 | Read criticalIndex from manifest extras |
mobile-rn/app/components/CriticalOTAOverlay.tsx |
104 | Blocking overlay for critical updates |
mobile-rn/config/initializers/ota.ts |
10 | Boot-time initializeOTA() call |
mobile-rn/app/routes/_layout.tsx |
(excerpt) | Mounts <CriticalOTAOverlay>; reads store |
mobile-rn/tests/services/ota_updates/service.test.ts |
242 | Service unit tests |
Total OTA-specific surface: ~566 lines (excluding tests).
- App cold start →
config/initializers/ota.tsrunsinitializeOTA()once. service.start()subscribes toonForeground/onBackground(delegated toappLifecycleTracker), then synchronously runscheck().check():- Bails if
__DEV__, already checking, orUpdates.isEmergencyLaunch. - Status →
checking. await Updates.checkForUpdateAsync(). If!update.isAvailable→ statusidle.- Computes
critical = availableCriticalIndex > currentCriticalIndex. Sets store flag. - Status →
downloading.await Updates.fetchUpdateAsync(). Status →ready. - On any error: silently swallowed; status →
idle. (No analytics, no logs.)
- Bails if
- On background → records
backgroundedAt = Date.now(). - On foreground:
- If
isUpdateReady() && backgroundDurationMs ≥ 120_000→Updates.reloadAsync(). - Else if
isUpdateReady()→ no-op (silent skip — this is the gap). - Else → run
check()again.
- If
<CriticalOTAOverlay>renders whenisCritical && status ∈ {checking, downloading, ready}. User taps "Restart" →applyDownloadedUpdate()→Updates.reloadAsync().
runtimeVersion:
APP_ENV === 'production'
? '46' // Production: static version, auto-bumped by CI when native code changes
: { policy: 'fingerprint' },
updates: {
url: 'https://u.expo.dev/4cf9cec0-523d-4591-bc5b-6269dd64ee66',
fallbackToCacheTimeout: 0,
checkAutomatically: 'NEVER', // Manual check via useOTAUpdates hook
},extra.criticalIndex: 2 is also defined in app.config.js, exposed to OTAs via manifest extras.
| Behaviour | Status |
|---|---|
| Critical-update overlay (security/forced updates) | ✅ Works as intended |
| Non-critical OTA delivery on kill-and-reopen | ❌ Never applies (process restart bypasses the time gate; also nothing on disk if fetch didn't complete before kill) |
| Non-critical OTA delivery on background ≥ 2 min | |
| Non-critical OTA delivery on background < 2 min | ❌ Silently skipped |
| Failure visibility (analytics / logs) | ❌ Catch block swallows all errors |
| Mid-session OTP-paste reload protection | ✅ Works (this was the original motivation) |
Switch to Expo defaults:
updates: {
url: 'https://u.expo.dev/4cf9cec0-523d-4591-bc5b-6269dd64ee66',
// Removed: fallbackToCacheTimeout (0 is the default)
// Removed: checkAutomatically (ON_LOAD is the default — see Expo Updates SDK docs)
},This means: on every cold start, Expo's native module checks the server in the background, downloads any available update, and the new bundle becomes active on the next cold start. This is the path that 95% of Expo apps use (Expo Download Updates docs).
Replace service.ts with a small hook driven by Expo's useUpdates(), plus a thin requestOTACheck() function for the manual button to share an in-flight guard with the foreground re-check. Sketch:
// mobile-rn/app/services/ota_updates/useOTAUpdates.ts
import { useEffect } from 'react';
import {
useUpdates,
checkForUpdateAsync,
fetchUpdateAsync,
} from 'expo-updates';
import { useOTAStore } from '~/stores/otaStore';
import { getCriticalIndex } from '~/utils/getCriticalIndex';
import { onForeground } from '~/services/appLifecycleTracker';
let inFlight: Promise<void> | null = null;
/** Shared by foreground re-check and the manual button. Idempotent. */
export function requestOTACheck(): Promise<void> {
if (__DEV__) return Promise.resolve();
if (inFlight) return inFlight;
const store = useOTAStore.getState();
inFlight = (async () => {
try {
const res = await checkForUpdateAsync();
if (!res.isAvailable) {
store.setLastCheckOutcome('none');
return;
}
await fetchUpdateAsync();
store.setLastCheckOutcome('available');
} catch (err) {
store.setLastCheckOutcome('error');
reportToSentry('ota_check_failed', err);
} finally {
inFlight = null;
}
})();
return inFlight;
}
export function useOTAUpdates(): void {
const { currentlyRunning, availableUpdate, downloadError, checkError } = useUpdates();
// Drive the critical flag from manifest extras whenever an update is available.
useEffect(() => {
if (!availableUpdate) {
useOTAStore.getState().setIsCritical(false);
return;
}
const currentIdx = getCriticalIndex(currentlyRunning.manifest) ?? 0;
const nextIdx = getCriticalIndex(availableUpdate.manifest) ?? 0;
useOTAStore.getState().setIsCritical(nextIdx > currentIdx);
}, [availableUpdate, currentlyRunning]);
// Re-check when the app comes back to the foreground after a real background gap.
useEffect(() => onForeground(() => void requestOTACheck()), []);
// Observability — surface previously-silent failures.
useEffect(() => {
if (checkError) reportToSentry('ota_check_failed', checkError);
if (downloadError) reportToSentry('ota_download_failed', downloadError);
}, [checkError, downloadError]);
}// mobile-rn/app/stores/otaStore.ts (simplified)
export interface OTAState {
isCritical: boolean;
lastCheckOutcome: 'none' | 'available' | 'error' | null;
}Notes:
useUpdates()fromexpo-updates@29exposesisUpdateAvailable,isUpdatePending,availableUpdate,downloadError,checkError,currentlyRunningreactively. This replaces our hand-rolledOTAUpdateStatusenum entirely.isUpdatePending = truemeans a new bundle is downloaded and will load on next reload — the manual button uses this to switch its label from "Check for updates" to "Restart to apply".- The first check is now done automatically by expo-updates on cold start (because
checkAutomaticallydefaults toON_LOAD). The hook only adds the foreground re-check + the manual entry point. requestOTACheckis the single funnel for non-cold-start checks. Both the foreground listener andOTAUpdateButtoncall it, sharing aninFlightguard — same pattern PR #1927 implements on the current service.- The Zustand store reduces to
{ isCritical, lastCheckOutcome }. Theidle | checking | downloading | ready | errorenum is gone — replaced byuseUpdates()reactive fields for transient state andlastCheckOutcomefor the post-check summary the button needs.
Recommendation: Default + Critical + Manual. Three independent paths, each with a clear trigger:
1. Default path (non-critical OTAs). Expo's standard behaviour:
- On every cold start, expo-updates checks the server in the background (because
checkAutomatically: 'ON_LOAD'is restored), downloads any new bundle, and applies it on the next cold start. - On
AppState: active(foreground after real background), the new hook also callscheckForUpdateAsync+fetchUpdateAsyncso backgrounded apps don't go stale for days. - No mid-session reload. Ever. The OTP-paste protection that motivated the 120 s gate is now structural, not heuristic — we simply never reload mid-session unless the user asks us to.
2. Critical path (criticalIndex bumped). Unchanged from today:
<CriticalOTAOverlay>blocks UI; downloads; "Restart" button →reloadAsync().- The only mid-session reload path. Reserved for security/regulatory issues where interrupting is the point.
3. Manual path (user-initiated, from PR #1927). The third path closes the visibility + agency gap:
- The Account footer (under the runtime/OTA version rows from #1891) renders an
OTAUpdateButton. - Tap →
requestOTACheck()→ updateslastCheckOutcometo one of:none→ "You're up to date" (no update available)available→ button label changes to "Restart to apply" → tap →reloadAsync()error→ "Couldn't check for updates" (with a retry affordance)
- Replaces the silent download-and-wait of the old service with a visible, debuggable flow that staff can ask users to trigger when investigating a session.
This is the right answer because:
- Visibility: Combined with the version rows from #1891, every user can answer "am I on the latest?" without help. Today they can't.
- Agency: Users who want the latest fix right now can get it without killing the app. Staff debugging a session can ask "tap that button and tell me what it says."
- No surprise reloads: Non-critical updates never interrupt; the user opts into the restart. OTP flows are safe by construction.
- No bespoke heuristics: The 120 s gate is gone. There is nothing magical about when an update applies — it's either "next cold start" (default), "user tap" (manual), or "criticalIndex says now" (critical).
Rejected alternative: a passive "Update ready, tap to restart" banner. The Account-screen button is a strictly better fit — it lives next to the version rows users would already look at, and it doubles as a debugging tool. We don't need both.
mobile-rn/app/services/ota_updates/service.ts(replaced byuseOTAUpdates.ts+requestOTACheck)mobile-rn/app/services/ota_updates/types.ts(OTAUpdateStatusenum replaced byuseUpdates()state)mobile-rn/config/initializers/ota.ts(no longer needed —useUpdates()is rendered insideRootLayoutInner)mobile-rn/tests/services/ota_updates/service.test.ts(242 lines — replaced by ~80 lines of hook tests)
mobile-rn/app.config.js— removecheckAutomaticallyandfallbackToCacheTimeoutkeys fromupdates.mobile-rn/app/services/ota_updates/index.ts— re-exportuseOTAUpdates,requestOTACheckinstead ofinitializeOTA/applyDownloadedUpdate.mobile-rn/app/stores/otaStore.ts— strip down to{ isCritical, lastCheckOutcome }. Thestatus/setStatus/OTAUpdateStatusreferences go away.mobile-rn/app/components/CriticalOTAOverlay.tsx— changestatus: OTAUpdateStatusprop to a booleanisReady: boolean(we no longer have the'checking'/'downloading'distinction;isUpdatePendingis the only state that matters for the overlay).mobile-rn/app/routes/_layout.tsx— calluseOTAUpdates(); render<CriticalOTAOverlay>based onisCritical && isUpdatePending.mobile-rn/app/features/account/hooks/useOTAUpdateControl.ts(from #1927) — replace its read ofotaStore.statuswithuseUpdates()fields (isUpdateAvailable,isUpdatePending); keep itsrequestOTACheckinvocation; keep its label/helper-text logic driven bylastCheckOutcome.mobile-rn/app/features/account/components/OTAUpdateButton.tsx(from #1927) — no API change, only prop names if the control hook's return type shifts.
mobile-rn/app/utils/getCriticalIndex.ts— manifest-extras reader, still useful.mobile-rn/app/services/appLifecycleTracker.ts—onForegroundsubscription model is independently useful and battle-tested.mobile-rn/app/services/deviceInfoService.ts— runtime/OTA version wrappers from #1891 are unaffected.mobile-rn/app/features/account/components/ContactUsFooter.tsx— the runtime/OTA version rows (#1891) and the button mount (#1927) are unchanged at the component level.mobile-rn/eas.json— channels and runtime config are unchanged.
The sequencing question is whether to land #1927 as-written (on the current service) and refactor afterwards, or rebase #1927 onto the new hook architecture and ship them together. Recommendation: rebase #1927 — its conceptual additions (button, control hook, lastCheckOutcome, in-flight guard) survive the refactor unchanged; only the underlying service plumbing it touches goes away.
Risk: low. Read-only addition.
- Add
mobile-rn/app/services/ota_updates/useOTAUpdates.ts+requestOTACheck(sketch above). - Add
lastCheckOutcometootaStore(matches the field #1927 adds — port it over verbatim with the same name + values for downstream-PR continuity). - Add a tiny client feature flag
ota_v2_enabled(default: false). When true, calluseOTAUpdates()from_layout.tsx; when false, keepinitializeOTA()from the initializer. - Add observability:
Sentry.captureMessagecalls forcheckErroranddownloadError. (Closes the silent-failure gap that hid today's incident.) - Tests for the new hook —
useUpdates()is mockable viajest.mock('expo-updates', …)(the same mock #1891 added injest.setup.ts).
Ship and verify on staging by flipping the flag for internal builds.
Risk: low. UI-only change; net-new surface.
- Rebase
feat/CORE-2252-account-ota-update-buttononto the PR 1 branch. - Rewrite
useOTAUpdateControl.tsto consumeuseUpdates()fields +useOTAStoreselectors instead of the oldotaStore.statusenum. - The component (
OTAUpdateButton.tsx), tests, andContactUsFootermount point land essentially as-is. - Behind
ota_v2_enabled, the button works against the new hook. When the flag is off, the button hides (or shows disabled with a tooltip).
This lets us ship the user-visible control surface even while the apply-behaviour change is still gated on staging.
Risk: medium. Behavioural change in prod.
- Flip
ota_v2_enableddefault to true. - Remove
checkAutomatically: 'NEVER'andfallbackToCacheTimeout: 0fromapp.config.js. - Bump the runtime version (this is a native config change — required by Expo for
app.config.jschanges that affectupdates.*). - Submit a new TestFlight build (
3.30.X→3.31.0or runtime46→47per CI's bump policy).
Verify on TestFlight before promoting to prod. The manual button from PR 2 is the verification tool — tap it, confirm lastCheckOutcome updates, confirm Restart to apply reloads into the new bundle.
Risk: zero (only runs after PR 3 is live and stable for ~1 week).
- Delete
service.ts,types.ts, the boot initializer, and the oldservice.test.ts. - Delete the
ota_v2_enabledflag and its dead branch. - Strip any vestigial
OTAUpdateStatusreferences.
| Scenario | Today | After |
|---|---|---|
| User opens app, OTA available, no critical bump | Download starts, status → ready, never applies unless backgrounded ≥ 2 min | Download starts in background, applies on next cold start. User can also force-apply via Account → "Check for updates" button. |
| User opens app, OTA available, critical bump | <CriticalOTAOverlay> blocks UI; user taps Restart |
Same — <CriticalOTAOverlay> blocks UI; user taps Restart |
| User backgrounds for 30 s then returns | Download triggers if not done; no apply (gate not met) | Foreground triggers checkForUpdateAsync re-check; no apply mid-session |
| User backgrounds for 5 min then returns | Apply fires if downloaded | No apply mid-session (deliberate — protects OTP flows) |
| User kills and reopens app | Apply only via Expo's native launch logic (if bundle was on disk pre-kill) | Apply on every cold start once download lands. Cleaner cache, no gate. |
| Mid-session OTP paste | Safe (gate prevents reload) | Safe (we never reload mid-session for non-critical) |
checkForUpdateAsync fails |
Silent | Sentry breadcrumb + error reported; button surfaces lastCheckOutcome: 'error' to user |
| User wants to know "am I on the latest?" | No way to tell | Account footer shows RUNTIME: 46 · 019e9772 · … (shipped in #1891) and the manual button reports "up to date" / "ready to apply" |
| Support staff debugging a user session | "Have you restarted the app?" | "Tap Check for updates and tell me what it says" |
Impact: A copy fix published Monday morning is seen by an active user on Tuesday morning when they re-open the app.
Mitigation: This is acceptable for the >95% of OTAs that are non-critical (copy, layout, analytics). The criticalIndex mechanism remains the escape hatch for "must apply now."
If unacceptable: ship Option B (banner) instead.
Impact: Unrelated to this change, but worth noting — runtime-version compatibility is unaffected by this refactor.
Mitigation: No change to runtime policy in this PR. See § Follow-ups.
Impact: We're on expo-updates@29.0.16 (SDK 54), which has stable useUpdates() semantics. A future SDK bump could rename fields.
Mitigation: The hook is small enough (~30 lines) that field renames are trivial to follow. Worse than current setup? No — Updates.checkForUpdateAsync is the same API that backs both implementations.
Impact: We have no data today on how often the 120 s gate denies reloads, so we can't quantify the improvement.
Mitigation: Add Sentry events for checkError and downloadError in PR 1, and emit a one-time analytics event when isUpdatePending transitions true. Compare OTA adoption curves on the dashboard before/after PR 2.
Impact: Removing checkAutomatically from updates is a native config change, not an OTA-able JS change.
Mitigation: Standard runtime-version bump + new TestFlight build. Coordinate with the next planned binary release; no need for an extraordinary out-of-cycle build.
- Land #1927 as-written first, or rebase onto the new architecture? Recommendation: rebase. The plumbing #1927 touches (
service.checkForUpdate,otaStore.status) is going away — landing it on the doomed service means rewriting the same code twice. Rebasing keeps the user-facing diff (button + control hook + tests) intact while sidestepping the deleted layer. - Keep
appLifecycleTrackerintegration? The new hook can either subscribe to our tracker (filtersinactiveblips) or to RN'sAppStatedirectly. Recommend tracker — consistent with the rest of the codebase and theinactive-blip filter is genuinely useful. - Delete
otaStoreentirely? With{ isCritical, lastCheckOutcome }as the only state, it could live inside the hook viauseState. But the button hook (useOTAUpdateControl) needs to readlastCheckOutcomefrom outside theuseOTAUpdatesmount point, which is awkward without a store. Recommend keeping the store at its reduced size. - Button placement. #1927 puts it in
ContactUsFooterbelow the version rows. Confirm this is where we want it long-term, or whether it deserves a dedicated row higher up.
app.config.js:96-99 uses static '46' in production and { policy: 'fingerprint' } in staging. Per the Expo runtime-versions docs, both work, but mixing means staging compatibility doesn't predict production compatibility — a native dep bump that shifts the staging fingerprint without bumping the static prod version can land an OTA on a prod binary it wasn't tested against.
Recommendation: move to policy: 'fingerprint' everywhere, with CI gating that fails if the published OTA's fingerprint doesn't match a currently-shipped binary. Separate PR.
We have no observability on OTA adoption rate (how quickly a published update reaches X% of installs). The Expo dashboard's "downloads" count is the only signal. With the new hook we can fire an analytics event on isUpdatePending → true and another on next cold-start launch — giving us a real adoption curve.
- Expo Updates SDK reference —
useUpdates()— hook API and reactive fields (isUpdateAvailable,isUpdatePending,availableUpdate,downloadError,checkError). - Expo Updates SDK reference — config —
checkAutomaticallyvalues (ON_LOADdefault,ON_ERROR_RECOVERY,WIFI_ONLY,NEVER);fallbackToCacheTimeoutdefault0. - Download updates lifecycle — default flow: launch immediately with cached bundle, check + download in background, apply on next cold start.
- Runtime versions — static vs
fingerprintpolicy. UpdatesAPIDemo(Expo official demo) — canonical pattern foruseUpdates()+AppState+ critical-update gating via manifest extras.
- Make EAS Updates apply immediately — Cathy Lai — when and how to opt out of "next cold start" default.
- The Realities of OTA Updates with Expo — Abiodun Abdullahi — production lessons; mid-session reload UX pitfalls.
useUpdates()hook guide — LogRocket — typical production hook shape.- How I Manage Versioning in Expo Apps — Welcome Developer — runtime-version strategies.
expo/expo#16264—reloadAsync()from anAppStateforeground listener has historically had iOS edge cases after long backgrounding. Confirms that some foreground gating is defensible, but not the 120 s heuristic.
- Linear: CORE-2252
- In-flight PRs:
- #1891 — feat(CORE-2252): show runtime + OTA version in Account footer (merged) — adds
deviceInfoServicewrappers +ContactUsFooterrows. - #1927 — feat(CORE-2252): add OTAUpdateButton for manual OTA checks (open) — adds
OTAUpdateButton,useOTAUpdateControl,requestOTACheck,lastCheckOutcome. Rebase target for PR 2 of this plan.
- #1891 — feat(CORE-2252): show runtime + OTA version in Account footer (merged) — adds
- Current implementation:
mobile-rn/app/services/ota_updates/service.ts,mobile-rn/app/stores/otaStore.ts,mobile-rn/config/initializers/ota.ts,mobile-rn/app/components/CriticalOTAOverlay.tsx,mobile-rn/app.config.js:96-108. - Lifecycle tracker (preserve as-is):
mobile-rn/app/services/appLifecycleTracker.ts. - Manifest reader (preserve as-is):
mobile-rn/app/utils/getCriticalIndex.ts. - Device-info wrappers (preserve as-is, shipped in #1891):
mobile-rn/app/services/deviceInfoService.ts.