At WWDC 25 Apple opened up the on-device large-language model that powers Apple Intelligence to every iOS, iPadOS, macOS and visionOS app via a new “Foundation Models” framework. The model is a compact ~3 billion-parameter LLM that has been quantized to just 2 bits per weight, so it runs fast, offline and entirely on the user’s device, keeping data private while still handling tasks such as summarization, extraction, short-form generation and structured reasoning. ([developer.apple.com][1], [machinelearning.apple.com][2]) Below is a developer-focused English-language overview—based mainly on Apple’s own announcements, docs and WWDC sessions—followed by ready-to-paste Swift code samples.
Apple ships two sibling LLMs: a device-scale model (~3 B params) embedded in Apple silicon and a server-scale mixture-of-experts model that runs inside Private Cloud Compute when more heft is required. ([machinelearning.apple.com][2]) The