FoundationModels code-along session Q&A (Sep 25, 2025)

Let me just say this format is AMAZINGLY well done, you guys. Is there a schedule somewhere of other "Code Along"s?

Thank you John, much appreciated! This is the only code along so far, but stay tuned to the Meet with Apple schedule for more events: https://developer.apple.com/events/view/upcoming-events

Will the recorded session be provided?

Great question! Yes, the code along will be available on-demand sometime after the event concludes. The code along will be made available on the Apple developer website and Youtube.

Is it possible to persist the session or conversation transcripts so that multi-turn interactions can continue after restarting the app?

Yes! Transcript conforms to Codable, You can obtain the transcript from the session and store it, then create a new session from a deserialized transcript.

What's the context limit for prompts?

4096 tokens. See: https://developer.apple.com/documentation/foundationmodels/languagemodelsession/generationerror/exceededcontextwindowsize(_:)

What is the recommended best practice for localization? Should we provide instructions and content for the Tools in the different languages, or is it preferable to keep them in one language only?

Localization can be a little complicated, so for all details see this guide: https://developer.apple.com/documentation/foundationmodels/support-languages-and-locales-with-foundation-models

Not a question, but thanks a lot for this session, really helpful! And it's nice to see the landmarks sample app back (edited)

We're glad you're enjoying it!

Can we use the Foundation Models to transcribe audios?

As of today, the model provided by the Foundation Models framework only supports text in text out. You can use the Speech framework to do audio transcription though.

Is this session going to be available later via recording?

Yes:)

Can you use this framework be used for Remote models? Especially useful if we want mix cloud and local foundation models?

The framework's LanguageModelSession API only supports the on-device model. However, you may be able to leverage @Generable and GeneratedContent to generate structured responses using 3rd-party server models such as OpenAl's models. For example, you can obtain the generation schema from a @Generable type, insert it into the prompt for OpenAl, and use GeneratedContent to convert the response JSON back to a Swift data structure.

I'm building out a prototype swim, bike and run workout app. It seems that the model is limited with the workouts it generates, being that it's a 3B parameter model, what should my next steps be to have a more viable product? Thanks for the workshop format. High votes for more of those.

Keep in mind that the model does not have world knowledge, including expertise in specific areas like fitness. Providing additional information to the model like a menu of workouts to select from (as an @Generable enum, perhaps), may improve results. As another example you could implement Tools that hook into other parts of your fitness app. If you want to ask this question in the Apple Developer Forums, I think there's a lot more to explore and discuss here!

Thanks for organizing this, I'm learning a lot already. Is the speaker live? I hope so.

→ ...and yes! This code along is occurring live!

Can apple see my users' prompts? or is it completely private?

Completely private. Apple cannot see what people prompt.

How many tools can be reliably provided to the model for it to work with?

As a rule of thumb, 3-4. The model will have an easier time if your tools are pretty different and have minimal overlap.

Is there any benefit to using @Generable if you are just creating a single text response?

There are some advantages still, such as being able to use @Guide to constrain the string response to a specific topic, format, or length. Additionally, failures in guided generation will throw more granular errors (such as Refusal) which your app can then adapt to.

Can i measure used tokens in runtime?

As of today, there is no API for that purpose. I'd suggest that you file a feedback report • You can Instruments.app to observe the tokens your app consumes. This forum post also mentions something relevant to this topic.

Not a technical question, but rather a HIG one: what is the consensus on Apple Intelligence-centered apps (i.e. apps that rely on it as a core part of their Ul/UX)? It's definitely always better to support all scenarios, but I'm thinking there could be some where Al is a core part of the app.

There is a section in the HIG for Generative Al that addresses this. You can find it at https://developer.apple.com/design/human-interface-guidelines/generative-ai

Can foundation models be used in a tool (for example to get an estimated cooking time from a cooking instruction)?

Yes! A tool can start a new session in order to provide its response.

A new API introduced in the current beta of iOs 26.1 also provides access to the transcript of the session from within the Tool, in case it's useful to you:. But please note that this is still in beta.

Since models are / can be on device, if I recall correctly, these features should be available offline, correct?

Hi Cristian thanks for your question. If you are using an on device model then you are correct, you should be able to be offline.

Had an issue the other day where it seemed the model was ignoring the instructions, but when I added them to the prompt it worked. Any idea what could have been going on?

The model has default instructions it's been optimized for. In this case, it's likely the model's default instructions were more effective to get it to follow your instructions. This is a general trick you can use. If you're noticing instruction following issues, try putting your instructions in the first prompt and send no Instructions so the model uses its default.

Is there only one local model available regardless of available system memory?

There is 1 system language model available in the Foundation Models framework. There are also other system models available such as a diffusion model, vision models, and speech models, which are available through the Image Playground framework, the Vision framework, and the Speech framework, respectively.

The language model and the diffusion model are both large in size, so the operating system will load and unload them dynamically to accommodate for memory used by apps and system features. Using the system language model or the system diffusion model will not increase the size of your app.

Rephrasing question: If I have too much data (tokens) to provide to the model, e.g.: a big blob of text I want it to reason through or compare against a prompt, what are things I can look at in terms of either making the source material shorter (like keywords?? or resumes?), or other alternatives?

Hi David that is a great question. You have a context window size of 4096 tokens for a given session. A good approach is to decompose your large request into multiple smaller ones. Remember that your input as well as output all counts towards your 4096 token budget. You can catch the LanguageModelSession.GenerationError. exception, summarize your previous session, and share your summary in a new session to get you a fresh 4096 collection of tokens. Best of luck!

Does defined Structured Output "eat up" some of the context tokens?

Yes, the structure is automatically included in the prompt in a format the model understands.

In section 6 we'll talk about 'includeSchemalnPrompt", to optimize for large structures! (https://developer.apple.com/events/resources/code-along-205/)

Are there any type of built-in moderation in the models? Like generating explicit text for example?

Yes! See this article: https://developer.apple.com/documentation/foundationmodels/improving-the-safety-of-generative-model-output

There are pretty strict guardrails in place

When generating a struct, does the model complete the "fields" from top to bottom? Is there any way to force it to complete them in a specific order? E.g. use: the model may first need to think (note: I'm not referring to the "reasoning" func found in certain Ilms), and then provide a final result.

Yes, the model will generate the properties in the order they are in your code. It can indeed make sense to change the order for your use case.

Is there a limit of model calling? Like what if I make thousands of calls to parse my dataset?

There is no rate limit when your device is connected to power, or when your app runs in the foreground

Can we not run and test this on Simulator?

Simulator is supported if you're running on macOS 26.0!

No such module 'FoundationModels'?

Please make sure you're using Xcode 26 (which has the iOS/macOS 26 SDKs)

What does the error: "Model assets are unavailable" mean?

This error is thrown when the model isn't ready, which could be for a variety of reasons.

You can do print (SystemLanguageModel.default.availability) to see the availability status.

Can Foundation Models perform OCR on a PDF file?

Use Vision and text recognition frameworks for that. Some good starting points are and.

Can we pass data from "Speech framework" to the model? (edited)

Hi Abraz, yes you can. The model is text based so you would need to ensure that the text output from the Speech framework is what gets provided to the model. If you have a more specific question on this topic please submit it as a new question, thanks alot!

Why or how can I make sure model does not ignore the tools available? I have noticed that if have more that one tools model decide to make things up and not use the tools to retrive the information asked. or is there documentation about this other apple official docs? a lot of pople has shared

Hi Hebert this is a great question. It is imperative that you provide good instructions in your tool definitions that help the model know when to call a specific tool. The more details the better. You can use terms like MUST to help further emphasize the importance to the model. Best of luck Herbert!

Is the generation always going return in Markdown?

Not necessarily. You can prompt the model to output the way you want. And you can also use @Generable to get structured output, which can help get a specific output format. https://developer.apple.com/documentation/foundationmodels/generating-swift-data-structures-with-guided-generation

The limit of 4096 token, is it applied per app per session? How many such session can an app create?

Correct, per session! And no limit on the amount of sessions.

Is there a way to get notified when model.availability changes? I'm thinking if the model is not set up yet on first run, we show a progress indicator. When it's all done, it goes away and the user can continue. Otherwise, we'll have to constantly poll it.

SystemLangaugeModel conforms to Observable so it works seamlessly in a SwiftUl view to trigger view updates. You can follow the instructions in Chapter 1.4 - so long as the model is a property of the view structure, the view body will be re-evaluated when the availability changes.

Does the content of macros like `@Guide(description: _)` counts towards our session token

Yes, it does. Guides will be inserted in to the prompt automatically to help steer the model's response.

Is there ever a possibility of the generation returning structured output that's not mappable to the Swift type requested? Do you handle any regenerations at the API level so we can expect 100% adherence at all times?

Great question! @Generable uses guided generation to make sure the model always outputs the correct format! For more information, you can watch the Generable section of our Deep Dive video: https://youtu.be/6Wgg7DIY29E?si=pVXw4x5Ao6lpvkgX&t=477

Is there a way to bulk-download the (excellent) Slido Q+A content? (Is "print to PDF" reasonable?) (edited)

Hello Jared and thank you for the question. The Slido Q&A is only available live during the code along. We're glad you're here with us to participate! If you have additional questions after the code along has concluded or would like to share what you learned with the community, we encourage you to take the conversation to the Apple developer forums. Here is a link to the "Foundation Models" area of the forums: https://developer.apple.com/forums/topics/machine-learning-and-ai/machine-learning-and-ai-foundation-models

If the model is on-device and fully offline, how often is its data updated? As in, how do I know I'm getting live/up-to-date data from the LLM?

Any information you need to be up-to-date, like news, word facts, or weather, you need to provide the model in your prompt or with tool calling. The LLM does not have web search so it's info is not guaranteed to be up-to-date.

Is a tool's response included in the context limit?

Good question! Yes it is.

I did not understand the PartiallyGenerated... why is it needed?

Good question! This is needed if you want to stream a @Generable type, because the model will generate it property-by-property. PartiallyGenerated will turn every property optional, where nil indicates the model hasn't generated it yet. That's especially useful for e.g. a Bool property, because using false as a default value would be confusing.

Can we expect 100% adherence in Structured Output even if the properties are PartiallyGenerated (in which case as you mentioned before it's marked as a Swift optional)?

Yes! @Generable uses guided generation to make sure the model always outputs the correct format! And this even works with PartiallyGenerated and optionals. For more information, you can watch the 'Generable' section of our Deep Dive video: https://youtu.be/6Wgg7DIY29E?si=pVXw4x5Ao6lpvkgX&t=477

What is the estimated efficiency of prewarm? I've been playing with the final app, l've noticed since I know the expected Ul, I can press the button and get the same delay compared to before I included prewarm.

prewarm will load the model into memory, ahead of making a request. If the model is already in memory (cached from a recent previous request), prewarm won't show a difference. But for the case where the model wasn't in memory yet, prewarm can easily save 500ms.

Why this tool is conforming to @Observable? Is it mandatory for them?

Not mandatory for all tools. For this specific example, we're adding state that we want to observe from our SwiftUl View.

Any apple guideline while submitting AppStore using foundation model ?

In addition to the App Review Guidelines (https://developer.apple.com/app-store/review/guidelines), apps using the on-device foundation model are subject to the acceptable use requirements for the Foundation Models framework (https://developer.apple.com/apple-intelligence/acceptable-use-requirements-for-the-foundation-models-framework).

The Safe guard get triggered

When you encounter an issue where the guardrails are triggered unexpectedly, please use LanguageModelSession.logFeedbackAttachment and file a report using Feedback Assistant ([http://feedbackassistant.apple.com/(http://feedbackassistant.apple.com/)])!

what are some best practices for custom adapters? how often are system models updated? thanks!

The system models are update along with OS updates, roughly every couple of OS

Are there multiple on-device models available or is there only one?

Foundation Models Framework gives your app access to one on-device large language model, and this model will be updated and improved over time as a part of OS updates. The API allows you to specify a use-case, and we also have a list of suggested model capabilities in our documentation: https://developer.apple.com/documentation/foundationmodels/generating-content-and-performing-tasks-with-foundation-models

on `print (SystemLanguageModel.default.availability)` getting error `unavailable(FoundationModels.SystemLanguageModel.Availability.UnavailableReason.applentelligenceNotEnabled)`

That's right, you need to enable Apple Intelligence in System Settings. If you are running on Simulator, you'll need to do that on the Mac where Xcode is running, too.

How the Foundation Model knows the correct enum to be selected?

Great question! @Generable uses guided generation to make sure the model outputs the correct format! For more information, you can watch the 'Generable' section of our Deep Dive video: https://youtu.be/6Wgg7DIY29E?si=pVXw4x5Ao6lpvkgX&t=477

I am a bit late to the session, but I constantly get a response from the LanguageModelSession "Detected content likely to be unsafe". How to convince the model to respond? © My Prompt: "Generate a 3 day itinerary to Paris."

It can be tricky to deal with the guardrails. For general guidance see this article: https://developer.apple.com/documentation/foundationmodels/improving-the-safety-of-generative-model-output

Are you prompting the model in English or another language? Sometimes language can impact guardrail sensitivity

Maybe this has already been asked, but is all the data for the landmarks already in the LLM or is that data requested from an off-device data source like an Internet search?

The code-along is using hardcoded data for the landmarks, and providing it to the model as a Tool. In a real app, you might use a service like MapKit to get real data to provide to the model.

Whats the best way to understand whats achievable with on device LL's 4096 token? Any guide on what are best practices and things to avoid?

We have some general guidance on model capabilities as well as requests to avoid here: https://developer.apple.com/documentation/foundationmodels/generating-content-and-performing-tasks-with-foundation-models

There's some additional advice on making the most of an on-device model in our Foundation Models WWDC session videos from this year!

Is there a way to achieve a RAG-like approach, where we store semantic values, that we can search against? i.e. Without adding all textual data in our LLM context, in order to produce a response.... which will blow up due to lack of tokens! - (Think of a notes app, or storage for text, etc.)

You can use a mix of speech-to-text, writing tools and the vision framework to achieve this. Additionally, you can look into incorporating Tools into your setup: https://developer.apple.com/documentation/foundationmodels/tool

Can you add @Generable support in an extension? For example if you import a Swift package with a struct that already has the structure you want for your Foundation Model responses?

@Generable is a macro that can only be applied to a type directly (not in an extension). However, you could do extension MyTypeFromPackage: Generable , and conform to it in the same way the macro does. For an example, you can right click on @Generable (e.g. on a struct) and select "Expand Macro" to use that as an example for how to conform to it.

What languages does the on-device LLM currently support?

See the full list here: https://support.apple.com/en-us/121115 More language guidance here: https://developer.apple.com/documentation/foundationmodels/support-languages-and-locales-with-foundation-models

I use guided generation with an array of strings. The results are shown in a picker intended to be suggestions in my app for next steps. I noticed the first result in the array are often similar or actually the same as a previous request. How can I tweak this so get more variety, different outputs?

You can try adjusting GenerationOptions like SamplingMode and temperature: https://developer.apple.com/documentation/foundationmodels/generationoptions

If you're still having trouble getting variety in your results, ask this question in the Developer Forums and we'll provide support there!

Does the 4K context limit also apply to returned data? What if it was asked to return 10 (or 100) itineraries? Also, is there a way to get the response back as JSON (say, for storage or transmission)?

The context size applies to input and output tokens. For conversion to JSON, a type can be both Generable and Codable. You can then use JSONEncoder to encode the type to JSON data for storage or transmission :)

Will the context limit also includes the description given coming form Tool?

Yes, the description of the tool is automatically included in the prompt when passing the tool to your session.

How come the model returns the same results even though we didn't use any sort of seed in our code?

The sampling GenerationOption is set to greedy. This helps the model choose the same tokens every time if given the same input.

Can (should) we run multiple requests in parallel to a model? what implications does that have on performance, etc? Should we only do one request at a time, or it works fine if we were to parallelise requests (and would they even parallelise?? (edited)

For one single session, you can't call the respond method while the model is responding. You can create multiple sessions to run multiple requests in parallel. See for more info.

What option/template was used for profiling?

You can use any template and add the Foundation Models instrument on top. You may find the SwiftUl template or Time Profiler useful, as they will allow you to profile the on-device LLM together with your UI.

stinger/FMCAQA.md