The Foundation Models framework is Apple's new API announced at WWDC 2025 that provides developers with direct access to the on-device large language model powering Apple Intelligence. Released to beta on June 9, 2025, this framework enables developers to integrate powerful AI capabilities directly into their apps while maintaining user privacy through on-device processing.
- On-device processing: All AI inference runs locally on the device
- Privacy-focused: No data leaves the device or is sent to cloud servers
- Offline capable: Works without an internet connection
- Zero cost: No API fees or cloud computing charges
- Small footprint: Built into the OS, doesn't increase app size
- Swift-native: Integrates seamlessly with Swift using as few as 3 lines of code
- iOS 26
- iPadOS 26
- macOS Tahoe 26
- visionOS 26
- iPhone 16 (all models)
- iPhone 15 Pro and iPhone 15 Pro Max
- iPad mini (A17 Pro)
- iPad models with M1 chip or later
- Mac models with M1 chip or later
- Apple Vision Pro
Available at launch:
- English, French, German, Italian, Portuguese (Brazil), Spanish, Japanese, Korean, Chinese (simplified)
Coming by end of 2025:
- Danish, Dutch, Norwegian, Portuguese (Portugal), Swedish, Turkish, Chinese (traditional), Vietnamese
- Parameters: ~3 billion
- Quantization: 2-bit (with 3.5-3.7 bits-per-weight average using mixed 2-bit and 4-bit configuration)
- Architecture: Optimized for Apple Silicon
- Performance:
- Time-to-first-token latency: ~0.6ms per prompt token
- Generation rate: 30 tokens per second on iPhone 15 Pro
The on-device model excels at:
- Text summarization
- Entity extraction
- Text understanding and refinement
- Short dialog generation
- Creative content generation
- Classification tasks
- Content tagging
- Natural language search
The model is NOT designed for:
- General world knowledge queries
- Advanced reasoning tasks
- Chatbot-style conversations
- Server-scale LLM tasks
Guided Generation is the framework's core feature that ensures reliable structured output from the model using Swift's type system.
import FoundationModels
@Generable
struct SearchSuggestions {
@Guide(description: "A list of suggested search terms", .count(4))
var searchTerms: [String]
}
Generable types can include:
- Primitives: String, Int, Double, Float, Decimal, Bool
- Arrays: [String], [Int], etc.
- Composed types: Nested structs
- Recursive types: Self-referencing structures
Provides constraints and natural language descriptions for properties:
@Generable
struct Person {
@Guide(description: "Person's full name")
var name: String
@Guide(description: "Age in years", .range(0...120))
var age: Int
@Guide(regex: /^[A-Z]{2}-\d{4}$/)
var id: String
}
let session = LanguageModelSession()
let prompt = "Generate search suggestions for a travel app"
let response = try await session.respond(
to: prompt,
generating: SearchSuggestions.self
)
print(response.content.searchTerms)
The framework uses a unique snapshot-based streaming approach instead of traditional delta streaming.
The @Generable macro automatically generates a PartiallyGenerated
type with all optional properties:
@Generable
struct Itinerary {
var destination: String
var days: [DayPlan]
var summary: String
}
// Automatically generates:
// Itinerary.PartiallyGenerated with all optional properties
struct ItineraryView: View {
let session: LanguageModelSession
@State private var itinerary: Itinerary.PartiallyGenerated?
var body: some View {
VStack {
// UI components
Button("Generate") {
Task {
let stream = session.streamResponse(
to: "Plan a 3-day trip to Tokyo",
generating: Itinerary.self
)
for try await partial in stream {
self.itinerary = partial
}
}
}
}
}
}
- Use SwiftUI animations to hide latency
- Consider view identity when generating arrays
- Property order matters - properties are generated in declaration order
- Place summaries last for better quality output
Tool calling allows the model to execute custom code to retrieve information or perform actions.
struct WeatherTool: Tool {
static let name = "get_weather"
static let description = "Get current weather for a location"
@Generable
struct Arguments {
let city: String
let unit: String?
}
func call(with arguments: Arguments) async throws -> ToolOutput {
// Use WeatherKit or other APIs
let temperature = try await getTemperature(for: arguments.city)
return .init(content: "The temperature in \(arguments.city) is \(temperature)°")
}
}
let weatherTool = WeatherTool()
let session = LanguageModelSession(tools: [weatherTool])
let response = try await session.respond(
to: "What's the weather like in San Francisco?"
)
// Model will automatically call the weather tool when needed
Sessions maintain context across multiple interactions.
let session = LanguageModelSession(
instructions: """
You are a helpful travel assistant.
Provide concise, actionable recommendations.
Focus on local experiences and hidden gems.
"""
)
// First turn
let response1 = try await session.respond(to: "Recommend a restaurant in Paris")
// Second turn - model remembers context
let response2 = try await session.respond(to: "What about one with vegetarian options?")
// Access conversation history
let transcript = session.transcript
// Check if model is currently generating
if session.isResponding {
// Show loading indicator
}
// Check availability
if case .available = SystemLanguageModel.availability {
// Model is available
}
let tagger = SystemLanguageModel(adapter: .contentTagging)
let session = LanguageModelSession(model: tagger)
@Generable
struct ContentTags {
var topics: [String]
var sentiment: String
var keywords: [String]
}
let tags = try await session.respond(
to: userContent,
generating: ContentTags.self
)
Test prompts directly in your code:
import FoundationModels
import Playgrounds
#Playground {
let session = LanguageModelSession()
let response = try await session.respond(
to: "Generate a haiku about coding"
)
}
Use the new Foundation Models template in Instruments to:
- Profile model request latency
- Identify optimization opportunities
- Quantify performance improvements
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelError.guardrailViolation {
// Handle safety violation
} catch LanguageModelError.unsupportedLanguage {
// Handle language not supported
} catch LanguageModelError.contextWindowExceeded {
// Handle context too long
} catch {
// Handle other errors
}
- Keep prompts focused - Break complex tasks into smaller pieces
- Use instructions wisely - Static developer guidance, not user input
- Leverage guided generation - Let the framework handle output formatting
- Test extensively - Use Xcode Playgrounds for rapid iteration
- Prewarm models when appropriate
- Stream responses for better perceived performance
- Use appropriate verbosity in prompts
- Profile with Instruments to identify bottlenecks
- Never interpolate user input into instructions
- Use tool calling for external data instead of prompt injection
- Handle errors gracefully including guardrail violations
- Validate generated content before using in production
@Generable
struct Quiz {
var questions: [Question]
@Generable
struct Question {
var text: String
var options: [String]
var correctAnswer: Int
}
}
let quiz = try await session.respond(
to: "Generate a quiz about \(studentNotes)",
generating: Quiz.self
)
@Generable
struct TravelItinerary {
var destination: String
var duration: Int
var activities: [Activity]
var estimatedBudget: Decimal
@Generable
struct Activity {
var name: String
var description: String
var duration: String
var cost: Decimal?
}
}
Automattic's Day One app uses the framework for:
- Intelligent prompts based on user entries
- Privacy-preserving content analysis
- Mood and theme detection
For runtime-determined structures:
let schema = GenerationSchema(
type: .object([
"name": .string,
"values": .array(.number),
"metadata": .object(dynamicProperties)
])
)
let response = try await session.respond(
to: prompt,
using: schema
)
For specialized use cases, train custom adapters using Apple's Python toolkit:
- Rank 32 LoRA adapters
- Must be retrained with each base model update
- Consider only after exhausting base model capabilities
The framework must NOT be used for:
- Illegal activities or law violations
- Generating pornographic or sexual content
- Child exploitation or abuse
- Employment-related assessments
- Circumventing safety guardrails
- Reverse engineering training data
- Generating harmful or discriminatory content
-
Developer Beta (Available now):
- Join Apple Developer Program
- Download from developer.apple.com
-
Public Beta (July 2025):
- Join Apple Beta Software Program
- Download from beta.apple.com
import FoundationModels
// Create a session
let session = LanguageModelSession()
// Generate response
let response = try await session.respond(to: "Summarize this text: \(userText)")
// Use the response
print(response.content)
- Meet the Foundation Models framework
- Deep dive into the Foundation Models framework
- Integrating Foundation Models into your app
- Prompt design and safety
- Available through Apple Developer Program
- Xcode 26 includes playground templates
- Use Feedback Assistant with encodable attachment structure
- Help improve models by sharing use cases
The Foundation Models framework represents a significant advancement in on-device AI for Apple platforms. By combining powerful language models with Swift's type system and Apple's focus on privacy, developers can create intelligent features that work offline, protect user data, and provide responsive experiences without cloud infrastructure costs.
As the framework evolves, Apple will continue to improve model capabilities and add new specialized adapters based on developer feedback. The tight integration with Swift and Apple's development tools makes it easier than ever to add AI-powered features to apps while maintaining the high standards of user experience and privacy that Apple users expect.