Anemll Anemll

ANE INT8 W8A8 Benchmark: ~1.7-1.9x FP16 Throughput on Apple Silicon

Demonstrates that Apple Neural Engine (ANE) achieves significantly higher throughput with INT8 W8A8 quantization vs FP16, consistent with native INT8 datapath support.

Results (M5, h17g, single ANE cluster)

Summary

Method	FP16	INT8 W8A8	Ratio

Pi Coding Agent + Apple Foundation Models

This gist shows a working local Pi provider setup for Apple's fm serve Chat Completions endpoint.

It supports both Apple Foundation Models exposed by the fm CLI:

fm/system: on-device Apple Foundation Model, configured as 4K context
fm/pcc: Private Cloud Compute model, configured as 32K context

Pi Coding Agent + Apple Foundation Models

This gist shows a working local Pi provider setup for Apple's fm serve Chat Completions endpoint.

It supports both Apple Foundation Models exposed by the fm CLI:

fm/system: on-device Apple Foundation Model, configured as 4K context
fm/pcc: Private Cloud Compute model, configured as 32K context

	import FoundationModels
	import Playgrounds
	import Foundation

	let session = LanguageModelSession()
	let start = Date()
	let response = try await session.respond(to: "What is Apple Neural Engine and how to use it?")
	let responseText = response.content // Replace 'value' with the actual property name from LanguageModelSession.Response<String> that holds the string payload.
	print(responseText)
	let end = Date()