2026-06-06
./models/synthetic/amplified_conv_tie_w8a8.mlpackage
Built from the synthetic amplified conv/SILU near-tie model. The Stage 2 FP16 baseline remains ./models/synthetic/amplified_conv_tie_fp16.mlpackage.
- Converted the PyTorch synthetic conv/SILU model to a Core ML MLProgram with FP16 compute precision.
- Applied Core ML-side activation quantization with
coremltools.optimize.coreml.linear_quantize_activationsusing deterministic calibration data. - Applied Core ML-side int8 weight quantization with
coremltools.optimize.coreml.linear_quantize_weights. - Inspected the saved MLProgram for quantization evidence.
- Inspected Core ML compute placement for the generated model.
- Compared repeated CPU/GPU/ANE/ALL predictions.
uv run python make_synthetic_coreml_amp_w8a8.py
uv run python probe_4_quant_evidence.py ./models/synthetic/amplified_conv_tie_w8a8.mlpackage --hits 30
gtimeout 60 uv run python probe_2_plan.py ./models/synthetic/amplified_conv_tie_w8a8.mlpackage
uv run python probe_3_drift.py ./models/synthetic/amplified_conv_tie_w8a8.mlpackage
uv run python -m py_compile make_synthetic_coreml_amp_w8a8.py probe_2_plan.py probe_3_drift.py probe_4_quant_evidence.pyAll commands completed successfully.
probe_4_quant_evidence.py found:
== OP TYPE COUNTS ==
68 const
11 quantize
11 dequantize
6 constexpr_affine_dequantize
5 conv
5 silu
1 mul
1 reduce_mean
1 reshape
1 linear
== OUTPUT DTYPE COUNTS ==
60 FLOAT16
22 INT32
16 STRING
11 INT8
1 BOOL
== CLASSIFICATION ==
weight_quantization_evidence: yes
weight_quantization_ops: 6
activation_quantization_evidence: yes
activation_quantize_ops: 11
activation_dequantize_ops: 11
real_ops_consuming_dequantized_values: 11
real_ops_consuming_quantized_values: 0
activation_quantization_interpretation: quantize/dequantize pairs are present before real ops
== REAL OPS CONSUMING DEQUANTIZED VALUES ==
5 conv
1 linear
5 silu
Interpretation: this is not merely filename evidence and not merely weight-only compression. The saved MLProgram contains int8 activation quantize ops, matching dequantize ops, and weight constexpr_affine_dequantize ops. The real conv/SILU/linear ops consume dequantized values, so this is activation-quantized graph evidence, not proof that every arithmetic kernel internally executes as integer MACs.
probe_2_plan.py found 110 total MLProgram ops. Relevant output:
== PREFERRED DEVICE COUNTS ==
74 None
36 MLNeuralEngineComputeDevice
== OP TYPE COUNTS ==
68 const
11 ios17.dequantize
11 ios17.quantize
6 ios16.constexpr_affine_dequantize
5 ios16.silu
5 ios17.conv
1 ios16.reduce_mean
1 ios17.linear
1 ios17.mul
1 ios17.reshape
== OP x PREFERRED DEVICE ==
11 ios17.dequantize MLNeuralEngineComputeDevice
11 ios17.quantize MLNeuralEngineComputeDevice
5 ios16.silu MLNeuralEngineComputeDevice
5 ios17.conv MLNeuralEngineComputeDevice
1 ios16.reduce_mean MLNeuralEngineComputeDevice
1 ios17.linear MLNeuralEngineComputeDevice
1 ios17.mul MLNeuralEngineComputeDevice
1 ios17.reshape MLNeuralEngineComputeDevice
Interpretation: activation quantize/dequantize ops and the real conv/SILU/mul/reduce/reshape/linear ops are NE-preferred and CPU/GPU/NE-supported in the compute plan.
probe_3_drift.py found stable same-backend repeated runs for all tested compute units:
== CPU_ONLY ==
raw_drift_seen: False
argmax_drift_seen: False
worst_abs_diff: 0.0
ref_logits: [3.1015625 3.5722656]
ref_margin_0_minus_1: -0.470703125
ref_argmax: 1
== CPU_AND_GPU ==
raw_drift_seen: False
argmax_drift_seen: False
worst_abs_diff: 0.0
ref_logits: [3.1015625 3.5722656]
ref_margin_0_minus_1: -0.470703125
ref_argmax: 1
== CPU_AND_NE ==
raw_drift_seen: False
argmax_drift_seen: False
worst_abs_diff: 0.0
ref_logits: [4.4414062 3.6835938]
ref_margin_0_minus_1: 0.7578125
ref_argmax: 0
== ALL ==
raw_drift_seen: False
argmax_drift_seen: False
worst_abs_diff: 0.0
ref_logits: [4.4414062 3.6835938]
ref_margin_0_minus_1: 0.7578125
ref_argmax: 0
Cross-backend comparison versus CPU:
CPU_AND_GPU
array_equal: True
max_abs_diff: 0.0
argmax_equal: True
margin_delta_vs_cpu: 0.0
CPU_AND_NE
array_equal: False
max_abs_diff: 1.33984375
argmax_equal: False
delta_vs_cpu: [1.3398438 0.11132812]
margin_delta_vs_cpu: 1.228515625
ALL
array_equal: False
max_abs_diff: 1.33984375
argmax_equal: False
delta_vs_cpu: [1.3398438 0.11132812]
margin_delta_vs_cpu: 1.228515625
Stage 3 passes the planned gates:
- Activation quantization evidence: yes, 11
quantizeand 11dequantizeops with INT8 quantize outputs. - Weight quantization evidence: yes, 6
constexpr_affine_dequantizeops. - NE placement: yes, quantize/dequantize and real ops are NE-preferred in the compute plan.
- Same-backend stability: yes, repeated outputs were stable for CPU_ONLY, CPU_AND_GPU, CPU_AND_NE, and ALL.
- CPU-vs-ANE delta: max absolute logit delta was
1.33984375; margin delta was1.228515625. - CPU-vs-ANE argmax flip: yes. CPU_ONLY/CPU_AND_GPU returned class 1; CPU_AND_NE/ALL returned class 0.
Honest claim: a Core ML MLProgram with activation quantize/dequantize ops, int8 activation tensors, compressed int8 weights, and NE-preferred placement can be stable within each backend while producing CPU-vs-ANE logit differences large enough to flip a near-tie argmax. This does not show run-to-run nondeterminism, and it does not by itself prove the exact low-level arithmetic kernel implementation inside ANE.