What’s New in Core ML, Part 1

If you want to know about tools, check out part 2 of this talk.

Model size

Running a model on a device has some advantages:

Privacy, data remains on the device
Speed, the model is used in real time
No extra servers needed
No internet needed, it’s always available

Downside is that a large model results in a larger app size. Users are typically not happy about larger apps.

It’s possible to keep models outside of the app, download them later and then compiled on the device. This results in a smaller initial download but at the end the app is still huge.

Some factors that contribute to the size of a Core ML app:

The number of models
The number of weights (parameters) in the model
The size of the weights in the model

Weight sizes

A neural network in iOS 11 used 32-bit Floats to store weights. In iOS 11.2 Floats were stored in 16-bit. In iOS 12 models are stored Quantised. Meaning the weights are stored in the amount of bits needed.

Quantisation makes it possible to slim down a model tremendously, models can become 4 times smaller or even more.

You can quantise a model after training, or straight away during training. Then you can convert the quantised model to CoreML.

Quantisation does trade some accuracy for the smaller file size. You must always verify that a quantised model is still accurate enough for your purposes.

Number of models

Typically models perform just a single task. However, some models can be merged to perform multiple tasks. For instance, if you want to use a model on different sized images you don’t need one model per size. You can merge these models to lower the number of models, and the app size.

This practice of merging models uses flexible shapes. Flexible shapes allow developers to pass multiple supported input to the model. Model flexibility can be defined as a range or an enumeration of sizes. Using an enumeration is better because CoreML can better predict use cases.

Flexibility is supported by Fully convolutional neural networks. Core ML tools can help you check.

Performance and customisation

CoreML is highly optimised under the hood because models, hardware and software are all optimised to perfection already. However, your workload is an unknown for CoreML so that can’t be optimised by Apple.

In iOS 12, you can supply a batch of inputs to a model to make sure the GPU is utilised constantly and efficiently. Running predictions one by one leave idle gaps that can be closed by running predictions in a batch. CoreML will make sure that the batch is processed efficiently and fast.

Sometimes a layer in a neural network is not supported by CoreML for some reason. In iOS 12 you can define custom layers that integrate seamlessly with the inference engine. To provide a custom layer, you need to conform a class to MLCustomLayer. You can also add metal shader based layers as your custom layers.

Custom layers only work for neural networks that use multi array inputs.

You can even create custom models for CoreML. The model must conform to MLCustomModel. Custom models are added using a very similar workflow as normal models. You must add your custom model (and layers if needed) to the dependency section in the model editor.

donnywals/coreml_part_one.md

What’s New in Core ML, Part 1

Model size

Weight sizes

Number of models

Performance and customisation