microsoft/phi-2 for creating Mixture of Experts (MoE)

The microsoft/phi-2 is a small language model with 2.7billion parameters. Because of its small size, opensource license and thanks to finetuning techqniques like QLoRA, one can (fairly) quickly finetune a base model for performing downstream tasks and creating an expert phi-2 model. It would be interesting to combine the individual experts into a Mixture of Experts (MoE) to make the MoE perform the tasks of the individual experts. Follow the steps below to create your own version of a MoE based out of phi-2.

Special mention to Maxime Labonne, Aratako, Paul Ilioaica for showing the opensource community that the mergekit can be tweaked to make a MoE out of phi-2 experts.
Big shoutout to Charles O. Goddard, the author of mergekit for creating and letting us play with mergekit

Phi-2 MoE generation

The mergekit in its original flavour does not support microsoft/phi-2 (at the time of writing this article) because of mismatch in layers names. This fork was done to make the mergekit work with microsoft/phi-2 based SLMs to create a "Phi2-Mixture of Experts" model. Follow the instructions below to create your own Mixture of Experts from multiple individual phi-2 experts. Please checkout the "phi2xtral" branch to start with.

Instructions for creating Phi-2 MoE

Checkout the phi2xtral branch of this repository.

Merging experts into MoE

Craete a merge configuration like config_moe_phi2.yaml where you either pass in the absolute path of the expert model's directory or to the expert's huggingface repository.
Run phi2_moe.py by passing the following arguments to it
- merge configuration: config_moe_phi2.yaml
- path to output folder: for example you can use this folder output_phi2_moe (as it has the configuration files needed for inferencing the MoE model)
- load-in-4bit
- trust-remote-code
- therefore the run command looks like: phi2_moe.py config_moe_phi2.yaml output_phi2_moe --load-in-4bit --trust-remote-code
This should now create the Mixture of Experts model from your individual experts as per the merge configuration inside the output directory.
Note: If you are using your own custom finetuned phi-2, that was fine tuned using techniques like Qlora, merge the adapter weights back to the base model before using it as one of the experts in the mergekit.
- You can read about merging the adapters to base model here: https://kaitchup.substack.com/p/lora-adapters-when-a-naive-merge

Inference

You will find the customized configuration and modelling file within the output_phi2_moe folder.
- You will need configuration_phi_2.py and m̀odelling_phi_2.py for inference of your Phi2MoE.
- Create your inference script as you would normally do using the huggingface transformer library and load the MoE you just craeted above.
- The model will use the customized configuration files present inside the output folder as per the config.json that gets created from the previous step using phi2_moe.py
Enjoy your Phi2MoE!

v-prgmr/creating_Phi2_MoE_using_mergekit.md

microsoft/phi-2 for creating Mixture of Experts (MoE)

Phi-2 MoE generation

Instructions for creating Phi-2 MoE

Merging experts into MoE

Inference