The microsoft/phi-2 is a small language model with 2.7billion parameters. Because of its small size, opensource license and thanks to finetuning techqniques like QLoRA, one can (fairly) quickly finetune a base model for performing downstream tasks and creating an expert phi-2 model. It would be interesting to combine the individual experts into a Mixture of Experts (MoE) to make the MoE perform the tasks of the individual experts. Follow the steps below to create your own version of a MoE based out of phi-2.
- Special mention to Maxime Labonne, Aratako, Paul Ilioaica for showing the opensource community that the mergekit can be tweaked to make a MoE out of phi-2 experts.
- Big shoutout to Charles O. Goddard, the author of mergekit for creating and letting us play with mergekit
The mergekit in its original flavour does not support microsoft/phi-2 (at the time of writing this article) because of mismatch in layers names. This fork was done to make the mergekit work with microsoft/phi-2 based SLMs to create a "Phi2-Mixture of Experts" model. Follow the instructions below to create your own Mixture of Experts from multiple individual phi-2 experts. Please checkout the "phi2xtral" branch to start with.
Checkout the phi2xtral
branch of this repository.
- Craete a merge configuration like config_moe_phi2.yaml where you either pass in the absolute path of the expert model's directory or to the expert's huggingface repository.
- Run phi2_moe.py by passing the following arguments to it
- merge configuration: config_moe_phi2.yaml
- path to output folder: for example you can use this folder output_phi2_moe (as it has the configuration files needed for inferencing the MoE model)
- load-in-4bit
- trust-remote-code
- therefore the run command looks like:
phi2_moe.py config_moe_phi2.yaml output_phi2_moe --load-in-4bit --trust-remote-code
- This should now create the Mixture of Experts model from your individual experts as per the merge configuration inside the output directory.
- Note: If you are using your own custom finetuned phi-2, that was fine tuned using techniques like Qlora, merge the adapter weights back to the base model before using it as one of the experts in the mergekit.
- You can read about merging the adapters to base model here: https://kaitchup.substack.com/p/lora-adapters-when-a-naive-merge
- You will find the customized configuration and modelling file within the
output_phi2_moe
folder.- You will need
configuration_phi_2.py
andm̀odelling_phi_2.py
for inference of your Phi2MoE. - Create your inference script as you would normally do using the huggingface transformer library and load the MoE you just craeted above.
- The model will use the customized configuration files present inside the output folder as per the
config.json
that gets created from the previous step usingphi2_moe.py
- You will need
- Enjoy your Phi2MoE!