The microsoft/phi-2 is a small language model with 2.7billion parameters. Because of its small size, opensource license and thanks to finetuning techqniques like QLoRA, one can (fairly) quickly finetune a base model for performing downstream tasks and creating an expert phi-2 model. It would be interesting to combine the individual experts into a Mixture of Experts (MoE) to make the MoE perform the tasks of the individual experts. Follow the steps below to create your own version of a MoE based out of phi-2.
- Special mention to Maxime Labonne, Aratako, Paul Ilioaica for showing the opensource community that the mergekit can be tweaked to make a MoE out of phi-2 experts.
- Big shoutout to Charles O. Goddard, the author of mergekit for creating and letting us play with mergekit