RoBERTa + CSPT (single model)
We first train a generation model to generate synthetic data from ConceptNet. We then build the commonsense pre-trained model by finetuning RoBERTa-large model on the synthetic data and Open Mind Common Sense (OMCS) corpus. The final model is finetuned from the pretrained commonsense model on CSQA.
Commonsense Pre-training:
- epochs: 5
- maximum sequence length: 35
- learning rate: 3e-5