avidale/create_rut5-base.ipynb

Created April 30, 2021 21:51

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/avidale/44cd35bfcdaf8bedf51d97c468cc8001.js"></script>
Save avidale/44cd35bfcdaf8bedf51d97c468cc8001 to your computer and use it in GitHub Desktop.

create_rut5-base.ipynb

Raw

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Author

Hi @MuaazAnsari!
I have two comments on your question:

I don't expect the optimal hyperparameter values to depend on whether the model vocabulary has been pruned or not.
Unfortunately, they do depend on the dataset that you train on (including the input and output lengths, dataset size, and task difficulty) and your hardware (e.g. if your GPU memory is limited, you'd have to decrease batch size, but then you might want to decrease learning rate or increase gradient accumulation to compensate). Thus, I cannot suggest parameters that would be universally good.