Skip to content

Instantly share code, notes, and snippets.

@avidale
Created April 30, 2021 21:51
Show Gist options
  • Save avidale/44cd35bfcdaf8bedf51d97c468cc8001 to your computer and use it in GitHub Desktop.
Save avidale/44cd35bfcdaf8bedf51d97c468cc8001 to your computer and use it in GitHub Desktop.
create_rut5-base.ipynb
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@avidale
Copy link
Author

avidale commented Nov 25, 2024

Hi @MuaazAnsari!
I have two comments on your question:

  1. I don't expect the optimal hyperparameter values to depend on whether the model vocabulary has been pruned or not.
  2. Unfortunately, they do depend on the dataset that you train on (including the input and output lengths, dataset size, and task difficulty) and your hardware (e.g. if your GPU memory is limited, you'd have to decrease batch size, but then you might want to decrease learning rate or increase gradient accumulation to compensate). Thus, I cannot suggest parameters that would be universally good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment