Created
April 30, 2021 21:51
-
-
Save avidale/44cd35bfcdaf8bedf51d97c468cc8001 to your computer and use it in GitHub Desktop.
create_rut5-base.ipynb
Hi @MuaazAnsari!
I have two comments on your question:
- I don't expect the optimal hyperparameter values to depend on whether the model vocabulary has been pruned or not.
- Unfortunately, they do depend on the dataset that you train on (including the input and output lengths, dataset size, and task difficulty) and your hardware (e.g. if your GPU memory is limited, you'd have to decrease batch size, but then you might want to decrease learning rate or increase gradient accumulation to compensate). Thus, I cannot suggest parameters that would be universally good.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @avidale . Hope your are doing great! Great work, Indeed! I have implemented the above code in Urdu language and want to finetune it for text summarisation . I just wanted to know about the arguments which needs to passed while training ( For example, learning rate, weight decay etc) . Given below are the training args for finetuning the mt5-base model on urdu dataset. What is your suggestion on the argument values for the reduced model (39.9% parameters of the original model). That would be a great help.
from transformers import Seq2SeqTrainingArguments
batch_size = 8
num_train_epochs = 8
Show the training loss with every epoch
logging_steps = len(tokenized_datasets["train"]) // batch_size
model_name = model_checkpoint.split("/")[-1]
args = Seq2SeqTrainingArguments(
output_dir=f"{model_name}-finetuned_urdu_mt5-base",
evaluation_strategy="epoch",
learning_rate=5.6e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=num_train_epochs,
predict_with_generate=True,
logging_steps=logging_steps,
push_to_hub=True,
)
I tried using the same args on the reduced model and it got overfit.