Hi @avidale . Hope your are doing great! Great work, Indeed! I have implemented the above code in Urdu language and want to finetune it for text summarisation . I just wanted to know about the arguments which needs to passed while training ( For example, learning rate, weight decay etc) . Given below are the training args for finetuning the mt5-base model on urdu dataset. What is your suggestion on the argument values for the reduced model (39.9% parameters of the original model). That would be a great help.

from transformers import Seq2SeqTrainingArguments

batch_size = 8
num_train_epochs = 8

Show the training loss with every epoch

logging_steps = len(tokenized_datasets["train"]) // batch_size
model_name = model_checkpoint.split("/")[-1]

args = Seq2SeqTrainingArguments(
output_dir=f"{model_name}-finetuned_urdu_mt5-base",
evaluation_strategy="epoch",
learning_rate=5.6e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=num_train_epochs,
predict_with_generate=True,
logging_steps=logging_steps,
push_to_hub=True,
)

I tried using the same args on the reduced model and it got overfit.

avidale/create_rut5-base.ipynb

MuaazAnsari commented Nov 23, 2024

Uh oh!

avidale commented Nov 25, 2024

Uh oh!