-
-
Save aditya-malte/2d4f896f471be9c38eb4d723a710768b to your computer and use it in GitHub Desktop.
Hi @carlstrath,
(Sorry I’ve been a bit busy lately so wasn’t active).
This gist was made for a specific version of the transformer and tokenizer library. Can you try using it with the versions mentioned at the start.
Meanwhile, I guess it’s about time now that I update this gist to reflect changes in the dependencies.
Thanks
Aditya
Also, while cloning from git. Please ensure you use this (https://github.com/huggingface/transformers/tree/v2.5.0) github repo instead. (As the gist is compatible with that version of huggingface, the newer one probably doesn’t contain the required run_language_modeling file)
I ran into issues while following the directions from the 2020 blog post https://huggingface.co/blog/how-to-train. This gist was more helpful. Thank you 👍
For anyone interested in running through training with an updated transformers: I have a write-up here of a complete example on training from scratch using transformers 4.10
and the updated run_language_modeling.py
script (https://github.com/huggingface/transformers/blob/4a872caef4e70595202c64687a074f99772d8e92/examples/legacy/run_language_modeling.py) committed on Jun 25, 2021.
https://github.com/sv-v5/train-roberta-ua
Python package verisions are locked with pipenv so the example remains reproducible. Tested on Linux and Windows on GPU and CPU.
Happy training
Hi,
That’s great to hear! Also, thanks a lot for making an updated training script. I’ve been busy lately (earlier with work and now my Master’s), so your updated script is much appreciated.
I had to update step #26 from tokenizer.save to tokenizer.save_model. FYI
tokenizer.save_model("/content/models/smallBERTa", "smallBERTa")
Sorry to bother everyone again. I am now getting this error in ln27
python3: can't open file '/content/transformers/examples/run_language_modeling.py': [Errno 2] No such file or directory