Click here for the article that I used as a reference. The reference code is almost the same.
Install the python package googletrans with google colab and split from English txt to English and Japanese txt (and others).
about googletrans
Subtitle files (.srt and .sbv) etc. to google translate and see the translated one, the time code part and counter index are randomly changed to kanji, and the colon (:) is full-width. It is necessary to perform the process of escaping the translation of such a part and returning it to the text after translating the text.For example, in French, a space is placed before the number.
If you can use the translation function in the procedure of sending back with API, the processing can be programmed on this side.
The image is that when you run googletrans on google colab, you can upload the text and download the translated version.
like this. It runs on a google cloud computer, so you don't need a python runtime environment at hand.
Usage: https://youtu.be/tEJDsapYFr8
If you want to use local python instead of google colab, please refer to the page linked at the bottom of this article.1
About [google colab] (https://research.google.com/colaboratory/faq.html)
The package to install has the tkk fix patch (probably uninvestigated) applied, 4.0.0-rc1
did not result in an error.
In version 3.0.0
installed by pip install googletrans
code = unicode (self.RE_TKK.search(r.text).group (1)).replace ('var','')
AttributeError:'NoneType' object has no attribute'group'
Will result in the error. (As of 2021.1.27.) This problem will often be a problem with Emacs's googletranlete program. Since it will be a chase that it will be corrected according to the token specification change of the service of google translate, this is the method that is temporarily used now, so there will always be changes in the future, so the link at the end Please check the Issue with. There will be updates to the problem at that point and tips such as solutions by volunteers.
If you specify the version on google colab and install the modified googletrans, it's OK. If you have an unversioned package installed, uninstall it, then Install googletrans with google colab.
pip install googletrans == 4.0.0-rc1
and code is here
ipynb (ipython) <But This program does not work well!>
from google.colab import files
from googletrans import Translator
import sys
uploaded = files.upload()
filename = ''
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
filename = fn
#args= sys.argv
args= [('translate.py'),filename,'>','translated-jp.txt']
if len(args) < 2:
print('python3 translate.py textfile.txt > output_textfile.txt')
else:
print('open '+args[1])
f = open(args[1])
lines = f.readlines()
f.close()
translator = Translator()
for line in lines:
translated = translator.translate(line, dest="ja");
print(line) # Original
print(translated.text) # translated
print()
print('EOF')
files.download(filename)
class
googletrans.models.Translated(src, dest, origin, text, pronunciation, extra_data=None, **kwargs)
Translate result object Parameters:src
– source language (default: auto)dest
– destination language (default: en)origin
– original texttext
– translated textpronunciation
– pronunciation
However, there was a problem when I tried it, and after using it for several hours and verifying it, when there was a blank line in the text to be translated, it became IndexErorr: list index out of range
.
In other words, the text to be translated is
00:00:00.320,00:00:06.320
welcome all you super amazing hardware addicts
i am so excited to share this project with you
00:00:06.880,00:00:11.920
after we got that letter in from the listener
talking about how they put lineage os
00:00:11.920, 00:00:17.840
on their fire hd tablet i just had to do
it and the kids have loved this change
In such a case, you will get an error if you stumble on the blank line on the 4th line.
0:00:00.320,0:00:06.320
0:00:00.320,0:00:06.320
welcome all you super amazing hardware addicts
超素晴らしいハードウェア中毒者を歓迎します
i am so excited to share this project with you
このプロジェクトをあなたと共有できることをとてもうれしく思います
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-23-e8018cddf127> in <module>()
23 translator = Translator()
24 for line in lines:
---> 25 translated = translator.translate(line, dest="ja");
26 print(line) # Original
27 print(translated.text) # Japanese
1 frames
/usr/local/lib/python3.6/dist-packages/googletrans/client.py in <lambda>(part)
220 # not sure
221 should_spacing = parsed[1][0][0][3]
--> 222 translated_parts = list(map(lambda part: TranslatedPart(part[0], part[1] if len(part) >= 2 else []), parsed[1][0][0][5]))
223 translated = (' ' if should_spacing else '').join(map(lambda part: part.text, translated_parts))
224
IndexError: list index out of range
But if you fill in the blank lines and then upload
00:00:00.320,00:00:06.320
welcome all you super amazing hardware addicts
i am so excited to share this project with you
00:00:06.880,00:00:11.920
after we got that letter in from the listener
talking about how they put lineage os
00:00:11.920,00:00:17.840
on their fire hd tablet i just had to do
it and the kids have loved this change
It's a simple problem that doesn't cause an error, so I think it will be improved soon.
Since the process of removing line breaks '\n'
and whitespace' '
is not a problem of googletrans at all, it has been improved so that the list passed to googletrans does not include line breaks and whitespace.
ipynb (ipython)
pip install googletrans == 4.0.0-rc1
translate.ipynb
from google.colab import files
from googletrans import Translator
import sys
uploaded = files.upload()
filename = ''
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
filename = fn
#args= sys.argv
args= [('translate.py'),filename]
if len(args) < 2:
print('python3 translate.py textfile.txt output_textfile.txt')
else:
print('open '+args[1])
with open(args[1]) as f:
line = f.readlines()
f.close()
#line_list = []
line[:] = [l.rstrip('\n') for l in line]
#for l in line:
# line_list.append(l.strip())
line[:] = [a for a in line if a != '']
##print(line)
translator = Translator()
f = open(filename, 'w')
for l in line:
translated = translator.translate(l, dest="ja");
print(l) # Original
f.writelines(l)
f.write('\n')
print(translated.text) # dest lang
f.writelines(translated.text)
f.write('\n')
print()
f.write('\n')
print('EOF')
f.close()
files.download(filename)
[github gist google-colab_googletrans.py] (https://gist.githubusercontent.com/dauuricus/83926ee296f487a8b78e6e156e168a3d/raw/4dbcc382eb4c4e65a3a8ebd9c071e076978e9138/google-colab_googletrans.py)
Cf. [how to remove newline character from a list in python] (https://www.kite.com/python/answers/how-to-remove-newline-character-from-a-list-in-python)
Cf. list comprehension: https://realpython.com/lessons/writing-your-first-list-comprehension/
English to French https://youtu.be/WlZbQnKOCMk
All of lang list
LANGUAGES = {
'af': 'afrikaans',
'sq': 'albanian',
'am': 'amharic',
'ar': 'arabic',
'hy': 'armenian',
'az': 'azerbaijani',
'eu': 'basque',
'be': 'belarusian',
'bn': 'bengali',
'bs': 'bosnian',
'bg': 'bulgarian',
'ca': 'catalan',
'ceb': 'cebuano',
'ny': 'chichewa',
'zh-cn': 'chinese (simplified)',
'zh-tw': 'chinese (traditional)',
'co': 'corsican',
'hr': 'croatian',
'cs': 'czech',
'da': 'danish',
'nl': 'dutch',
'en': 'english',
'eo': 'esperanto',
'et': 'estonian',
'tl': 'filipino',
'fi': 'finnish',
'fr': 'french',
'fy': 'frisian',
'gl': 'galician',
'ka': 'georgian',
'de': 'german',
'el': 'greek',
'gu': 'gujarati',
'ht': 'haitian creole',
'ha': 'hausa',
'haw': 'hawaiian',
'iw': 'hebrew',
'he': 'hebrew',
'hi': 'hindi',
'hmn': 'hmong',
'hu': 'hungarian',
'is': 'icelandic',
'ig': 'igbo',
'id': 'indonesian',
'ga': 'irish',
'it': 'italian',
'ja': 'japanese',
'jw': 'javanese',
'kn': 'kannada',
'kk': 'kazakh',
'km': 'khmer',
'ko': 'korean',
'ku': 'kurdish (kurmanji)',
'ky': 'kyrgyz',
'lo': 'lao',
'la': 'latin',
'lv': 'latvian',
'lt': 'lithuanian',
'lb': 'luxembourgish',
'mk': 'macedonian',
'mg': 'malagasy',
'ms': 'malay',
'ml': 'malayalam',
'mt': 'maltese',
'mi': 'maori',
'mr': 'marathi',
'mn': 'mongolian',
'my': 'myanmar (burmese)',
'ne': 'nepali',
'no': 'norwegian',
'or': 'odia',
'ps': 'pashto',
'fa': 'persian',
'pl': 'polish',
'pt': 'portuguese',
'pa': 'punjabi',
'ro': 'romanian',
'ru': 'russian',
'sm': 'samoan',
'gd': 'scots gaelic',
'sr': 'serbian',
'st': 'sesotho',
'sn': 'shona',
'sd': 'sindhi',
'si': 'sinhala',
'sk': 'slovak',
'sl': 'slovenian',
'so': 'somali',
'es': 'spanish',
'su': 'sundanese',
'sw': 'swahili',
'sv': 'swedish',
'tg': 'tajik',
'ta': 'tamil',
'te': 'telugu',
'th': 'thai',
'tr': 'turkish',
'uk': 'ukrainian',
'ur': 'urdu',
'ug': 'uyghur',
'uz': 'uzbek',
'vi': 'vietnamese',
'cy': 'welsh',
'xh': 'xhosa',
'yi': 'yiddish',
'yo': 'yoruba',
'zu': 'zulu',
Reference for tkk error: [stackoverflow.com "googletrans stopped working with error nonetype object has no attribute group"] (https://stackoverflow.com/questions/52455774/googletrans-stopped-working-with-error-nonetype-object-has-no-attribute -group)
[py-googletrans/issues/234] (ssut/py-googletrans#234 (comment))
Footnotes
-
googletrans with local python qiita page ↩