Skip to content

Instantly share code, notes, and snippets.

@dauuricus
Last active January 30, 2021 14:09
Show Gist options
  • Save dauuricus/bbd4bd5a57a8d91804257a06fe3e233b to your computer and use it in GitHub Desktop.
Save dauuricus/bbd4bd5a57a8d91804257a06fe3e233b to your computer and use it in GitHub Desktop.

Click here for the article that I used as a reference. The reference code is almost the same.

Install the python package googletrans with google colab and split from English txt to English and Japanese txt (and others).

about googletrans

Subtitle files (.srt and .sbv) etc. to google translate and see the translated one, the time code part and counter index are randomly changed to kanji, and the colon (:) is full-width. It is necessary to perform the process of escaping the translation of such a part and returning it to the text after translating the text.For example, in French, a space is placed before the number.

If you can use the translation function in the procedure of sending back with API, the processing can be programmed on this side.

this QR is URL of this page:

The image is that when you run googletrans on google colab, you can upload the text and download the translated version.

like this. It runs on a google cloud computer, so you don't need a python runtime environment at hand.

If you want to use local python instead of google colab, please refer to the page linked at the bottom of this article.1

About [google colab] (https://research.google.com/colaboratory/faq.html)

The package to install has the tkk fix patch (probably uninvestigated) applied, 4.0.0-rc1 did not result in an error. In version 3.0.0 installed by pip install googletrans

code = unicode (self.RE_TKK.search(r.text).group (1)).replace ('var','')
AttributeError:'NoneType' object has no attribute'group'

Will result in the error. (As of 2021.1.27.) This problem will often be a problem with Emacs's googletranlete program. Since it will be a chase that it will be corrected according to the token specification change of the service of google translate, this is the method that is temporarily used now, so there will always be changes in the future, so the link at the end Please check the Issue with. There will be updates to the problem at that point and tips such as solutions by volunteers.

If you specify the version on google colab and install the modified googletrans, it's OK. If you have an unversioned package installed, uninstall it, then Install googletrans with google colab.

pip install googletrans == 4.0.0-rc1

and code is here

ipynb (ipython) <But This program does not work well!>

from google.colab import files
from googletrans import Translator
import sys

uploaded = files.upload()

filename = ''
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  filename = fn

#args= sys.argv
args= [('translate.py'),filename,'>','translated-jp.txt']
if len(args) < 2:
    print('python3 translate.py textfile.txt > output_textfile.txt')
else:
    print('open '+args[1])
    f = open(args[1])
    lines = f.readlines()
    f.close()

    translator = Translator()
    for line in lines:
        translated = translator.translate(line, dest="ja");
        print(line) # Original
        print(translated.text) # translated
        print()
    print('EOF')
    files.download(filename)

class googletrans.models.Translated(src, dest, origin, text, pronunciation, extra_data=None, **kwargs) Translate result object Parameters: src – source language (default: auto) dest – destination language (default: en) origin – original text text – translated text pronunciation – pronunciation

However, there was a problem when I tried it, and after using it for several hours and verifying it, when there was a blank line in the text to be translated, it became IndexErorr: list index out of range.

In other words, the text to be translated is

00:00:00.320,00:00:06.320
welcome all you super amazing hardware addicts
i am so excited to share this project with you

00:00:06.880,00:00:11.920
after we got that letter in from the listener
talking about how they put lineage os

00:00:11.920, 00:00:17.840
on their fire hd tablet i just had to do
it and the kids have loved this change

In such a case, you will get an error if you stumble on the blank line on the 4th line.

0:00:00.320,0:00:06.320

00000.320,00006.320

welcome all you super amazing hardware addicts 

超素晴らしいハードウェア中毒者を歓迎します

i am so excited to share this project with you

このプロジェクトをあなたと共有できることをとてもうれしく思います

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-23-e8018cddf127> in <module>()
     23     translator = Translator()
     24     for line in lines:
---> 25         translated = translator.translate(line, dest="ja");
     26         print(line) # Original
     27         print(translated.text) # Japanese

1 frames
/usr/local/lib/python3.6/dist-packages/googletrans/client.py in <lambda>(part)
    220         # not sure
    221         should_spacing = parsed[1][0][0][3]
--> 222         translated_parts = list(map(lambda part: TranslatedPart(part[0], part[1] if len(part) >= 2 else []), parsed[1][0][0][5]))
    223         translated = (' ' if should_spacing else '').join(map(lambda part: part.text, translated_parts))
    224 

IndexError: list index out of range

But if you fill in the blank lines and then upload

00:00:00.320,00:00:06.320
welcome all you super amazing hardware addicts
i am so excited to share this project with you
00:00:06.880,00:00:11.920
after we got that letter in from the listener
talking about how they put lineage os
00:00:11.920,00:00:17.840
on their fire hd tablet i just had to do
it and the kids have loved this change

It's a simple problem that doesn't cause an error, so I think it will be improved soon.

(Addition) Improvement. (2021-01-28)

Since the process of removing line breaks '\n' and whitespace' ' is not a problem of googletrans at all, it has been improved so that the list passed to googletrans does not include line breaks and whitespace.

ipynb (ipython)

pip install googletrans == 4.0.0-rc1

translate.ipynb

from google.colab import files
from googletrans import Translator
import sys

uploaded = files.upload()

filename = ''
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  filename = fn

#args= sys.argv
args= [('translate.py'),filename]
if len(args) < 2:
    print('python3 translate.py textfile.txt output_textfile.txt')
else:
    print('open '+args[1])
    with open(args[1]) as f:
      line = f.readlines() 
    f.close()
    
    #line_list = []

    line[:] = [l.rstrip('\n') for l in line]
    #for l in line:
    #  line_list.append(l.strip())
    
    line[:] = [a for a in line if a != '']
    
    ##print(line)

    translator = Translator()
    f = open(filename, 'w')
    for l in line:
      translated = translator.translate(l, dest="ja");
      print(l) # Original
      f.writelines(l)
      f.write('\n')
      print(translated.text) # dest lang
      f.writelines(translated.text)
      f.write('\n')
      print()
      f.write('\n')
    print('EOF')
    f.close()

    files.download(filename)

[github gist google-colab_googletrans.py] (https://gist.githubusercontent.com/dauuricus/83926ee296f487a8b78e6e156e168a3d/raw/4dbcc382eb4c4e65a3a8ebd9c071e076978e9138/google-colab_googletrans.py)

Cf. [how to remove newline character from a list in python] (https://www.kite.com/python/answers/how-to-remove-newline-character-from-a-list-in-python)

Cf. list comprehension: https://realpython.com/lessons/writing-your-first-list-comprehension/

https://youtu.be/WehZC2g-FdU

English to French https://youtu.be/WlZbQnKOCMk

All of lang list

LANGUAGES = {
    'af': 'afrikaans',
    'sq': 'albanian',
    'am': 'amharic',
    'ar': 'arabic',
    'hy': 'armenian',
    'az': 'azerbaijani',
    'eu': 'basque',
    'be': 'belarusian',
    'bn': 'bengali',
    'bs': 'bosnian',
    'bg': 'bulgarian',
    'ca': 'catalan',
    'ceb': 'cebuano',
    'ny': 'chichewa',
    'zh-cn': 'chinese (simplified)',
    'zh-tw': 'chinese (traditional)',
    'co': 'corsican',
    'hr': 'croatian',
    'cs': 'czech',
    'da': 'danish',
    'nl': 'dutch',
    'en': 'english',
    'eo': 'esperanto',
    'et': 'estonian',
    'tl': 'filipino',
    'fi': 'finnish',
    'fr': 'french',
    'fy': 'frisian',
    'gl': 'galician',
    'ka': 'georgian',
    'de': 'german',
    'el': 'greek',
    'gu': 'gujarati',
    'ht': 'haitian creole',
    'ha': 'hausa',
    'haw': 'hawaiian',
    'iw': 'hebrew',
    'he': 'hebrew',
    'hi': 'hindi',
    'hmn': 'hmong',
    'hu': 'hungarian',
    'is': 'icelandic',
    'ig': 'igbo',
    'id': 'indonesian',
    'ga': 'irish',
    'it': 'italian',
    'ja': 'japanese',
    'jw': 'javanese',
    'kn': 'kannada',
    'kk': 'kazakh',
    'km': 'khmer',
    'ko': 'korean',
    'ku': 'kurdish (kurmanji)',
    'ky': 'kyrgyz',
    'lo': 'lao',
    'la': 'latin',
    'lv': 'latvian',
    'lt': 'lithuanian',
    'lb': 'luxembourgish',
    'mk': 'macedonian',
    'mg': 'malagasy',
    'ms': 'malay',
    'ml': 'malayalam',
    'mt': 'maltese',
    'mi': 'maori',
    'mr': 'marathi',
    'mn': 'mongolian',
    'my': 'myanmar (burmese)',
    'ne': 'nepali',
    'no': 'norwegian',
    'or': 'odia',
    'ps': 'pashto',
    'fa': 'persian',
    'pl': 'polish',
    'pt': 'portuguese',
    'pa': 'punjabi',
    'ro': 'romanian',
    'ru': 'russian',
    'sm': 'samoan',
    'gd': 'scots gaelic',
    'sr': 'serbian',
    'st': 'sesotho',
    'sn': 'shona',
    'sd': 'sindhi',
    'si': 'sinhala',
    'sk': 'slovak',
    'sl': 'slovenian',
    'so': 'somali',
    'es': 'spanish',
    'su': 'sundanese',
    'sw': 'swahili',
    'sv': 'swedish',
    'tg': 'tajik',
    'ta': 'tamil',
    'te': 'telugu',
    'th': 'thai',
    'tr': 'turkish',
    'uk': 'ukrainian',
    'ur': 'urdu',
    'ug': 'uyghur',
    'uz': 'uzbek',
    'vi': 'vietnamese',
    'cy': 'welsh',
    'xh': 'xhosa',
    'yi': 'yiddish',
    'yo': 'yoruba',
    'zu': 'zulu',

Reference for tkk error: [stackoverflow.com "googletrans stopped working with error nonetype object has no attribute group"] (https://stackoverflow.com/questions/52455774/googletrans-stopped-working-with-error-nonetype-object-has-no-attribute -group)

[py-googletrans/issues/234] (ssut/py-googletrans#234 (comment))

Footnotes

  1. googletrans with local python qiita page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment