This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from apiclient.discovery import build | |
| service = build('translate', 'v2', developerKey='') | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import youtube_dl | |
| def download_vtt(url,lang): | |
| ydl_opts = { | |
| 'quiet': True, | |
| 'subtitleslangs': [lang], | |
| 'writeautomaticsub': 'yes', | |
| 'skip_download': 'yes' | |
| } | |
| with youtube_dl.YoutubeDL(ydl_opts) as ydl: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from faker import Faker | |
| fake = Faker('es_MX') | |
| for n in range(10): | |
| print(fake.name()) | |
| ''' | |
| Humberto Menchaca Berríos | |
| Lic. Irma Menchaca | |
| Elisa Barrera |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from faker import Faker | |
| fake = Faker('es_MX') | |
| for n in range(10): | |
| print(fake.job()) | |
| ''' | |
| Geologist, wellsite | |
| Sports development officer | |
| Telecommunications researcher |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from faker import Faker | |
| from translate import Translator | |
| fake = Faker('es_MX') | |
| translator= Translator(to_lang="es") | |
| for n in range(10): | |
| print(translator.translate(fake.job())) | |
| ''' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def strip_punct(line): | |
| line = str(line) | |
| charset = set() | |
| for ch in line: | |
| charset.update(ch) | |
| punct = [ch for ch in charset if not ch.isalpha()] | |
| if ' ' in punct: | |
| punct.remove(' ') | |
| for ch in punct: | |
| line = line.replace(ch, ' ').lower() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import stanfordnlp | |
| MODELS_DIR = 'C:\\Users\\user\\stanfordnlp_resources\\' | |
| nlp = stanfordnlp.Pipeline(processors='tokenize,pos,lemma', models_dir=MODELS_DIR, lang='es') | |
| def get_lemmas(line): | |
| line = nlp(line) | |
| tagged = [[w.lemma for w in sent.words if w.pos == 'ADV' or w.pos == 'ADJ' or w.pos == 'VERB'] | |
| for sent in line.sentences] | |
| return ' '.join([w for sent in tagged for w in sent]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #Get bilingual data from the European Comission translation memories | |
| #https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory#More%20details%20/%20Reference%20publication | |
| #I needed to extract just EN-ES bilingual data from the tmx files for my machine translation experiment. | |
| #Their Java TM exporter was not working on my side. | |
| #I wrote this script to get the data | |
| import xmltodict | |
| import pandas as pd | |
| import os |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
OlderNewer