Skip to content

Instantly share code, notes, and snippets.

@sshleifer
Created August 19, 2020 01:00
Show Gist options
  • Save sshleifer/2624c64bc44a57e84f4c135baf48f631 to your computer and use it in GitHub Desktop.
Save sshleifer/2624c64bc44a57e84f4c135baf48f631 to your computer and use it in GitHub Desktop.
Language Tagging Process

README.md

based on this table: https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/models/released-models.txt and ISO mappings.

metadata.json

{'hf_name': 'zlw-zlw',
 'source_languages': 'zlw',
 'target_languages': 'zlw',
 'opus_readme_url': 'https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/zlw-zlw/README.md',
 'original_repo': 'Tatoeba-Challenge',
 'tags': ['translation'],
 'languages': ['pl', 'cs', 'zlw', 'multilingual_src', 'multilingual_tgt'],
 'src_constituents': {'ces', 'csb_Latn', 'dsb', 'hsb', 'pol'},
 'tgt_constituents': {'ces', 'csb_Latn', 'dsb', 'hsb', 'pol'},
 'prepro': ' normalization + SentencePiece (spm32k,spm32k)',
 'url_model': 'https://object.pouta.csc.fi/Tatoeba-MT-models/zlw-zlw/opus-2020-07-27.zip',
 'url_test_set': 'https://object.pouta.csc.fi/Tatoeba-MT-models/zlw-zlw/opus-2020-07-27.test.txt',
 'src_alpha3': 'zlw',
 'tgt_alpha3': 'zlw',
 'short_pair': 'zlw-zlw',
 'chrF2_score': 0.58,
 'bleu': 38.8,
 'brevity_penalty': 0.99,
 'ref_len': 7792.0,
 'src_name': 'West Slavic languages',
 'tgt_name': 'West Slavic languages',
 'train_date': '2020-07-27',
 'src_alpha2': 'zlw',
 'tgt_alpha2': 'zlw',
 'prefer_old': False,
 'long_pair': 'zlw-zlw',
 'helsinki_git_sha': '480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535',
 'transformers_git_sha': '6bdf998dffa70030e42f512a586f33a15e648edd',
 'port_machine': 'brutasse',
 'port_time': '2020-08-18-21:36'}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment