Skip to content

Instantly share code, notes, and snippets.

@ilius
Last active December 17, 2025 12:52
Show Gist options
  • Select an option

  • Save ilius/5e789a339ae236bdf42fe7874e7efb7b to your computer and use it in GitHub Desktop.

Select an option

Save ilius/5e789a339ae236bdf42fe7874e7efb7b to your computer and use it in GitHub Desktop.
Convert mwscrape's CouchDB dump json into PyGlossary's Tabfile
import os
import sys
import json
from pyglossary.text_utils import joinByBar, escapeNTB
filename = sys.argv[1]
with open(filename, encoding="utf-8") as file, open(filename+".txt", "w", encoding="utf-8") as outFile:
for toplevelLine in file:
for item in json.loads(toplevelLine):
if "parse" not in item:
# print(f"No 'parse' in {item}")
continue
parse = item["parse"]
defi = parse["text"]["*"]
terms = [parse["title"]]
if "aliases" in item:
for alias in item["aliases"]:
if not alias:
continue
if len(alias) == 2:
alt = alias[1]
else:
alt = alias[0]
terms.append(alt)
assert "\n" not in alt, alt
termsEscaped = [escapeNTB(term) for term in terms]
outFile.write(joinByBar(termsEscaped) + "\t" + escapeNTB(defi) + "\n")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment