Skip to content

Instantly share code, notes, and snippets.

@kiriappeee
Forked from janithl/getlang.py
Created December 15, 2016 04:59
Show Gist options
  • Save kiriappeee/d46d90095e1a22ea358151dbf9d7a240 to your computer and use it in GitHub Desktop.
Save kiriappeee/d46d90095e1a22ea358151dbf9d7a240 to your computer and use it in GitHub Desktop.
from collections import defaultdict
UNICODE_BLOCKS = {
'en': range(0x0000, 0x02AF),
'si': range(0x0D80, 0x0DFF),
'ta': range(0x0B80, 0x0BFF),
'dv': range(0x0780, 0x07BF)
}
def getlang(text):
"""Get language via Unicode range. Partially based on:
https://github.com/kent37/guess-language/blob/master/guess_language/guess_language.py#L344
"""
run_types = defaultdict(int)
for c in text:
if(c.isalpha()):
for block in UNICODE_BLOCKS:
if(ord(c) in UNICODE_BLOCKS[block]):
run_types[block] += 1
return max(run_types, key=run_types.get)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment