Skip to content

Instantly share code, notes, and snippets.

@dwcaraway
Created November 20, 2013 20:06
Show Gist options
  • Save dwcaraway/7570091 to your computer and use it in GitHub Desktop.
Save dwcaraway/7570091 to your computer and use it in GitHub Desktop.
Python script to generate a Javascript regular expression (regex) for BCP 47 / RFC 5646, a port of http://stackoverflow.com/questions/7035825/regular-expression-for-a-language-tag-as-defined-by-bcp47
regular = "(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang)"
irregular = "(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)"
grandfathered = "(" + irregular + "|" + regular + ")"
privateUse = "(x(-[A-Za-z0-9]{1,8})+)"
singleton = "[0-9A-WY-Za-wy-z]"
extension = "(" + singleton + "(-[A-Za-z0-9]{2,8})+)"
variant = "([A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3})"
region = "([A-Za-z]{2}|[0-9]{3})"
script = "([A-Za-z]{4})"
extlang = "([A-Za-z]{3}(-[A-Za-z]{3}){0,2})"
language = "(([A-Za-z]{2,3}(-" + extlang + ")?)|[A-Za-z]{4}|[A-Za-z]{5,8})"
langtag = "(" + language + "(-" + script + ")?" + "(-" + region + ")?" + "(-" + variant + ")*" + "(-" + extension + ")*" + "(-" + privateUse + ")?" + ")"
languageTag = "^(" + langtag + "|" + privateUse + "|" + grandfathered + ")$"
print languageTag
@oravecz
Copy link

oravecz commented Nov 23, 2014

Attempted to check this out on http://regex101.com/#javascript, but it didn't match on even simple languages such as 'en' or 'en-GB'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment