Skip to content

Instantly share code, notes, and snippets.

@Humoud
Last active March 21, 2025 00:55
Show Gist options
  • Save Humoud/f40f58cd85c5935a444c to your computer and use it in GitHub Desktop.
Save Humoud/f40f58cd85c5935a444c to your computer and use it in GitHub Desktop.
Detecting arabic characters with regex.

Detect all Arabic Characters:

/[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufbc1]|[\ufbd3-\ufd3f]|[\ufd50-\ufd8f]|[\ufd92-\ufdc7]|[\ufe70-\ufefc]|[\uFDF0-\uFDFD]/

Summary:

  Arabic (0600—06FF, 225 characters)

  Arabic Supplement (0750—077F, 48 characters)

  Arabic Extended-A (08A0—08FF, 39 characters)

  Arabic Presentation Forms-A (FB50—FDFF, 608 characters)

  Arabic Presentation Forms-B (FE70—FEFF, 140 characters)

  Rumi Numeral Symbols (10E60—10E7F, 31 characters)

  Arabic Mathematical Alphabetic Symbols (1EE00—1EEFF, 143 characters)

For more info check this Wiki link to see arabic letters in Unicode:

https://en.wikipedia.org/wiki/Arabic_(Unicode_block)

References:

http://stackoverflow.com/questions/11323596/regular-expression-for-arabic-language

@abousselmi
Copy link

abousselmi commented Mar 29, 2020

Very useful, thanks !

I used it in a regex instruction to keep arabic and numeric chars and remove the rest:

...
t = re.sub(r'[^0-9\u0600-\u06ff\u0750-\u077f\ufb50-\ufbc1\ufbd3-\ufd3f\ufd50-\ufd8f\ufd50-\ufd8f\ufe70-\ufefc\uFDF0-\uFDFD]+', ' ', text)
...

@AhmedAbouelkher
Copy link

do you have an example in golang?

@Arifursdev
Copy link

in PHP

$arabic_regex = '/[\x{0600}-\x{06FF}|\x{0750}-\x{077f}|\x{fb50}-\x{fbc1}|\x{fbd3}-\x{fd3f}|\x{fd50}-\x{fd8f}|\x{fd92}-\x{fdc7}|\x{fe70}-\x{fefc}|\x{FDF0}-\x{FDFD}]/u';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment