Last active
May 27, 2024 16:44
-
-
Save Alex-Just/e86110836f3f93fe7932290526529cd1 to your computer and use it in GitHub Desktop.
Python regex to strip emoji from a string
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
# http://stackoverflow.com/a/13752628/6762004 | |
RE_EMOJI = re.compile('[\U00010000-\U0010ffff]', flags=re.UNICODE) | |
def strip_emoji(text): | |
return RE_EMOJI.sub(r'', text) | |
print(strip_emoji('🙄🤔')) |
@clichedmoog you are totally right, everything here is a simplification
. For a complete/accurate emoji remover for python I recommend the library https://github.com/bsolomon1124/demoji which download the latest emoji specification to build the pattern. It's not super fast but it's exhaustive.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Sorry to say this but I think @mgaitan's regex is not perfect.
The recent emoji character includes various combinations and patterns so it would be more complex expression.
And this would be good implementation example by javascript: https://github.com/mathiasbynens/emoji-regex