Alex-Just/strip_emoji.py

joc32 · 2018-06-26T05:10:09Z

Does not work when the emoji is at the end of a sentence.

GMTernence · 2019-01-13T07:53:21Z

Thanks a lot
It works for me

swjmj · 2019-08-20T06:03:52Z

Thanks, works very well

gabriel19913 · 2019-10-18T17:31:37Z

In this question on stackoverflow, an user said that this function doesn't cover all emojis, so it is better to use:

def strip_emoji(text):
    RE_EMOJI = re.compile(u'([\U00002600-\U000027BF])|([\U0001f300-\U0001f64F])|([\U0001f680-\U0001f6FF])')
    return RE_EMOJI.sub(r'', text)

mgaitan · 2020-03-11T12:40:06Z

for the record, this is the pattern we are using

# https://en.wikipedia.org/wiki/Unicode_block
EMOJI_PATTERN = re.compile(
    "["
    "\U0001F1E0-\U0001F1FF"  # flags (iOS)
    "\U0001F300-\U0001F5FF"  # symbols & pictographs
    "\U0001F600-\U0001F64F"  # emoticons
    "\U0001F680-\U0001F6FF"  # transport & map symbols
    "\U0001F700-\U0001F77F"  # alchemical symbols
    "\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
    "\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
    "\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
    "\U0001FA00-\U0001FA6F"  # Chess Symbols
    "\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
    "\U00002702-\U000027B0"  # Dingbats
    "\U000024C2-\U0001F251" 
    "]+"
)```

mghayour · 2020-04-01T14:01:31Z

@mgaitan it works perfectly for me, thanks a lot 💖

def add_space_between_emojies(text):
  # Ref: https://gist.github.com/Alex-Just/e86110836f3f93fe7932290526529cd1#gistcomment-3208085
  # Ref: https://en.wikipedia.org/wiki/Unicode_block
  EMOJI_PATTERN = re.compile(
    "(["
    "\U0001F1E0-\U0001F1FF"  # flags (iOS)
    "\U0001F300-\U0001F5FF"  # symbols & pictographs
    "\U0001F600-\U0001F64F"  # emoticons
    "\U0001F680-\U0001F6FF"  # transport & map symbols
    "\U0001F700-\U0001F77F"  # alchemical symbols
    "\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
    "\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
    "\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
    "\U0001FA00-\U0001FA6F"  # Chess Symbols
    "\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
    "\U00002702-\U000027B0"  # Dingbats
    "])"
  )
  text = re.sub(EMOJI_PATTERN, r' \1 ', text)
  return text

EDIT:
i deleted last one "\U000024C2-\U0001F251" , because it matches persian characters, that makes bug for me

nestukh · 2020-04-03T20:32:00Z

hello, I credited your work for a workaround in a youtube-dl issue:
ytdl-org/youtube-dl#5042 (comment)
it has helped a lot, thank you.

Shellbye · 2020-06-09T10:28:39Z

In case someone like has from __future__ import unicode_literals at the top, then you need to escape "-" like this:

    EMOJI_PATTERN = re.compile(
    "["
    "\U0001F1E0-\U0001F1FF"  # flags (iOS)
    "\U0001F300-\U0001F5FF"  # symbols & pictographs
    "\U0001F600-\U0001F64F"  # emoticons
    "\U0001F680-\U0001F6FF"  # transport & map symbols
    "\U0001F700-\U0001F77F"  # alchemical symbols
    "\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
    "\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
    "\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
    "\U0001FA00-\U0001FA6F"  # Chess Symbols
    "\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
    "\U00002702-\U000027B0"  # Dingbats
    "\U000024C2-\U0001F251" 
    "]+"
    )

or you will got a bad character range like in this SO

Lakril · 2021-04-07T23:34:03Z

Thanks for you help.

def add_space_between_emojies(text):
    '''
    >>> add_space_between_emojies('Python is fun 💚')
    'Python is fun '    
    '''
    from advertools.emoji import EMOJI
    EMOJI_PATTERN = EMOJI
    text = re.sub(EMOJI_PATTERN, r'', text)
    return text

clichedmoog · 2021-05-05T09:56:30Z

Sorry to say this but I think @mgaitan's regex is not perfect.
The recent emoji character includes various combinations and patterns so it would be more complex expression.
And this would be good implementation example by javascript: https://github.com/mathiasbynens/emoji-regex

mgaitan · 2021-05-13T18:17:18Z

@clichedmoog you are totally right, everything here is a simplification

. For a complete/accurate emoji remover for python I recommend the library https://github.com/bsolomon1124/demoji which download the latest emoji specification to build the pattern. It's not super fast but it's exhaustive.

	import re

	# http://stackoverflow.com/a/13752628/6762004
	RE_EMOJI = re.compile('[\U00010000-\U0010ffff]', flags=re.UNICODE)

	def strip_emoji(text):
	return RE_EMOJI.sub(r'', text)

	print(strip_emoji('🙄🤔'))

Alex-Just/strip_emoji.py

joc32 commented Jun 26, 2018

GMTernence commented Jan 13, 2019

swjmj commented Aug 20, 2019

gabriel19913 commented Oct 18, 2019 •

edited

Loading

mgaitan commented Mar 11, 2020

mghayour commented Apr 1, 2020 •

edited

Loading

nestukh commented Apr 3, 2020

Shellbye commented Jun 9, 2020 •

edited

Loading

Lakril commented Apr 7, 2021 •

edited

Loading

clichedmoog commented May 5, 2021

mgaitan commented May 13, 2021

Alex-Just/strip_emoji.py

joc32 commented Jun 26, 2018

GMTernence commented Jan 13, 2019

swjmj commented Aug 20, 2019

gabriel19913 commented Oct 18, 2019 • edited Loading

mgaitan commented Mar 11, 2020

mghayour commented Apr 1, 2020 • edited Loading

nestukh commented Apr 3, 2020

Shellbye commented Jun 9, 2020 • edited Loading

Lakril commented Apr 7, 2021 • edited Loading

clichedmoog commented May 5, 2021

mgaitan commented May 13, 2021

gabriel19913 commented Oct 18, 2019 •

edited

Loading

mghayour commented Apr 1, 2020 •

edited

Loading

Shellbye commented Jun 9, 2020 •

edited

Loading

Lakril commented Apr 7, 2021 •

edited

Loading