Last active
June 25, 2021 17:39
-
-
Save jerinphilip/439ba3b25cdd0d8727b0c80956340024 to your computer and use it in GitHub Desktop.
Emoji (oov) replacement from source
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def replace_unk_from_source(service): | |
def __replace_unk_from_source(source): | |
opts = ResponseOptions() | |
opts.alignment = True | |
opts.alignmentThreshold = 1.0 # hardAlignment | |
response = service.translate(source, opts) | |
target = response.target.text | |
replace_ops = [] | |
for sentenceIdx, alignment in enumerate(response.alignments): | |
for point in alignment: | |
if response.source.isUnknown(sentenceIdx, point.src): | |
source = response.source.text | |
sourceByteRange = response.source.wordAsByteRange(sentenceIdx, point.src) | |
targetByteRange = response.target.wordAsByteRange(sentenceIdx, point.tgt) | |
replace_ops.append((targetByteRange, sourceByteRange)) | |
replace_ops = sorted(replace_ops, key=lambda x: x[0].begin) | |
sourceBytes = bytearray(response.source.text.encode()) | |
targetBytes = bytearray(response.target.text.encode()) | |
previous = 0 | |
replaced = bytearray() | |
for tbr, sbr in replace_ops: | |
replaced += targetBytes[previous:tbr.begin] | |
replaced += sourceBytes[sbr.begin:sbr.end] | |
previous = tbr.end | |
replaced += targetBytes[previous:] | |
return replaced.decode("utf-8") | |
return __replace_unk_from_source |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Sample 1 ---- | |
[src] > Pleading Face is the third most popular emoji used on Twitter, and the most commonly found emoji in tweets that include hearts. Used in sequence with pointing hands to indicate a bashful or shy pose (🥺👉👈) particularly on TikTok. | |
[tgt] < Pleading Gesicht ist die drittbeliebteste Emoji auf Twitter verwendet, und die am weiteesten gefunden E-E-Emoji in Tweets, die Herzen einschließen. Verwendet in der Sequenz mit punktierenden Händchen, um eine verschämte oder scheutliche Pose (🥺👉👈 besonders auf TikTok anzudeuten. . | |
-- Sample 2 ---- | |
[src] > നിങ്ങളുടെ ഈ വിഡിയോ കണ്ടതിനു ശേഷം Michael jackson ന്റെ ഗാനമാണ് ഓർമ വന്നത് just because you read it in a magazine Or see it on the TV screen Don't make it factual, actual | |
[tgt] < Nur weil Sie es auf dem TV-Fernsehlein auf den Fernsehbildschirm lesen. Machen Sie es nicht s sachlich, tatsächlich, aktuell, tatsächlich | |
-- Sample 3 ---- | |
[src] > 🥺 Face with Pleading Eyes | |
[tgt] < 🥺 mit plädierenden Sehnsüchten | |
-- Sample 4 ---- | |
[src] > I'm sure I would be!😘 | |
[tgt] < Ich bin mir sicher, dass ich es sein würde!😘 | |
-- Sample 5 ---- | |
[src] > I was bored at work 👌 | |
[tgt] < Ich langweilte mich bei der Arbeit👌👌👌👌 | |
-- Sample 6 ---- | |
[src] > Have a great day 😊 Damn. | |
[tgt] < Haben Sie einen tollen Tag😊dammt. Verdammt. | |
-- Sample 7 ---- | |
[src] > <--------- | |
[tgt] < <--------------- | |
-- Sample 8 ---- | |
[src] > ¯\\_(ツ)_/¯ | |
[tgt] < "¯\\_(ツ)_/ ¯¯ | |
-- Sample 9 ---- | |
[src] > ;) | |
[tgt] < ; ;) | |
-- Sample 10 ---- | |
[src] > :-) | |
[tgt] < :--) | |
-- Sample 11 ---- | |
[src] > :/ | |
[tgt] < :// | |
-- Sample 12 ---- | |
[src] > :/ | |
[tgt] < :// | |
-- Sample 13 ---- | |
[src] > :’) | |
[tgt] < :’) | |
-- Sample 14 ---- | |
[src] > :( | |
[tgt] < : . | |
-- Sample 15 ---- | |
[src] > :) | |
[tgt] < :) . | |
-- Sample 16 ---- | |
[src] > !” | |
[tgt] < !””” | |
-- Sample 17 ---- | |
[src] > !🤣 | |
[tgt] < !🤣🤣🤣 | |
-- Sample 18 ---- | |
[src] > ! | |
[tgt] < ! ! | |
-- Sample 19 ---- | |
[src] > ??? | |
[tgt] < . . ??? | |
-- Sample 20 ---- | |
[src] > ?😋 | |
[tgt] < . ?😋 | |
-- Sample 21 ---- | |
[src] > ? | |
[tgt] < ? . | |
-- Sample 22 ---- | |
[src] > . | |
[tgt] < . . . | |
-- Sample 23 ---- | |
[src] > "* | |
[tgt] < " "* | |
-- Sample 24 ---- | |
[src] > (: | |
[tgt] < ( Folgendes: Die | |
-- Sample 25 ---- | |
[src] > (?) | |
[tgt] < (?)?)? | |
-- Sample 26 ---- | |
[src] > ): | |
[tgt] < : . | |
-- Sample 27 ---- | |
[src] > )... | |
[tgt] < ) .... | |
-- Sample 28 ---- | |
[src] > ). | |
[tgt] < . ). | |
-- Sample 29 ---- | |
[src] > [?] | |
[tgt] < [?]?]?] | |
-- Sample 30 ---- | |
[src] > @ | |
[tgt] < @: @. | |
-- Sample 31 ---- | |
[src] > * 😂 | |
[tgt] < * 😂 😂 | |
-- Sample 32 ---- | |
[src] > *- | |
[tgt] < *-- | |
-- Sample 33 ---- | |
[src] > *** | |
[tgt] < *** - *** | |
-- Sample 34 ---- | |
[src] > ** | |
[tgt] < ** .n. | |
-- Sample 35 ---- | |
[src] > * | |
[tgt] < * . - | |
-- Sample 36 ---- | |
[src] > 👌? | |
[tgt] < 👌 -? | |
-- Sample 37 ---- | |
[src] > ༽つ** | |
[tgt] < ༽つ༽つ** | |
-- Sample 38 ---- | |
[src] > 😔🤦🏾♂️ | |
[tgt] < 😔🤦🏾♂️ 😔🤦🏾♂️ | |
-- Sample 39 ---- | |
[src] > 🤷🏼♀️ | |
[tgt] < 🤷🏼♀️ 🤷🏼♀️ | |
-- Sample 40 ---- | |
[src] > 🤷♀️ | |
[tgt] < 🤷♀️ 🤷♀️ | |
-- Sample 41 ---- | |
[src] > 😂😂 | |
[tgt] < 😂😂 😂😂 | |
-- Sample 42 ---- | |
[src] > 😅😂 | |
[tgt] < 😅😂 😅😂 | |
-- Sample 43 ---- | |
[src] > ♥♥ | |
[tgt] < ♥♥ ♥♥ | |
-- Sample 44 ---- | |
[src] > 😒 | |
[tgt] < 😒 😒 | |
-- Sample 45 ---- | |
[src] > 😂 | |
[tgt] < 😂 😂 | |
-- Sample 46 ---- | |
[src] > 0. | |
[tgt] < 0. 0.. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment