Last active
August 29, 2015 14:15
-
-
Save bramswenson/04d6bb7ccade1ad2b283 to your computer and use it in GitHub Desktop.
replace greater than 3 byte utf8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# encoding: utf-8 | |
require 'gemoji' | |
require 'benchmark' | |
require 'benchmark/ips' | |
task :default => :benchmark | |
task :benchmark do | |
regex = /([^\u0000-\uD7FF\uE000-\uFFFF])/ | |
replacement = "�" | |
test_string = "this is my 😊, there are many like it but this one is mine" | |
Benchmark.ips do |x| | |
x.report 'String#gsub regex' do | |
test_string.gsub(regex, replacement) | |
end | |
x.report 'String#gsub with gemoji' do | |
test_string.gsub(regex) { |e| ":#{Emoji.find_by_unicode(e).aliases.first}:" } | |
end | |
x.report 'String#chars bytesize' do | |
"".tap do |out_str| | |
test_string.chars.each do |char| | |
if char.bytesize > 3 | |
out_str << replacement | |
else | |
out_str << char | |
end | |
end | |
end | |
end | |
x.report 'Demoji Gem Style' do | |
"".tap do |out_str| | |
# for instead of split and joins for perf | |
for i in (0...test_string.length) | |
char = test_string[i] | |
char = replacement if char.ord > 65535 | |
out_str << char | |
end | |
end | |
end | |
x.compare! | |
end | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rake benchmark | |
Calculating ------------------------------------- | |
String#gsub regex 27.135k i/100ms | |
String#gsub with gemoji | |
21.825k i/100ms | |
String#chars bytesize | |
6.193k i/100ms | |
Demoji Gem Style 5.812k i/100ms | |
------------------------------------------------- | |
String#gsub regex 295.730k (± 4.4%) i/s - 1.492M | |
String#gsub with gemoji | |
266.580k (± 4.0%) i/s - 1.331M | |
String#chars bytesize | |
84.961k (± 4.3%) i/s - 427.317k | |
Demoji Gem Style 59.164k (± 3.2%) i/s - 296.412k | |
Comparison: | |
String#gsub regex: 295730.2 i/s | |
String#gsub with gemoji: 266580.3 i/s - 1.11x slower | |
String#chars bytesize: 84961.4 i/s - 3.48x slower | |
Demoji Gem Style: 59164.3 i/s - 5.00x slower |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment