Skip to content

Instantly share code, notes, and snippets.

@bramswenson
Last active August 29, 2015 14:15
Show Gist options
  • Save bramswenson/04d6bb7ccade1ad2b283 to your computer and use it in GitHub Desktop.
Save bramswenson/04d6bb7ccade1ad2b283 to your computer and use it in GitHub Desktop.
replace greater than 3 byte utf8
# encoding: utf-8
require 'gemoji'
require 'benchmark'
require 'benchmark/ips'
task :default => :benchmark
task :benchmark do
regex = /([^\u0000-\uD7FF\uE000-\uFFFF])/
replacement = "�"
test_string = "this is my 😊, there are many like it but this one is mine"
Benchmark.ips do |x|
x.report 'String#gsub regex' do
test_string.gsub(regex, replacement)
end
x.report 'String#gsub with gemoji' do
test_string.gsub(regex) { |e| ":#{Emoji.find_by_unicode(e).aliases.first}:" }
end
x.report 'String#chars bytesize' do
"".tap do |out_str|
test_string.chars.each do |char|
if char.bytesize > 3
out_str << replacement
else
out_str << char
end
end
end
end
x.report 'Demoji Gem Style' do
"".tap do |out_str|
# for instead of split and joins for perf
for i in (0...test_string.length)
char = test_string[i]
char = replacement if char.ord > 65535
out_str << char
end
end
end
x.compare!
end
end
rake benchmark
Calculating -------------------------------------
String#gsub regex 27.135k i/100ms
String#gsub with gemoji
21.825k i/100ms
String#chars bytesize
6.193k i/100ms
Demoji Gem Style 5.812k i/100ms
-------------------------------------------------
String#gsub regex 295.730k (± 4.4%) i/s - 1.492M
String#gsub with gemoji
266.580k (± 4.0%) i/s - 1.331M
String#chars bytesize
84.961k (± 4.3%) i/s - 427.317k
Demoji Gem Style 59.164k (± 3.2%) i/s - 296.412k
Comparison:
String#gsub regex: 295730.2 i/s
String#gsub with gemoji: 266580.3 i/s - 1.11x slower
String#chars bytesize: 84961.4 i/s - 3.48x slower
Demoji Gem Style: 59164.3 i/s - 5.00x slower
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment