Skip to content

Instantly share code, notes, and snippets.

@julp
Last active December 24, 2015 04:19
Show Gist options
  • Select an option

  • Save julp/6743297 to your computer and use it in GitHub Desktop.

Select an option

Save julp/6743297 to your computer and use it in GitHub Desktop.
Benchmarks between 2 implementations of Collator::replace and Regexp::replaceCallback

// #define UTF8_REPLACEMENT 1

Changes:

  1. convert replacement from UTF-8 to UTF-16 (before processing)
  2. copy (strdup) UTF-16 version of suject as result (before processing)
  3. convert back result from UTF-16 to UTF-8 (before returning)
  4. we directly work on UTF-16 code unit offset (internal for replacements)

$string = str_repeat('Hayır İyi', 1);

1 iteration

real 0m0.022s user 0m0.018s sys 0m0.004s

1 000 000 iterations

real 0m18.278s user 0m18.261s sys 0m0.010s

$string = str_repeat('Hayır İyi', 10000);

1 iteration

real 0m0.093s user 0m0.085s sys 0m0.008s

10 000 iterations

real 17m4.348s user 16m59.614s sys 0m4.092s

#define UTF8_REPLACEMENT 1

Changes:

  1. we don't need to convert replacement in UTF-16
  2. copy (strdup) UTF-8 version of suject as result (before processing)
  3. result is already in UTF-8 (before returning)
  4. we need to convert length in code points (UTF-16 => UTF-8 to map offsets) (internal for replacements)

$string = str_repeat('Hayır İyi', 1);

1 iteration

real 0m0.022s user 0m0.014s sys 0m0.008s

1 000 000 iteration

real 0m21.291s user 0m21.263s sys 0m0.015s

$string = str_repeat('Hayır İyi', 10000);

1 iteration

real 0m4.960s user 0m4.951s sys 0m0.006s

10 000 iteration

too long (more than 13h ?)

<?php
$coll = new Collator('tr_tr');
$coll->setStrength(Collator::SECONDARY);
$search = 'I';
$string = str_repeat('Hayır İyi', 1);
for ($i = 0; $i < 1; $i++) {
$coll->replace($string, $search, '<replaced>');
}
<?php
$re = new Regexp('i', 'i');
$string = str_repeat('Hayır İyi', 1);
for ($i = 0; $i < 1; $i++) {
$re->replaceCallback(
$string,
function ($matches) {
static $m = 0;
return '<replacement n°' . $m++ . '>';
}
);
}

// #define UTF8_REPLACEMENT 1

Changes:

  1. copy (strdup) UTF-16 version of suject as result (before processing)
  2. for each match, convert replacement (callback returned value) from UTF-8 to UTF-16 (while processing)
  3. convert back result from UTF-16 to UTF-8 (before returning)
  4. we directly work on UTF-16 code unit offset (internal for replacements)

$string = str_repeat('Hayır İyi', 1);

1 iteration

real 0m0.023s user 0m0.015s sys 0m0.007s

1 000 000 iterations

real 0m14.406s user 0m14.394s sys 0m0.006s

$string = str_repeat('Hayır İyi', 10000);

1 iteration

real 0m0.140s user 0m0.131s sys 0m0.009s

10 000 iterations

real 28m54.813s user 28m47.893s sys 0m5.892s

#define UTF8_REPLACEMENT 1

Changes:

  1. we don't need to convert replacement in UTF-16
  2. copy (strdup) UTF-8 version of suject as result (before processing)
  3. for each match, do inline replacement into result (UTF-8 returned by the callback and result is in UTF-8, no conversion)
  4. result is already in UTF-8 (before returning)

$string = str_repeat('Hayır İyi', 1);

1 iteration

real 0m0.021s user 0m0.014s sys 0m0.006s

1 000 000 iteration

real 0m14.365s user 0m14.351s sys 0m0.005s

$string = str_repeat('Hayır İyi', 10000);

1 iteration

real 0m5.595s user 0m5.585s sys 0m0.006s

10 000 iteration

too long (more than 16h ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment