Hello software engineers and computer scientists! :)
I'm working on open source JavaScript project which has css parser as one if its modules. Node --perf shows me that the cricical part is REGEXP which is used to find the next word boundary.
const RE_WORD_END = /[ \n\t\r\f\(\)\{\}:;@!'"\\\]\[#]|\/(?=\*)/g;
// pos - current boundary
// next - next boundary which the code has to find
RE_WORD_END.lastIndex = pos + 1;
RE_WORD_END.test(css);
if ( RE_WORD_END.lastIndex === 0 ) {
next = css.length - 1;
} else {
next = RE_WORD_END.lastIndex - 2;
}
My first guess was that regexps are slow and I refactored this part as follow:
const WORD_END_CHARS_CODES = [ // sorted list of char codes which correspond to word end symbols ]
for (let i = pos + 1; i < css.length; i++) {
if (binarysearch(WORD_END_CHARS_CODES, css.charCodeAt(i))) {
return i - 1;
}
}
return css.length - 1;
It became slower! (~1.5x slower, 6ms vs 10ms).
Calling to parsers ninjas, what approach should I try?
PS. The project is postcss if you'd ask me.