-
-
Save Davidebyzero/9090628 to your computer and use it in GitHub Desktop.
Well this is weird. In http://regexpal.com/ quantifiers are not allowed after assertions in Firefox, Chrome, or Opera. But in http://regex.alf.nu/ they're allowed after all assertions in Opera, and allowed after lookaheads in all three.
As for all the other bugs in Opera, and the octal escape ugliness in all three browsers, they happen in both sites.
There's another apparent break from the ECMAScript specification. It states that { and } are not valid PatternCharacters. However, in all three browsers, any {
or }
that is not in the precise form of a quantifier is treated as a literal. For example x{}
matches x{}
, x{,5}
matches x{,5}
, and the regexes x{1\}
, x\{1}
, and x{[1]}
all match the string x{1}
. (On the other hand, a quantifier that is in the precise form, but in the wrong context, is treated as an error.) The same happens in Perl and PCRE. I think this is a nice feature.
I think this is a nice feature.
We're just going to have to agree to disagree.
Found another bizarre PCRE bug:
$ echo 'abc def'|pcregrep -o '^.*?\b'
abc
$ echo 'abc def'|pcregrep -o '\babc'
abc
$ echo 'abc def'|pcregrep -o 'abc def\b'
abc def
$ echo 'aaa'|pcregrep -o '^.*?(?=a)'
a
$ echo 'aaa'|pcregrep -o '^.*?(?=aaa)'
$ echo 'aaa'|pcregrep -o '^.*(?=a)'
aa
$ echo 'aaa'|pcregrep -o '^.*?(^|$)'
aaa
$ echo 'aaa'|pcregrep -o '^.*?a'
a
Seems like a lazy search with a minimum count of 0 tries a count of 1 as the first possibility if the match following it is zero-length, only backtracking to a count of 0 for the match if it has to.
Perl does not have this bug:
$ echo 'abc def'|perl -E '@m = <> =~ /^.*?\b/g; print @m[0]'
$ echo 'abc def'|perl -E '@m = <> =~ /^.*\b/g; print @m[0]'
abc def
$ echo 'aaa'|perl -E '@m = <> =~ /^.*?(?=a)/g; print @m[0]'
$ echo 'aaa'|perl -E '@m = <> =~ /^.*(?=a)/g; print @m[0]'
aa
$ echo 'aaa'|perl -E '@m = <> =~ /^.*?(^|$)/g; print @m[0]'
$ echo 'aaa'|perl -E '@m = <> =~ /^.*?a/g; print @m[0]'
a
Weird. I'm surprised that there are so many bugs in common regex engines.
I implemented character classes :) and of course the first thing I tried was our robust Triples solution. It works perfectly.
I implemented character classes :) and of course the first thing I tried was our robust Triples solution. It works perfectly.
Brilliant.
Hi teukon!
Well I finally got it releasable and posted my regex engine on github. The name isn't final. There's still some polishing to be done (especially, adding parser error messages), but it is quite usable. Hopefully you'll be able to compile it without any trouble.
Great. I'll put this on my to-do list but I'm currently snowed under with work.
This gist has gotten very long, so I've started a new one to continue the discussion.
Ugh! So ugly.