Skip to content

Instantly share code, notes, and snippets.

@Integralist
Created August 3, 2011 16:23
Show Gist options
  • Save Integralist/1123058 to your computer and use it in GitHub Desktop.
Save Integralist/1123058 to your computer and use it in GitHub Desktop.
Problems with different regex implementations of lookbehind assertions
# taken from http://www.regular-expressions.info/lookaround.html
"The bad news is that most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards. Therefore, the regular expression engine needs to be able to figure out how many steps to step back before checking the lookbehind.
Therefore, many regex flavors, including those used by Perl and Python, only allow fixed-length strings. You can use any regex of which the length of the match can be predetermined. This means you can use literal text and character classes. You cannot use repetition or optional items. You can use alternation, but only if all options in the alternation have the same length.
PCRE is not fully Perl-compatible when it comes to lookbehind. While Perl requires alternatives inside lookbehind to have the same length, PCRE allows alternatives of variable length. Each alternative still has to be fixed-length.
Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths. Unfortunately, the JDK 1.4 and 1.5 have some bugs when you use alternation inside lookbehind. These were fixed in JDK 1.6.
The only regex engines that allow you to use a full regular expression inside lookbehind, including infinite repetition, are the JGsoft engine and the .NET framework RegEx classes"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment