Table Of Contents
- What's a pattern ?
- Positional characters
- Common matching characters
- Ranges, character sets and logic
- Quantifiers
- Capture groups
- Examples
A pattern is a string that contains some characters that are recognized and treated by a "regex engine". Patterns are used to match, i.e. check if a "subject string" comply to the pattern.
In patterns, there are two positional characters : ^
and $
.
^
asserts the start of the subject string.
$
asserts the end of the subject string.
.
: Any character%w
: Any character that is alphanumeric (from A to Z and from 0 to 9)%W
: Any character that is NOT alphanumeric%d
: Any character that is a digit (from 0 to 9)%D
: Any character that is NOT a number%s
: Any character that is whitespace (" ", tab, line breaks)%S
: Any character that is NOT whitespace
Inside of a pattern, you can add "ranges" in brackets.
[ab]
will search for a character that is either "a" OR "b"[^ab]
will search for a character that is neither "a" nor "b"[a-i]
will search for a character that is between "a" and "i" in the alphabet[6-8]
will search for a character that is between "6" and "8"[a-zA-Z0-9]
will search for : A character that is between "a" and "z" OR a character that is between "A" and "Z" OR a character that is between "0" and "9" The previous pattern is the same as%w
. If you want to match the[
or]
char literally, put a "%" before it.
Quantifiers are special characters that goes after any char, common matching char, range or charset.
*
will match the previous char 0 to n times.+
will match the previous char 1 to n times.?
will match the previous char 0 or 1 times.
Without capture groups, the whole matching subject string is returned. Sometimes only certain parts are required.
Capture groups defines what to return. Multiple capture groups, multiple values are returned.
To make a capture group, enclose what you want to return with parentheses.
If you want to match the (
or )
char literally, put a "%" before it.
Match a string that...
- contains "apples" anywhere in it :
apples
- contains "apples" at the start of the string :
^apples
- contains "apples" at the end of the string :
apples$
- has any 3 characters between "app" and "les" :
app...les
- contains a letter or a digit at any position :
%w
- contains a digit at the end of the string :
%d$
- contains either "duck" or "luck" at any position :
[dl]uck
- does not contain a number between 0 and 4 at the start of the string :
^[^0-4]
- contains "]" at the end of the string :
%]$
- is either "apple" or "apples" :
apples?
- is "a", or "aa", or "aaa", or "aaaa"... :
a+
Is there a way to find out how many capturing groups are present in any pattern?
I am generating the pattern for a given string on the fly, which may have
n
number of capturing groupfor eg.