Regular expressions are a series of characters that create a pattern for the code to use in searching for specific strings within text. They are used for several different things, such as finding types of text within a string, verifying user-entered data such as card numbers, phone numbers, and email addresses.
The regex I have chosen to explain is that for verifying a social security number. The regex itself is ^(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$.
- Anchors
- Quantifiers
- OR Operator
- Character Classes
- Flags
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
- Boundaries
- Back-references
- Look-ahead and Look-behind
- Final analysis
We have a ^ as the beginning of the line, and $ as the end.
The first quantifier the (?!666|000|9\d{2}) and means that the first three characters may not be 666, 000. The 9\d{2} states that the first three digits cannot be between 900 or 999.
Shortly thereafter we have a \d{3}, requiring three digits to start. Then, another \d{2} and a \d{4}, requiring two and four digits after the dashes.
We use it to guarantee that the four digits after are not equal to 0: (?00{4}).
The or operator is used in the beginning section (?!666|000|9) to represent that none of the conditions for the first three characters may exist (no 666, OR 000, OR 900-999).
The \d is used to designate that only digits are accepted.
No flags are required or used.
We use () to group and capture the three digit sections. (?666|000|9\d{2}) groups the first elements that the first three digits must not contain. (?00) groups the requirement for the second set of digits and states that they cannot be equal to 00. The, we have another grouping (?!0{4}) to say that the last set of digits cannot be 0000.
There are none required.
The phrase (?!666|000|9\d{2}) contains a greedy or lazy match for any two digits after the 9. The \d{3} requires that there be a greedy or lazy match for any three digits (except the ones already excluded). Then we have another match for \d{2} for any two digits (except 00, which is previously excluded). Then the last set of digits is matched to \{4} (except for 0000). So it would lazily match anything from 0001 to 9999.
None used.
None
None
We start and end with the anchors: ^ and $. The first section uses the negation operator to state that the first three digits cannot be 666, 000, or 900-999: (?!666|000|9\d{2}). Then, we require any other three digits, with \d{3}.
Then, we match against a dash with -.
The second group of numbers cannot be 00. We use the following to state that: (!00). We add \d{2} after to require any two digits from 01 to 99.
Then another dash is required.
Lastly, we verify that the last four digits are not 0000 with (?0{4}). However, we do require that there are four digits with \d{4}, which will match anything between 0001 and 9999.
Jani Muhlestein is a software test engineer who is always interested in, and fascinate by, the software industry, and the continuously changing technology that drives it. My github is: https://github.com/janimuhlestein.