A regular expression (shortened as regex) is a sequence of characters that specifies a search pattern in text. The patterns are often used by string-search algorithms for "find" or " find and replace" operations or for input validation (i.e if input matches requirement, e.g. if you input a phone that is in fact a phone number and not a name of a country). See more on wikipedia.
We have all been asked to create a password and told if that our password is not strong enough or does not contain a special character. As you might have guessed, password regex is behind this! In this regex tutorial I will be describing how to understand a strong password regex. A strong password regex looks like this:
^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#\$%\^&\*])(?=.{8,})
This regex validates the password to ensure it meets all five requirements:
(?=.*[a-z])
at least 1 lowercase alphabetical character(?=.*[A-Z])
at least 1 uppercase alphabetical character(?=.*[0-9])
at least 1 numeric character(?=.*[!@#\$%\^&\*])
at least one special character,(?=.{8,})
must be eight characters or longer.
This graph visualizes the process:
- Anchors
- Quantifiers
- OR Operator
- Character Classes
- Flags
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
- Boundaries
- Back-references
- Look-ahead and Look-behind
Anchors are speical characters in regex. They assert that the engine's current position in the string matches a determined location: for instance, the beginning of the string, or the end of a line. In our case, we use the start anchor:^
The caret sign matches the start of the string that regex pattern is applied to. In other words, this is to say we are validating this password starting from the beginning of text.
Another anchor that is not used here but you might often see is $
. The dollar sign matches at the end of the string the regex pattern is applied to.
Quantifiers is the numbers of matches or instances of a character, group, or character class in a string. The quantifier in our example is {8,}
, which specifies at least 8 chacarters (or matches) is required to pass this validation. ,
means at least
. If we want to check string has exactly 8 matches (or 8 characters long), we can use {8}
.
?
is another common quantifier, but in our example, the (?=.{8,})
is actually NOT a quantifier. I will explain what it is in later section.
There is no OR opertator in this regex. But in general |
means OR
. In our example, we are using an "AND" notation, denoted by ()()
, which means all these conditions need to be met. For a strong password, we have 5 conditions to be matched.
Character classes are the characters you want the regex to match out of several characters. To do this, you can put the characters you'd like to match in []
. There are many use cases in our example. [a-z]
means to match a lower case letter from a -z . [0-9]
means to match a number from 0 to 9, whereas [!@#\$%\^&\*]
means the password needs to match one of the special characters inside the squared bracket.
.
as in (?=.*[0-9])
is a matacharacter sequence and it means to match any signle character.
Flags are tokens that modify its search behavior. We dont have any flags here but one example is i
, which means ignore caseing in the search
.
Grouping and Capturing is a way to treat characters in a single unit. We don't have this in our example.
Bracket Expressions are closely related to character classes. We enclose a list of charaters by [
and ]
. And the expression that is being validated need to match any single charater in that list.
Greedy match means to match the longest string possible, i.e the whole string. In other words, it means to keep searching until condistion is not satisfied.
Lazy match means it will match the smallest group possile. That is to say, it will stop searching as soon as it finds a match.
In our example, we are using the greedy match quantifier *
. In the context of (?=.*[a-z])
, to match all the [a-z]
string. But actually, we could have used the lazy match quantifier *?
and have our expression written as (?=.*?[a-z])
. This would have meant to stop searching as soon as we find one of the lower case character. And it would have also worked for our purpose -- to verify the string has at least one lower case character.
There is no boundaries in our regex but word boundaries refer to the metacharacter \b
, which is similar to the ^
and the $
to indicate position. Simply put, you wrap \b
around a word so you can perform a whole word search. E.g. \b word \b
means to search the whole word "word". They are similar to anchors.
Back-references match the same text previously matched by a capturing group. We don't use it here.
Look-ahead and look-behind are called "lookaround" collectively. They indicate, similar to the boundaries, where to start and end the match. The difference is that they don't actually return to the actual match but to "match" or "no match" result.
For example, in our case, the Look-ahead and Look-behind is the (?=)
. To be precise, this is called a positive look ahead, ^(?=.*[0-9])
mean to from start of the string, look for the character class[0-9]
. What worth noting is this does not return to which number(s) the password match, but if the password has a match or not, i.e. if the password contains a number.
- Grace Liu (Checkout my GitHub)
- Link to test your password https://www.debuggex.com/r/Se0ZWopvjRt8ay-m