Welcome to this regex tutorial! Today, we're diving into the world of regular expressions. Specifically, we'll be breaking down a regular expression that's widely used for validating email addresses. By understanding each part of this regular expression (or regex), you'll gain a useful tool for text manipulation and validation.
Our focus will be a regex that is designed to confirm whether or not an email follows the typical structure: [email protected]
. Here is the regex we'll be working with:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
We'll walk through each piece of this regex, demystifying its purpose and role in the validation process.
- Anchors
- Quantifiers
- Character Classes
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
Our regex opens with ^
and and closes with $
. These are known as anchors. They mark the start and the end of a line. We're using them to ensure our entire string matches the pattern.
You'll see a few +
in our regex - that's a quantifier. It signals that the preceding character or group can show up once or more times.
The \d
and a-z
are character classes in our regex. \d
matches any digit, while a-z
matches any lowercase letter.
Those parentheses ()
you're seeing are for grouping and capturing. We're using them to "capture" or "remember" specific parts of the email address.
The bracket expressions, namely [a-z0-9_\.-]
, [\da-z\.-]
, and [a-z\.]{2,6}
, match a single character that's contained within the brackets.
Our regex doesn't use lazy matches, but it does use greedy matches. The +
quantifier means it'll match as many characters as possible.
Now that we've gone through each type of component used in the regex, let's walk through the regex from beginning to end and break itn down into its individual components:
As a reminder, here is our regex:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
/^
([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
This is known as the start anchor. It asserts the start of a line. The expression will match only when the pattern is at the beginning of the string.
/^([a-z0-9_\.-]+)
@([\da-z.-]+).([a-z.]{2,6})$/
This is a group that matches one or more (+
is a quantifier that matches 1 or more of the preceding token) of the following characters:
a-z
Any lowercase alphabetic character.
0-9
Any digit. This is equivalent to 0-9.
_
An underscore.
.
A period (it needs to be escaped using a backslash because a non-escaped period . in regex means "any character").
-
A hyphen.
/^([a-z0-9_.-]+)@
([\da-z.-]+).([a-z.]{2,6})$/
Now that we are done with the prefix of the email address, we match the the literal '@' character before we move on to the email domain.
/^([a-z0-9_.-]+)@([\da-z\.-]+)
.([a-z.]{2,6})$/
For the email domain, we capture another group that matches one or more of the following:
\d
Any digit. This is equivalent to 0-9.
a-z
Any lowercase alphabetic character.
.
A period (again, escaped).
-
A hyphen.
/^([a-z0-9_.-]+)@([\da-z.-]+)\.
([a-z.]{2,6})$/
Now we are done with the domain. The next part of the regex is \.
This matches the literal '.' character. It's used to separate the domain and the extension in the email address.
/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z\.]{2,6})
$/
This is the final group that matches between 2 and 6 ({2,6} is a quantifier that matches between 2 and 6 of the preceding token) of the following:
a-z
Any lowercase alphabetic character.
.
A period (again, escaped). This is to allow multiple extensions like .co.uk.
/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
$
This is the end anchor. It asserts the end of a line. The expression will match only when the pattern is at the very end of the string.
This tutorial was put together by Matt Turner