Decoding Email Validation with Regular Expressions

Welcome to this regex tutorial! Today, we're diving into the world of regular expressions. Specifically, we'll be breaking down a regular expression that's widely used for validating email addresses. By understanding each part of this regular expression (or regex), you'll gain a useful tool for text manipulation and validation.

Summary

Our focus will be a regex that is designed to confirm whether or not an email follows the typical structure: [email protected]. Here is the regex we'll be working with:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

We'll walk through each piece of this regex, demystifying its purpose and role in the validation process.

Anchors
Quantifiers
Character Classes
Grouping and Capturing
Bracket Expressions
Greedy and Lazy Match

Regex Components

Anchors

Our regex opens with ^ and and closes with $. These are known as anchors. They mark the start and the end of a line. We're using them to ensure our entire string matches the pattern.

Quantifiers

You'll see a few + in our regex - that's a quantifier. It signals that the preceding character or group can show up once or more times.

Character Classes

The \d and a-z are character classes in our regex. \d matches any digit, while a-z matches any lowercase letter.

Grouping and Capturing

Those parentheses () you're seeing are for grouping and capturing. We're using them to "capture" or "remember" specific parts of the email address.

Bracket Expressions

The bracket expressions, namely [a-z0-9_\.-], [\da-z\.-], and [a-z\.]{2,6}, match a single character that's contained within the brackets.

Greedy and Lazy Match

Our regex doesn't use lazy matches, but it does use greedy matches. The + quantifier means it'll match as many characters as possible.

Breakdown of the regex

Now that we've gone through each type of component used in the regex, let's walk through the regex from beginning to end and break itn down into its individual components:

As a reminder, here is our regex:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
This is known as the start anchor. It asserts the start of a line. The expression will match only when the pattern is at the beginning of the string.

/^([a-z0-9_\.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
This is a group that matches one or more (+ is a quantifier that matches 1 or more of the preceding token) of the following characters:

a-z Any lowercase alphabetic character.
0-9 Any digit. This is equivalent to 0-9.
_ An underscore.
. A period (it needs to be escaped using a backslash because a non-escaped period . in regex means "any character").
- A hyphen.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
Now that we are done with the prefix of the email address, we match the the literal '@' character before we move on to the email domain.

/^([a-z0-9_.-]+)@([\da-z\.-]+).([a-z.]{2,6})$/
For the email domain, we capture another group that matches one or more of the following:

\d Any digit. This is equivalent to 0-9.
a-z Any lowercase alphabetic character.
. A period (again, escaped).
- A hyphen.

/^([a-z0-9_.-]+)@([\da-z.-]+)\.([a-z.]{2,6})$/
Now we are done with the domain. The next part of the regex is \. This matches the literal '.' character. It's used to separate the domain and the extension in the email address.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z\.]{2,6})$/
This is the final group that matches between 2 and 6 ({2,6} is a quantifier that matches between 2 and 6 of the preceding token) of the following:

a-z Any lowercase alphabetic character.
. A period (again, escaped). This is to allow multiple extensions like .co.uk.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
$ This is the end anchor. It asserts the end of a line. The expression will match only when the pattern is at the very end of the string.

Author

This tutorial was put together by Matt Turner

maaront/email-regex-tutorial.md