Brief Tutorial on Matching an Email with Regex

In this tutorial, I will disect the logic behind the Regex email validation syntax, so that you can easily customize and incorporate it into your next project that may require an email validation.

Summary

Regular expressions, Regex, enable programmers to identify/match patterns within a string. Therefore, they are an invaluable tool that every programmer must get familiar with, as they are advancing in their quest for writing clean code.

Even though this tutorial only focuses on the following email validation syntax:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/,

by the end of the tutorial, you will have a clear understanding of how to customize the syntax for your own use.

Grouping and Capturing
Anchors
Character Classes
Bracket Expressions
Quantifiers
Wanna Learn More?

Regex Components

Grouping and Capturing

Parantheses () are used to create or capture a group of characters. The values inside the parantheses are treated as a single unit. Furthermore, they provide visual clarity to the expression by helping us modularize our logic.

Let's take a closer look at our example /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/. Did you notice the modularization in the form of (something)@(something).(something)?

Anchors

Anchors, such as ^ and $, are used to identify the position of a character. While the caret ^ points out the first character's position, $ points out the ending character's position in the string. In our example, the user's email must start with this ([a-z0-9_\.-]+) group of characters and end with this ([a-z\.]{2,6})group of characters.

Character Classes

While anchors help us specify the location of a character, character classes enable to us to specify the type of the character that we want to match. It is no wonder that character classes are the most-commonly used feature in Regex. Let's examine the character classes in our example.

The \d in our expression represents a specail character class that matches any ONE digit between 0 and 9.

We must pay special attention to any character following a backslash \ , as it has special powers and flavors in Regex. For instance, \w is the shortcut to match [A-Za-z0-9_] , which translates as the following:

any lowercase letter between a and z, OR uppercase letter A and Z, OR a digit between 0 and 9, OR the underscore_ character.

The opposite of \w is no other than \W which refers to any character not is not covered by /w.

In the end, the backslash \ makes it possible to distinguish the ordinary letter "d" from the extraordinary character d and gives it the super powers to represent any ONE digit between 0 and 9. Given the relationship between /w and \W, what kind of powers do you think \D possess? If you guessed that \D matches any ONE character that is not a digit, you are absolutely on point!

Lastly, it is important to note that the \s matches any whitespace character. If you want to search for anything except whitespace, again, you would use \S .

Another commonly-used character class is the wildcard character . , which looks like a regular dot(.) but has hidden superpowers such that it will match any character. For instance, if you wanted to match any word that starts with "ca", your regex expression /ca./ would match any word that you are looking for such as cat, car, cap, cal, cag, etc .

Please note that this wildcard character . is NOT present in our expression. Furthermore, because of the extraordinary powers of the wildcard . character has without the presense of backslash, in order to identify a regular dot(.) character, we must use a backslach in front of the regular dot(.) to distinguish it from the wildcard . . So the " . " charcters in our expression are refering to the presense of a regular dot(.) .

Bracket Expressions

Characters between square brackets [] give the user a range of character options to choose from. In its most simple form, for instance, [a-z] denotes that user can use any ONE lowercase character between a and z. In other words, the bracket expression represents a single character while providing the user with a range of characters that the user is allowed to choose from.

In our example, we have the following three bracket expressions:

[a-z0-9_\.-] , [\da-z\.-] , and [a-z\.].

Let's examine the first example [a-z0-9_\.-] . This bracket expression allows the user to use any one character as long as it is either a lowercase letter between a and z or a digit between 0 and 9 or an underscore _ or a regular dot \. or a hyphen - .

Quantifiers

Quantifiers, such as + and {}, determine how many times a character can be used.

The + quantifier in our example denotes that the preceeding character or character group must appear at least ONCE with no upper limit. In our example we have the following two character group that this quantifier makes a reference to: [a-z0-9_\.-] and [\da-z\.-]. Again, we are using + to check if the character is present consequently and repeating at least once. To meet this requirement, the user email address must contain at least two characters from the character group preceeding the + quantifier in our example.

On the other hand, the {2,6} quantifier specifies the lower and upper character limits for the character group that preceeds it. In our example, any character within the following character group must appear at least twice and not more than six times: [a-z\.]. Can you guess which part of the email structure we are using this quantifier for? You've guessed it right! It is the domain extension followed by dot . , as in [email protected]/.net/.io/.edu/.gov/... .

Wanna Learn More?

If you want to go beyond our syntax and create your own examples, check out this tool.

-HAPPY CODING!!! :)

Author

I am a full-stack web developer currently exploring the MERN stack. For questions and feedback, please contact me via my email or my GitHub

benkaan001/myFirstGist.md