Skip to content

Instantly share code, notes, and snippets.

@marywebby
Last active April 3, 2023 22:05
Show Gist options
  • Select an option

  • Save marywebby/aec20ca6fbec5e3cacd91bf5e823c4da to your computer and use it in GitHub Desktop.

Select an option

Save marywebby/aec20ca6fbec5e3cacd91bf5e823c4da to your computer and use it in GitHub Desktop.
In depth review on the email regex

Email Regular Expression

When it comes to needing to search through a string to find a certain phrase or wording, some developers may opt in to search by using expressions such as 'unsortedInputArray'.startsWith('u'). However, this could cause some confusion when there are multiple results for items starting with ('u'). So, to make this process simpiler, other developers will use regular expressions, also know as 'regex's'. These regex's take into account every possible input that the string could contain, and whether it fits the requirements the regex is asking for. Since a regex is just a series of characters called literals, it can be used in any programming language, making it very benifical and widely accepted.

Summary

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Briefly summarize the regex you will be describing and what you will explain. Include a code snippet of the regex. Replace this text with your summary.

In this gist, I will be summarizing the email regular expression, showing how each component is key in understanding how this regex helps find email adresses based off our requirments outlined in our regex. I will be covering everything from achors to groupings, and flags to boundries, so if you feel like learning about how to implement something like this into your own code, keep on reading!

Table of Contents

Regex Components

Anchors

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

With our chosen regular expression, we will begin by pointing out the defining anchors of ^ and $. These two characters will symbolize the begining and the end of the string, defining that whatever comes after or before them, will be the begining and the end of the string. In this case, with our email regex, /^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/, we use ^ to define that [a-zA-Z0-9.!#$%&’*+/=?^_{|}~-] will be the first string of characters, since it immediatly follows the ^. For the end of the string, the $ represent the ending marker for the regex. This means that (?:\.[a-zA-Z0-9-]+) shows what the end of the string will look like.

Quantifiers

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Quantifiers help with whether the limit requirements are being fulfilled with the answers given. In our specific email regex, the * towards the end of the regex means that the given answer matches the pattern zero or more times. Qualifiers can also be labeled as greedy, wanting as many matches as possible. So when we use *, we want to gain as many answers as possible that match what our regex wants.

OR Operator

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

The OR operator is used when there are multiple answers that can be accepted in a string. An example of this is (a|b|c):(x|y|z), here we can see that the regex will accept a:x, b:y, OR c:z and so on and so forth. For our specific email regex, we can see that {|} is included in the middle of the literal, this means that the string will take | as is, it is not being used as a 'this or that' situation.

Character Classes

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Character classes are used to group together a string of a few characters or more and make them more readable in the regex. An example of this can be \s, which can be used to describe any single whitespace character, including tabs and line breaks. Now for our own personal email regex we chose, we don't use any special character classes, only because our regex uses pretty baseline requirements for the matches it wants.

Flags

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Flags are used when writing regex's to help with finding the matches to that regex. We can have a few different types of flags in javascript, these being g, i, m, u, s, y. Each flag will provide you with a different way to search using a regex, for example, the g flag will provide you with a global search.

Grouping and Capturing

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Grouping is used by placing certain parts of the regex into parentheses to better organize the requirements when it grows to be too long and complicated. We can see this happening in our chosen regex where we have (?:\.[a-zA-Z0-9-]+). In our example though, we have a special type of grouping called non-capturing grouping, this ?: means that we uses it to match the text, but we will ignore it later in the final match.

Bracket Expressions

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Bracket expressions signify that anything that is inside these [] we want to capture. For example, in [abc] we want to capture anything that has a, b, or c in its string. And for [a-b], we can any numbers that range from a-b in lowercase. Our regex uses these a lot because emails can we a wide arrange of number, letters, and characters. So we can see this happening at [a-zA-Z0-9.!#$%&’*+/=?^_{|}~-], showing we want to capture all lowercase, uppercase, and special characters possible. We also see this a little later in our expression after the @ symbol, [a-zA-Z0-9-].

Greedy and Lazy Match

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Greedy and lazy matching is very similar to what we see earlier in our quantifiers section. Where a regexs specific quantifier will spell out how greedy it is. To make it a lazy match, we will add ? after it, meaning we want as few matches as possible.

Boundaries

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Boundries are very vimilar to anchors where they are used as anchor points in finding specific words that have a string at the begining or the end of the word. Our regex does not use this set up, but a good example of it would be /bcat/b , where this would match black cat, but not any words not starting in b that contain the word cat.

Back-references

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Back references are commands placed in a regex looking for a specific part of a string matching one previously used. They are indicated using \1, meaning to go back to the subexpression used previouly in the parentheses.

Look-ahead and Look-behind

/^[a-zA-Z0-9.!#$%&’*+/=?^_{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

Look-ahead and look-behind, also know as lookaround assertions, are used when trying to find if a certain string matches or doesn't match, sort of like a boolean, it has either two options, true or false. A good example of using this is when youre are trying to look for a word that has a q, followed by a u. We can use q(?=u) for this, showing that words like Iraq, would return only the match at q, however works like questions would return a match at q and u.

Credits

Website used: https://coding-boot-camp.github.io/full-stack/computer-science/regex-tutorial Website used: https://www.regular-expressions.info/tutorial.html

Author

Github Username: marywebby

Github Link: https://github.com/marywebby

Email Address: [email protected]

Email me if you have additional questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment