I am currently working through an online code camp. We have recently been learing about Regex and how it's used to verify and find data in a given application. By the end of this tutorial you should have a basic understanding of what Regex is, and how to use it in your application.
Let's begin by defining regex. Regular expressions, regex for short, are a series of characters that lay out a specific search pattern. It takes a given string of characters and verifies to see if it matches a required pattern. At first it may look like nonsense, but each part has an exact meaning that will help in breaking down the string.
The regex I will be covering in this tutorial will walk you through the steps to verify the format of an email. Primarily, this will be used to check user input to validate what they are entering is formated correctly. The specific regex we will analyze is:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
There are multiple parts of regex to understand, including anchors, quatifiers, bracket expressions, and more. Let's begin by breaking down the snippet of regex above to see how each part is being used.
/^
([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$
/
Lets begin by defining our anchors. The two being used in this example are ^
and $
. They are found at the beinging and
end of our regex. Anchors are a token that don't match characters within the string, instead they state something about the
process you're using to match/find the given data. In this case ^
denotes the beginging of the string and $
is the end
of the string. Other anchors that can be explored include:
\A
(start of a string)\G
(Start of a match)\Z
(End of a string)\z
(Absolute end of a sting)
/^([a-z0-9_\.-]+)
@([\da-z\.-]+)
.([a-z\.]{2,6})
$/
Next we're going to look into grouping and capturing. The easiest way to verify your data, and the only way to properly
verify an email using regex, is to break it into groups and capture the data inside. We do that with the use of parentheses
()
. In the example above you can see three distinct groups of information that we're collecting. In this example group one
is checking the username, group two is checking the domain name, and group three is checking the extension. Grouping the
email string into distinct parts allows you to verify each section accurately and individually.
Bracket expressions are another importrant part of regex. They denote a range of characters in the string that are allowed.
They are defined by using square brackets []
to surround the characters you are looking to match. For example, [123]
would look for anything that includes a 1
, 2
, or 3
.
You can also search for a range of characters by using a hyphen -
as seen in the first section of our example regex:
[a-z0-9_\.-]
The [a-z]
signifies that any letter between a
and z
is accepted. The [0-9]
as well as the \d
will match with any single digit between 0
and 9
. It also allows underscores _
, periods '., and hyphens/dashes
-. Each is denoted by its specific symbol. In the case of periods as well as the
\d` you'll notice there is a little more required, but we'll cover that in Character Escapes.
Quantifiers set limits on the string in the form of minimum and maximum quantities that can be included. They will naturally try to match as many instances of the data inside the string as possible.
([a-z0-9_\.-]+
Inside our first example we have an addition symbol +
. This determines if our data matches at least one time, but allows
it to match infinitely more. In this case, we can have any number of letters, numbers, or symbols that were determined from
the Bracket Expressions. We can find the same symbol being used after the section determining our
domain name.
Next we find curly brackets {}
near the end of our regex.
([a-z\.]{2,6})
In our first example we used the +
to allow us to pass over the data an infintie number of times. In this case we are
using the brackets to limit the number of times the qualifying characters can be used. See the example below:
{2}
-- This allows you to match the data exactly 2 times.{2,}
-- This allows you to match the data at least 2 times.- '{2,6} -- As in our example about, this lets you match the data at least 2 times, but no more than 6 times.
Other quantifiers inclue:
*
-- Like the+
this will allow you to match the data an infinite number of times, but allows allows for no matches.?
-- This allows you match the pattern either once, or not at all.
Alright, now lets addres the odd syntax that is included with our period.
/^([a-z0-9_\.
-]+)@([\da-z\.
-]+)\.
([a-z\.
]{2,6})$/
There are times where a program reading your code will view a character literally. We need something that will allow that
character to escape from how regex typically would interpret it. In this case we need regex to read our period as part of
our data, and not part of Character Classes. To do this we use the backslash \
before it. It acts a
an escape, or delimiter in this situation. Any time you want to match a character that is included as a utility in regex you
will need to include a backslash.
Now that we have a bit of an understanding at what the crazy jumble of regex is doing, let's analyze an email.
[email protected] vs /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
-
^
- begining of our string -
([a-z0-9_\.-]+)
-()
first section of our string -[]
containing -a-z0-9_\.-
letters, numbers, and symbols -+
and finds as many there as possible -regex
@gmail.com -
@
- stops first match and includes the literal@
character -regex@
gmail.com -
([\da-z\.-]+)
- repeats checking for matches over the second section -regex@gmail
.com -
\.
- checks for a literal period, not a character class -regex@gmail.
com -
([a-z\.]{2,6})
- looks to verify the extension and ensures it is[a-z\.]
a-z or a period, and{2,6}
between 2 and 6 characters long -[email protected]
-
$
- end of our string
While this definitely doesn't cover everything regarding regex, it will surely get you started. Below are some other regex categories that that may help you.
- Greedy and Lazy Match
- Character Classes
- Flags
- Boundaries
- OR Operator
- Back-references
- Look-ahead and Look-behind