In this tutorial we will be taking a dive into Regex. Regex, or regular expressions, are patterns that are used to search for character combinations in strings. They are used by string-searching algorithms and for matching input validation.
Today we are going to be taking a look how to implement a regex to verify that a phone number is valid:
/(?:(\+1)[ -]?)?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/
It make look daunting at first, so let's break down what's going on here to get a better understanding.
Character Classes, or a character set, tells the regex engine to match only one out of several specific characters, such as digits, words, or whitespace
\d
- matches a single character that is a digit\w
- matches a word character (any alphanumeric character including underscore)\s
- matches a whitespace character (including tabs and line brakes).
- matches any character(wildcard)- the capital case for any aformentioned characters will inverse the match
- Examples:
\d matches any single digit 0-9
\w matches any single character a-z
\s matches ` `
. matches any character
\D matches any single non-digit character
\W matches any single non-character a-z
\S matches a single non-` `
So, to search for just any one number we would create a regex like so
/\d/
Qunatifiers are characters within the regex that specify how many instances a character, group, or character class that precedes it must be present in the input to be matched.
-
*
- matches the pattern zero or more times -
+
- matches the pattern one or more times -
?
- matches the pattern zero or one time -
{n}
- Matches the pattern exactly n number of times -
{n,
} - Matches the pattern at least n number of times -
{n,x}
- Matches the pattern from a minimum of n number of times to a maximum of x number of times -
()*
- matches a string that has any preceding characters followed by zero or more copies of the string within the parentheses -
Examples:
xyz* matches a string that has xy followed by zero or more z
xyz+ matches a string that has xy followed by one or more z
xyz? matches a string that has xy followed by zero or one z
xyz{2} matches a string that has xy followed by 2 z
xyz{2,} matches a string that has xy followed by 2 or more z
xyz{2,5} matches a string that has xy followed by 2 up to 5 z
x(yz)* matches a string that has x followed by zero or more copies of the sequence yz
x(yz){2,5} matches a string that has x followed by 2 up to 5 copies of the sequence yz
Let's take a look a simple phone number:
1234567890
If we want to search for this then we would just specify we want ten single digits like this
/\d{10}/
But phone numbers can be valid in many ways. What if we wanted to make hyphens valid
123-456-7890
We would just add a hyphen -
followed by the quantifier ?
for it to be optional as a parameter, and then group the numbers together, 3, 3, and 4
/\d{3}-?\d{3}-?\d{4}/
But sometimes phone numbers are written with spaces like this
123 456 7890
We can also account for that space, but for that we need to learn about backet expressions first!
Bracket Expressions are characters enclosed by a bracket []
matching any single character within the brackets.
*note: if the first character within the brackets is a ^
then it signifies any chracter not in the list, and is unspecified whether it matches an encoding error.
Examples of Bracket Expressions are as follows:
[]
- matching any single character within the brackets[]%
- matching the string inside the brackets before the%
[^]
- matching any string that has not a letter from within the brackets (negation of expression)- Examples:
[xyz] matches a string that etiher has x or x y or x z (same as x|y|z)
[x-y] similar to case above
[u-zU-Z0-9] a string that represents a single hexadecimal digit, case insensitively
[0-9]% a string that has a character from 0-9 before a %
[^a-zA-Z] a string that has not a letter from a to z or from A to Z
So having that in mind, we would enclose the space and hyphen within the bracket like so
[ -]?
And then add that to what we have so far
/\d{3}[ -]?\d{3}[ -]?\d{4}/
Grouping unifies a pattern or string so that it is matched in a complete block
Examples of Grouping are as follows:
()
- parentheses creates a capture group(?:)
- using?:
disables the capturing group(?<>)
- using?<>
puts a name to the group- Examples:
x(yz) parentheses create a capturing group with value yz
x(?:yz)* using ?: we disable the capturing group
x(?<bar>yz) using ?<bar> we put a name to the group
So say we wanted to not only match the whole phone number entered, but the individual groups of numbers, so that we can format exactly the way we want? That's where we place the parentheses around each group of digits
/(\d{3})[ -]?(\d{3})[ -]?(\d{4})/
But what if the area code has parentheses around it like this?
(123) 456-7890
To match special characters, we need to use character escapes
A backslash \
is used in regular expressions to escape the next character that would otherwise be interpreted literally. This allows us to include reserved character such as { } [ ] / \ + * . $ ^ | ?
as matching characters. To use one of these special characters as a matching character, prepend it with \
.
So in our example, we must wrap the area code with a backslash before each parenthesis, followed by ?
to make it optional
/\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/
There's one more thing to account for. Sometimes the international calling code appears like this
+1 123 456 7890
In this case we would add the +1
with escapement \+1
,
then capture that together (\+1)
, account for the space or hyphen and make sure it's optional (\+1[ -])?
After that we want to put another grouping around just the plus 1 ((\+1)[ -])?
and then we can disable the capturing of the space with ?:
, like so
(?:(\+1)[ -])?
After that, we're all done and can bring it all together!
/(?:(\+1)[ -]?)?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/
This was a simple example, and there are many other regex components to learn. So check out this website for more:
You can also test and build on what you've learned with this regex testing website:
Kevin Shank is a web developer enrolled in Michigan State University's full stack coding bootcamp.
Feel free to check out his GitHub repo for all of his projects: