I'm going to go over a quick overview of Regular Expressions. I'll try to make the syntax fast, because using them is the fun stuff.
We'll be working with https://regex101.com
RegEx stands for Regular Expression. A Regular Expression is a pattern-matching scheme. We use it to determine whether a string follows certain patterns. We'll often see these used to validate form input, and Single Page Applications may use them to compare the URL with patterns tell it what pages to serve.
match for literal characters/words:
/you/
match for any character:
/./
match a character in this list (whitelist):
/[aeiou]/
match a character not in this list (blacklist):
/[^aeiou]/
match numbers:
/[0123456789]/
this isn't very efficient. We can define a range of characters to match:
/[0-9]/
this is much better. there are more ranges we can match
/[a-z]/
/[A-Z]/
/[a-zA-Z0-9]/
/[0-4]/
When we used /[0-9]/ it only matched a single digit. Suppose we want to match 1, 10, and 4240. We need some way to allow our match to use multiple characters
/+ and */
/[0-9]+/ matches one to infinity digits
/[0-9]*/ matches zero to infinity digits
Note how using * returns more results than + Can anybody tell me why?
We can also set a number of characters we should match with {}
/[0-9]{3}/ matches three numbers in a row
/[0-9]{2,3}/ matches two to three numbers in a row
suppose we want to match 2 different options. Perhaps between your and you or between nor and or
the | character is an OR operator
/your|you/
/nor|or/
In this example, one of these words is the same as the other, but with a letter at the end. It's important that we chose to use this order. The OR operator matches conditions to the left first.
It turns out, we don't need the OR operator for a match like this. We have a ? we can use instead for optional values
/your|you/ can be represented as /your?/
/nor|or/ can be represented as /n?or/
In each of these cases, the ? operator makes the previous character optional.
We can make a group of characters optional by surrounding the characters with ().
/function(al)?/ matches function or functional
we've seen a lot of characters that have special meanings in Regular Expressions.
/ goes on each side of a regex to identify it
. matches any character
[] defines a whitelist of values to match ([^] defines a blacklist of values to exclude)
+, * and {} can be used to set the number of characters we should match
() marks a group of characters
| matches the pattern to the left or the right of the pipe (OR operator)
? makes the previous character optional (zero or one times)
When we need to match one of these characters, we can preceed it with a \
\. matches a literal period
\+ matches a literal plus sign
/[A-Za-z]+[!\.\?]/ matches the last word in a sentence. We don't escape ! because it has no special meaning to RegEx
groups give us an additional feature. Every time we use a (), the value inside is saved and can be used later.
Here's a simple example that matches all characters that occur after each other. Think double letters in words.
/(.)\1/
In many cases, you are trying to match an exact string, rather than small pieces of it, like we've been doing so far.
We can use the ^ to match the start of a string and the $ character to mark the end of it. For example, if we're matching a string that starts with a capital letter and ends in a period, we might have:
/^[A-Z].*\.$/
We'll use the ^ and $ characters most times when we validate input fields.
It's a good practice to include them anytime you're matching the entire string, but you don't always need them. Matches such as .* and .+ match the largest number of characters they can. If your pattern starts with or ends with these, the ^ or $ becomes redundant.
RegEx also has a bunch of shortcuts we can match instead of needing to always rely on []
\s matches whitespace characters including spaces , tabs \t, and newlines \n and \r
\d matches digits. This is the literal equivalent of [0-9]/
\w matches "word characters". This is a tricky one. word characters sounds at first like it's just alphabet characters, but it also matches numbers and underscores. It's the literal equivalent of [A-Za-z0-9_]
For each of these cases, capitalizing the letter used negates it. That is, while \s matches all whitespace characters, \S matches all characters that aren't whitespace.
while \d literally translates to [0-9], \D literally translates to [^0-9]