Check out this Gist if you're interested in learning more about Regex, aka Regular Expressions.
It's not a complete guide, but I did cover the basics to help you get started. I've also included my references and contact information at the bottom if you want to learn more.
This is a general reference guide on how to understand and use some basic regular expressions. I've broken down specific functions below with simple examples to help you learn the concepts quickly.
Publish Notes The content on this Gist was created on this GitHub Repo, published to this GitHub Website, and published on this GitHub Gist.
- CS Regex Reference Guide for JavaScript
- Summary
- Repo Stats
- Table of Contents
- 1. What is Regex?
- 2. Starting with Examples
- The Syntax
- Author
- Contact Me
- Resources and References
... aka regular expression, is a universal syntax language used to simplify advanced searching/filtering of content based on a user-specified search pattern.
You define what you are searching at the level of precision you need.
What makes a regex search/filter different from others is that it searches for patterns in ASCII or Unicode characters.
You're not just looking for a specific character value, you're looking for all instances of a pattern within the content. For example, all phone-numbers, email-addresses, websites, or really any type of content that follows a universal pattern.
Great questions!
The MDN team said,
"Regular expressions are patterns used to match character combinations in strings."
"A regular expression is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation."
To understand a regex pattern, the search / filter you're creating, you'll need to learn some syntax. But first, let's start with some examples.
Without the area-code, phone numbers are generally 10-digits separated by a space or a hyphen.
We can look for 10 digits like this /\d\d\d-\d\d\d-\d\d\d\d/
Here's, we're using the meta character argument \d
for
each unique digit that we're searching for separated by a hyphen.
BUT what IF the single phone-number is formatted differently through-out the data?
For Example, our regex expression ran on the below data would only return 1
results, 123-456-7890
, even though ALL of them are the same phone number.
(123)-456-7890
,123-456-7890
,123.456.7890
, and123 456 7890
So how could we improve our regex expression?
Well, considering the same phone numbers 4 times again, we want to account for
(
,)
,
, and -
.
(123)-456-7890
,123-456-7890
,123.456.7890
, and123 456 7890
1. Add a FEW OR operators to account for spaces vs hyphens
> To do this we'll use []
, the character class syntax,
> where each literal character inside the square-
> brackets is considered a unique argument.
2. Add the ability to match (
and )
if they exist.
> For optional parameters, we'll use the ?
quantifier,
> which allows us to search for instances that a value does and does not exist.
What does this fully fleshed out syntactically accurate regex argument look like?
/(\(?)+(\d{3})+[-.) ]+(\s?)(\d{3})+[-. ]+(\s?)+(\d{4})/
Let's break it down 👇🏼
Syntax | Description |
---|---|
/ |
Starting the regex expression |
\(? |
Left-parenthesis ( if exists |
+ |
Followed by... |
\d{3} |
A collection of 3 digits |
+ |
Followed by... |
[)-. ] |
a right-parenthesis ) , OR hyphen - , OR period . , OR a space |
+ |
Followed by... |
\s? |
A white-space if it exists |
[- ] |
Hyphen OR a space |
+ |
Followed by... |
\d{3} |
A collection of 3 digits |
+ |
Followed by... |
[-. ] |
A hyphen, dash, or space |
+ |
Followed by... |
\s? |
A white-space if it exists |
+ |
Followed by... |
\d{4} |
A collection of 4 digits |
/ |
Ending the regex expression |
Now that we've covered the basics, let's look at a regex search pattern built to search for email addresses.
You'll notice I've covered less details here.
Do you see a pattern in this regex search pattern?
/([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})/
Let's break it apart into smaller chunks based on the high-level patterns we see!
Symbol | Description |
---|---|
/ |
Starting the regex expression |
( |
Encapsulating a sub-expression |
[a-z0-9_\.-] |
Any alpha-numeric characters along with _ , . , and - . Results must |
+) |
Ending sub-expression, and requiring it be followed by the next argument. |
@ |
MUST include the @ character. |
([\da-z\.-]+) |
Any digit or alpha-number character, followed by a . or - . |
([a-z\.]{2,6}) |
Look for alpha-characters of any combination between 2-6 characters in length |
/ |
Ending the regex expression |
In this section, I've included the syntax used in the above examples. The goal is to have it server as a point-of-reference for the above along with helping you develop the confidence and knowledge to create your own regex patterns.
Not all sections contain examples by design.
Note: If you're looking for a resource to make it easy to lean regex while following along with this guide, check out this website https://regexr.com/.
It allows you to create regex search patterns and get live-feedback.
Any/all ASCII or Unicode characters you're wanting to search for or filter out.
This will include single characters.
Syntax | Description |
---|---|
a-z |
Any lower-case letters |
A-Z |
Any upper-case letters |
0-9 |
Any digits |
\. |
A period character |
Unicode Characters |
There's a lot. here's an index -> Microsoft - Insert ASCII or Unicode Latin-based symbols and characters |
Regex operator that represent specific data-types within the ASCI or Unicode character sets.
Syntax | Description | Note | Example |
---|---|---|---|
\ |
Converts qualifying ASCI or Unicode character into meta-characters. | WARNING: If you don't use this it will be considered a literal-character! | |
/^ |
Any new line | ||
. |
Any ASCI or Unicode Character. | WARNING: Within a character class, a . does not need to be escaped to be read as a literal character |
|
\d |
Any Digit 0-9 | ||
\w |
Anything that is a word-character | A-Z , a-z , 0-9 |
\w\w -> Returns any sequent of two word-characters. |
\W |
Anything that is NOT a word-character | ||
\s |
Any white-space characters. | Space , Tab , and sometimes new-line |
\s\s -> Returns any sequent of two white-space characters. |
\S |
Anything that is NOT white-space. | Space , Tab , and sometimes new-line |
|
[a-z] |
All character a-z. | When in a class, the - plays as an operator to reutrn a-z character argument values. See the character classes section for more details |
... are used to match the location of a literal character within your defined search parameters.
Syntax | Description | Example |
---|---|---|
^ |
Used to look for a string value start | ^test looks for all strings that start with the literal characters t , e , s , and t . |
$ |
Used to look for literal characters that end with a specific value. | /test$/ looks for all strings that end with the literal characters t , e , s , and t . |
... are a meta character that modify the pervious meta characters in a regular expression.
Based on your regex search parameters, how many of times do you want it to match in a row?
Syntax | Description | Example |
---|---|---|
* |
0 or more | /\d*/ -> returns all digits, period. |
? |
0 or 1 | /test?t/ -> all combinations of test and testt where the second T is optional. |
+ |
1 or more | /\d+/ -> returns all digits, of length 1 or more. |
{min,max} |
Range of number of times former argument must exist to qualify as a result. | \w{1,5} -> All word-character combinations with 1-5 characters followed by white-space. |
{n} |
Number of times the former argument must exist to qualify as a result. | \w{5}\s -> All word-character combinations with 5 character followed by white-space. |
How to use OR arguments within a regex statement.
... is one of the two OR operators, where arguments are placed inside of square-brackets [ ]
.
Syntax | Description | Notes | Example |
---|---|---|---|
[^argument] |
NOT OR Operator, returns anything except for the argument in the Class | A carrot, ^ , becomes a meta character when used at the preface of a class. Anywhere else and it becomes a literal character |
[^0-5] -> anything not 0-5. [^a-c] -> Anything that is not between the letters a-c. |
[.] |
All literal character instances of a period, . |
Within a class, does not need to be escaped to be read as a literal character. | [-.] -> looks for the literal characters - OR . |
a[bc]de |
All cases of abde AND/OR acde . |
||
[letter-letter] |
Any literal characters a-z based on character case. | A hyphen, - becomes a meta character when used between two literal characters of the same family within a class. |
[a-c] -> return all characters between a and c . /\b[A-Za-z]{4}\b/ -> to match any 4-letter word with letter literal characters in it. /\b[A-Z][a-z]*\b/ -> to match any 0-or more letter word with letter literal characters in it starting with a capital letter. \b[\w]{4}\b -> All 4 letter words that contain any value used within words. |
[number-number] |
A range between two numbers | A hyphen, - becomes a meta character when used between two literal characters of the same family within a class. |
[0-5]{3} -> All combinations of 3-digits where each unique digit is between 0 - 5 |
... is the second OR Operator, and is used with as an or operator to look
for grouped literal characters within parenthesis and separated by a vertical
bar ( arg1 | arg2 )
.
Syntax | Description | Notes | Example |
---|---|---|---|
(arg1|arg2) |
Return all instances where arg1 or arg2 exist. |
This is how you search for very specific groups of literal characters. | /[\w.]+@\w+\.+(com|net|edu)/ -> Returns all email address that end with .net, .com, or.edu |
Syntax | Description | Example |
---|---|---|
\b |
A word boundry | All 4 letter words -> /\b\w{4}/ -> Looking at each word, look ALL word-character values of length 4. /\btest\b/ -> Returns a whole word search |
... are used to classify specific search-case scenarios to you regex expression.
They can be combined or used individually as needed, and are added to the end
of your regex expression. /regex-pattern/flag
Syntax | Description | Example |
---|---|---|
g |
Globally searching. | /[a-z]/g -> returns all letters within all content. |
i |
Case-insensitive searching | /[a-z]/gi -> returns ALL literal character despite case from A-Z and a-z |
m |
Multi-line searching. | /^\d/gm -> returns ALL initial digits within all lines |
s |
Dotall mode returns results with any literal character between them. |
|
u |
Enable Unicode support | |
y |
Sticky mode allows you to search exact position within content |
… are used to synchronize pattern-group result parameters within a regex expression based on the pattern group you specify.
Syntax | Description | Example |
---|---|---|
\n |
References n pattern group for what parameters to look for. |
(['"])(.*?)\1 ran on the content `Testing: "my regex expression..." ` -> returns "my regex expression..." . |
Syntax | Description |
---|---|
(?=arg) |
Lookahead returns what's immediately after arg |
(?<=arg) |
Lookbehind returns what's immediately before arg |
Thanks for taking the time to read this!
If you want to check out more of my work, head on over to my GitHub Page.
Do you want to get in touch?
Feel free to connect with me on my Twitter @ErikPlachta or LinkedIn @ErikPlachta
A collection of resources I used to learn about Regex.
- Wikipedia - Regular Expressions
- MDN - Regular Expressions
- Microsoft - Regular Expression Language - Quick Reference
- MDN - Regular expression syntax cheat sheet
- Regular-Exprsesions.info
- RexEgg.com
- YouTube - The Coding Train - Introduction to Regular Expressions - Programming with Text
- Regex tutorial — A quick cheat sheet by examples
- zone.ni.com - Regular Expressions Components
- javascript.info - Backreferences in pattern
- rexegg.com - Mastering Lookahead and Lookbehind
I built out this Gist as I was learning more about Regex as a reference resource.