A Regular Expression (shortened as 'regex' or 'regexp') is a sequence of characters that defines a specific search patten in text. Most general-purpose programming launguages support regex capabilities either natively or via libraries, including for example JavaScript, Python, C, C++ and Java.
In this Gist, Let's break down the regex for matching a URL and take a look at each component.
- Matching a URL :
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
- Anchors
- Quantifiers
- Grouping Constructs
- Bracket Expressions
- Character Classes
- Character Escapes
- Flags
/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/
Anchors allow you to match a position before or after characters
^: The caret anchor matches the beginning of the text$: The dollar anchor matches the end of the text
Examples (JavaScript)
/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/
Quantifires match a number of instances of a character, group, or character class in a string.
| Quantifier | Description |
|---|---|
| * | Match zero or more times - same as {0, } |
| + | Match one or more times - same as {1, } |
| ? | Match zero or one time - same as {0,1} |
| {n} | Match exactly n times |
| {n, } | Match at least n times |
| {n,m} | Match from n to m times |
Examples (JavaScript)
"/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/"
Groups use the ( ) symbols. They are useful for creating blocks of patterns, so you can apply repetitions or other modifiers to them as a whole.
Example (JavaScript)
"/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-]*)*/?$/"
The bracket expressions match one character out of a set of characters. The square brackets can contain character range such as [a-z], [0-9], or [a-zA-Z0-9] etc.
Examples (JavaScript)
"/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w.-]*)*/?$/"
A Character class allows you to match any symbol from a certain character set. A character class is also called a character set.
| Characters | Meaning |
|---|---|
| \d | Matches any digit (Arabic numeral), same as [0-9] |
| \D | Matches any character that is not a digit (Arabic numeral), same as [^0-9] |
| \w | Matches any alphanumeric character form the basic latin alphabet, including the underscore, same as [A-Za-z0-9_] |
| \W | Matches any character that is not a word character form the basic Latin alphabet, such as [^a-za-z0-9_] |
| \s | Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces, same as [\f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
| \S | Matches a single character other thatn white space, same as [^ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
| \t | Matches a horizontal tag |
| \r | Matches a carriage return |
| \n | Matches a linefeed |
Examples (JavaScript)
"/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/"
There are special characters that have special meaning in a regular expression, such as []{}()\^$.|?*+. To use a special character as a regular one, prepend it with a backslash: \
Examples (JavaScript)
A flag changes the default searching behavior of a regular expression. It makes a regex search in a different way.
| Flag | Name | Modification |
|---|---|---|
| i | Ignore Casing | With this glag the search is case-insensitive: no difference between A and a |
| g | Global | With this flag the search looks for all matches, without it - only the first match is returned |
| s | Dot All | Enables 'dotall' mode, that allows a dot . to match newline character \n |
| m | Multiline | Makes the boundary characters ^ and $ match the beginning and ending of every single line instead of the beginning and ending of the whole string |
| u | Unicode | Enables full Unicode support |
Wonjong Park : https://github.com/wonjong2





