Created
November 9, 2015 23:07
-
-
Save curtisz/11139b2cfcaef4a261e0 to your computer and use it in GitHub Desktop.
RFC 3986 URL Parsing Regular Expression (JavaScript)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* *********************************************************************************** | |
Hero authors of RFC 3986 (http://www.ietf.org/rfc/rfc3986.txt) gave us this regex | |
for parsing (well-formed) URLs into their constituent pieces: | |
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? | |
Which for the following URL: | |
http://www.ics.uci.edu/pub/ietf/uri/#Related | |
Yields the following subexpression matches: | |
$1 = http: | |
$2 = http | |
$3 = //www.ics.uci.edu | |
$4 = www.ics.uci.edu | |
$5 = /pub/ietf/uri/ | |
$6 = <undefined> | |
$7 = <undefined> | |
$8 = #Related | |
$9 = Related | |
where <undefined> indicates that the component is not present, as is | |
the case for the query component in the above example. Therefore, we | |
can determine the value of the five components as | |
scheme = $2 | |
authority = $4 | |
path = $5 | |
query = $7 | |
fragment = $9 | |
*********************************************************************************** */ | |
var parseURL = function( url ) { | |
var regex = RegExp("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?"); | |
var matches = url.match(regex); | |
return { | |
scheme: matches[2], | |
authority: matches[4], | |
path: matches[5], | |
query: matches[7], | |
fragment: matches[9] | |
}; | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment