Skip to content

Instantly share code, notes, and snippets.

@metafeather
Created October 6, 2009 12:35
Show Gist options
  • Save metafeather/202974 to your computer and use it in GitHub Desktop.
Save metafeather/202974 to your computer and use it in GitHub Desktop.
URL parsing regex.js
/*
A single regex to parse and breakup a full URL including query parameters and anchors e.g.
https://www.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
*/
Url.regex = /^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$/;
url: RegExp['$&'],
protocol: RegExp.$2,
host: RegExp.$3,
path: RegExp.$4,
file: RegExp.$6,
query: RegExp.$7,
hash: RegExp.$8
/*
Alternate from Reverse HTTP javascript server http://www.reversehttp.net/demos/httpd.js
*/
Url.regex =
/*12 3 45 6 7 8 9 A B C D E F 0 */
/* proto user pass host port path query frag */
/^((\w+):)?(\/\/((\w+)?(:(\w+))?@)?([^\/\?:]+)(:(\d+))?)?(\/?([^\/\?#][^\?#]*)?)?(\?([^#]+))?(#(\w*))?/;
this.url = r[0];
this.protocol = r[2];
this.username = r[5];
this.password = r[7];
this.host = r[8] || "";
this.port = r[10];
this.pathname = r[11] || "";
this.querystring = r[14] || "";
this.fragment = r[16] || "";
@skounis
Copy link

skounis commented Aug 25, 2013

Hi,

I stumbled upon this gist while searching for a URL parse regexp. Very helpful. I noticed however that the first regexp does not match the following cases:

http://www.domain.org
http://www.domain.org/
http://www.domain.org/?foo=bar
http://www.domain.org/a
http://www.domain.org/a?foo=bar

In order to fix this I had to adjust it slightly:

^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)?([\w\-\.]*[^#?\s]+)?(.*)?(#[\w\-]+)?$

@wehrstedt
Copy link

If anybody wants to match the port:
^((http[s]?|ftp):\/)?\/?([^:\/\s]+)(?::([0-9]+))?((\/\w+)*\/)?([\w\-\.]*[^#?\s]+)?(.*)?(#[\w\-]+)?$

@Hydrocarbure-H
Copy link

Hydrocarbure-H commented Jun 7, 2022

If you want something a little bit more advanced... (341 characters regex) (It works for most cases)
#WeLoveRegex

var url_regex = /^((([a-z0-9-]+):)*((([a-z0-9-]+):)\/\/))((((([a-z0-9-]+)(:([a-z0-9-]+))?)@)?(([a-z0-9-]+(\.[a-z0-9-]+)*)(:((6553[0-5])|(655[0-2][0-9])|(65[0-4][0-9]{2})|(6[0-4][0-9]{3})|([1-5][0-9]{4})|([0-5]{0,5})|([0-9]{1,4})))?)))?((((\/[a-z0-9_().-]*)+)?)(\?([a-z0-9._()-]+(=[a-z0-9_()-]+)?)(&[a-z0-9()_-]+(=[a-z0-9_()-]+)?)*)?((#[a-z0-9()&-]+)?)*)*$/;

var result = url_regex.exec(url);

    data =
    {
        hash: result[36],
        host: result[14],
        hostname: result[15],
        href: result[0],
        origin: result[4] + result[14],
        password: result[13],
        pathname: result[28],
        port: result[18],
        protocol: result[5],
        search: result[30],
        username: result[11],
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment