Skip to content

Instantly share code, notes, and snippets.

@kpobococ
Last active July 27, 2021 13:13
Show Gist options
  • Save kpobococ/92f120c6c4a9a52b84e3 to your computer and use it in GitHub Desktop.
Save kpobococ/92f120c6c4a9a52b84e3 to your computer and use it in GitHub Desktop.
Regular Expression to validate URI (RFC 3986 http://tools.ietf.org/html/rfc3986)
<?php
function rfc3986_validate_uri($uri)
{
// Play around with this regexp online:
// http://regex101.com/r/hZ5gU9/1
// Links to relevant RFC documents:
// RFC 3986: http://tools.ietf.org/html/rfc3986 (URI scheme)
// RFC 2234: http://tools.ietf.org/html/rfc2234#section-6.1 (ABNF notation)
$regex = '/
# URI scheme RFC 3986
(?(DEFINE)
# ABNF notation of RFC 2234
(?<ALPHA> [\x41-\x5A\x61-\x7A] ) # Latin character (A-Z, a-z)
(?<CR> \x0D ) # Carriage return (\r)
(?<DIGIT> [\x30-\x39] ) # Decimal number (0-9)
(?<DQUOTE> \x22 ) # Double quote (")
(?<HEXDIG> (?&DIGIT) | [\x41-\x46] ) # Hexadecimal number (0-9, A-F)
(?<LF> \x0A ) # Line feed (\n)
(?<SP> \x20 ) # Space
# RFC 3986 body
(?<uri> (?&scheme) \: (?&hier_part) (?: \? (?&query) )? (?: \# (?&fragment) )? )
(?<hier_part> \/\/ (?&authority) (?&path_abempty)
| (?&path_absolute)
| (?&path_rootless)
| (?&path_empty) )
(?<uri_reference> (?&uri) | (?&relative_ref) )
(?<absolute_uri> (?&scheme) \: (?&hier_part) (?: \? (?&query) )? )
(?<relative_ref> (?&relative_part) (?: \? (?&query) )? (?: \# (?&fragment) )? )
(?<relative_part> \/\/ (?&authority) (?&path_abempty)
| (?&path_absolute)
| (?&path_noscheme)
| (?&path_empty) )
(?<scheme> (?&ALPHA) (?: (?&ALPHA) | (?&DIGIT) | \+ | \- | \. )* )
(?<authority> (?: (?&userinfo) \@ )? (?&host) (?: \: (?&port) )? )
(?<userinfo> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \: )* )
(?<host> (?&ip_literal) | (?&ipv4_address) | (?&reg_name) )
(?<port> (?&DIGIT)* )
(?<ip_literal> \[ (?: (?&ipv6_address) | (?&ipv_future) ) \] )
(?<ipv_future> \x76 (?&HEXDIG)+ \. (?: (?&unreserved) | (?&sub_delims) | \: )+ )
(?<ipv6_address> (?: (?&h16) \: ){6} (?&ls32)
| \:\: (?: (?&h16) \: ){5} (?&ls32)
| (?&h16)? \:\: (?: (?&h16) \: ){4} (?&ls32)
| (?: (?: (?&h16) \: ){0,1} (?&h16) )? \:\: (?: (?&h16) \: ){3} (?&ls32)
| (?: (?: (?&h16) \: ){0,2} (?&h16) )? \:\: (?: (?&h16) \: ){2} (?&ls32)
| (?: (?: (?&h16) \: ){0,3} (?&h16) )? \:\: (?&h16) \: (?&ls32)
| (?: (?: (?&h16) \: ){0,4} (?&h16) )? \:\: (?&ls32)
| (?: (?: (?&h16) \: ){0,5} (?&h16) )? \:\: (?&h16)
| (?: (?: (?&h16) \: ){0,6} (?&h16) )? \:\: )
(?<h16> (?&HEXDIG){1,4} )
(?<ls32> (?: (?&h16) \: (?&h16) ) | (?&ipv4_address) )
(?<ipv4_address> (?&dec_octet) \. (?&dec_octet) \. (?&dec_octet) \. (?&dec_octet) )
(?<dec_octet> (?&DIGIT)
| [\x31-\x39] (?&DIGIT)
| \x31 (?&DIGIT){2}
| \x32 [\x30-\x34] (?&DIGIT)
| \x32\x35 [\x30-\x35] )
(?<reg_name> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) )* )
(?<path> (?&path_abempty)
| (?&path_absolute)
| (?&path_noscheme)
| (?&path_rootless)
| (?&path_empty) )
(?<path_abempty> (?: \/ (?&segment) )* )
(?<path_absolute> \/ (?: (?&segment_nz) (?: \/ (?&segment) )* )? )
(?<path_noscheme> (?&segment_nz_nc) (?: \/ (?&segment) )* )
(?<path_rootless> (?&segment_nz) (?: \/ (?&segment) )* )
(?<path_empty> (?&pchar){0} ) # For explicity only
(?<segment> (?&pchar)* )
(?<segment_nz> (?&pchar)+ )
(?<segment_nz_nc> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \@ )+ )
(?<pchar> (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \: | \@ )
(?<query> (?: (?&pchar) | \/ | \? )* )
(?<fragment> (?: (?&pchar) | \/ | \? )* )
(?<pct_encoded> \% (?&HEXDIG) (?&HEXDIG) )
(?<unreserved> (?&ALPHA) | (?&DIGIT) | \- | \. | \_ | \~ )
(?<reserved> (?&gen_delims) | (?&sub_delims) )
(?<gen_delims> \: | \/ | \? | \# | \[ | \] | \@ )
(?<sub_delims> \! | \$ | \& | \' | \( | \)
| \* | \+ | \, | \; | \= )
)
^(?&uri)$
/x';
return preg_match($regex, $uri) === 1;
}
@sneakyimp
Copy link

The regex throws an unknown modifier '/' error. Here's my test script:

<?php
/**
 * script to test url validator
 */
require_once "function.rfc3986_validate_uri.php";

// urls to test
$urls = array(
        "",
        "Buy It Now",
        "localhost/foo/bar",
        "blarg",
        "blarg/",
        "blarg/some/path/file.ext",
        "http://google.com",
        "http://google.com/",
        "http://google.com/some/path.ext",
        "http://google.com/some/path.ext?foo=bar",
        "example.com",
        "example.com/",
        "example.com/some/path/file.ext",
        "example.com/some/path/file.ext?foo=bar",
        "example.com:1234",
        "example.com:1234/",
        "example.com:1234/some/path/file.ext",
        "example.com:1234/some/path/file.ext?foo=bar",
        "//foobar.com",
        "//foobar.com/",
        "//foobar.com/path/file.txt",
        "//cdn.example.com/js_file.js",
        "http://example.com?id=some-file-id"
);


// the testing
foreach ($urls as $url) {
        echo "url: $url\n";
        $result = rfc3986_validate_uri($url);
        echo $result ? "PASS" : "FAIL";
        echo "\n\n";
}

The error specifically is:

PHP Warning:  preg_match(): Unknown modifier '/' in /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php on line 112
PHP Stack trace:
PHP   1. {main}() /home/jaith/biz/erep/2017/05-12-url/test.php:0
PHP   2. rfc3986_validate_uri() /home/jaith/biz/erep/2017/05-12-url/test.php:38
PHP   3. preg_match() /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php:112

@Spomky
Copy link

Spomky commented Jan 10, 2018

Excellent! Works great.

Is there any way to get each part of the uri uing the $matches parameter of the function preg_match?

@bradjones1
Copy link

Would you be able to share this under a GPL-2 compatible license? Something like MIT or even explicitly state it's public domain? Thanks!

@kpobococ
Copy link
Author

kpobococ commented Dec 5, 2020

Would you be able to share this under a GPL-2 compatible license? Something like MIT or even explicitly state it's public domain? Thanks!

Sure, consider this code public domain. Or do you need me to add it as a comment in the source or something?

@kpobococ
Copy link
Author

kpobococ commented Dec 5, 2020

The regex throws an unknown modifier '/' error. Here's my test script:

The error specifically is:

PHP Warning:  preg_match(): Unknown modifier '/' in /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php on line 112
PHP Stack trace:
PHP   1. {main}() /home/jaith/biz/erep/2017/05-12-url/test.php:0
PHP   2. rfc3986_validate_uri() /home/jaith/biz/erep/2017/05-12-url/test.php:38
PHP   3. preg_match() /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php:112

GitHub did not send me a notification to your comment :(

Apparently, newer versions of PHP do not ignore slashes within regex comments, and I had a couple of URLs there. So I've updated the code

@bradjones1
Copy link

I think since this is a Gist your comment is fine. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment