-
-
Save kpobococ/92f120c6c4a9a52b84e3 to your computer and use it in GitHub Desktop.
<?php | |
function rfc3986_validate_uri($uri) | |
{ | |
// Play around with this regexp online: | |
// http://regex101.com/r/hZ5gU9/1 | |
// Links to relevant RFC documents: | |
// RFC 3986: http://tools.ietf.org/html/rfc3986 (URI scheme) | |
// RFC 2234: http://tools.ietf.org/html/rfc2234#section-6.1 (ABNF notation) | |
$regex = '/ | |
# URI scheme RFC 3986 | |
(?(DEFINE) | |
# ABNF notation of RFC 2234 | |
(?<ALPHA> [\x41-\x5A\x61-\x7A] ) # Latin character (A-Z, a-z) | |
(?<CR> \x0D ) # Carriage return (\r) | |
(?<DIGIT> [\x30-\x39] ) # Decimal number (0-9) | |
(?<DQUOTE> \x22 ) # Double quote (") | |
(?<HEXDIG> (?&DIGIT) | [\x41-\x46] ) # Hexadecimal number (0-9, A-F) | |
(?<LF> \x0A ) # Line feed (\n) | |
(?<SP> \x20 ) # Space | |
# RFC 3986 body | |
(?<uri> (?&scheme) \: (?&hier_part) (?: \? (?&query) )? (?: \# (?&fragment) )? ) | |
(?<hier_part> \/\/ (?&authority) (?&path_abempty) | |
| (?&path_absolute) | |
| (?&path_rootless) | |
| (?&path_empty) ) | |
(?<uri_reference> (?&uri) | (?&relative_ref) ) | |
(?<absolute_uri> (?&scheme) \: (?&hier_part) (?: \? (?&query) )? ) | |
(?<relative_ref> (?&relative_part) (?: \? (?&query) )? (?: \# (?&fragment) )? ) | |
(?<relative_part> \/\/ (?&authority) (?&path_abempty) | |
| (?&path_absolute) | |
| (?&path_noscheme) | |
| (?&path_empty) ) | |
(?<scheme> (?&ALPHA) (?: (?&ALPHA) | (?&DIGIT) | \+ | \- | \. )* ) | |
(?<authority> (?: (?&userinfo) \@ )? (?&host) (?: \: (?&port) )? ) | |
(?<userinfo> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \: )* ) | |
(?<host> (?&ip_literal) | (?&ipv4_address) | (?®_name) ) | |
(?<port> (?&DIGIT)* ) | |
(?<ip_literal> \[ (?: (?&ipv6_address) | (?&ipv_future) ) \] ) | |
(?<ipv_future> \x76 (?&HEXDIG)+ \. (?: (?&unreserved) | (?&sub_delims) | \: )+ ) | |
(?<ipv6_address> (?: (?&h16) \: ){6} (?&ls32) | |
| \:\: (?: (?&h16) \: ){5} (?&ls32) | |
| (?&h16)? \:\: (?: (?&h16) \: ){4} (?&ls32) | |
| (?: (?: (?&h16) \: ){0,1} (?&h16) )? \:\: (?: (?&h16) \: ){3} (?&ls32) | |
| (?: (?: (?&h16) \: ){0,2} (?&h16) )? \:\: (?: (?&h16) \: ){2} (?&ls32) | |
| (?: (?: (?&h16) \: ){0,3} (?&h16) )? \:\: (?&h16) \: (?&ls32) | |
| (?: (?: (?&h16) \: ){0,4} (?&h16) )? \:\: (?&ls32) | |
| (?: (?: (?&h16) \: ){0,5} (?&h16) )? \:\: (?&h16) | |
| (?: (?: (?&h16) \: ){0,6} (?&h16) )? \:\: ) | |
(?<h16> (?&HEXDIG){1,4} ) | |
(?<ls32> (?: (?&h16) \: (?&h16) ) | (?&ipv4_address) ) | |
(?<ipv4_address> (?&dec_octet) \. (?&dec_octet) \. (?&dec_octet) \. (?&dec_octet) ) | |
(?<dec_octet> (?&DIGIT) | |
| [\x31-\x39] (?&DIGIT) | |
| \x31 (?&DIGIT){2} | |
| \x32 [\x30-\x34] (?&DIGIT) | |
| \x32\x35 [\x30-\x35] ) | |
(?<reg_name> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) )* ) | |
(?<path> (?&path_abempty) | |
| (?&path_absolute) | |
| (?&path_noscheme) | |
| (?&path_rootless) | |
| (?&path_empty) ) | |
(?<path_abempty> (?: \/ (?&segment) )* ) | |
(?<path_absolute> \/ (?: (?&segment_nz) (?: \/ (?&segment) )* )? ) | |
(?<path_noscheme> (?&segment_nz_nc) (?: \/ (?&segment) )* ) | |
(?<path_rootless> (?&segment_nz) (?: \/ (?&segment) )* ) | |
(?<path_empty> (?&pchar){0} ) # For explicity only | |
(?<segment> (?&pchar)* ) | |
(?<segment_nz> (?&pchar)+ ) | |
(?<segment_nz_nc> (?: (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \@ )+ ) | |
(?<pchar> (?&unreserved) | (?&pct_encoded) | (?&sub_delims) | \: | \@ ) | |
(?<query> (?: (?&pchar) | \/ | \? )* ) | |
(?<fragment> (?: (?&pchar) | \/ | \? )* ) | |
(?<pct_encoded> \% (?&HEXDIG) (?&HEXDIG) ) | |
(?<unreserved> (?&ALPHA) | (?&DIGIT) | \- | \. | \_ | \~ ) | |
(?<reserved> (?&gen_delims) | (?&sub_delims) ) | |
(?<gen_delims> \: | \/ | \? | \# | \[ | \] | \@ ) | |
(?<sub_delims> \! | \$ | \& | \' | \( | \) | |
| \* | \+ | \, | \; | \= ) | |
) | |
^(?&uri)$ | |
/x'; | |
return preg_match($regex, $uri) === 1; | |
} |
Excellent! Works great.
Is there any way to get each part of the uri uing the $matches
parameter of the function preg_match
?
Would you be able to share this under a GPL-2 compatible license? Something like MIT or even explicitly state it's public domain? Thanks!
Would you be able to share this under a GPL-2 compatible license? Something like MIT or even explicitly state it's public domain? Thanks!
Sure, consider this code public domain. Or do you need me to add it as a comment in the source or something?
The regex throws an unknown modifier '/' error. Here's my test script:
The error specifically is:
PHP Warning: preg_match(): Unknown modifier '/' in /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php on line 112 PHP Stack trace: PHP 1. {main}() /home/jaith/biz/erep/2017/05-12-url/test.php:0 PHP 2. rfc3986_validate_uri() /home/jaith/biz/erep/2017/05-12-url/test.php:38 PHP 3. preg_match() /home/jaith/biz/erep/2017/05-12-url/function.rfc3986_validate_uri.php:112
GitHub did not send me a notification to your comment :(
Apparently, newer versions of PHP do not ignore slashes within regex comments, and I had a couple of URLs there. So I've updated the code
I think since this is a Gist your comment is fine. Thanks.
The regex throws an unknown modifier '/' error. Here's my test script:
The error specifically is: