Created
August 8, 2012 11:13
-
-
Save davidcaste/3294320 to your computer and use it in GitHub Desktop.
Yet another regex to check URLs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Capture entire matched URL | |
( | |
# Optional: only allow some network protocols | |
# URL protocol and a colon followed by 2 slashes | |
(?: | |
(?: | |
http|https|ftp | |
):\\/\\/ | |
)? | |
# Check if it is the beginning of a word | |
(?<=\\b) | |
# The URL must not start with the character '@' | |
(?<!\\@) | |
# The domain name must begin with a valid character | |
(?:[\w\d] | |
# Other characters allowed in the domain | |
(?:[\w\dñÑ()+,-.:=;$_!*'%?#])* | |
) | |
# A recognized domain is required | |
\\. | |
(?: | |
aero|arpa|asia|biz|cat|com|coop|edu|gov|inet|info|int|jobs|mil| | |
mobi|museum|name|net|org|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai| | |
al|am|an|ao|aq|ar|as|at|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn| | |
bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu| | |
cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo| | |
fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm| | |
hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki| | |
km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me| | |
mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf| | |
ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw| | |
py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr| | |
st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz| | |
ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|za|zm|zw | |
) | |
# Optional: Port number | |
(?::[0-9]+)? | |
# Characters allowed in a URL according to RF1738 | |
(?: | |
\\/[\w\d()+,-.:=@;$_!*'%?#&|\\\\]* | |
)* | |
# Check if we have consumed all characters allowed in a URL | |
(?![\w\d()+,-./:=@;$_!*'%?#&|\\\\]) | |
) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment