Skip to content

Instantly share code, notes, and snippets.

@lzlrd
Last active February 22, 2023 21:37
Show Gist options
  • Save lzlrd/a2faf6ae10eb8d4dc96a695d00b674e2 to your computer and use it in GitHub Desktop.
Save lzlrd/a2faf6ae10eb8d4dc96a695d00b674e2 to your computer and use it in GitHub Desktop.
A RegEx to match publicly-accessible HTTP URLs. I cannot guarantee it matches every case, but it matches everything I could think of.
// This snippet was based off https://gist.github.com/dperini/729294 by Diego Perini,
// and is licensed under the MIT license. My additions (notably, the IPv6 regex)
// preserve the same license with the addition that I, Diab Neiroukh, am included in
// the copyright notice. I'd also appreciate it if you could link back to this Gist.
// As usual, here's the copyright notice in full:
// Copyright (c) 2010-2018 Diego Perini
// Copyright (c) 2021 Diab Neiroukh
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
const regexShort = new RegExp("^(?:(?:(?:https?):)\\/\\/)(?:\\S+(?::\\S*)?@)?(?:(?!(?:10|127)(?:\\.\\d{1,3}){3})(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|\\[(((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){6,6}[0-9a-fA-F]{1,4}|((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,6}:|((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,5}:[0-9a-fA-F]{1,4}|((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,4}(:[0-9a-fA-F]{1,4}){1,2}|((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,3}(:[0-9a-fA-F]{1,4}){1,3}|((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,2}(:[0-9a-fA-F]{1,4}){1,4}|((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:)?(:[0-9a-fA-F]{1,4}){1,5}|((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))\\]|(?:xn--[a-z0-9\\-]{1,59}|(?:(?:[a-z\\u00a1-\\uffff0-9]-*){0,62}[a-z\\u00a1-\\uffff0-9]{1,63}))(?:\\.(?:xn--[a-z0-9\\-]{1,59}|(?:[a-z\\u00a1-\\uffff0-9]-*){0,62}[a-z\\u00a1-\\uffff0-9]{1,63}))*(?:\\.(?:xn--[a-z0-9\\-]{1,59}|(?:[a-z\\u00a1-\\uffff]{2,63})))\\.?)(?::\\d{2,5})?(?:[/?#]\\S*)?$", "i")
const regexLong = new RegExp(
"^" +
// Protocol Identifier
"(?:(?:(?:https?):)\\/\\/)" +
// HTTP Basic Auth
"(?:\\S+(?::\\S*)?@)?" +
"(?:" +
// IPv4 (excl. Private Addresses)
"(?!(?:10|127)(?:\\.\\d{1,3}){3})" +
"(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})" +
"(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})" +
"(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])" +
"(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}" +
"(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))" +
"|" +
// IPv6 (excl. Private Addresses)
"\\[(" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){6,6}[0-9a-fA-F]{1,4}|" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,6}:|" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,5}:[0-9a-fA-F]{1,4}|" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,4}(:[0-9a-fA-F]{1,4}){1,2}|" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,3}(:[0-9a-fA-F]{1,4}){1,3}|" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:){0,2}(:[0-9a-fA-F]{1,4}){1,4}|" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)([0-9a-fA-F]{1,4}:)?(:[0-9a-fA-F]{1,4}){1,5}|" +
"((([0-9a-eA-E][0-9a-fA-F]{0,3})|([fF]([0-9a-cfA-CF][0-9a-fA-F]{0,2})?)|([fF][eE]([0-7c-fC-F][0-9a-fA-F]?)?)):)((:[0-9a-fA-F]{1,4}){1,6})|" +
":((:[0-9a-fA-F]{1,4}){1,7}|:)|" +
"::(ffff(:0{1,4}){0,1}:){0,1}" +
"((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}" +
"(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|" +
"([0-9a-fA-F]{1,4}:){1,4}:" +
"((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}" +
"(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])" +
")\\]" +
"|" +
// Hostname
"(?:xn--[a-z0-9\\-]{1,59}|(?:(?:[a-z\\u00a1-\\uffff0-9]-*){0,62}[a-z\\u00a1-\\uffff0-9]{1,63}))" +
// Domain name
"(?:\\.(?:xn--[a-z0-9\\-]{1,59}|(?:[a-z\\u00a1-\\uffff0-9]-*){0,62}[a-z\\u00a1-\\uffff0-9]{1,63}))*" +
// TLD
"(?:\\.(?:xn--[a-z0-9\\-]{1,59}|(?:[a-z\\u00a1-\\uffff]{2,63})))" +
"\\.?" +
")" +
// Port Number
"(?::\\d{2,5})?" +
// Pathname
"(?:[/?#]\\S*)?" +
"$", "i"
);
@lzlrd
Copy link
Author

lzlrd commented Sep 11, 2021

TODO: Avoid any form of :: and ::1 within the IPv6 RegEx.

@lzlrd
Copy link
Author

lzlrd commented Feb 22, 2023

TODO: Support Full and Compressed IPv6 and IPv6v4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment