Skip to content

Instantly share code, notes, and snippets.

@matthiasott
Last active July 27, 2024 05:55
Show Gist options
  • Save matthiasott/0ee80bcce3ef65c4d7eeabb739c954ba to your computer and use it in GitHub Desktop.
Save matthiasott/0ee80bcce3ef65c4d7eeabb739c954ba to your computer and use it in GitHub Desktop.
Extract valid URLs from a given string
<?php
# Extract valid URLs from a given string
# Licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
# http://creativecommons.org/publicdomain/zero/1.0/
# Based on WordPress' _extract_urls function (https://github.com/WordPress/WordPress/blob/master/wp-includes/functions.php),
# but using the regular expression by @diegoperini (https://gist.github.com/dperini/729294) – which is close to the perfect URL validation regex (https://mathiasbynens.be/demo/url-regex)
# See it in action here: https://regex101.com/r/LHqKuO/1
function extractUrls( $string ) {
preg_match_all("/(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\"\'\s]*)?/uix", $string, $post_links);
$post_links = array_unique( array_map( 'html_entity_decode', $post_links[0] ) );
return array_values( $post_links );
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment