-
-
Save damusnet/3323622 to your computer and use it in GitHub Desktop.
Regex to get the Facebook Page ID from a given URL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
$tests = array( | |
'http://www.facebook.com/page_id' => 'page_id', | |
'https://www.facebook.com/page_id' => 'page_id', | |
'http://www.facebook.com/#!/page_id' => 'page_id', | |
'http://www.facebook.com/pages/Parisé-France/Vanity-Url/123456?v=app_555' => '123456', | |
'http://www.facebook.com/pages/Vanity-Url/45678' => '45678', | |
'http://www.facebook.com/#!/page_with_1_number' => 'page_with_1_number', | |
'http://www.facebook.com/bounce_page#!/pages/Vanity-Url/45678' => '45678', | |
'http://www.facebook.com/bounce_page#!/my_page_id?v=app_166292090072334' => 'my_page_id', | |
'http://www.facebook.com/pages/some-café-or-èàù-url/123456' => '123456', | |
'http://www.facebook.com/some-vanity-url/123456' => '123456', | |
'http://www.facebook.com/some.page.9' => 'some.page.9', | |
'http://www.facebook.com/a_page_with_id/123456789?ref=hl' => '123456789', | |
'vanityurl/123456789?ref=hl' => '123456789', | |
'pages/really-long-vanity-url-page/123456789?ref=hl' => '123456789', | |
'http://www.facebook.com/profile.php?id=123456789' => '123456789' | |
); | |
echo '<table>'; | |
foreach ($tests as $url => $result) { | |
echo '<tr><td>'; | |
echo $url; | |
echo '</td><td>'; | |
echo preg_replace( | |
'#' | |
. '(?:https?://)?' | |
. '(?:www.)?' | |
. '(?:facebook.com/)?' | |
. '(?:(?:\w)*\#!/)?' | |
. '(?:pages/)?' | |
. '(?:[?\p{L}\-_]*/)?' | |
. '(?:[?\w\-_]*/)?' | |
. '(?:profile.php\?id=(?=\d.*))?' | |
. '([\d\-]*)?' | |
. '(?:\?.*)?' | |
. '#u', | |
'$1', | |
$url | |
); | |
echo '</td></tr>'; | |
} | |
echo '</table>'; | |
?> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Based on a small sample of real world data, I added a few use cases and improved on the regex:
/
to#
to be more readable (based on a comment this SO question: http://stackoverflow.com/q/11907178/233404)u
modifier and replaced\w
by\p{L}
to account for possible accentsAnd I wrapped the whole in a small test to easily add more use cases. Hope this helps.