-
-
Save marcgg/733592 to your computer and use it in GitHub Desktop.
# Matches patterns such as: | |
# https://www.facebook.com/my_page_id => my_page_id | |
# http://www.facebook.com/my_page_id => my_page_id | |
# http://www.facebook.com/#!/my_page_id => my_page_id | |
# http://www.facebook.com/pages/Paris-France/Vanity-Url/123456?v=app_555 => 123456 | |
# http://www.facebook.com/pages/Vanity-Url/45678 => 45678 | |
# http://www.facebook.com/#!/page_with_1_number => page_with_1_number | |
# http://www.facebook.com/bounce_page#!/pages/Vanity-Url/45678 => 45678 | |
# http://www.facebook.com/bounce_page#!/my_page_id?v=app_166292090072334 => my_page_id | |
# http://www.facebook.com/my.page.is.great => my.page.is.great | |
/(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/ |
Anyone who wants this for Python:
https?://(www.)?facebook.com/(\w_#!/)?(pages/)?(([\w-]_/)*)?(?P<page_id>[\w.-]+)
Matching fails if url contains closing slash, like https://www.facebook.com/my_page_id/
The regex below is more simpler working solution:
(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:.+\/)*([\w\.\-]+)
^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(\/)?([^/?\s]*)(?:/|&|\?)?.*$
support also m.facebook.com, touch.facebook.com, fb.me, fb.com
example :
https://www.facebook.com/TrollsdeGeek/videos/1612020065488125/
https://www.facebook.com/Φc?ref=hl
http://m.facebook.com/Φc?ref=hl
http://www.facebook.com/Φc/?ref=hl
http://www.facebook.com/pages/Φc/1234?ref=hl
http://www.facebook.com/pages/Φc/1234/?ref=hl
http://www.facebook.com/mypageusername?ref=hl
http://touch.facebook.com/mypageusername?ref=hl
https://www.facebook.com/mypageusername?ref=hl
http://www.facebook.com/pages/Vanity-Url/12345678?ref=hl
https://www.facebook.com/pages/Vanity-Url/12345678?ref=hl
http://www.facebook.com/pages/%CE%A4%CE%91-%CE%A6%CE%A1%CE%9F%CE%A5%CE%A4%CE%91-%CE%A4%CE%9F%CE%A5-%CE%94%CE%91%CE%A3%CE%9F%CE%A5%CE%A3/145298928829093
http://www.facebook.com/pages/ΤΑ-ΦΡΟΥΤΑ-ΤΟΥ-ΔΑΣΟΥΣ/145298928829093
https://www.facebook.com/Babies.Fan.Page
https://www.facebook.com/pages/Babies.Fan.Page/121166161229757
https://www.facebook.com/pages/GHOST-Caf%C3%A9/627191887397533?fref=ts
@BastienMottier
Just a little mistake. Unescaped slash at the end.
This below works better
^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(\/)?([^/?\s]*)(?:\/|&|\?)?.*$
@Raphhh
this one below also works for profile.php
^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^/?\s]*)(?:\/|&|\?)?.*$
@lekiend There was one more missing slash.
Also, this below disallows host URLs ending in a slash with no profile e.g. https://www.facebook.com, http://fb.me, https://m.facebook.com/
^(?:https?://)?(?:www.|m.|touch.)?(?:facebook.com|fb(?:.me|.com))/(?!$)(?:(?:\w)#!/)?(?:pages/)?(?:[\w-]/)?(?:/)?(?:profile.php?id=)?([^\/?\s])(?:/|&|?)?.*$
where did u guys know how to write all these?
doesnt work for Unicode containing pages, like this:
https://www.facebook.com/საწარმო-SabaDesign-927047470710565/?ref=safrghbeდფწერგ
Props for all contributers!!
Everything incorporated above, with just one more forgotten escape character added, gives me this:
/^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?\s]*)(?:\/|&|\?)?.*$/
Which works GREAT (yay), except when the url has arguments after the profile.php?id= or fbid= part like these urls:
https://www.facebook.com/profile.php?id=114376375296751&fref=pb&hc_location=friends_tab
returns 114376375296751&fref=pb&hc_location=friends_tab
instead of 114376375296751
and
https://www.facebook.com/photo.php?fbid=114376375296751&set=a.114376371963418.13845.114375165296872&type=1&theater
returns 114376375296751&set=a.114376371963418.13845.114375165296872&type=1&theater
Someone care to snip everything off after the first &?
/^(?:https?://)?(?:www.|m.|touch.)?(?:facebook.com|fb(?:.me|.com))/(?!$)(?:(?:\w)#!/)?(?:pages/)?(?:photo.php?fbid=)?(?:[\w-]/)?(?:/)?(?:profile.php?id=)?([^\/?\&\s])(?:/|&|?)?.*?$/
this should exclude the & as well
Ayal's alternative didn't work for me. It worked when I got hoofdletterj's answer and added &
before \s
(Ayal's partial answer):
/^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*$/
I want to display the post's images of Facebook on my website by just copy the address of the image is there any regex for that i have use the above regex but it doesn't help me
preg_match_all('/(https?://\S+.(?:jpg|png|gif))+/', $string, $match);
i am using this regex but it display all the other images except Facebook''s images
I updated the regex to match even mbasic:
^(?:https?:\/\/)?(?:www\.|m\.|touch\.|mbasic\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*$
It works for urls like:
https://mbasic.facebook.com/BMW/?refid=46&__xts__%5B0%5D=12.%7B%22unit_id_click_type%22%3A%22graph_search_results_item_tapped%22%2C%22click_type%22%3A%22result%22%2C%22module_id%22%3A2%2C%22result_id%22%3A22893372268%2C%22session_id%22%3A%22e4709b011e94ec8207a44ffedd1d2901%22%2C%22module_role%22%3A%22ENTITY_PAGES%22%2C%22unit_id%22%3A%22browse_rl%3Ab2718be4-bbd0-4764-9c31-6908c431daa2%22%2C%22browse_result_type%22%3A%22browse_type_page%22%2C%22unit_id_result_id%22%3A22893372268%2C%22module_result_position%22%3A0%7D
There's also mobile.facebook.com, so here's the new regex:
^(?:https?:\/\/)?(?:www\.|m\.|mobile\.|touch\.|mbasic\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*$
Which can match stuff like:
https://mobile.facebook.com/BMW/
(?:https?:\/\/)?(?:www\.|m\.|mobile\.|touch\.|mbasic\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/|pg\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*
This will match /pg/ URL as weel
https://m.facebook.com/pg/DwayneTheRockJohnsonFanClub/photos/
...................................................^
@neilshEl tuyo no capta patrones como`
[](URL ``https://www.google.com/search?q=https+%2F%2Fwww.facebook.com%2Fprofile.php+id%3D10005&oq=&gs_lcrp=EgZjaHJvbWUqCQgBECMYJxjqAjIHCAAQRRiwATIJCAEQIxgnGOoCMgkIAhAjGCcY6gIyCQgDECMYJxjqAjIJCAQQIxgnGOoCMg8IBRAuGCcYxwEY6gIY0QMyCQgGECMYJxjqAjIJCAcQIxgnGOoCMgkICBAjGCcY6gIyCQgJEEUYOxjCA9IBBi0xajBqN6gCCrACAQ&client=ms-android-americamovil-mx-revc&sourceid=chrome-mobile&ie=UTF-8
Something you can do is two separate regex's to try and find a numeric ID first then on failure find the vanity id.
I've used this pretty basic regex to look for numbers of length 10 or greater:
/(\d{10,})/
then take the result if one is found and if one is not found use the original regex from here. Obviously this might fail if the name of the page is number1234567890 but that is a pretty special case.
I have found this to work for me pretty well but criticism welcome
Example URL:
https://www.facebook.com/pages/GHOST-Caf%C3%A9/627191887397533?fref=ts