-
-
Save marcgg/733592 to your computer and use it in GitHub Desktop.
# Matches patterns such as: | |
# https://www.facebook.com/my_page_id => my_page_id | |
# http://www.facebook.com/my_page_id => my_page_id | |
# http://www.facebook.com/#!/my_page_id => my_page_id | |
# http://www.facebook.com/pages/Paris-France/Vanity-Url/123456?v=app_555 => 123456 | |
# http://www.facebook.com/pages/Vanity-Url/45678 => 45678 | |
# http://www.facebook.com/#!/page_with_1_number => page_with_1_number | |
# http://www.facebook.com/bounce_page#!/pages/Vanity-Url/45678 => 45678 | |
# http://www.facebook.com/bounce_page#!/my_page_id?v=app_166292090072334 => my_page_id | |
# http://www.facebook.com/my.page.is.great => my.page.is.great | |
/(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/ |
updated to use this:
/^(https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/
it would allow ftp:// without the ^ and the ?:
errr... without the ^ and with the ?:
This regex works great for english characters, but it doesn't work for non-english ones.
For example, it doesn't work for FB pages such as
http://www.facebook.com/pages/ΤΑ-ΦΡΟΥΤΑ-ΤΟΥ-ΔΑΣΟΥΣ/145298928829093
which when copied and pasted in a textbox from the browser url box returns as
@Manalof Do you know how to update it to match those pages?
Doesn't work for paths with a trailing slash. Try adding a check for that (added (\/)?
).
/(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(\/)?([\w\-\.]*)/
To address the UTF-8 comment: \b only does ASCII. To work with UTF-8, you need to define your own word boundaries.
The solution here is probably best to use an inverse character class ("anything that is not a slash or question mark") to find the usernames. This works in this situation since we know the only place special characters would appear is in the username.
Here's my attempt at filtering out other languages.
/(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(\/)?([^/?]*)/
Tested it against:
http://www.facebook.com/Φc?ref=hl => [2] = Φc
http://www.facebook.com/Φc/?ref=hl => [2] = Φc
http://www.facebook.com/pages/Φc/1234?ref=hl => [2] = Φc
http://www.facebook.com/pages/Φc/1234/?ref=hl => [2] = Φc
This also includes my forward slash escape code in my previous comment.
Something you can do is two separate regex's to try and find a numeric ID first then on failure find the vanity id.
I've used this pretty basic regex to look for numbers of length 10 or greater:
/(\d{10,})/
then take the result if one is found and if one is not found use the original regex from here. Obviously this might fail if the name of the page is number1234567890 but that is a pretty special case.
I have found this to work for me pretty well but criticism welcome
Example URL:
https://www.facebook.com/pages/GHOST-Caf%C3%A9/627191887397533?fref=ts
Anyone who wants this for Python:
https?://(www.)?facebook.com/(\w_#!/)?(pages/)?(([\w-]_/)*)?(?P<page_id>[\w.-]+)
Matching fails if url contains closing slash, like https://www.facebook.com/my_page_id/
The regex below is more simpler working solution:
(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:.+\/)*([\w\.\-]+)
^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(\/)?([^/?\s]*)(?:/|&|\?)?.*$
support also m.facebook.com, touch.facebook.com, fb.me, fb.com
example :
https://www.facebook.com/TrollsdeGeek/videos/1612020065488125/
https://www.facebook.com/Φc?ref=hl
http://m.facebook.com/Φc?ref=hl
http://www.facebook.com/Φc/?ref=hl
http://www.facebook.com/pages/Φc/1234?ref=hl
http://www.facebook.com/pages/Φc/1234/?ref=hl
http://www.facebook.com/mypageusername?ref=hl
http://touch.facebook.com/mypageusername?ref=hl
https://www.facebook.com/mypageusername?ref=hl
http://www.facebook.com/pages/Vanity-Url/12345678?ref=hl
https://www.facebook.com/pages/Vanity-Url/12345678?ref=hl
http://www.facebook.com/pages/%CE%A4%CE%91-%CE%A6%CE%A1%CE%9F%CE%A5%CE%A4%CE%91-%CE%A4%CE%9F%CE%A5-%CE%94%CE%91%CE%A3%CE%9F%CE%A5%CE%A3/145298928829093
http://www.facebook.com/pages/ΤΑ-ΦΡΟΥΤΑ-ΤΟΥ-ΔΑΣΟΥΣ/145298928829093
https://www.facebook.com/Babies.Fan.Page
https://www.facebook.com/pages/Babies.Fan.Page/121166161229757
https://www.facebook.com/pages/GHOST-Caf%C3%A9/627191887397533?fref=ts
@BastienMottier
Just a little mistake. Unescaped slash at the end.
This below works better
^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(\/)?([^/?\s]*)(?:\/|&|\?)?.*$
@Raphhh
this one below also works for profile.php
^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^/?\s]*)(?:\/|&|\?)?.*$
@lekiend There was one more missing slash.
Also, this below disallows host URLs ending in a slash with no profile e.g. https://www.facebook.com, http://fb.me, https://m.facebook.com/
^(?:https?://)?(?:www.|m.|touch.)?(?:facebook.com|fb(?:.me|.com))/(?!$)(?:(?:\w)#!/)?(?:pages/)?(?:[\w-]/)?(?:/)?(?:profile.php?id=)?([^\/?\s])(?:/|&|?)?.*$
where did u guys know how to write all these?
doesnt work for Unicode containing pages, like this:
https://www.facebook.com/საწარმო-SabaDesign-927047470710565/?ref=safrghbeდფწერგ
Props for all contributers!!
Everything incorporated above, with just one more forgotten escape character added, gives me this:
/^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?\s]*)(?:\/|&|\?)?.*$/
Which works GREAT (yay), except when the url has arguments after the profile.php?id= or fbid= part like these urls:
https://www.facebook.com/profile.php?id=114376375296751&fref=pb&hc_location=friends_tab
returns 114376375296751&fref=pb&hc_location=friends_tab
instead of 114376375296751
and
https://www.facebook.com/photo.php?fbid=114376375296751&set=a.114376371963418.13845.114375165296872&type=1&theater
returns 114376375296751&set=a.114376371963418.13845.114375165296872&type=1&theater
Someone care to snip everything off after the first &?
/^(?:https?://)?(?:www.|m.|touch.)?(?:facebook.com|fb(?:.me|.com))/(?!$)(?:(?:\w)#!/)?(?:pages/)?(?:photo.php?fbid=)?(?:[\w-]/)?(?:/)?(?:profile.php?id=)?([^\/?\&\s])(?:/|&|?)?.*?$/
this should exclude the & as well
Ayal's alternative didn't work for me. It worked when I got hoofdletterj's answer and added &
before \s
(Ayal's partial answer):
/^(?:https?:\/\/)?(?:www\.|m\.|touch\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*$/
I want to display the post's images of Facebook on my website by just copy the address of the image is there any regex for that i have use the above regex but it doesn't help me
preg_match_all('/(https?://\S+.(?:jpg|png|gif))+/', $string, $match);
i am using this regex but it display all the other images except Facebook''s images
I updated the regex to match even mbasic:
^(?:https?:\/\/)?(?:www\.|m\.|touch\.|mbasic\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*$
It works for urls like:
https://mbasic.facebook.com/BMW/?refid=46&__xts__%5B0%5D=12.%7B%22unit_id_click_type%22%3A%22graph_search_results_item_tapped%22%2C%22click_type%22%3A%22result%22%2C%22module_id%22%3A2%2C%22result_id%22%3A22893372268%2C%22session_id%22%3A%22e4709b011e94ec8207a44ffedd1d2901%22%2C%22module_role%22%3A%22ENTITY_PAGES%22%2C%22unit_id%22%3A%22browse_rl%3Ab2718be4-bbd0-4764-9c31-6908c431daa2%22%2C%22browse_result_type%22%3A%22browse_type_page%22%2C%22unit_id_result_id%22%3A22893372268%2C%22module_result_position%22%3A0%7D
There's also mobile.facebook.com, so here's the new regex:
^(?:https?:\/\/)?(?:www\.|m\.|mobile\.|touch\.|mbasic\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*$
Which can match stuff like:
https://mobile.facebook.com/BMW/
(?:https?:\/\/)?(?:www\.|m\.|mobile\.|touch\.|mbasic\.)?(?:facebook\.com|fb(?:\.me|\.com))\/(?!$)(?:(?:\w)*#!\/)?(?:pages\/|pg\/)?(?:photo\.php\?fbid=)?(?:[\w\-]*\/)*?(?:\/)?(?:profile\.php\?id=)?([^\/?&\s]*)(?:\/|&|\?)?.*
This will match /pg/ URL as weel
https://m.facebook.com/pg/DwayneTheRockJohnsonFanClub/photos/
...................................................^
@neilshEl tuyo no capta patrones como`
[](URL ``https://www.google.com/search?q=https+%2F%2Fwww.facebook.com%2Fprofile.php+id%3D10005&oq=&gs_lcrp=EgZjaHJvbWUqCQgBECMYJxjqAjIHCAAQRRiwATIJCAEQIxgnGOoCMgkIAhAjGCcY6gIyCQgDECMYJxjqAjIJCAQQIxgnGOoCMg8IBRAuGCcYxwEY6gIY0QMyCQgGECMYJxjqAjIJCAcQIxgnGOoCMgkICBAjGCcY6gIyCQgJEEUYOxjCA9IBBi0xajBqN6gCCrACAQ&client=ms-android-americamovil-mx-revc&sourceid=chrome-mobile&ie=UTF-8
@damusnet Fixed for handling https