-
Star
(146)
You must be signed in to star a gist -
Fork
(35)
You must be signed in to fork a gist
-
-
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
<?php | |
//returns a big old hunk of JSON from a non-private IG account page. | |
function scrape_insta($username) { | |
$insta_source = file_get_contents('http://instagram.com/'.$username); | |
$shards = explode('window._sharedData = ', $insta_source); | |
$insta_json = explode(';</script>', $shards[1]); | |
$insta_array = json_decode($insta_json[0], TRUE); | |
return $insta_array; | |
} | |
//Supply a username | |
$my_account = 'cosmocatalano'; | |
//Do the deed | |
$results_array = scrape_insta($my_account); | |
//An example of where to go from there | |
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0]; | |
echo 'Latest Photo:<br/>'; | |
echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>'; | |
echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>'; | |
/* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff. | |
echo 'Taken at '.$latest_array['location']['name'].'<br/>'; | |
//Heck, lets compare it to a useful API, just for kicks. | |
echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">'; | |
?> | |
*/ |
According to Instagram's documentation for their API they want you to have a API Key for every User who wishes to pull their photos (keeping the API key in sandbox mode). Again this seems unrealistic to me. You "can" submit your App on Instagram for review (which theoretically "may" let you pull photos for other Users from the same API key), but I highly doubt they'd approve an app that pulls images off their servers (like the above mentioned scripts do). I also do not specifically see this supported with their current API documentation. Nonetheless I have submitted my app for review. So I will let you know how that turns out.
were you able to find any good solution to this issue ? what is the best way we can bypass this login page ? @bateller
I've hit the same issue and had to spend fair amount of time on it for my own project.
Here is the code I came up with: https://github.com/restyler/instagram-php-scraper - it uses Rapid API ( https://rapidapi.com/restyler/api/instagram40 ) to bypass ip restrictions.
@restyler are you fetching the user's post details ie. suppose if i provide you a instagram post link does it return the the path where it's stored ? instagram ususally detect the datacenter IP.
i can see you've a method getMediaByUrl
but I'm not sure how you're dealing with the IP, please let me know. Thanks
@restyler are you fetching the user's post details ie. suppose if i provide you a instagram post link does it return the the path where it's stored ? instagram ususally detect the datacenter IP.
i can see you've a methodgetMediaByUrl
but I'm not sure how you're dealing with the IP, please let me know. Thanks
Yes. Technically there is a proxy
method in the API which allows you to submit any instagram.com* link and get raw HTML/JSON response, and there are helper endpoints like getMediaByUrl
you've mentioned, if you don't need raw response. I'd recommend use helpers when it is feasible, because this approach uses more optimisations on the API side.
To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.
To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.
@restyler thanks for replying really appreciated, can you tell me a little more about your login on how you are handling from not getting blocked by instagram, are you using any third party API or anything which provides new IP on each request ? because by looking your code it seems like you're just asking proxy credentials from user and connecting to that proxy server if i'm not wrong. please let me know your comments. Thanks.
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020
By any chance does anyone happen to have a way to collect followers without logging in?
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01By any chance does anyone happen to have a way to collect followers without logging in?
Page not found
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01
By any chance does anyone happen to have a way to collect followers without logging in?Page not found
updated link
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.
Best regards and thanks
Axel Arnold Bangert
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/simple-instagram-api to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..
This GitHub repository is a great resource for ios app urls, but it could be updated for more relevance. By the way, have you explored Insta Pro APK for advanced Instagram features?
were you able to find any good solution to this issue ? what is the best way we can bypass this login page ? @bateller