-
-
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
<?php | |
//returns a big old hunk of JSON from a non-private IG account page. | |
function scrape_insta($username) { | |
$insta_source = file_get_contents('http://instagram.com/'.$username); | |
$shards = explode('window._sharedData = ', $insta_source); | |
$insta_json = explode(';</script>', $shards[1]); | |
$insta_array = json_decode($insta_json[0], TRUE); | |
return $insta_array; | |
} | |
//Supply a username | |
$my_account = 'cosmocatalano'; | |
//Do the deed | |
$results_array = scrape_insta($my_account); | |
//An example of where to go from there | |
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0]; | |
echo 'Latest Photo:<br/>'; | |
echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>'; | |
echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>'; | |
/* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff. | |
echo 'Taken at '.$latest_array['location']['name'].'<br/>'; | |
//Heck, lets compare it to a useful API, just for kicks. | |
echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">'; | |
?> | |
*/ |
Is this updated?
Is this updated?
change
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0];
to
$latest_array = $results_array['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['edges'];
I created a small program which extracts data from this endpoint and creates a gallery kinda thing. It is in JavaScript, but all the arrays and stuff are same so maybe it would be useful for some people trying to migrate to the new changes :)
https://github.com/SwetankPoddar/dynamic-instagram-gallery
does this still work august 2020? Seems all my serverside options have been blocked (curl & filegetcontents). Only javascript options (axios) are working. But i want a cronjob scraper so i need serverside solution.
does this still work august 2020? Seems all my serverside options have been blocked (curl & filegetcontents). Only javascript options (axios) are working. But i want a cronjob scraper so i need serverside solution.
the fix i posted above worked a couple weeks ago.
Doesn't seem to be working
As of Sep 2020, the following revisions must be made:
$latest_array = $results_array['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']['edges'][0]['node'];
echo 'Latest Photo:<br/>';
echo '<a href="http://instagram.com/p/'.$latest_array['shortcode'].'"><img src="'.$latest_array['thumbnail_src'].'"></a></br>';
echo 'Likes: '.$latest_array['edge_media_preview_like']['count'].' - Comments: '.$latest_array['edge_media_to_comment']['count'].'<br/>';
The referenced code still works (specifically the update provided by @garudacrafts on Mar 21, 2018).
The issue is Instagram over the last few weeks or so has restricted their unlogged-in (guest) access (most likely based on IP address).
After a large amount of queries to their servers, they will begin showing a "Please login" page (See below. Mine shows a "processing circle" where the login would be, because I have JavaScript turned off).
The only way around this would be to have each of your Users on Instagram who wish to use this process create an API Key (This is unrealistic, boo). Otherwise you'll need to use a proxy when issuing the request to Instagram so it doesn't see you hit their servers multiple times from the same IP address.
Picture below is what I see when going directly to a valid public user account after multiple requests from the same IP address (the actual number of requests that trigger this 'login' screen is currently unknown). Eg: https://instagram.com/username/
Using the same above referenced script or even postaddictme/instagram-php-scraper on a brand new IP address that hasn't hit Instagram's servers work just fine. However after multiple queries (once the IP is blacklisted), both the above referenced script and postaddictme/instagram-php-scraper begin to fail.
According to Instagram's documentation for their API they want you to have a API Key for every User who wishes to pull their photos (keeping the API key in sandbox mode). Again this seems unrealistic to me. You "can" submit your App on Instagram for review (which theoretically "may" let you pull photos for other Users from the same API key), but I highly doubt they'd approve an app that pulls images off their servers (like the above mentioned scripts do). I also do not specifically see this supported with their current API documentation. Nonetheless I have submitted my app for review. So I will let you know how that turns out.
were you able to find any good solution to this issue ? what is the best way we can bypass this login page ? @bateller
According to Instagram's documentation for their API they want you to have a API Key for every User who wishes to pull their photos (keeping the API key in sandbox mode). Again this seems unrealistic to me. You "can" submit your App on Instagram for review (which theoretically "may" let you pull photos for other Users from the same API key), but I highly doubt they'd approve an app that pulls images off their servers (like the above mentioned scripts do). I also do not specifically see this supported with their current API documentation. Nonetheless I have submitted my app for review. So I will let you know how that turns out.
were you able to find any good solution to this issue ? what is the best way we can bypass this login page ? @bateller
I've hit the same issue and had to spend fair amount of time on it for my own project.
Here is the code I came up with: https://github.com/restyler/instagram-php-scraper - it uses Rapid API ( https://rapidapi.com/restyler/api/instagram40 ) to bypass ip restrictions.
@restyler are you fetching the user's post details ie. suppose if i provide you a instagram post link does it return the the path where it's stored ? instagram ususally detect the datacenter IP.
i can see you've a method getMediaByUrl
but I'm not sure how you're dealing with the IP, please let me know. Thanks
@restyler are you fetching the user's post details ie. suppose if i provide you a instagram post link does it return the the path where it's stored ? instagram ususally detect the datacenter IP.
i can see you've a methodgetMediaByUrl
but I'm not sure how you're dealing with the IP, please let me know. Thanks
Yes. Technically there is a proxy
method in the API which allows you to submit any instagram.com* link and get raw HTML/JSON response, and there are helper endpoints like getMediaByUrl
you've mentioned, if you don't need raw response. I'd recommend use helpers when it is feasible, because this approach uses more optimisations on the API side.
To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.
To mitigate Instagram ip detection (on the API side) I use proxies which are usually not located in popular data center ip ranges.
@restyler thanks for replying really appreciated, can you tell me a little more about your login on how you are handling from not getting blocked by instagram, are you using any third party API or anything which provides new IP on each request ? because by looking your code it seems like you're just asking proxy credentials from user and connecting to that proxy server if i'm not wrong. please let me know your comments. Thanks.
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020
By any chance does anyone happen to have a way to collect followers without logging in?
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01By any chance does anyone happen to have a way to collect followers without logging in?
Page not found
hey really enjoyed this post. i made a quick lil mockup on the break down of scraping user tags without login.
https://gist.github.com/levlet/23cf1c17e6bb6e353f5823b3392c1e01
By any chance does anyone happen to have a way to collect followers without logging in?Page not found
updated link
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.
Best regards and thanks
Axel Arnold Bangert
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/simple-instagram-api to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..
Does anyone know if we can set a cookie with filegetcontents
Also does anyone know how to work with csrftoken? It looks like instagram needs that to be set..