-
-
Save cosmocatalano/4544576 to your computer and use it in GitHub Desktop.
<?php | |
//returns a big old hunk of JSON from a non-private IG account page. | |
function scrape_insta($username) { | |
$insta_source = file_get_contents('http://instagram.com/'.$username); | |
$shards = explode('window._sharedData = ', $insta_source); | |
$insta_json = explode(';</script>', $shards[1]); | |
$insta_array = json_decode($insta_json[0], TRUE); | |
return $insta_array; | |
} | |
//Supply a username | |
$my_account = 'cosmocatalano'; | |
//Do the deed | |
$results_array = scrape_insta($my_account); | |
//An example of where to go from there | |
$latest_array = $results_array['entry_data']['ProfilePage'][0]['user']['media']['nodes'][0]; | |
echo 'Latest Photo:<br/>'; | |
echo '<a href="http://instagram.com/p/'.$latest_array['code'].'"><img src="'.$latest_array['display_src'].'"></a></br>'; | |
echo 'Likes: '.$latest_array['likes']['count'].' - Comments: '.$latest_array['comments']['count'].'<br/>'; | |
/* BAH! An Instagram site redesign in June 2015 broke quick retrieval of captions, locations and some other stuff. | |
echo 'Taken at '.$latest_array['location']['name'].'<br/>'; | |
//Heck, lets compare it to a useful API, just for kicks. | |
echo '<img src="http://maps.googleapis.com/maps/api/staticmap?markers=color:red%7Clabel:X%7C'.$latest_array['location']['latitude'].','.$latest_array['location']['longitude'].'&zoom=13&size=300x150&sensor=false">'; | |
?> | |
*/ |
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
Hi 'Cosmocatalano' [ nomen est omen?] :) ,
this is a very interesting solution. I only try it on local host so I have no problem with CORS. But the array names seem to be changed completely. The only one which is still the same seems to be 'entry_data'. Is this changed response still usable with alternative array 'names'? This would be very interesting.
Best regards and thanks
Axel Arnold Bangert
looks like instagram blocking scraping using file_get_contents/curl anyone got solution? i wonder how online web scraping tools are working then without block?
I guess it is just the right amount of good proxies.. I am using https://rapidapi.com/neotank/api/simple-instagram-api to avoid dealing with proxies now because they fail all the time (for Instagram) and get 302 redirect to login..
updated link
https://gist.github.com/ycaty/23cf1c17e6bb6e353f5823b3392c1e01#file-instagram-user-tag-scraping-2020