Skip to content

Instantly share code, notes, and snippets.

@plallin
Created March 14, 2017 21:55
Show Gist options
  • Save plallin/1e1217c3e3fc162fceeed9dfde60f65c to your computer and use it in GitHub Desktop.
Save plallin/1e1217c3e3fc162fceeed9dfde60f65c to your computer and use it in GitHub Desktop.
"""
The account was scraped using the following Twitter library: https://github.com/sixohsix/twitter
I prefer this library over Tweepy as it is much "closer" to the API (less abstraction).
The way the following script works is basically as follows:
- First, I get all of the user's friend (I get max 200 friend per call) and store their the friends' names in a list
- Once I have the list of all the user's friends, I iterate through it to get the friends of the friends
- In a second script, I count the number of times a particular username occurs, and then return the sorted list of
friends (the most followed ones).
- That's it, more or less
AltShiftX needed his personal account scraped (not his account AltShiftX). The account @AltShiftX can be scraped in less
than 45 calls which takes about 30 minutes. He follows ~420 people on his personal account, so it took longer.
I made some calculations and the minimum time it could take (that is, if all his friends followed < 200 people) is
7.5 hours. The maximum time it could take (that is if all your friends followed > 1000 people as I'm only getting a list
up to a 1000) is about 33.5. I ran the script from my Raspberry Pi starting Friday 3/4pm and the script was finished
on Saturday circa 1pm.
There are way to improve this: if I had used friend/ids instead of friends/list, I could have scraped 5,000 friends per
call, so for example, 420 friends take 3 friends/lists calls (200+200+20) but a single friend/ids call. The disadvantage
is that friends/list give you the full details of the accounts scraped, while ids only return their id number. However,
You could then use users/lookup to get the full details of up to 100 ids in 1 call, so it would be pretty fast to get
a decent amount of details as 15 calls will get you 1500 (100 * 15) user details, which is a lot :-)
"""
import time
import json
from twitter import *
import datetime
print("Starting to scrap! " + str(datetime.datetime.now()))
with open("config.json") as data_file: # just a config file I have to save my keys
data = json.load(data_file)
data = data["MyAccountName"] # Account whose keys I will use (I have a couple of accounts)
CONSUMER_KEY = data["consumer_key"]
SECRET_CONSUMER_KEY = data["secret_consumer_key"]
ACCESS_TOKEN = data["access_token"]
SECRET_ACCESS_TOKEN = data["secret_access_token"]
t = Twitter(auth=OAuth(ACCESS_TOKEN, SECRET_ACCESS_TOKEN, CONSUMER_KEY, SECRET_CONSUMER_KEY))
account_to_be_scraped = "Insert account to be scraped here"
file_out = "recommendations.txt" # the friends of his friends will be added to that file.
friends = [] # holds the list of friends of account_to_be_scraped
next_page_loc = -1 # for cursor purposes; if there is no next page, cursor points to location 0
friends_list = t.friends.list(screen_name=account_to_be_scraped,
count=200,
skip_status=True)
while next_page_loc != 0:
next_page_loc = friends_list['next_cursor'] # get location of next page
for friend in friends_list['users']:
friends.append(friend['screen_name'])
if next_page_loc == 0:
break # we reached the end of the user's friend list.
else:
friends_list = t.friends.list(screen_name=account_to_be_scraped,
cursor=next_page_loc,
count=200,
skip_status=True)
with open(file_out, "a") as out:
for friend in friends:
out.write(friend + "\n")
for friend in friends:
friend_friends_list = t.friends.list(screen_name=friend,
count=200,
skip_status=True)
count = 0 # required for friend following a large number of person; we want to scrap the first 1000 (5 * 200) only
while count < 5:
next_page_loc = friend_friends_list['next_cursor']
for follow in friend_friends_list['users']:
with open(file_out, "a") as out:
out.write(follow['screen_name'] + "\n")
remaining_calls = t.application.rate_limit_status(resources="friends")["resources"]["friends"]["/friends/list"]["remaining"]
if remaining_calls <= 1:
print("Got sleepy while scrapping {}'s data...".format(friend))
time.sleep(60 * 15) # wait 15 minutes for API limit to replenish
remaining_calls = t.application.rate_limit_status(resources="friends")["resources"]["friends"]["/friends/list"]["remaining"]
print("sleep over! I have now {} calls left".format(remaining_calls))
if next_page_loc == 0:
break # end of friend list.
else:
friend_friends_list = t.friends.list(screen_name=friend,
cursor=next_page_loc,
count=200,
skip_status=True)
count += 1
print("over and out :-)" + str(datetime.datetime.now()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment