Created
November 12, 2017 14:04
-
-
Save wuyongzheng/f05a812fdf95b4bb51a25360c1a7ea11 to your computer and use it in GitHub Desktop.
Tweeter user timeline crawler without authentication
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ bash crawl.sh | |
pos 0 | |
pos 923679343724838912 | |
pos 916681099698364416 | |
pos 913192547035488257 | |
... | |
pos 695457113452126208 | |
pos 695457113452126208 | |
$ ls | |
crawl-0.json | |
crawl-753078410369437696.json | |
crawl-793631421369757697.json | |
... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# change this | |
user=SMRT_Singapore | |
while true ; do | |
if [ -f crawl-0.json ] ; then | |
pos=`cat crawl-*.json | tr ',' '\n' | grep 'min_position.:.[0-9]' | sed -e 's/.*:"//' -e 's/".*//' | sort -n | head -n 1` | |
url="https://twitter.com/i/profiles/show/$user/timeline/tweets?include_available_features=1&include_entities=1&max_position=$pos&reset_error_state=false" | |
else | |
pos=0 | |
url="https://twitter.com/i/profiles/show/$user/timeline/tweets?include_available_features=1&include_entities=1&reset_error_state=false" | |
fi | |
echo "pos $pos" | |
if [ -f crawl-$pos.json ] ; then break ; fi | |
wget -q -O crawl-$pos.json "$url" | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment