Skip to content

Instantly share code, notes, and snippets.

@pcolazurdo
Last active February 19, 2021 00:02
Show Gist options
  • Save pcolazurdo/089507c9d6b2f5f1809b5e5095227201 to your computer and use it in GitHub Desktop.
Save pcolazurdo/089507c9d6b2f5f1809b5e5095227201 to your computer and use it in GitHub Desktop.
Doing some tweet Analysis from a tweet backup

Tweet Analysis

Prerequisite

  1. Setup an ENV variable called TWITTER_DATA pointing to where you extracted your twitter.zip content
cat tweet.json | jq . | grep expanded_url | grep -v twitter.com | cut -d\" -f4 | sort | uniq -c | wc -l
cat tweet.json | jq . | grep expanded_url | grep -v twitter.com | cut -d\" -f4 | sort | uniq -c | sort -n | awk '{print $2}' | head -10
cat tweet.json | jq '.[].full_text' | more
cat tweet.json | jq '.[] | .full_text, .urls[]?.expanded_url?, ' | more
cat tweet.json| jq '.[].tweet.entities?.user_mentions[]?.screen_name?' | sort | uniq -c | sort | less

Example to review one single user

cat tweet.json| jq '.[] | select (.tweet.entities?.user_mentions[]?.screen_name? == "pcolazurdo") | .tweet.full_text'

Example running the nodejs Extract command

docker run --rm -it -v ${}/data:/opt/local -v `pwd`:/opt/bin node node /opt/bin/likes_extract.js | jq '.[].fullText'

How to extract URLs from the tweet texts

docker run --rm -it -v ${TWITTER_DATA}/data:/opt/local -v `pwd`:/opt/bin node node /opt/bin/likes_extract.js | jq '.[].fullText' >full_text.txt

docker run --rm -it -v `pwd`:/opt/local python bash
    pip install urlextract
    cd /opt/local
    cat full_text.txt | ./url_extract.py | more

url_extract.py

#!/usr/local/bin/python
from urlextract import URLExtract
import fileinput

extractor = URLExtract()
for line in fileinput.input():
    urls = extractor.find_urls(line)
    print(line, urls) 

likes_extract.js

window = {}
window.YTD = {}
window.YTD.like = {}
require ('/opt/local/like.js')

window.YTD.like.part0.map(function (x) { 
  process.stdout.write(JSON.stringify(x)) 
});

tweets_extract.js

function prettyJSON(obj) {
    console.log(JSON.stringify(obj, null, 2));
}

window = {};
window.YTD={}
window.YTD.tweet={}
require ('/opt/local/like.js')
prettyJSON (window.YTD.tweet.part0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment