Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active October 28, 2021 14:13
Show Gist options
  • Save dannguyen/7c592c4559ee64f753e5 to your computer and use it in GitHub Desktop.
Save dannguyen/7c592c4559ee64f753e5 to your computer and use it in GitHub Desktop.
Using the command-line tools t and csvkit to track the #NICAR16 hashtag

Using the t and csvkit to quickly collect and analyze #nicar16 tweets from the command-line

The t command-line Twitter tool is a great way to work with Twitter information in a spreadsheet.

Its homepage with good installation instructions is here:

https://github.com/sferik/t

And I've written some related instructions about how to get an authentication token from Twitter:

http://www.compjour.org/tutorials/twitter-app-authentication-process/

Doing a basic query for a term

Once you have it installed and you're authenticated, you can do a basic search for Tweets like this:

$ t search all 'nicar16'

The default behavior is to present the tweets in a human-readable format:

   @mailbackwards
   Good morning Denver, I'm at #NICAR16. Find me and say hi (and then come to 
   our talk on Sunday)

   @tbtprojx
   RT @MarshallProj: And about building your own criminal justice data w 
   @ultracasual @gabrieldance, @kenandavis + more at 3:30 
   https://t.co/mTNK1a1Xox #NICAR16

   @sickmund
   RT @MarshallProj: And about building your own criminal justice data w 
   @ultracasual @gabrieldance, @kenandavis + more at 3:30 
   https://t.co/mTNK1a1Xox #NICAR16

   @tbtprojx
   RT @MarshallProj: #NICAR16: Learn how to keep those news apps skills sharp at 
   11:30, with @gabrieldancehttp://bit.ly/1nwH4Zd

   @rdmurphy
   RT @A_L: Want to learn how to work with satellite data? @esagara and I will 
   be sharing our secrets today at 11:30 #NICAR16

Getting data in CSV format

But you can get them in CSV format using the --csv flag:

$ t search all 'nicar16' --csv
ID Posted at Screen name Text
707951040855982080 2016-03-10 15:27:35 +0000 MaiAndy RT @nkhensley: Saturday. #NICAR16 https://t.co/IBbqmP8KIo
707950349508739072 2016-03-10 15:24:51 +0000 ashlynstill RT @Lindzcook: Join @ashlynstill and me in Denver 4 at 9am to learn programming concepts using fun games! Great place to start for newcomers #NICAR16
707950090355216384 2016-03-10 15:23:49 +0000 karanormal It's a beautiful day to live in Denver... Because #NICAR16.
707949741179428864 2016-03-10 15:22:26 +0000 HBCompass Starting off #NICAR16 by tilting off a bench just in case everyone didn't know I'm awkward as hell. https://t.co/9HJ1Z6lvFT
707949606831665153 2016-03-10 15:21:53 +0000 nkhensley Saturday. #NICAR16 https://t.co/IBbqmP8KIo
707949340040548352 2016-03-10 15:20:50 +0000 AlexSecanove RT @biologypartners: Investigative journalists & data miners: welcome to Colorado. There are some exciting data analytics startups here for you to meet. #NICAR16
707949060238344193 2016-03-10 15:19:43 +0000 natecarlisle And @TonySemerad and I just landed at DEN. Next stop: #NICAR16
707949028881731585 2016-03-10 15:19:36 +0000 michelleminkoff Let #nicar16 officially begin -- my uniform is on! It's go time! https://t.co/K2Z2DIfu04
707948651151122433 2016-03-10 15:18:06 +0000 ryanngro My sixth NICAR conf and the first where I fell asleep before midnight on the first night. Losing my touch. #NICAR16
707948445131268096 2016-03-10 15:17:17 +0000 1GKh RT @FerretScot: If you're interested in investigative journalism it's worth keeping an eye on #NICAR16 as it unfolds
707948358275444736 2016-03-10 15:16:56 +0000 cjsinner SUPER excited for my first #NICAR16 😁😁😁

Getting the max number of tweet results

By default, 20 of the most recent tweets are returned. You can change this by using the -n flag; I believe the max nunber of results is capped at 3200, or, however many tweets have been posted in the last 7 days with the queried term.

And of course, you most likely want to be piping this directly into a text file that you can open up in Excel or what have you:

$ t search all 'nicar16' --csv -n 3200 > nicar16tweets.csv

Searching more specific streams

The t search subcommand lets you narrow the query to just your own timeline (t search timeline 'nicar16') or even to a specific list. Run t search help to see the descriptions:

  t search all QUERY               # Returns the 20 most recent Tweets that match the specified query.
  t search favorites [USER] QUERY  # Returns Tweets you've favorited that match the specified query.
  t search help [COMMAND]          # Describe subcommands or one specific subcommand
  t search list [USER/]LIST QUERY  # Returns Tweets on a list that match the specified query.
  t search mentions QUERY          # Returns Tweets mentioning you that match the specified query.
  t search retweets [USER] QUERY   # Returns Tweets you've retweeted that match the specified query.
  t search timeline [USER] QUERY   # Returns Tweets in your timeline that match the specified query.
  t search users QUERY             # Returns users that match the specified query.

Try csvkit

This is also a good time to try out csvkit, rather than using a spreadsheet.

Use csvcut with the -n flag to see the headers:

$ csvcut -n nicar16tweets.csv
  1: ID
  2: Posted at
  3: Screen name
  4: Text

Here's how to get the most frequent users (by screen name) of the hashtag in the set of tweets you've downloaded:

$ csvcut -c 'Screen name' nicar16tweets.csv | sort | uniq -c | sort -rn
  82 BizJournalism
  20 MacDiva
  19 ultracasual
  18 Jeremy_CF_Lin
  17 IRE_NICAR
  15 tbtprojx
  15 RajneeshB
  14 palewire
  13 brentajones
  13 KateReports
  13 DanielleAlberti
  12 seecmb
  12 benlkeith
  12 KarrieKehoe
  12 HacksHackersCO
  11 livlab
  11 dougfisher
  10 wjchat
  10 harrisj
   9 onyxfish

A note about using Excel

If you need yet another example of why you should stay away from Excel (and any other spreadsheet, but mostly Excel on OS X) until you absolutely need a spreadsheet, you will get this inexplicable error when opening up the csv file provided by t if you're on OS X:

image

The reason? Because when the first letters in a file are ID, this causes Excel to shit itself. It's hard to imagine the logic that went into that decision to hardcode ID as a magic word: https://support.microsoft.com/en-us/kb/215591

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment