Skip to content

Instantly share code, notes, and snippets.

@vanvaridiksha
Last active October 21, 2016 05:06
Show Gist options
  • Save vanvaridiksha/cedc8cdfa4b6ddad70eed524a61308d2 to your computer and use it in GitHub Desktop.
Save vanvaridiksha/cedc8cdfa4b6ddad70eed524a61308d2 to your computer and use it in GitHub Desktop.
Description of a mini assignment for the students of COMS 6998 - Cloud Computing and Big Data at Columbia University

#Twitter Streaming Using Kafka

  • Last week, you read the Kafka paper and summarized it. This week, you will be using Kafka and Zookeeper to stream Twitter data.
  • You can reuse code from your first homework for reading tweets using a twitter API library of your choice. The focus of this assignment will be on familiarizing you with Kafka and Zookeeper.

##Installation and Setup

Download and install Zookeeper and Kafka on your machines. The steps required depend on the platform you are using. There a are a lot of tutorials readily available on this. Follow any tutorial, and if you get stuck, your TAs can help you with this.

##Problem Statement:

  1. Run Zookeeper and Kafka on your machines. Submit screenshots. (4 pts.)
  2. Write a producer module that streams tweets and posts them to a topic that you create. A topic is nothing but a channel with which your messages will be associated. The consumer will subscribe to this same channel to read tweets. Submit console screenshot and code. (8 pts.)
  3. Write a consumer module that reads from this topic and prints tweets on the console. Submit console screenshot and code. (8 pts.)

You have two weeks to complete this mini homework. Submit code and screenshots in a single folder on the submission link available on Courseworks. You have until 6th November(Sunday) 11:59 PM to complete this assignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment