This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Scrape a specified subreddit for comments using the Pushshift API | |
(writes JSON objects to disk)""" | |
import json | |
import requests | |
from tqdm import tqdm | |
from nltk import defaultdict | |
from time import sleep | |
SUB = "AskReddit" # subreddit to scrape |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Downloads tweets containing a user-specified query. | |
### The data is converted to an indexable of (JSON) tweets and written to disk. | |
import TwitterAPI | |
import yaml | |
import json | |
# User parameters. | |
TOTAL_COUNT=15000 # Total number of tweets to scrape (generally works up to ~15000) | |
COUNT=200 # Twitter API allows 200 per scrape |