Skip to content

Instantly share code, notes, and snippets.

View m-sean's full-sized avatar

Sean Miller m-sean

  • Brooklyn, NY
View GitHub Profile
@m-sean
m-sean / subscraper.py
Last active February 1, 2022 07:15
Subreddit scraper using the Pushshift API.
"""Scrape a specified subreddit for comments using the Pushshift API
(writes JSON objects to disk)"""
import json
import requests
from tqdm import tqdm
from nltk import defaultdict
from time import sleep
SUB = "AskReddit" # subreddit to scrape
@m-sean
m-sean / getquery.py
Last active March 30, 2019 16:04
Downloads tweets containing a user-specified query.
### Downloads tweets containing a user-specified query.
### The data is converted to an indexable of (JSON) tweets and written to disk.
import TwitterAPI
import yaml
import json
# User parameters.
TOTAL_COUNT=15000 # Total number of tweets to scrape (generally works up to ~15000)
COUNT=200 # Twitter API allows 200 per scrape