Skip to content

Instantly share code, notes, and snippets.

View natematias's full-sized avatar

J. Nathan Matias natematias

View GitHub Profile
@natematias
natematias / 140sentences.py
Created September 7, 2016 04:14
Find sentences with 140 characters or less
import nltk
import sys
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
fulltext = open(sys.argv[1], "r").read()
for sentence in sent_detector.tokenize(fulltext.strip()):
if(len(sentence) <= 140):
print(sentence)
@natematias
natematias / model_results.txt
Last active October 26, 2016 02:46
Statistical modeling of news related comments on a subreddit
MSM: ("bbc.com", "reuters.com", "nytimes.com", "washingtonpost.com", "cnn.com",
"telegraph.co.uk", "latimes.com", "huffingtonpost.com", "theguardian.com", "forbes.com",
"examiner.com", "usatoday.com", "wsj.com", "cbsnews.com", "cbc.ca", "time.com",
"sfgate.com", "newsweek.com", "bostonglobe.com", "nydailynews.com", "msnbc.com",
"foxnews.com", "aljazeera.com", "nbcnews.com", "npr.org", "bloomberg.com", "abcnews.com",
"aljazeera.com", "bigstory.ap.com", "cbc.ca", "time.com")
TABLOIDS: ['dailymail.co.uk', 'express.co.uk','mirror.co.uk',
'news.com.au', 'nypost.com', 'thesun.co.uk','dailystar.co.uk','metro.co.uk']
@natematias
natematias / write_python_twitter_to_unicode
Created February 17, 2018 20:50
Writing python-twitter data to unicode file
all_users_info = [x._json for x in users_info]
f = codecs.open(FILENAME, mode="w", encoding="utf-8")
f.write(json.dumps(all_users_info, ensure_ascii=False))
f.close()
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from PIL import Image
import sys
## Example via: https://stackoverflow.com/questions/28057722/algorithm-to-turn-a-large-blob-of-text-into-an-image-as-defined-by-the-image-e
def to_ascii(img,maxLen=250.0):
#resize to maximum line length
width, height = img.size
rate = maxLen / max(width, height)
@natematias
natematias / all-subreddit-comments-from-pushshift-zst-file.py
Created July 18, 2022 20:52
Python code for processing pushshift data to output all comments associated with a specific subreddit
import sys,os,io
import simplejson as json
import zstandard as zstd
subreddit = "futurology"
infile = sys.argv[1]
outfile = sys.argv[2]
print("infile: {0}".format(infile))
@natematias
natematias / tracery-configuration.json
Last active February 2, 2023 05:51
Twitter Farewell Announcement (using Cheap Bots Done Quick)
{
"//ABOUT THIS FILE": "This is a configuration file for a repeating Twitter farewell message that uses cheapbotsdonequick.com. To set it up, I logged into CheapBotsDoneQuick with my Twitter account, pasted this configuration file into the box, and set the announcement for twice daily. The site will now auto-post in perpituity until one of the systems goes down or my account is banned from Twitter.",
"origin": [
" I have left Twitter, due to the the dismantling of the platform's safety & security capacity.\n\n Find me on Mastodon, LinkedIn, or sign up for email updates: https://natematias.com/updates/ \n\n Thanks for #noun#, #people#. \n\nThis message repeats."
],
"noun": [
"all the support and love",
"great conversations",
"all the inspiration",
"so many great discussions",