Skip to content

Instantly share code, notes, and snippets.

View tdhopper's full-sized avatar
©️
𝔀𝓸𝓻𝓴𝓲𝓷𝓰 𝓱𝓪𝓻𝓭

Tim Hopper tdhopper

©️
𝔀𝓸𝓻𝓴𝓲𝓷𝓰 𝓱𝓪𝓻𝓭
View GitHub Profile
@tdhopper
tdhopper / gist:5461553
Created April 25, 2013 17:34
Smallest Missing Number in a Python Iterable
def smallest_missing_number(stream, min_val, max_val, bin_exp):
###
# Returns the smallest counting number _not_ contained in a bounded
# sequence of counting numbers.
#
# Inputs:
# stream: iterator of integer values
# min_val: lower bound on values in stream
# max_val: upper bound on values in stream
# bin_exp: size of bin given as exponent of 2 (bin_size = 2^bin_exp)

So I started developing something remarkably similar about 4 years ago (in Django, too!). Wrote up a business plan with monetization, p&l, etc. I brought the plan and prototype to a few seed folks for early funding before I admittedly lost interest and moved onto something else.

The hardest technical challenge I had was, knowing there would need to be a ton of recipes in the system to make it truly effective, I had to do an automated intake of recipes from many places. But ingredient normalization got in the way, even with really good regex/etl practices. For instance, a recipe says, "boneless skinless chicken breast".. another says, "skinless boneless chicken breast". Some list the # of breasts. Some list pounds. Some mean the breast is split, some don't. But in order for the nutrition info to be accurate, the normalization process had to be near perfect. I ended up "buying" the source code to "recipefox" a recipe parsing plugin for firefox (for $100 or something like that) which helped tremendously, but st

@tdhopper
tdhopper / URL to Day One
Last active December 20, 2015 13:39
Pythonista script for taking a URL on the clipboard, running it through Readability, then through Heck Yes Markdown, and finally dropping in Day One. Borrows heavily from http://www.macdrifter.com/2012/09/pythonista-trick-url-to-markdown.html
​import clipboard
import urllib2
import webbrowser
clipString = clipboard.get()
marky = 'http://heckyesmarkdown.com/go/?read=1&u='
queryString = marky + clipString
@tdhopper
tdhopper / README.md
Last active December 20, 2015 14:39
Details on how I build my Pelican website

My scripts for building stiglerdiet.com with Pelican and uploading to Amazon S3 bucket.

Requires:

json files should go into templates folder of ofexport. Modify the ofexport commands to fit your needs. In particular, _Nonurgent is the folder my projects are in, and "Posts to Write" and "To Read" are the projects I'm uploading.

{
"metadata": {
"name": "OTW hollow slopes"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@tdhopper
tdhopper / gist:6356555
Last active December 21, 2015 19:49
Code for unshortening a list of shortened URLs using unshort.me and unshorten.it's APIs.
# Used for unshortening list of urls as wel as checking the HTTP status code of a webpage.
# When called from the command line, requires either a filename or STDIN as input
# Filename or standard input should contain one URL per line
# Output is printed to the standard out and saved to a JSON file using PickleDB.
__author__ = "Tim Hopper"
__email__ = "[email protected]"
import fileinput
import urllib2
@tdhopper
tdhopper / The error I get from the code below
Last active December 23, 2015 22:39
Causes error. Seems to be related to type.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-01896d0e98b9> in <module>()
5 df = pd.DataFrame({"content":["tim hopper", "this data tim"]})
6 mapper = DataFrameMapper([("content", CountVectorizer())])
----> 7 mapper.fit_transform(df)
C:\Anaconda\lib\site-packages\sklearn\base.pyc in fit_transform(self, X, y, **fit_params)
406 if y is None:
407 # fit method of arity 1 (unsupervised transformation)
import re, string, sys, pandas
stops = set(open("../stop_words.txt").read().split(",") + list(string.ascii_lowercase))
words = [x.lower() for x in re.split("[^a-zA-Z]+", open("../pride-and-prejudice.txt").read()) if len(x) > 0 and x.lower() not in stops]
print pandas.Series(words).value_counts().head(25)
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@tdhopper
tdhopper / 0_reuse_code.js
Created April 2, 2014 22:08
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console