Skip to content

Instantly share code, notes, and snippets.

@beauvais
beauvais / extractor.py
Created December 11, 2012 13:23
extracting and soupifying
import requests
from sys import argv
from bs4 import BeautifulSoup
script, landing = argv # landing is the first URI for this script
def extractor(landing): # extractor uses requests to GET pages
r = requests.get(landing) # and assigns them variables for use
response = r.status_code
c = r.text
discussion = {}
discussion['John'] = 45
discussion['Graham'] = 30
discussion['Michael'] = 20
discussion['Terry'] = 30
def talk_time(argv):
count = 0
remaining = sum(argv)
for i in argv:
print("time elapsed: " + str(count))
count += i
remaining -= i
print("after " + str(i) + "time remaining: " + str(remaining) + " ")
@beauvais
beauvais / gist:4091857
Created November 16, 2012 23:25
out.txt
0{'attribution': [u'Mark Pilgrim'],
'body': u"You KNOW HOW OTHER books go on and on about programming fundamentals, and finally work up to building a complete, working program? Let's skip all that.\n",
'date': u'June 13, 2011',
'dow': u'Monday',
'location': u'218-19',
'time': u'09:54 PM',
'title': u'Dive Into Python',
'type': u'Highlight on Page 11 |'}
1{'attribution': [u'Mark Pilgrim'],
'body': u'Often love this approach to learning, but appreciate context too.',
@beauvais
beauvais / gist:4091833
Created November 16, 2012 23:20
parser.py
import re
import codecs
from pprint import pprint
EOR = u"=========="
def records(file_path):
clip_file = codecs.open(file_path)
clip_file.seek(3) # skip magic cookie
@beauvais
beauvais / gist:4091805
Created November 16, 2012 23:16
traceback
Traceback (most recent call last):
File "parser.py", line 42, in <module>
for n, r in enumerate(records(argv[1])): # 'My Clippings.txt') ):
File "parser.py", line 23, in records
clip['type'], clip['location'], clip['dow'], clip['date'], clip['time'] = match.groups()
AttributeError: 'NoneType' object has no attribute 'groups'
@beauvais
beauvais / gist:4089103
Created November 16, 2012 17:14
clippings_and_highlights
Dive Into Python (Mark Pilgrim)
- Highlight on Page 11 | Loc. 218-19 | Added on Monday, June 13, 2011, 09:54 PM
You KNOW HOW OTHER books go on and on about programming fundamentals, and finally work up to building a complete, working program? Let's skip all that.
==========
Dive Into Python (Mark Pilgrim)
- Note on Page 11 | Loc. 219 | Added on Monday, June 13, 2011, 09:54 PM
Often love this approach to learning, but appreciate context too.
@beauvais
beauvais / gist:1924661
Created February 27, 2012 15:28
Python CSV error
16. header = next(csvFile)
line 16, in <module>
header = next(csvFile)
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
@beauvais
beauvais / gist:1901080
Created February 24, 2012 13:55
partial email match
#! /usr/bin/python
import sys
import csv
import re
csv.writer(open('out.csv', 'wb'),).writerows(row for row in csv.reader(open('in.csv', 'rb')) if len(row) >= 6 and "gmail" in row[5])
@beauvais
beauvais / crab-recipes.txt
Created February 10, 2012 17:08
Crab Recipes in Kasabi's Food dataset
Sample SPARQL query to find Crab Recipes in Kasabi's [food](http://kasabi.com/dataset/food) dataset.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX recipe: <http://linkedrecipes.org/schema/>
SELECT ?label ?recipe WHERE {
<http://data.kasabi.com/dataset/food/foods/crab> recipe:ingredient_of ?recipe .
?recipe a recipe:Recipe .
?recipe rdfs:label ?label .