This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from bs4 import BeautifulSoup | |
def preproc(infile, outfile): | |
#open input file for reading | |
file = open(infile, 'r') | |
#create BeautifulSoup object with the file contents | |
soup = BeautifulSoup(file) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
#get a list of the files in the current directory | |
a = os.listdir(os.getcwd()) | |
def postproc(a): | |
#for every file in the directory | |
for i in a: | |
#call the preproc function on said file and generate the appropriate outfile | |
preproc(i, "out"+str(a.index(i))+".txt") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
from bs4 import BeautifulSoup | |
# get a list of the files in the current directory | |
here = os.listdir(os.getcwd()) | |
# define preprocessing method to extract email addresses from a given | |
# html file | |
def preproc(infile, outfile): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A simple program to extract the administrator name, and school name from | |
# the html files of an online directory then output a file each for | |
# the lists of names and schools using the json.dumps() approach to generate | |
# simple json output | |
def extractor(infile, outfile1, outfile2): | |
file = open(infile, 'r') | |
soup = BeautifulSoup(file) | |
commonsoup = soup('strong') | |
names = [] | |
schools = [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A simple python script to extract names, and emails from | |
# a certain online directory | |
import os, json | |
from bs4 import BeautifulSoup | |
#get a list of the files in the current directory | |
inputfiles = os.listdir(os.getcwd()) | |
def postproc(inputfiles): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# Note that this program does not have any error-handling code, so if it fails, | |
# you will just get some error messages at the command line. It won't skip the offending | |
# email and move on with the task; it will fail entirely. | |
# | |
# Keep an eye on the command prompt periodically, if you don't | |
# know how to write error-handling code. If it does fail, as it occassionally will | |
# due to a server rejecting the email, or something like that, then look in your | |
# Gmail 'sent' folder to identify the last email sent. Then, compare to the list | |
# of contacts. It will almost certainly have failed because of the next email in the list. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import glob | |
# glob does regular expression pattern matching, expansion etc, and, it returns | |
# a list of strings - nice and easy to work with | |
# List files starting with anything and ending with anything: | |
# glob.glob("*.*") | |
# | |
# to list only text files, for example, try: | |
# glob.glob("*.txt") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
# Twizzer-0-0-1.py: Simple script for pulling streaming data from Twitter using | |
# the credentials of a given user. You will need a developer account for this | |
# to work, because of the way Twitter API 1.1 handles authentication etc. This script is | |
# based on a very similar script, by YouTube user SentDex, but with some modifcations | |
# suggested by YouTube user Satish Chandra, and some of my own to resolve stdout encoding | |
# issues, and user interaction. | |
# | |
# ROADMAP |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function throttle( fn, time ) { | |
var t = 0; | |
return function() { | |
var args = arguments, ctx = this; | |
clearTimeout(t); | |
t = setTimeout( function() { | |
fn.apply( ctx, args ); | |
}, time ); | |
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from bs4 import BeautifulSoup | |
# open the infile for reading | |
file = open(infile, 'r') | |
# convert the contents of the infile to a Beautiful Soup object | |
soup = BeautifulSoup(file) | |
# create lists, a list containing bs4.element.Tag items generated by using | |
# the .select() syntax - the texts and their author names are contained in |
OlderNewer