Tyler Pearson tdpearson

tdpearson / extract_comments.sh

Last active April 7, 2021 17:04

Extract comments from a MS Word .docx files using Command Line Tools

	# For background information, DOCX files are ZIP archives containing XML files.
	# I had a recent project that needed comments extracted from several MS Word documents.
	# This would have been painful to do manually - command line to the rescue!

	find . -name "*.docx" -exec sh -c 'unzip -p $1 word/comments.xml \| xmllint -xpath "//text()" -' sh {} \;

tdpearson / undeleteS3.py

Last active February 2, 2018 16:24

S3 Undelete Example

	#!/usr/bin/env python
	import boto3
	from sys import argv

	"""
	Example of undeleting in S3 with specified key prefix
	"""

	def restore_bag(bagname):
	""" undelete derivative bag """

tdpearson / jinjajunk.py

Last active November 30, 2017 19:54

Jinja Int Padding Example

	from jinja2 import Environment

	mytemp = """
	my box number is {{ "%(box)04d"\|format(box=mybox) }}
	"""

	print Environment().from_string(mytemp).render(mybox=456)

	# my box number is 0456

tdpearson / example.py

Created November 17, 2017 19:53

Example of yielding in iterator for Boris

tdpearson / sraping_example.py

Created October 19, 2017 19:58

Example of scraping a site that requires form submission

	import requests
	from bs4 import BeautifulSoup

	# Iterate over author pages
	for page in range(1, 2): #TODO: update to number of pages + 1 (i.e. 15 pages would be 16)
	data = {"gruppo": "autori",
	"iniziale": "all",
	"pag": page}
	response = requests.post("http://digiliblt.lett.unipmn.it/testi.php", data=data)
	soup = BeautifulSoup(response.text, "lxml")

tdpearson / tallyattendance.py

Created December 4, 2016 06:09

	import pandas as pd

	df = pd.DataFrame.from_csv(r"append_test.txt", sep='\t')

	frames = []
	for meeting in range(len(df.T)): # iterate over recorded attendances
	attendees = df[df.T.iloc[meeting] == 1].index # get attendees for specific meeting
	frames.append(pd.DataFrame(index=attendees, columns=attendees).fillna(1)) # create frame from attendees

	# sum up jointly attended meetings