Skip to content

Instantly share code, notes, and snippets.

@tdpearson
tdpearson / extract_comments.sh
Last active April 7, 2021 17:04
Extract comments from a MS Word .docx files using Command Line Tools
# For background information, DOCX files are ZIP archives containing XML files.
# I had a recent project that needed comments extracted from several MS Word documents.
# This would have been painful to do manually - command line to the rescue!
find . -name "*.docx" -exec sh -c 'unzip -p $1 word/comments.xml | xmllint -xpath "//text()" -' sh {} \;
@tdpearson
tdpearson / undeleteS3.py
Last active February 2, 2018 16:24
S3 Undelete Example
#!/usr/bin/env python
import boto3
from sys import argv
"""
Example of undeleting in S3 with specified key prefix
"""
def restore_bag(bagname):
""" undelete derivative bag """
@tdpearson
tdpearson / jinjajunk.py
Last active November 30, 2017 19:54
Jinja Int Padding Example
from jinja2 import Environment
mytemp = """
my box number is {{ "%(box)04d"|format(box=mybox) }}
"""
print Environment().from_string(mytemp).render(mybox=456)
# my box number is 0456
@tdpearson
tdpearson / example.py
Created November 17, 2017 19:53
Example of yielding in iterator for Boris
purchases = [12, 43, 13, 465, 1, 13]
n = 2
d = 3
class Prizes(object):
def __init__(self, purchases1, n1, d1):
self.pu = purchases1
self.n = n1
self.d = d1
@tdpearson
tdpearson / sraping_example.py
Created October 19, 2017 19:58
Example of scraping a site that requires form submission
import requests
from bs4 import BeautifulSoup
# Iterate over author pages
for page in range(1, 2): #TODO: update to number of pages + 1 (i.e. 15 pages would be 16)
data = {"gruppo": "autori",
"iniziale": "all",
"pag": page}
response = requests.post("http://digiliblt.lett.unipmn.it/testi.php", data=data)
soup = BeautifulSoup(response.text, "lxml")
import pandas as pd
df = pd.DataFrame.from_csv(r"append_test.txt", sep='\t')
frames = []
for meeting in range(len(df.T)): # iterate over recorded attendances
attendees = df[df.T.iloc[meeting] == 1].index # get attendees for specific meeting
frames.append(pd.DataFrame(index=attendees, columns=attendees).fillna(1)) # create frame from attendees
# sum up jointly attended meetings