Skip to content

Instantly share code, notes, and snippets.

@kschlottmann
kschlottmann / ead_series_titledates.xq
Created May 4, 2020 21:55
xquery to retrieve series level titles with the string '19' (proxy for selecting finding aids with dates in unittitles)
<data>
{
for $Record in /ead
where $Record/archdesc/dsc//c[@level='series']/did/unittitle[contains(., '19')]
let $id := $Record/archdesc/did/unitid[1]/text()
let $title := $Record/archdesc/did/unittitle
let $repo := $Record/eadheader/eadid/@mainagencycode
@kschlottmann
kschlottmann / ead_date_restrict.xq
Last active January 9, 2021 20:54
get c-level restrictions from AS-derived EAD based on date
xquery version "3.0";
<results>
{
for $ead in /ead/archdesc/dsc//accessrestrict[contains(., '2021')]
let $restrict := $ead/p
let $restrictTitle := $ead/../did/unittitle
let $restrictDate := $ead/../did/unitdate[1]
@kschlottmann
kschlottmann / getBoxes.py
Last active December 27, 2019 16:05
this will take a series of strings, and prepend box numbers before each. use with docx2python for Word container lists with box headings
import re
#prompt for date
date = input("Date ")
#read file of strings into memory
f = open('input.txt', 'r')
x = f.readlines()
f.close()
@kschlottmann
kschlottmann / get_names_ids.xsl
Last active March 20, 2020 19:46
Get unittitles and aspace IDs from an EAD file for all c elements at the file level
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ead="urn:isbn:1-931666-22-9"
exclude-result-prefixes="xs"
version="2.0">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:for-each select="//ead:c[@level='file']">
<xsl:value-of select="@id"/>
<xsl:text>|</xsl:text>
@kschlottmann
kschlottmann / deleteLocationsTest.py
Created November 14, 2019 15:45
delete all locations from AS test server
import json
import requests
import secretsTest
import time
import os
startTime = time.time()
#call secrets for authentication
baseURL = secretsTest.baseURL
@kschlottmann
kschlottmann / getclio.py
Last active February 22, 2020 14:27
From a list of ids, pull down marcxml records from CLIO
import requests, csv, json, urllib, time
baseURL = 'https://clio.columbia.edu/catalog/'
endURL = '.marcxml'
with open('inputlist.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter='|', quotechar='"')
for row in reader:
bib = str(row[0])
url = baseURL+bib+endURL
response = requests.get(url)
@kschlottmann
kschlottmann / convert_wpd_txt.sh
Created October 24, 2019 14:55
use headless libreoffice to convert input files (here, mostly wpd) to txt files
#!/bin/bash
FILES=/opt/wpdConvert/data/*
for f in $FILES
do
echo "Processing $f file..."
# take action on each file. $f store current file name
./soffice.bin --headless --convert-to txt:Text *.txt $f
done
@kschlottmann
kschlottmann / ead_tbm_csv.xsl
Last active July 31, 2019 20:22
XSL to generate pipe-delimited file with TBM information
<?xml version="1.0" encoding="UTF-8"?>
<!-- IN PROGRESS, 2019-07-29 -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ead="urn:isbn:1-931666-22-9" exclude-result-prefixes="xs" version="2.0">
<xsl:output omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="ead:ead">
<xsl:apply-templates select="//ead:c[@level = 'file']"/>
</xsl:template>
@kschlottmann
kschlottmann / uniqueElements.xml
Created May 13, 2019 16:25
This XSLT will provide a list of all unique elements within a given XML documents
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:variable name="newline">
<xsl:text>
</xsl:text>
</xsl:variable>
<xsl:key name="elements" match="*" use="name()"/>
@kschlottmann
kschlottmann / Word-to-tsv_styles.xsl
Last active October 7, 2021 15:04
This XSLT will take the Columbia Word container list template, as saved in Word 2003 XMl format, and generate pipe-delimited data with levels for use in an Excel-based EAD generation. Updated to account for boxes without folder
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2" xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core" exclude-result-prefixes="#all" version="2.0">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="/">
<!--NB: