Skip to content

Instantly share code, notes, and snippets.

View nkrabben's full-sized avatar

Nick Krabbenhoeft nkrabben

View GitHub Profile
@nkrabben
nkrabben / xfr_ia_pandas.py
Created December 17, 2019 23:30
Convert IA JSON to CSV using pandas
import json
import pandas as pd
with open('Downloads/xfrcollective.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data['collection_items'])
df.to_csv('Downloads/xfrcollective.csv', index=False)
@nkrabben
nkrabben / xfr_ia_standard.py
Created December 17, 2019 23:29
Convert IA JSON to a CSV using standard libraries
import json
import csv
with open('Downloads/xfrcollective.json', 'r') as f:
data = json.load(f)
with open('Downloads/xfrcollective.csv', 'w') as f:
header = ['addeddate', 'backup_location', 'collection', 'color', 'contact', 'contributor', 'coverage', 'creator', 'curation', 'date', 'description', 'designer', 'identifier', 'illustrator', 'language', 'licenseurl', 'mediatype', 'ocr', 'ppi', 'publicdate', 'publisher', 'rights', 'runtime', 'scanner', 'sound', 'sponsor', 'subject', 'title', 'updatedate', 'updater', 'uploader', 'year']
writer = csv.writer(f, header)
writer.writerow(header)
@nkrabben
nkrabben / bag_manifests.py
Last active August 16, 2019 18:49
Quick attempt at making a bagit manifest maker utility
import argparse
import bagit
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"directory",
nargs="+",
help=_(
"Directory which will be scanned for files and hashed."
@nkrabben
nkrabben / simultaneous.py
Last active May 16, 2019 17:36
Read from multiple hard drives simultaneously to fill up all 10G of bandwidth
import os
import subprocess
import multiprocessing
import timeit
def work(drive):
cmd = [
'snowball',
'cp',
@nkrabben
nkrabben / snowball.py
Created May 9, 2019 14:09
Multiprocess thinger that uses a find exec command to avoid too many argument errors
import os
import subprocess
import multiprocessing
import timeit
def work(drive):
cmd = [
'find',
drive,
@nkrabben
nkrabben / snowball.py
Created May 9, 2019 13:59
Multiprocess naive snowball cp
import os
import subprocess
import multiprocessing
import timeit
def work(drive):
cmd = [
'snowball',
'cp',
@nkrabben
nkrabben / transcode.py
Last active April 8, 2019 14:24
PAL/NTSC aware v210/MOV to FFV1/MKV recipe
#!/usr/bin/env/python
import os
import argparse
import subprocess
import multiprocessing
### For extra utitily when running this script remotely
# nohup transcode.py -d directory/of/mov -e mov -o directory/of/future/mkv &
@nkrabben
nkrabben / pull_mediainfo.py
Created April 1, 2019 13:49
Quick CLI'ed python script to extract technical metadata using mediainfo from files/directories
#!/usr/bin/env python3
from pymediainfo import MediaInfo
import argparse
import os
import glob
import logging
import csv
import re
@nkrabben
nkrabben / ndsr_skill_cluster.r
Last active May 1, 2017 21:30
Cluster analysis of the NDSR Competencies survey (https://osf.io/zndwq/)
library(readr)
library(dplyr)
library(tidyr)
library(factoextra)
# Read data, drop last 3 columns that don't ask about skills
osf_url = "https://files.osf.io/v1/resources/zndwq/providers/osfstorage/570be0eab83f6901d62b19d9"
responses <- read_csv(osf_url)[,1:30]
# Reshape raw data into number of responses per importance level