Ed Summers edsu

sustainable hypermedia

edsu / noaa-host-providers.csv

Created April 3, 2025 21:52

edsu / noaa-hostnames.txt

Created April 3, 2025 21:45

edsu / host_provider

Created April 3, 2025 21:44

edsu / err.log

Created March 27, 2025 13:58

Error output

	Traceback (most recent call last):
	File "/Users/edsu/.pyenv/versions/3.13.0/bin/sciop", line 8, in <module>
	sys.exit(_main())
	~~~~~^^
	File "/Users/edsu/Projects/sciop/src/sciop/cli/main.py", line 16, in _main
	main(max_content_width=100)
	~~~~^^^^^^^^^^^^^^^^^^^^^^^
	File "/Users/edsu/.pyenv/versions/3.13.0/lib/python3.13/site-packages/click/core.py", line 1161, in __call__
	return self.main(args, *kwargs)
	~~~~~~~~~^^^^^^^^^^^^^^^^^

edsu / warc_text.py

Last active March 19, 2025 19:58

	#!/usr/bin/env python3

	# The program will read WARC or WACZ data looking for Browsertrix text records
	# and print them out as files using the archived URL as the path.
	#
	# You can run it right here from Gist using pipx:
	#
	# pipx run https://gist.githubusercontent.com/edsu/89bd2844b9d3d4536e68956b3a16eaef/raw/warc_text.py file1.warc.gz file2.warc.gz
	#
	# If you give it a WACZ file it will read any WARC files contained in the WACZ:

edsu / diagram.md

Last active March 5, 2025 13:41

Diagram:

flowchart TB
  
  subgraph Harvest-by-ORCID
    direction RL
    Dimensions-by-ORCID
    OpenAlex-by-ORCID
 PubMed-by-ORCID

edsu / subjects.py

Last active February 18, 2025 16:55

	#!/usr/bin/env python3

	# This program will fetch the first page of recently updated Library of Congress
	# Subject Headings from id.loc.gov and print out the MARC records for them.
	#
	# /// script
	# dependencies = ["requests", "pymarc"]
	# ///
	#
	# see PEP 723

edsu / hello.py

Created February 18, 2025 16:07

	#!/usr/bin/env python3

	import getpass

	print(f"Hello {getpass.getuser()}!")

edsu / data-usaid-gov-check.py

Last active February 15, 2025 11:20

	#!/usr/bin/env -S pipx run

	# This program walks through the URLs in the sitemap and checks to see if they
	# are in the Internet Archive Wayback Machine.
	#
	# You can run it like:
	#
	# pipx run data-usaid-gov-check.py > results.csv
	#
	#

edsu / swap

Last active July 30, 2024 18:08

See what SDR collections and crawls objects have a snapshot of a given URL.

	#!/usr/bin/env python3

	"""
	Look up a URL in swap.stanford.edu and print out the collections and crawl
	SDR object identifiers that contain a snapshot of the URL.
	"""

	import sys
	import json
	import collections