Skip to content

Instantly share code, notes, and snippets.

View edsu's full-sized avatar

Ed Summers edsu

View GitHub Profile
hostname ip provider
research.noaa.gov 3.171.38.80 Amazon Technologies Inc.
research.noaa.gov 3.171.38.79 Amazon Technologies Inc.
research.noaa.gov 3.171.38.59 Amazon Technologies Inc.
research.noaa.gov 3.171.38.3 Amazon Technologies Inc.
epic.noaa.gov 108.138.64.54 Amazon.com, Inc.
epic.noaa.gov 108.138.64.32 Amazon.com, Inc.
epic.noaa.gov 108.138.64.6 Amazon.com, Inc.
epic.noaa.gov 108.138.64.49 Amazon.com, Inc.
adp.noaa.gov 18.165.98.15 Amazon Technologies Inc.
research.noaa.gov
epic.noaa.gov
adp.noaa.gov
ci.noaa.gov
oeab.noaa.gov
orta.research.noaa.gov
testbeds.noaa.gov
qosap.research.noaa.gov
oss.research.noaa.gov
eeo.oar.noaa.gov
#!/usr/bin/env python3
import fileinput
import re
import subprocess
from functools import cache
@cache
def ips(hostname):
@edsu
edsu / err.log
Created March 27, 2025 13:58
Error output
Traceback (most recent call last):
File "/Users/edsu/.pyenv/versions/3.13.0/bin/sciop", line 8, in <module>
sys.exit(_main())
~~~~~^^
File "/Users/edsu/Projects/sciop/src/sciop/cli/main.py", line 16, in _main
main(max_content_width=100)
~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/edsu/.pyenv/versions/3.13.0/lib/python3.13/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
~~~~~~~~~^^^^^^^^^^^^^^^^^
#!/usr/bin/env python3
# The program will read WARC or WACZ data looking for Browsertrix text records
# and print them out as files using the archived URL as the path.
#
# You can run it right here from Gist using pipx:
#
# pipx run https://gist.githubusercontent.com/edsu/89bd2844b9d3d4536e68956b3a16eaef/raw/warc_text.py file1.warc.gz file2.warc.gz
#
# If you give it a WACZ file it will read any WARC files contained in the WACZ:

Diagram:

flowchart TB
  
  subgraph Harvest-by-ORCID
    direction RL
    Dimensions-by-ORCID
    OpenAlex-by-ORCID
 PubMed-by-ORCID
@edsu
edsu / subjects.py
Last active February 18, 2025 16:55
#!/usr/bin/env python3
# This program will fetch the first page of recently updated Library of Congress
# Subject Headings from id.loc.gov and print out the MARC records for them.
#
# /// script
# dependencies = ["requests", "pymarc"]
# ///
#
# see PEP 723
#!/usr/bin/env python3
import getpass
print(f"Hello {getpass.getuser()}!")
#!/usr/bin/env -S pipx run
# This program walks through the URLs in the sitemap and checks to see if they
# are in the Internet Archive Wayback Machine.
#
# You can run it like:
#
# pipx run data-usaid-gov-check.py > results.csv
#
#
@edsu
edsu / swap
Last active July 30, 2024 18:08
See what SDR collections and crawls objects have a snapshot of a given URL.
#!/usr/bin/env python3
"""
Look up a URL in swap.stanford.edu and print out the collections and crawl
SDR object identifiers that contain a snapshot of the URL.
"""
import sys
import json
import collections