Org A split off of Org B, Org B split into Org C & Org D, Org A and Org D merged into Org E?
can be turned into Mermaid notation
graph TD;
B --> A;
B --> C;
B --> D;
A --> E;
[13] Sufjan Stevens - Javelin [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Mojo, Uncut, Piccadilly Records, Rough Trade] | |
[12] Kelela - Raven [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, Crack, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, The Quietus] | |
[12] Wednesday - Rat Saw God [Clash, The Fader, The Forty-Five, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Paste, Uncut, Rough Trade] | |
[11] Noname - Sundial [Clash, The Fader, The Forty-Five, The Wire, PopMatters, Pitchfork, Crack, The Line of Best Fit, Rolling Stone, Paste, The Quietus] | |
[9] Mitski - The Land Is Inhospitable and So Are We [Clash, The Fader, PopMatters, Pitchfork, The Line of Best Fit, Consequence, Rolling Stone, Exclaim, Mojo] | |
[9] Lankum - False Lankum [Clash, Concrete Islands, Crack, The Line of Best Fit, Fast 'n' Bulbous, Louder Than War, Mojo, Uncut, The Quietus] | |
[8] Amaarae - Fountain Baby [Clash, The Fader, T |
#!/usr/bin/env python3 | |
import csv | |
import json | |
from collections import OrderedDict | |
from collections import Counter | |
def trace(data, shape=None): | |
if isinstance(data, dict): | |
new_dict = OrderedDict() |
#!/bin/bash | |
# | |
# Use the Internet Archive Wayback Machine to demonstrate roughly when the | |
# NYTimes started blocking GPTBot. | |
# | |
# See: https://www.theverge.com/2023/8/21/23840705/new-york-times-openai-web-crawler-ai-gpt | |
# | |
wget -q -O robots-20230817.txt https://web.archive.org/web/20230817012138id_/https://www.nytimes.com/robots.txt |
Org A split off of Org B, Org B split into Org C & Org D, Org A and Org D merged into Org E?
can be turned into Mermaid notation
graph TD;
B --> A;
B --> C;
B --> D;
A --> E;
collection: fatal-encounters | |
generateWACZ: true | |
workers: 4 | |
screencastPort: 9037 | |
seeds: | |
- url: https://fatalencounters.org/ | |
scopeType: prefix | |
- url: https://www.wsoctv.com/news/1-person-dead-after-attempting-escape-police-troopers-say/QXA244QPUZGJ5GAGRADGDWBAEU/ | |
scopeType: page | |
- url: https://www.wtok.com/2022/01/01/officer-involved-shooting/ |
#!/usr/bin/env python | |
from warcio.archiveiterator import ArchiveIterator | |
with open('archive/rec-20230722210008512613-81a34b41ee13.warc.gz', 'rb') as stream: | |
for i, record in enumerate(ArchiveIterator(stream)): | |
print(i, record.rec_headers.get_header('WARC-Target-URI')) | |
if record.rec_type == 'response': | |
content = record.content_stream().read() |
from warcio.warcwriter import WARCWriter | |
with open('test.warc.gz', 'wb') as output: | |
writer = WARCWriter(output, gzip=True) | |
# write some metadata for the warc as a info record | |
rec = writer.create_warcinfo_record('test.warc.gz', { | |
'software': 'warcio', | |
'description': 'An example of packaging up two images in a WARC' | |
}) |
#!/usr/bin/env python3 | |
# run like this: | |
# | |
# $ python3 warc2mbox.py yahoo-groups-2016-03-20T12:45:19Z-nyzp9w.warc.gz | |
# | |
# and it will generate an mbox file for each Yahoo Group: | |
# | |
# $ ls -l mboxes | |
# -rw-r--r-- 1 edsummers staff 12522488 Jul 15 14:14 amicigranata.mbox |
#!/usr/bin/env python3 | |
import csv | |
import sys | |
import json | |
import time | |
import requests | |
def get_snapshots(url): | |
url = f"https://swap.stanford.edu/was/cdx?url={url}&output=json" |
#!/usr/bin/env python3 | |
import csv | |
import sys | |
import json | |
import time | |
import requests | |
def get_snapshots(url): | |
url = f"https://swap.stanford.edu/was/cdx?url={url}&output=json" |