Uses the Wayback CDX Server API.
Script expects a urls.json
file. It can be retrieved with the following curl
command
"https://web.archive.org/cdx/search/cdx?url=*.mit.edu&collapse=urlkey&output=json"
This will give you all unique URLs under the *.mit.edu
namespace archived by the Wayback Machine. It's a few hundred megabytes, so the download may take a while.