Skip to content

Instantly share code, notes, and snippets.

@atomotic
Last active September 9, 2022 09:39
Show Gist options
  • Save atomotic/721aefe8c72ac095cb6e to your computer and use it in GitHub Desktop.
Save atomotic/721aefe8c72ac095cb6e to your computer and use it in GitHub Desktop.
Internet Archive Save Page Now

save a page to internetarchive wayback from shell

put the function in your .zshrc or .bashrc and then

~  ia-save http://twitter.com/atomotic
https://web.archive.org/web/20140702123925/http://twitter.com/atomotic
function ia-save() { curl -s -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}' }
@hugovk
Copy link

hugovk commented Apr 20, 2015

I added the function:

function ia-save() { curl -s -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}' }

to the end of my OS X .bashrc and called source ~/.bashrc but got:

-bash: .bashrc: line 12: syntax error: unexpected end of file

It needs a semicolon:

function ia-save() { curl -s -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}'; }

But it doesn't work. Just the curl:

HTTP/1.1 403 Forbidden
Server: Tengine/2.0.3
Date: Mon, 20 Apr 2015 18:10:34 GMT
Content-Type: text/html;charset=utf-8
Connection: keep-alive
set-cookie: wayback_server=46; Domain=archive.org; Path=/; Expires=Wed, 20-May-15 18:10:33 GMT;
X-Archive-Wayback-Liveweb-Error: RobotAccessControlException: Blocked By Robots
X-Archive-Playback: 0

@edsu
Copy link

edsu commented Apr 20, 2015

What were you trying to curl?

@paulkaefer
Copy link

I'm trying to make this a bash alias. If I add it as-is, it says "unexpected end of file" when I reload my .bashrc.

If I add a semicolon, as hugovk suggests, the .bashrc file works, but when I go to archive a page, I get the following:

awk: cmd. line:1: {print
awk: cmd. line:1:       ^ unexpected newline or end of string

Any ideas? I've been playing around with the quotes (switching between " and '), but with no success.

@paulkaefer
Copy link

I asked on StackOverflow and the following works for me:

function ia-save() {
    curl -s -I "https://web.archive.org/save/$1" |
    grep Content-Location |
    awk '{printf( "https://web.archive.org/%s\n",$2)}';
}

@lyda
Copy link

lyda commented Jan 13, 2019

You don't need grep.

function ia-save() {
    curl -s -I "https://web.archive.org/save/$1" |
    awk '/^Content-Location/ {print "https://web.archive.org/" $2}';
}

@jerclarke
Copy link

Hey! You all are doing something that seems to have broken for me. The web.archive.org/save/ is no longer returning a Content-Location for me in an application where it used to work.

Anyone else having this issue since July 10?

Here's a link to a related ticket: berkmancenter/amber_wordpress#59

@atomotic
Copy link
Author

seems that GET https://web.archive.org/save/___ is not working anymore, there is a POST now. i will look later

@jerclarke
Copy link

Thanks!

I tried just looking at dev tools when using the website version of /save/ and it seems like the POST request is super simple, just url=$url.

When I run that request through PHP (WordPress HTTP API) it seems to work based on the content that's returned, but there's still no Content-Location header.

Let me know if you find something different 🙏🏻

@ellcs
Copy link

ellcs commented Mar 23, 2021

This one did it for me:

function ia-save() {     
  curl -s -I "https://web.archive.org/save/$1" | \
  egrep '^location:' | \
  awk '{ print $2 }'; 
}

@jxu
Copy link

jxu commented Jul 1, 2021

@jerclarke

Seems to me that GET request from curl still works in getting the site to perform an archive if there aren't any. I'm not sure if a new archive will be generated depending on how recent the last archive is, if it exists.

Also the header is renamed location instead of Content-Location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment