Skip to content

Instantly share code, notes, and snippets.

@jrivero
Created September 24, 2012 11:12
Show Gist options
  • Save jrivero/3775495 to your computer and use it in GitHub Desktop.
Save jrivero/3775495 to your computer and use it in GitHub Desktop.
Retrieve Google Cached HTML
<?php
// http://hactheplanet.com/blog/11
function cachedHTMLForURL($url)
{
// Request the cache from Google.
$googleRequestURL = "http://webcache.googleusercontent.com/search?q=" . urlencode("cache:" . $url);
$googleResponse = file_get_contents($googleRequestURL);
// Return false if Google did not have it.
if (preg_match("/^.*<title>cache:/", $googleResponse))
return false;
// Remove the first 3 lines of the response, which is inserted by Google.
$importantHTML = preg_replace("/^(.*\n){3}/", "", $googleResponse);
// Allow one line to be inserted, which corrects the base path of the site.
preg_match_all("/<base href=\"[^\"]*\">/", $googleResponse, $matches);
$base = $matches[0][0] . "\n";
return $base . $importantHTML;
}
echo cachedHTMLForURL("http://news.google.com/");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment