Skip to content

Instantly share code, notes, and snippets.

@robballou
Created February 16, 2012 02:39
Show Gist options
  • Save robballou/1841155 to your computer and use it in GitHub Desktop.
Save robballou/1841155 to your computer and use it in GitHub Desktop.
<?php
/*
I felt it was best to treat this a bit more completely by pulling in an open source library.
So, cheating a bit and using simplehtmldom since it's best if we account for
both href & target attributes (if a link has a target, we don't want to repeat it)
http://simplehtmldom.sourceforge.net/
A solution without using this library could use preg_match_all to parse out the href's but
I've found that handling HTML is best done by libraries rather than just regular expressions.
*/
require_once '/Users/rballou/Sites/php/simple_html_dom.php';
$html = "<html><head>
<title>Example</title>
</head>
<body>
<ul>
<li><a href=\"/about/\">About</a></li>
<li><a href=\"anotherfile.php\">File</a></li>
<li><a href=\"anotherfile2.php\" target=\"_blank\">File 2</a></li>
<li><a href=\"http://cnn.com\">CNN</a></li>
</ul>
</body></html>";
$html = str_get_html($html);
$links = $html->find('a[href]');
foreach($links as $link){
// check that this is not an absolute URL already (detect
// if the link is a protocol link)
$href = $link->href;
$link->target = "_blank";
if(preg_match('/^(\w+):/', $href, $matches2)){
// protocol link, just return as is
$link->href = $href;
continue;
}
// non-protocol link, check that we have a leading slash
if(substr($href, 0, 1) != '/'){ $href = '/'. $href; }
$link->href = 'http://zingstudios.com'. $href;
}
echo $html->save();
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment