Skip to content

Instantly share code, notes, and snippets.

@bwente
Created August 30, 2018 17:12
Show Gist options
  • Save bwente/76ef3738f4b425c3fbc8a02a7b489c2c to your computer and use it in GitHub Desktop.
Save bwente/76ef3738f4b425c3fbc8a02a7b489c2c to your computer and use it in GitHub Desktop.
Scrape a HTML element from a site and return it.
<?php
$url = $modx->getOption('url',$scriptProperties,'https://www.google.com');
$elementId = $modx->getOption('elementId',$scriptProperties,'hplogo');
$formatted = $modx->getOption('formatted',$scriptProperties,true);
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->validateOnParse = true;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$value = $dom->getElementById($elementId)->nodeValue;
$element = $dom->getElementById($elementId);
$html = $dom->saveHTML($element);
if ($formatted) {
$output = $html;
} else {
$output = $value;
}
return $output;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment