Skip to content

Instantly share code, notes, and snippets.

@mslepko
Created October 14, 2023 10:31
Show Gist options
  • Save mslepko/91431ad26b5ee9b6633fe7ce155b3216 to your computer and use it in GitHub Desktop.
Save mslepko/91431ad26b5ee9b6633fe7ce155b3216 to your computer and use it in GitHub Desktop.
PHP script to extract all urls from sitemap index file that contains other sitemaps
<?php
// extract sitemap urls from sitemap_index.xml
$urls = array();
$xml = simplexml_load_file('https://example.com/sitemap_index.xml');
foreach ($xml->sitemap as $sitemap) {
$sitemapUrl = (string) $sitemap->loc;
$sitemapXml = simplexml_load_file($sitemapUrl);
// Iterate through the URLs in the sitemap
foreach ($sitemapXml->url as $url) {
$urlString = (string) $url->loc;
$urls[] = $urlString;
}
}
print_r($urls);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment