Last active
July 28, 2024 21:23
-
-
Save chris-castillo-dev/6892f1a5aedcb3df3154b84396e86ce3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Exclude Element with ID (replace div with any tag) | |
and not(ancestor::div[@id='MyID']) | |
## Exclude Element with Class | |
and not(ancestor::body[contains(@class, 'My-Class')]) | |
## Exclude Element with Attribute (replace div or attribute/value as needed) | |
and not(ancestor::div[@data-elementor-type='header']) | |
and not(ancestor::div[@data-id='981cb']) | |
## Chat GPT Prompt | |
Please act like an expert with Screaming Frog web crawler and custom extractions using XPath values. Below is an extraction I configured that looks for links not found in the <header> or <footer> tag: | |
//a[not(ancestor::header) and not(ancestor::footer) and not(ancestor::div[@data-elementor-type='header']) and not(contains(@href, '/wp-content/'))][starts-with(@href, '/') or starts-with(@href, './') or starts-with(@href, '../') or starts-with(@href, '#') or contains(@href, 'mydomain.com')]/@href | |
Please update this XPath value to also exclude links found .... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@githubtelle support is only provided via the private FB group: https://www.facebook.com/groups/agencytrainingcommunity