Created
October 4, 2020 13:35
-
-
Save LeeMeng2020/0bf5500b768cc96d996850daf075e93a to your computer and use it in GitHub Desktop.
This one was interesting; I wanted to figure out a way to limit the Load More. The sitemap below will stop at 200 results. More details in the attached text file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"_id": "cbc-load-more", | |
"startUrl": ["https://www.cbc.ca/search?q=quebec%20tourism§ion=all&sortOrder=relevance&media=all"], | |
"selectors": [{ | |
"id": "Separate Load More", | |
"type": "SelectorElementClick", | |
"parentSelectors": ["_root"], | |
"selector": " div.contentListCards", | |
"multiple": false, | |
"delay": "3700", | |
"clickElementSelector": "div > button[class^='sclt-loadmore']:not([class*='loadmore20'])", | |
"clickType": "clickMore", | |
"discardInitialElements": "do-not-discard", | |
"clickElementUniquenessType": "uniqueHTML" | |
}, { | |
"id": "Row wrappers", | |
"type": "SelectorElement", | |
"parentSelectors": ["_root"], | |
"selector": "div.contentListCards a.card", | |
"multiple": true, | |
"delay": 0 | |
}, { | |
"id": "Title", | |
"type": "SelectorText", | |
"parentSelectors": ["Row wrappers"], | |
"selector": "h3", | |
"multiple": false, | |
"regex": "", | |
"delay": 0 | |
}, { | |
"id": "Time", | |
"type": "SelectorText", | |
"parentSelectors": ["Row wrappers"], | |
"selector": "time", | |
"multiple": false, | |
"regex": "", | |
"delay": 0 | |
}, { | |
"id": "Link", | |
"type": "SelectorLink", | |
"parentSelectors": ["Row wrappers"], | |
"selector": "_parent_", | |
"multiple": false, | |
"delay": 0 | |
}] | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Answer for forum question at: | |
https://forum.webscraper.io/t/page-with-load-more-pagination-abruptly-closes-during-scraping/6252 | |
This one was interesting; I wanted to figure out a way to limit the Load More. Try the sitemap below which will stop at 200 results. I recommend Page load delay of at least 5000. | |
This sitemap will click on all the Load More first so it might look like nothing much is happening for a while. The results are actually loading below the screen, and will be indicated by "Showing results 1 – XXX of" which will change every few seconds. | |
If you want more/fewer pages, you'll need to do some math to figure out which Load More to stop at, and then change the Load More selector, | |
div > button[class^='sclt-loadmore']:not([class*='loadmore20']) | |
Each Load More loads an additional 10 results. In this example, it will stop at loadmore20, so 20 x 10 = 200 results. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment