Skip to content

Instantly share code, notes, and snippets.

@BretCameron
Last active June 20, 2019 09:33
Show Gist options
  • Save BretCameron/8a2509b5faf4f7aa78cfbb15d49f2778 to your computer and use it in GitHub Desktop.
Save BretCameron/8a2509b5faf4f7aa78cfbb15d49f2778 to your computer and use it in GitHub Desktop.
A scraper that handles infinite scrolling
const scrapeNumber = 20;
const scrapeQuery = '.userContentWrapper';
const startTime = Date.now();
const lapseTime = 3600000; // that's 1 hour in milliseconds
let arrayOfItems = [];
let heightBefore = 0;
let heightAfter = 0;
while (arrayOfItems.length < scrapeNumber) {
await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
await page.waitFor(1000);
arrayOfPostItems = await page.evaluate(() => {
return [...document.querySelectorAll(scrapeQuery)];
});
// break loop if scrollHeight is unchanged, signalling the bottom has been reached
heightAfter = await page.evaluate('document.body.scrollHeight');
if (heightBefore === heightAfter) {
break;
} else {
heightBefore = heightAfter;
}
// break loop if one hour passes
if (Date.now() - startTime >= lapseTime) {
break;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment