Skip to content

Instantly share code, notes, and snippets.

@chrisheseltine
Created August 31, 2018 09:24
Show Gist options
  • Save chrisheseltine/02dcaeda92b22d6f730f74e6fa8b9baa to your computer and use it in GitHub Desktop.
Save chrisheseltine/02dcaeda92b22d6f730f74e6fa8b9baa to your computer and use it in GitHub Desktop.
const Apify = require('apify');
const CATS_URL = 'https://petharbor.com/results.asp?searchtype=ADOPT&start=1&miles=20&shelterlist=%27HAMP%27&zip=&where=type_CAT&nosuccess=1&nomax=1&rows=25&nobreedreq=1&nopod=1&nocustom=1&samaritans=1&view=sysadm.v_hamp&imgres=detail&stylesheet=https://cbbb1e2ef05c549bf4c2-7b792f487d9839572907a6863bac8ad2.ssl.cf5.rackcdn.com/petharbor.css&grid=1&NewOrderBy=Name&text=000000&link=007c0f&col_bg=ffffff';
const DOGS_URL = 'https://petharbor.com/results.asp?searchtype=ADOPT&start=1&miles=20&shelterlist=%27HAMP%27&zip=&where=type_DOG&nosuccess=1&nomax=1&rows=25&nobreedreq=1&nopod=1&nocustom=1&samaritans=1&view=sysadm.v_hamp&imgres=detail&stylesheet=https://cbbb1e2ef05c549bf4c2-7b792f487d9839572907a6863bac8ad2.ssl.cf5.rackcdn.com/petharbor.css&grid=1&NewOrderBy=Name&text=000000&link=007c0f&col_bg=ffffff';
Apify.main(async () => {
// create request queue for all start URLs
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest(new Apify.Request({ url: CATS_URL }));
await requestQueue.addRequest(new Apify.Request({ url: DOGS_URL }));
// create the crawler
const crawler = new Apify.PuppeteerCrawler({
requestQueue,
handlePageFunction: async ({ page, request }) => {
// get the URLs we will be extracting the data from
let detailUrls = await page.$$eval('.GridResultsContainer a', elems => elems.map(elem => elem.href));
console.log('detailUrls: ' + detailUrls);
await Apify.pushData(detailUrls);
} // goto and onfailedrequest funcs?
});
// run the crawler
await crawler.run();
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment