Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save chrisheseltine/c47ffb4bb03fe9402838fe273dea4452 to your computer and use it in GitHub Desktop.
Save chrisheseltine/c47ffb4bb03fe9402838fe273dea4452 to your computer and use it in GitHub Desktop.
const Apify = require('apify');
const { log } = Apify.utils;
Apify.main(async () => {
const input = await Apify.getInput();
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({
url: `https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=${input.keyword}`
});
log.debug(`input: ${JSON.stringify(input.keyword)}`);
const crawler = new Apify.CheerioCrawler({
requestQueue,
minConcurrency: 10,
maxConcurrency: 50,
maxRequestRetries: 1,
handlePageTimeoutSecs: 30,
maxRequestsPerCrawl: 20,
handlePageFunction: async ({ request, $ }) => {
log.debug(`Processing ${request.url}...`);
const title = $('title').text();
await Apify.pushData({
url: request.url
});
},
handleFailedRequestFunction: async ({ request }) => {
log.debug(`Request ${request.url} failed twice.`);
}
});
await crawler.run();
log.debug('Crawler finished.');
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment