Skip to content

Instantly share code, notes, and snippets.

@sturmenta
Last active September 30, 2022 12:18
Show Gist options
  • Save sturmenta/00f580f7cb668cf18b946c140a30809b to your computer and use it in GitHub Desktop.
Save sturmenta/00f580f7cb668cf18b946c140a30809b to your computer and use it in GitHub Desktop.
basic scraper
const axios = require('axios');
const cheerio = require('cheerio');
const baseUrl = 'https://www.infomerlo.com';
const urlToScrap = baseUrl + '/noticias';
const numberOfFirstElementsToScrap = 10;
axios(urlToScrap)
.then(response => {
const html = response.data;
const $ = cheerio.load(html);
const articles = [];
// cannot be a function expression ↓
$(`.media:nth-of-type(-n+${numberOfFirstElementsToScrap})`, html).each(
function () {
// console.log(`$(this).html()`, $(this).html());
const jsonWithImages = JSON.parse(
$(this).find('.img-container').attr('data-media-desktop'),
);
const imageUrl = jsonWithImages.imagen.jpg.xs;
const title = $(this).find('h2').text().trim();
const url = $(this).find('a').attr('href');
articles.push({title, url: baseUrl + url, imageUrl});
},
);
console.log(`articles`, JSON.stringify(articles, null, 2));
})
.catch(err => console.log(err));
// resources:
// https://github.com/cheeriojs/cheerio#selectors
// https://css-tricks.com/useful-nth-child-recipies/#aa-select-only-the-first-five
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment