In this article I wanna introduce Puppeteer
as a tools that help us to do something cool like Web Scraping
or Automation
some task. Puppeteer
helps developer up and run a google chromium browser throught command line tools this google chromium is headless browser that acting like real world browser. Puppeteer
Api helps developer to do anyting that a user could do with it's browser. for example :
- we could open and new page or new tab
- we could select any element from DOM with it's api
- we could typing and selection input element and manipulate them value
- we could select a button and click on it
- we could create a pdf from current page that
- we could create a screenshot from current page
- ...
There is a official explanation about Puppeteer
:
"Puppeteer" is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full (non-headless) Chrome/Chromium.
To install this tools we should follow below instructions : First we create a package.json
file through this command
npm init
Then use this below command to install Puppeteer
npm install puppeteer --save
when npm
did install all dependencies.
open the package.json
and add "type": "module"
inside it as a key/value.
{
"name": "project-name",
"version": "1.0.0",
"description": "",
"type": "module",
"main": "app.js",
"scripts": {
"run": "node app.js"
},
"author": "",
"license": "ISC",
"dependencies": {
"puppeteer": "^19.8.3"
}
}
ok, when we did all above tasks we are ready to implement our own first example.
In this example we will learn how we could take a screenshot from a web page or website then save it on the hard disk.
// import puppeteer package
import puppeteer from 'puppeteer';
( async() => {
// launch a browser
const browser = await puppeteer.launch();
// creat a new page
const page = await browser.newPage();
// go to this address https://developer.mozilla.org/en-US/
await page.goto('https://developer.mozilla.org/en-US/');
// set viewport size, width and height
await page.setViewport({width: 1980, height: 1080});
// take a screenshot
await page.screenshot({path: 'mozillla-dev-center.png', fullPage: true});
// close browser
await browser.close();
} )();
In this example we decide to read bitcoin price from CoinMarketCap
website. we will learn how to select a element and how extract data from it.
// import puppeteer package
import puppeteer from 'puppeteer';
( async () => {
// launch a browser
const browser = await puppeteer.launch();
// creat a new page
const page = await browser.newPage();
// go to this address https://coinmarketcap.com/currencies/bitcoin/
await page.goto('https://coinmarketcap.com/currencies/bitcoin/');
// set viewport size, width and height
await page.setViewport({width: 1980, height: 1080});
// select price element and store withing bitcoinElement
const bitcoinElement = await page.waitForSelector('.priceValue>span');
// extract price from bitcoinElement with evaluate method
const bitcoinPrice = await bitcoinElement.evaluate( el => el.textContent );
// print bitcoin price
console.log("bitcoin price on the cmc : " + bitcoinPrice);
// close browser
await browser.close();
} )();
Example 03 show us how we could interact with a html form and type anything inside it.
// import puppeteer package
import puppeteer from 'puppeteer';
( async () => {
// launch a browser
const browser = await puppeteer.launch();
// creat a new page
const page = await browser.newPage();
// go to this address https://github.com/
await page.goto('https://github.com/');
// set viewport size, width and height
await page.setViewport({width: 1980, height: 1080});
// select search input form with waitForSelector through input[name="q"]
const searchBox = await page.waitForSelector('input[name="q"]');
// typing puppeteer inside input element with type method
await searchBox.type('puppeteer');
// creating a screenshot from webpage that show us everything is ok
await page.screenshot({path: 'github-searchbox.png', fullPage: true});
await browser.close();
} )();
In this example we will attempt to extract first product image from the amazon.com website then save it on the hard disk.
// import puppeteer package
import puppeteer from 'puppeteer';
import * as fs from 'node:fs/promises';
( async () => {
// launch a browser
const browser = await puppeteer.launch();
// creat a new page
const page = await browser.newPage();
// go to this address
await page.goto('https://www.amazon.com/Desktop-Processor-12-Thread-Unlocked-Motherboard/dp/B0972FHS7J');
// set viewport size, width and height
await page.setViewport({width: 1980, height: 1080});
// timeout to page completly loaded
await page.waitForTimeout(10000);
await page.screenshot({path:'amazon.png', fullPage:true});
// select image throught its id
const getLandingImage = await page.waitForSelector('#landingImage');
// extract url inside browser through evaluate methond and pass it to landingImageUrl (nodejs enviourment)
const landingImageUrl = await getLandingImage.evaluate( x => x.src);
// go to image url
const imagePage = await page.goto(landingImageUrl);
// writing image on the hard disk through fs api, and puppeteer buffer method
await fs.writeFile(landingImageUrl.split("/").pop(), await imagePage.buffer());
// log image url to terminal
console.log(landingImageUrl);
// creating a screenshot from webpage that show us everything is ok
await page.screenshot({path: 'github-searchbox.png', fullPage: true});
await browser.close();
} )();
https://github.com/puppeteer/puppeteer/tree/main/examples