Skip to content

Instantly share code, notes, and snippets.

@jhjguxin
Created April 4, 2019 03:03
Show Gist options
  • Save jhjguxin/b0f4767329792c7f61b372ad437ef564 to your computer and use it in GitHub Desktop.
Save jhjguxin/b0f4767329792c7f61b372ad437ef564 to your computer and use it in GitHub Desktop.
headless browser

headless browser

puppeteer vs jsdom vs phantomjs

  • jsdom

jsdom 是一个轻量级的 headless browser, 基于原生的 js 实现 不过不能渲染布局, 具体可以参考 https://github.com/jsdom/jsdom/wiki/jsdom-vs.-PhantomJS

  • PhantomJS

PhantomJS (phantomjs.org) is a headless WebKit scriptable with JavaScript, 目前暂停开发

  • puppeteer

puppeteer 是谷歌出品的 Headless Chrome Node API. 目前来说是最强大的 headless browser

install

npm i puppeteer

# sudo yum install -y pango.x86_64 libXcomposite.x86_64 libXcursor.x86_64 libXdamage.x86_64 libXext.x86_64 libXi.x86_64 libXtst.x86_64 cups-libs.x86_64 libXScrnSaver.x86_64 libXrandr.x86_64 GConf2.x86_64 alsa-lib.x86_64 atk.x86_64 gtk3.x86_64 ipa-gothic-fonts xorg-x11-fonts-100dpi xorg-x11-fonts-75dpi xorg-x11-utils xorg-x11-fonts-cyrillic xorg-x11-fonts-Type1 xorg-x11-fonts-misc

使用示例

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox', '--headless', '--disable-gpu', '--disable-setuid-sandbox'],
    });
    // const browser = await puppeteer.launch({args: ["--proxy-server =${argv.proxy}","--no-sandbox", "--disable-setuid-sandbox"]});

    const page = await browser.newPage();
    await page.goto('http://www.gsxt.gov.cn/index.html');
    await page.waitFor(3000);
    await page.screenshot({path: '/tmp/example.png'});

    await browser.close();
})();


(async () => {
    const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox', '--headless', '--disable-gpu', '--disable-setuid-sandbox'],
    });
    console.log(await browser.version());

    // const browser = await puppeteer.launch({args: ["--proxy-server =${argv.proxy}","--no-sandbox", "--disable-setuid-sandbox"]});
    let url = 'http://www.gsxt.gov.cn/SearchItemCaptcha?t=' + (new Date()).getTime();
    const page = await browser.newPage();
    await page.setJavaScriptEnabled(true);
    // enable request interception
    // await page.setRequestInterception(true);
    // // add header for the navigation requests
    // page.on('request', request => {
    // // Do nothing in case of non-navigation requests.
    // if (!request.isNavigationRequest()) {
    //     request.continue();
    //     return;
    // }
    // // Add a new header for navigation request.
    // const headers = request.headers();
    // headers['X-Just-Must-Be-Request-In-Main-Request'] = 1;
    // request.continue({ headers });
    // });
    // await page.setUserAgent(argv.agent);
    await page.goto(url);
    await page.screenshot({path: '/tmp/example.png'});

    await browser.close();
})();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment