Skip to content

Instantly share code, notes, and snippets.

View gahabeen's full-sized avatar

Gabin Desserprit gahabeen

View GitHub Profile
@gahabeen
gahabeen / jsonframedataextraction.js
Last active January 28, 2017 10:41
jsonframe data extraction
let cheerio = require('cheerio');
let jsonframe = require('jsonframe-cheerio');
let $ = cheerio.load('our html page url here');
jsonframe($); // initializes the plugin
var frame = {
"companies": { // setting the parent item as "companies"
"selector": ".item", // defines the elements to search for
"data": [{ // "data": [{}] defines a list of items
@gahabeen
gahabeen / barecheerioextractdata.js
Last active February 2, 2018 20:06
Bare Cheerio Code to Extract Data
let cheerio = require('cheerio')
let $ = cheerio.load('our html page url here')
var companiesList = [];
// For each .item, we add all the structure of a company to the companiesList array
// Don't try to understand what follows because we will do it differently.
$('.list.items .item').each(function(index, element){
companiesList[index] = {};
var header = $(element).find('.header');
@gahabeen
gahabeen / selectorsofdata.txt
Last active January 27, 2017 22:45
Selectors Structure of Data
company : .list.items .item
|_ name : .header [itemprop=name]
|_ description : .header [rel=description]
|_ url : .header [itemprop=name] a
|_ contact : .contact
|_ telephone : [itemprop=telephone]
|_ employee
|_ name : [itemprop=employeeName]
|_ jobTitle : [itemprop=employeeJobTitle]
|_ email : [itemprop=email]
@gahabeen
gahabeen / structureofdata.txt
Last active January 27, 2017 22:45
Companies Structure
company
|_ name
|_ description
|_ url
|_ contact
|_ telephone
|_ employee
|_ name
|_ jobTitle
|_ email
@gahabeen
gahabeen / index.html
Last active January 27, 2017 22:43
List of companies
<!DOCTYPE html>
<html lang="en">
<head>
</head>
<body>
<!-- Data we want to scrape starts here -->
<div class="list items">
<div class="item">
<div class="header">