Skip to content

Instantly share code, notes, and snippets.

@stekhn
Created January 30, 2017 10:31
Show Gist options
  • Save stekhn/4f71fedd5a27c81b2899242f491b2eba to your computer and use it in GitHub Desktop.
Save stekhn/4f71fedd5a27c81b2899242f491b2eba to your computer and use it in GitHub Desktop.
Sitemap configuration for scraping Angela Merkel's calender using the webscrape.io Chrome extension
{
"selectors": [
{
"parentSelectors": [
"_root"
],
"type": "SelectorLink",
"multiple": true,
"id": "month",
"selector": "div.calendar-switcher.months ul li a",
"delay": ""
},
{
"parentSelectors": [
"_root",
"month"
],
"type": "SelectorText",
"multiple": false,
"id": "year",
"selector": "#main h1",
"regex": "",
"delay": ""
},
{
"parentSelectors": [
"_root",
"month"
],
"type": "SelectorElement",
"multiple": true,
"id": "day",
"selector": "div.appointment",
"delay": ""
},
{
"parentSelectors": [
"day"
],
"type": "SelectorText",
"multiple": false,
"id": "date",
"selector": "em",
"regex": "",
"delay": ""
},
{
"parentSelectors": [
"appointment"
],
"type": "SelectorText",
"multiple": false,
"id": "caption",
"selector": "h3",
"regex": "",
"delay": ""
},
{
"parentSelectors": [
"appointment"
],
"type": "SelectorText",
"multiple": false,
"id": "headline",
"selector": "h2",
"regex": "",
"delay": ""
},
{
"parentSelectors": [
"appointment"
],
"type": "SelectorText",
"multiple": false,
"id": "description",
"selector": "p",
"regex": "",
"delay": ""
},
{
"parentSelectors": [
"day"
],
"type": "SelectorElement",
"multiple": true,
"id": "appointment",
"selector": "div.appointment-box",
"delay": ""
}
],
"startUrl": "https://www.bundeskanzlerin.de/Webs/BKin/DE/AngelaMerkel/Terminkalender/kalender_node.html?date=[2008-2017]01",
"_id": "merkel-final"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment