Skip to content

Instantly share code, notes, and snippets.

@blahah
Last active August 29, 2015 14:01
Show Gist options
  • Save blahah/581e79ad75fbe6e81bb5 to your computer and use it in GitHub Desktop.
Save blahah/581e79ad75fbe6e81bb5 to your computer and use it in GitHub Desktop.
MDPI figure scraper definition for ContentMine quickscrape
{
"url": "mdpi",
"elements": {
"dc.source": {
"selector": "//meta[@name='dc.source']",
"attribute": "content"
},
"figure_img": {
"selector": "//div[contains(@id, 'fig')]/div/img",
"attribute": "src",
"download": true
},
"figure_caption": {
"selector": "//div[contains(@class, 'html-fig_description')]"
},
"fulltext_pdf": {
"selector": "//meta[@name='citation_pdf_url']",
"attribute": "content",
"download": true
},
"fulltext_html": {
"selector": "//meta[@name='citation_fulltext_html_url']",
"attribute": "content",
"download": true
},
"title": {
"selector": "//meta[@name='citation_title']",
"attribute": "content"
},
"author": {
"selector": "//meta[@name='citation_author']",
"attribute": "content"
},
"date": {
"selector": "//meta[@name='citation_date']",
"attribute": "content"
},
"doi": {
"selector": "//meta[@name='citation_doi']",
"attribute": "content"
},
"volume": {
"selector": "//meta[@name='citation_volume']",
"attribute": "content"
},
"issue": {
"selector": "//meta[@name='citation_issue']",
"attribute": "content"
},
"firstpage": {
"selector": "//meta[@name='citation_firstpage']",
"attribute": "content"
},
"description": {
"selector": "//meta[@name='description']",
"attribute": "content"
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment