Created
November 8, 2015 05:12
-
-
Save jmcarp/d9fbc73e5d9719c04613 to your computer and use it in GitHub Desktop.
scraping for humans?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Scrapy includes an `ItemLoader` class and associated helpers to abstract | |
| data extraction from `Reponse` objects. But this API is verbose and easily | |
| result in more boilerplate, not less. The following is a quick sketch of | |
| a possible interface for using marshmallow, with a few custom fields, to | |
| pull data from Scrapy responses. | |
| """ | |
| class PersonSchema(Schema): | |
| name = fields.XPath('//title/text()', fields.Str) | |
| hobbies = fields.CSS('.hobby', fields.List(fields.Str)) | |
| @fields.Method() | |
| def details(self, response): | |
| labels = response.css('.label::text').extract() | |
| values = response.css('.value::text').extract() | |
| return dict(zip(labels, values)) | |
| PersonSchema().dump(response) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment