This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import encodings | |
import lxml.etree | |
for enc in set(encodings.aliases.aliases.values()): | |
try: | |
parser = lxml.etree.HTMLParser(recover=True, encoding=enc) | |
except LookupError as exc: | |
print str(exc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import encodings | |
import lxml.etree | |
for enc in set(encodings.aliases.aliases.values()): | |
try: | |
parser = lxml.etree.HTMLParser(recover=True, encoding=enc) | |
except LookupError as exc: | |
print str(exc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# global parameters | |
global | |
# log on syslog of 127.0.0.1 udp port 514 (default) using local0 facility. | |
log 127.0.0.1 local0 | |
# maximum number of concurrent connections | |
maxconn 4096 | |
# drop privileges after port binding | |
user nobody | |
group nogroup | |
# run in daemon mode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Downloading cryptography-0.2.2.tar.gz (13.8MB): 13.8MB downloaded | |
Running setup.py (path:/tmp/pip_build_root/cryptography/setup.py) egg_info for package cryptography | |
no previously-included directories found matching 'documentation/_build' | |
zip_safe flag not set; analyzing archive contents... | |
six: module references __file__ | |
Installed /tmp/pip_build_root/cryptography/six-1.5.2-py2.7.egg | |
Searching for cffi>=0.8 | |
Reading http://33.33.33.41:3141/vagrant/dev/+simple/cffi/ | |
Best match: cffi 0.8.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
------------------------------------------------------------ | |
/home/daniel/envs/setup3/bin/pip run on Thu Mar 6 12:23:59 2014 | |
Downloading/unpacking cryptography | |
Getting page https://pypi.python.org/simple/cryptography/ | |
URLs to search for versions for cryptography: | |
* https://pypi.python.org/simple/cryptography/ | |
Analyzing links from page https://pypi.python.org/simple/cryptography/ | |
Skipping https://pypi.python.org/packages/cp26/c/cryptography/cryptography-0.2-cp26-none-win32.whl#md5=13e5c4b19520e7dc6f07c6502b3f74e2 (from https://pypi.python.org/simple/cryptography/) because it is not compatible with this Python | |
Skipping https://pypi.python.org/packages/cp26/c/cryptography/cryptography-0.2.1-cp26-none-win32.whl#md5=00e733648ee5cdb9e58876238b1328f8 (from https://pypi.python.org/simple/cryptography/) because it is not compatible with this Python | |
Skipping https://pypi.python.org/packages/cp26/c/cryptography/cryptography-0.2.2-cp26-none-win32.whl#md5=b52f9b5f5c980ebbe090f945a44be2a5 (from https:/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell http://scrapy.org/images/logo.png | |
2014-04-21 23:53:11-0300 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot) | |
2014-04-21 23:53:11-0300 [scrapy] INFO: Optional features available: ssl, http11 | |
2014-04-21 23:53:11-0300 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0} | |
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState | |
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2014-04-21 23:53:12-0300 [scrapy] INFO: Enabled item pipelines: | |
2014-04-21 23:53:12-0300 [scrapy] DEBUG: Telnet console listen |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell https://www.ssehl.co.uk/HALO/publicLogon.do -c "response.xpath('//title').extract()" | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot) | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Optional features available: ssl, http11 | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0} | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled item pipelines: | |
2014-05-08 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell https://www.ssehl.co.uk/HALO/publicLogon.do -c "response.xpath('//title').extract()" | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot) | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Optional features available: ssl, http11 | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0} | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2014-05-08 16:33:22-0300 [scrapy] INFO: Enabled item pipelines: | |
2014-05-08 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* | |
* Licensed to the Apache Software Foundation (ASF) under one | |
* or more contributor license agreements. See the NOTICE file | |
* distributed with this work for additional information | |
* regarding copyright ownership. The ASF licenses this file | |
* to you under the Apache License, Version 2.0 (the | |
* "License"); you may not use this file except in compliance | |
* with the License. You may obtain a copy of the License at | |
* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import scrapy | |
class Spider(scrapy.Spider): | |
name = 'loremipsum' | |
start_urls = ('https://www.lipsum.com',) | |
def parse(self, response): | |
for lnk in response.links(): |