- Backwards incompatible Switched HTTPCacheMiddleware backend to filesystem (:issue:`541`) To restore old backend set HTTPCACHE_STORAGE to scrapy.contrib.httpcache.DbmCacheStorage
- Proxy https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
- Add a middleware to crawl ajax crawleable pages as defined by google (:issue:`343`)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy shell http://taobao.com | |
2013-04-19 11:53:08-0300 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot) | |
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Overridden settings: {'LOGSTATS_INTERVAL': 0} | |
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState | |
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled item pipelines: | |
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 | |
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Web service listening on 0.0.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ scrapy runspider extras/qpsclient.py -a download_delay=0 -a slots=10 -a benchurl=http://localhost:9999/p/200:pr,1:dr --logfile log --loglevel ERROR | |
SLOT REMOVED 127.0.0.8 tested=15 active=14 inactive=1 reset=0 new=1 closefailed=0 | |
SLOT REMOVED 127.0.0.5 tested=36 active=28 inactive=8 reset=6 new=2 closefailed=1 | |
SLOT REMOVED 127.0.0.2 tested=66 active=51 inactive=15 reset=8 new=7 closefailed=6 | |
SLOT REMOVED 127.0.0.5 tested=81 active=67 inactive=14 reset=7 new=7 closefailed=6 | |
SLOT REMOVED 127.0.0.8 tested=126 active=116 inactive=10 reset=2 new=8 closefailed=7 | |
SLOT REMOVED 127.0.0.6 tested=243 active=202 inactive=41 reset=16 new=25 closefailed=24 | |
SLOT REMOVED 127.0.0.2 tested=210 active=154 inactive=56 reset=29 new=27 closefailed=26 | |
SLOT REMOVED 127.0.0.2 tested=96 active=81 inactive=15 reset=8 new=7 closefailed=6 | |
SLOT REMOVED 127.0.0.10 tested=450 active=332 inactive=118 reset=65 new=53 closefailed=52 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ nosetests tests/hstestcase.py | |
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$'] | |
test_nop (hstestcase.NopTest) ... ok | |
---------------------------------------------------------------------- | |
Ran 1 test in 2.235s | |
OK |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
diff --git a/scrapy/tests/test_downloader_handlers.py b/scrapy/tests/test_downloader_handlers.py | |
index 4728177..0ee1f77 100644 | |
--- a/scrapy/tests/test_downloader_handlers.py | |
+++ b/scrapy/tests/test_downloader_handlers.py | |
@@ -58,6 +58,7 @@ class HttpTestCase(unittest.TestCase): | |
r = static.File(name) | |
r.putChild("redirect", util.Redirect("/file")) | |
r.putChild("wait", ForeverTakingResource()) | |
+ r.putChild("hang-after-headers", ForeverTakingResource(write=True)) | |
r.putChild("nolength", NoLengthResource()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
use feature 'switch'; | |
use strict; | |
use warnings; | |
use Data::Dumper; | |
use File::Basename; | |
use File::Copy; | |
use File::Path qw/make_path/; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> import lxml.etree | |
>>> root = lxml.etree.fromstring('<html></html>', base_url='http://foo.com/?a=|') | |
>>> root.getroottree().docinfo.URL | |
u'http%3A//foo.com/%3Fa=%257C' | |
>>> root = lxml.etree.fromstring('<html></html>', base_url='http://foo.com/?a=b') | |
>>> root.getroottree().docinfo.URL | |
u'http://foo.com/?a=b' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
include: | |
- docker | |
image: | |
docker.pulled: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
---------- | |
ID: app | |
Function: docker.running | |
Result: False | |
Comment: Container 'shipyard' cannot be started | |
Traceback (most recent call last): | |
File "/var/cache/salt/minion/extmods/modules/dockerio.py", line 904, in start | |
for k, v in port_bindings.iteritems(): | |
AttributeError: 'list' object has no attribute 'iteritems' | |
Changes: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
~$ scrapy shell http://www.jobberman.com/jobs-in-nigeria/3/by-industry/vacancies-in-ict-telecommunications-companies-in-nigeria/ | |
2014-01-22 23:09:26-0200 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot) | |
2014-01-22 23:09:26-0200 [scrapy] INFO: Optional features available: ssl, http11, boto, django | |
2014-01-22 23:09:26-0200 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0} | |
2014-01-22 23:09:27-0200 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState | |
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2014-01-22 23:09:28-0200 [scrapy] |