Skip to content

Instantly share code, notes, and snippets.

View dangra's full-sized avatar
๐Ÿฆ–

Daniel Graรฑa dangra

๐Ÿฆ–
View GitHub Profile
$ scrapy shell http://taobao.com
2013-04-19 11:53:08-0300 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot)
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Enabled item pipelines:
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-04-19 11:53:08-0300 [scrapy] DEBUG: Web service listening on 0.0.0
$ scrapy runspider extras/qpsclient.py -a download_delay=0 -a slots=10 -a benchurl=http://localhost:9999/p/200:pr,1:dr --logfile log --loglevel ERROR
SLOT REMOVED 127.0.0.8 tested=15 active=14 inactive=1 reset=0 new=1 closefailed=0
SLOT REMOVED 127.0.0.5 tested=36 active=28 inactive=8 reset=6 new=2 closefailed=1
SLOT REMOVED 127.0.0.2 tested=66 active=51 inactive=15 reset=8 new=7 closefailed=6
SLOT REMOVED 127.0.0.5 tested=81 active=67 inactive=14 reset=7 new=7 closefailed=6
SLOT REMOVED 127.0.0.8 tested=126 active=116 inactive=10 reset=2 new=8 closefailed=7
SLOT REMOVED 127.0.0.6 tested=243 active=202 inactive=41 reset=16 new=25 closefailed=24
SLOT REMOVED 127.0.0.2 tested=210 active=154 inactive=56 reset=29 new=27 closefailed=26
SLOT REMOVED 127.0.0.2 tested=96 active=81 inactive=15 reset=8 new=7 closefailed=6
SLOT REMOVED 127.0.0.10 tested=450 active=332 inactive=118 reset=65 new=53 closefailed=52
$ nosetests tests/hstestcase.py
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
test_nop (hstestcase.NopTest) ... ok
----------------------------------------------------------------------
Ran 1 test in 2.235s
OK
diff --git a/scrapy/tests/test_downloader_handlers.py b/scrapy/tests/test_downloader_handlers.py
index 4728177..0ee1f77 100644
--- a/scrapy/tests/test_downloader_handlers.py
+++ b/scrapy/tests/test_downloader_handlers.py
@@ -58,6 +58,7 @@ class HttpTestCase(unittest.TestCase):
r = static.File(name)
r.putChild("redirect", util.Redirect("/file"))
r.putChild("wait", ForeverTakingResource())
+ r.putChild("hang-after-headers", ForeverTakingResource(write=True))
r.putChild("nolength", NoLengthResource())
#!/usr/bin/env perl
use feature 'switch';
use strict;
use warnings;
use Data::Dumper;
use File::Basename;
use File::Copy;
use File::Path qw/make_path/;
>>> import lxml.etree
>>> root = lxml.etree.fromstring('<html></html>', base_url='http://foo.com/?a=|')
>>> root.getroottree().docinfo.URL
u'http%3A//foo.com/%3Fa=%257C'
>>> root = lxml.etree.fromstring('<html></html>', base_url='http://foo.com/?a=b')
>>> root.getroottree().docinfo.URL
u'http://foo.com/?a=b'
include:
- docker
image:
docker.pulled:
----------
ID: app
Function: docker.running
Result: False
Comment: Container 'shipyard' cannot be started
Traceback (most recent call last):
File "/var/cache/salt/minion/extmods/modules/dockerio.py", line 904, in start
for k, v in port_bindings.iteritems():
AttributeError: 'list' object has no attribute 'iteritems'
Changes:

0.22.0 (released 2014-01-16)

Enhancements

  • Backwards incompatible Switched HTTPCacheMiddleware backend to filesystem (:issue:`541`) To restore old backend set HTTPCACHE_STORAGE to scrapy.contrib.httpcache.DbmCacheStorage
  • Proxy https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
  • Add a middleware to crawl ajax crawleable pages as defined by google (:issue:`343`)
~$ scrapy shell http://www.jobberman.com/jobs-in-nigeria/3/by-industry/vacancies-in-ict-telecommunications-companies-in-nigeria/
2014-01-22 23:09:26-0200 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-01-22 23:09:26-0200 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2014-01-22 23:09:26-0200 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-01-22 23:09:27-0200 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-01-22 23:09:28-0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-01-22 23:09:28-0200 [scrapy]