This presents some possible improvements for scrapy contracts. They can be potentially all implemented, but curious which are good/bad ideas. All of them could potentially break existing custom contracts.
... or multiple contracts which generate requests in general
def parse_response(self, response):
"""
@url http://example.org/foo
@url http://example.org/bar
@returns items 1 1
"""
return MyItem(url=response.url)
@url
is a contract which generates one request, @returns
is a contract with a post-hook which checks that it returns exactly 1 item. @returns
then applies to both requests generated by @url
individually (so each request must return 1 item).
def search(self, keywords):
"""
@custom kw1 kw2
@url http://example.org/bar
@returns items 2 2
"""
for kw in keywords:
yield Request('http:/example.org/%s' % kw, callback=self.parse_response)
In this case, @custom
is a contract which returns a list of requests which can be treated as a batch. @returns
then applies to each batch. It will require the batch of requests returned by @custom
(http://example.org/kw1
and http://example.org/kw2
) and the batch of requests returned by @url
(just the one: http://example.org/bar
) to return 2 items each.
This is useful in case we want a method to be tested by multiple scenarios.
def parse_response(self, response):
"""
@url http://example.org/foo
@returns items 1 1
<!-- some sort of separator -->
@url http://example.org/bar
@returns items 0 0
"""
pass
Here the first @returns
contract only applies to http://example.org/foo
but not http://example.org/bar
and the second only applies to @url http://example.org/bar
. For the separator there are a few options:
- blank line
@@
- other ideas.. ?