Skip to content

Instantly share code, notes, and snippets.

@ajw0100
Created November 22, 2013 18:57
Show Gist options
  • Save ajw0100/7605012 to your computer and use it in GitHub Desktop.
Save ajw0100/7605012 to your computer and use it in GitHub Desktop.
In [3]: sel.re(r'license_list\.cfm\?.*')
Out[3]:
[u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=2">2</a>\r',
u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=3">3</a>\r',
u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=4">4</a>\r',
u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=5">5</a>\r',
u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=6">6</a>\r',
u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=7">7</a>\r',
u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=8">8</a>\r',
u'license_list.cfm?genreID=1&amp;cclicense=1&amp;sort=1&amp;page=2">Next ></a></strong>\r']
re_allow = r'license_list\.cfm\?.*'
rules = (
Rule(SgmlLinkExtractor(allow=re_allow),
callback='parse_list_page'
),
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment