Skip to content

Instantly share code, notes, and snippets.

@diegorocha
Created August 10, 2015 19:48
Show Gist options
  • Save diegorocha/bc7a2a57154e84521eb2 to your computer and use it in GitHub Desktop.
Save diegorocha/bc7a2a57154e84521eb2 to your computer and use it in GitHub Desktop.
Um exemplo de como usar o BeautifulSoup para obter dados de paginação de uma página HTML. Surgiu de uma dúvida do grupo PythonBrasil
from BeautifulSoup import BeautifulSoup as bs
s = '''<div class="pages">
<div class="navigation"><span>1</span> <a href="http://www.redcouch.me/filmes/page/2/">2</a> <a href="http://www.redcouch.me/filmes/page/3/">3</a> <a href="http://www.redcouch.me/filmes/page/4/">4</a> <a href="http://www.redcouch.me/filmes/page/5/">5</a> <a href="http://www.redcouch.me/filmes/page/6/">6</a> <a href="http://www.redcouch.me/filmes/page/7/">7</a> <a href="http://www.redcouch.me/filmes/page/8/">8</a> <a href="http://www.redcouch.me/filmes/page/9/">9</a> <a href="http://www.redcouch.me/filmes/page/10/">10</a> <span class="nav_ext">...</span> <a href="http://www.redcouch.me/filmes/page/40/">40</a></div>
<div class="nextprev">
<span><span class="pprev">Anterior</span></span>
<a href="http://www.redcouch.me/filmes/page/2/"><span class="pnext">Seguinte</span></a>
</div>
<div class="clear"></div>
</div>
</div>
</div>'''
soup = bs(s)
paginas = [a['href'] for a in soup.find('div', {'class': 'navigation'}).findAll('a')]
print(paginas)
print('Proxima: %s' % paginas[0])
print('Ultima: %s' % paginas[-1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment