Skip to content

Instantly share code, notes, and snippets.

@marcelcaraciolo
Created May 5, 2012 05:06
Show Gist options
  • Save marcelcaraciolo/2599936 to your computer and use it in GitHub Desktop.
Save marcelcaraciolo/2599936 to your computer and use it in GitHub Desktop.
Primeiro passo
from BeautifulSoup import BeautifulSoup
import urllib
import re
def extract_price(url):
dados = urllib.urlopen(url).read()
return re.findall(r'R\$\s*[a-z<> ]*([1-9][0-9,]+)', dados)
if __name__ == '__main__':
url = 'http://www.submarino.com.br/produto/1/21652587/construcao+do+corredor,+a'
print extract_price(url)
url = 'http://www.livrariacultura.com.br/scripts/resenha/resenha.asp?nitem=29401758&sid=872230118142277712224725'
print extract_price(url)
url = 'http://www.walmart.com.br/produto/DVDs-e-Blu-Ray/Lancamentos/Microservice/341835-Santos---100-Anos-de-Futebol-Arte---DVD'
print extract_price(url)
url = 'http://www.fnac.com.br/radio-portatil-com-cd-philips-mp3-az1137-55-FNAC,,som-549464-12.html'
print extract_price(url)
url = 'http://www.americanas.com.br/produto/7091439/radio-portatil-az1137-reproduz-cd-mp3/wma-cd-e-cdrw-sintonizador-fm/mw-20-faixas-cd-programaveis-philips'
print extract_price(url)
url = 'http://www.gazin.com.br/produto/34082/r%E1dio-philips-cd-mp3-fm-2w-rms-az1137.html'
print extract_price(url)
url = 'http://www.shoptime.com.br/produto/7091439/audio/somportatil/philips/radio-portatil-az1137-reproduz-cd-mp3/wma-cd-e-cdrw-sintonizador-fm/mw-20-faixas-cd-programaveis-philips'
print extract_price(url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment