Skip to content

Instantly share code, notes, and snippets.

@sergiolucero
Created May 13, 2018 00:02
Show Gist options
  • Save sergiolucero/a8e0606a36d2691a17101f8e10fcaf4c to your computer and use it in GitHub Desktop.
Save sergiolucero/a8e0606a36d2691a17101f8e10fcaf4c to your computer and use it in GitHub Desktop.
minijumboscraper
import requests, pandas as pd
from bs4 import BeautifulSoup
url_base = 'https://nuevo.jumbo.cl/%s'
def parse(what):
print(what)
bs = BeautifulSoup(requests.get(url_base %what).text, 'lxml')
producto = [pt.text.split(chr(10))[1] for pt in bs.find_all('div',attrs={'class':'product-item__info'})]
marca = [pt.text.split(chr(10))[2] for pt in bs.find_all('div',attrs={'class':'product-item__info'})]
precios = [float(p.text.replace('$','').replace('.','').replace(',','.'))
for p in bs.find_all('span',attrs={'class':"product-prices__value product-prices__value--best-price"})]
df = pd.DataFrame({'producto': producto, 'marca':marca, 'precio': precios})
print(df.head())
for what in ['abarrotes','coctel','conservas']:
parse(what)
@sergiolucero
Copy link
Author

plot above was as simple as defining familia:

import matplotlib.pyplot as plt
bdf['familia'] = [p.split()[0] for p in bdf['producto']]
bdf=bdf[bdf.familia.isin(['Vino','Shampoo','Crema','Desodorante','Alimento','Queso'])]
bdf.boxplot('precio',by='familia')
plt.show()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment