Skip to content

Instantly share code, notes, and snippets.

@sergiolucero
Last active April 20, 2020 10:43
Show Gist options
  • Select an option

  • Save sergiolucero/791b083689cde509f6e926940564827c to your computer and use it in GitHub Desktop.

Select an option

Save sergiolucero/791b083689cde509f6e926940564827c to your computer and use it in GitHub Desktop.
Scraper Mercado Público
import pandas as pd
import requests, time, os
TOKEN = os.getenv('TOKEN_MP')
WAIT = 2 # this seems reasonable, 1 second does not work
url = 's3://quantcldata/CLIENTS/ObservatorioFiscal/abril2020.csv'
df = pd.read_csv(url)
get_prod = lambda d: 'camilla' if 'camilla' in d else 'mascarilla' if 'mascarilla' in d else 'alcohol' if 'alcohol gel' in d else 'Otros'
df['desc'] = df.Nombre.str.lower()
df['producto'] = df.desc.apply(get_prod)
cami = df[df.producto=='camilla']
alco = df[df.producto=='alcohol']
mask = df[df.producto=='mascarilla']
print(list(len(p) for p in [alco,cami,mask]))
def get_OC(oc):
BASE = 'http://api.mercadopublico.cl/servicios/v1/publico/'
ticket = os.getenv('TOKEN_MP')
url = f'{BASE}ordenesdecompra.json?codigo={oc}&ticket={ticket}'
js = requests.get(url).json()['Listado']
return js
def get_detalle(df):
pdf = pd.DataFrame()
for oc in df.Codigo:
print(oc, end=',')
c0 = pd.DataFrame(get_OC(oc))
pdf = pdf.append(c0)
time.sleep(WAIT)
return pdf
@sergiolucero
Copy link
Copy Markdown
Author

PyData_3CAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment