Skip to content

Instantly share code, notes, and snippets.

@Vido
Created October 18, 2019 20:27
Show Gist options
  • Save Vido/12e5e3ecef854b35627b6b5ab9a59b64 to your computer and use it in GitHub Desktop.
Save Vido/12e5e3ecef854b35627b6b5ab9a59b64 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pegando o ajuste dos contratos futuros da B3 com Python\n",
"\n",
"Primeiramente devemos acessar o site da B3:\n",
"\n",
"* http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/historico/derivativos/ajustes-do-pregao/\n",
"\n",
"Infelizmente tenho que assumir que o leitor tenha alguma familiariadade com HTML.\n",
"Com a função `Inspecionar` do browser, vamos analisar o HTML do site. Lá notamos que os dados estão organizados em uma tabela `<table>`, e esta table tem o `id=\"tblDadosAjustes\"`.\n",
"Porém essa tabela está dentro de um `<iframe>`. Esse TAG informa o browser que ele deve renderizar o conteúdo de outra página HTML. No caso: \n",
"\n",
"* http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-ajustes-do-pregao-ptBR.asp\n",
"\n",
"Perfeito, Vamos fazer o scraping na página:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<Response [200]>\n"
]
}
],
"source": [
"# requests é uma biblioteca para fazer requisições HTTP\n",
"import requests\n",
"\n",
"bmf_url = 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-ajustes-do-pregao-ptBR.asp'\n",
"response = requests.get(bmf_url)\n",
"print(response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Para facilitar a nossa vida, vamos usar um biblioteca para fazer o 'parsing' do HTML: lxml Porém ela não tem instalada por padrão, devemos usar o seguinte comando para fazer a instalção:\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting package metadata (current_repodata.json): done\n",
"Solving environment: done\n",
"\n",
"\n",
"==> WARNING: A newer version of conda exists. <==\n",
" current version: 4.7.10\n",
" latest version: 4.7.12\n",
"\n",
"Please update conda by running\n",
"\n",
" $ conda update -n base conda\n",
"\n",
"\n",
"\n",
"## Package Plan ##\n",
"\n",
" environment location: /srv/conda/envs/notebook\n",
"\n",
" added / updated specs:\n",
" - lxml\n",
"\n",
"\n",
"The following packages will be downloaded:\n",
"\n",
" package | build\n",
" ---------------------------|-----------------\n",
" ca-certificates-2019.9.11 | hecc5488_0 144 KB conda-forge\n",
" certifi-2019.9.11 | py36_0 147 KB conda-forge\n",
" libxslt-1.1.33 | h31b3aaa_0 556 KB conda-forge\n",
" lxml-4.4.1 | py36h7ec2d77_0 1.6 MB conda-forge\n",
" ------------------------------------------------------------\n",
" Total: 2.4 MB\n",
"\n",
"The following NEW packages will be INSTALLED:\n",
"\n",
" libxslt conda-forge/linux-64::libxslt-1.1.33-h31b3aaa_0\n",
" lxml conda-forge/linux-64::lxml-4.4.1-py36h7ec2d77_0\n",
"\n",
"The following packages will be UPDATED:\n",
"\n",
" ca-certificates 2019.6.16-hecc5488_0 --> 2019.9.11-hecc5488_0\n",
" certifi 2019.6.16-py36_1 --> 2019.9.11-py36_0\n",
"\n",
"\n",
"\n",
"Downloading and Extracting Packages\n",
"libxslt-1.1.33 | 556 KB | ##################################### | 100% \n",
"ca-certificates-2019 | 144 KB | ##################################### | 100% \n",
"lxml-4.4.1 | 1.6 MB | ##################################### | 100% \n",
"certifi-2019.9.11 | 147 KB | ##################################### | 100% \n",
"Preparing transaction: done\n",
"Verifying transaction: done\n",
"Executing transaction: done\n",
"\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"conda install lxml"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[<Element table at 0x7f1929f270e8>]\n"
]
}
],
"source": [
"from lxml import html\n",
"\n",
"# Aqui inicializamos nosso Parser com a resposta da requisição lá de cima.\n",
"# O objeto `response` tem um atributo `.text` que representa o coódigo-fonte da página\n",
"tree = html.fromstring(response.text)\n",
"\n",
"# Aqui vamos usar uma sintax chamada de XPath, para acessar os elementos do HTML\n",
"table = tree.xpath('//table[@id=\"tblDadosAjustes\"]')\n",
"\n",
"# Esse XPath pode ser interpretado desta forma:\n",
"# * A partir do nó atual:\n",
"# * procure por uma <table> com o id=\"tblDadosAjustes\"\n",
"\n",
"print(table)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mercadoria \n",
" WIN - Ibovespa Mini \n",
"\n",
"Vencimento \n",
" Z19 \n",
"\n",
"Preço de ajuste anterior \n",
" 106.253 \n",
"\n",
"Preço de ajuste Atual \n",
" 105.750 \n",
"\n",
"Variação \n",
" -503 \n",
"\n",
"Valor do ajuste por contrato (R$) \n",
" 100,60 \n",
"\n"
]
}
],
"source": [
"# Esse XPath pode ser interpretado desta forma:\n",
"# * A partir do nó atual:\n",
"# * procure por uma <th>\n",
"# * A partir deste nó, retorne o 'pai'\n",
"\n",
"headers = tree.xpath('//th/..')\n",
"\n",
"# Esse XPath pode ser interpretado desta forma:\n",
"# * A partir do nó atual:\n",
"# * procure por uma <td> cujo o conteúdo contenha a string \"WIN\"\n",
"# * A partir deste nó, retorne o 'pai'\n",
"\n",
"data = tree.xpath('//td[contains(text(), \"WIN\")]/..')\n",
"\n",
"# Aqui vamos combinar a lista headers (os cabeçalhos da tabela),\n",
"# com a lista data (dados da tabela)\n",
"# através da função `zip`\n",
"\n",
"for header, value in zip(headers[0].getchildren(), data[0].getchildren()):\n",
" print(header.text, '\\n', value.text, '\\n')\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment