Created
October 18, 2019 20:27
-
-
Save Vido/12e5e3ecef854b35627b6b5ab9a59b64 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Pegando o ajuste dos contratos futuros da B3 com Python\n", | |
"\n", | |
"Primeiramente devemos acessar o site da B3:\n", | |
"\n", | |
"* http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/historico/derivativos/ajustes-do-pregao/\n", | |
"\n", | |
"Infelizmente tenho que assumir que o leitor tenha alguma familiariadade com HTML.\n", | |
"Com a função `Inspecionar` do browser, vamos analisar o HTML do site. Lá notamos que os dados estão organizados em uma tabela `<table>`, e esta table tem o `id=\"tblDadosAjustes\"`.\n", | |
"Porém essa tabela está dentro de um `<iframe>`. Esse TAG informa o browser que ele deve renderizar o conteúdo de outra página HTML. No caso: \n", | |
"\n", | |
"* http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-ajustes-do-pregao-ptBR.asp\n", | |
"\n", | |
"Perfeito, Vamos fazer o scraping na página:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<Response [200]>\n" | |
] | |
} | |
], | |
"source": [ | |
"# requests é uma biblioteca para fazer requisições HTTP\n", | |
"import requests\n", | |
"\n", | |
"bmf_url = 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-ajustes-do-pregao-ptBR.asp'\n", | |
"response = requests.get(bmf_url)\n", | |
"print(response)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Para facilitar a nossa vida, vamos usar um biblioteca para fazer o 'parsing' do HTML: lxml Porém ela não tem instalada por padrão, devemos usar o seguinte comando para fazer a instalção:\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Collecting package metadata (current_repodata.json): done\n", | |
"Solving environment: done\n", | |
"\n", | |
"\n", | |
"==> WARNING: A newer version of conda exists. <==\n", | |
" current version: 4.7.10\n", | |
" latest version: 4.7.12\n", | |
"\n", | |
"Please update conda by running\n", | |
"\n", | |
" $ conda update -n base conda\n", | |
"\n", | |
"\n", | |
"\n", | |
"## Package Plan ##\n", | |
"\n", | |
" environment location: /srv/conda/envs/notebook\n", | |
"\n", | |
" added / updated specs:\n", | |
" - lxml\n", | |
"\n", | |
"\n", | |
"The following packages will be downloaded:\n", | |
"\n", | |
" package | build\n", | |
" ---------------------------|-----------------\n", | |
" ca-certificates-2019.9.11 | hecc5488_0 144 KB conda-forge\n", | |
" certifi-2019.9.11 | py36_0 147 KB conda-forge\n", | |
" libxslt-1.1.33 | h31b3aaa_0 556 KB conda-forge\n", | |
" lxml-4.4.1 | py36h7ec2d77_0 1.6 MB conda-forge\n", | |
" ------------------------------------------------------------\n", | |
" Total: 2.4 MB\n", | |
"\n", | |
"The following NEW packages will be INSTALLED:\n", | |
"\n", | |
" libxslt conda-forge/linux-64::libxslt-1.1.33-h31b3aaa_0\n", | |
" lxml conda-forge/linux-64::lxml-4.4.1-py36h7ec2d77_0\n", | |
"\n", | |
"The following packages will be UPDATED:\n", | |
"\n", | |
" ca-certificates 2019.6.16-hecc5488_0 --> 2019.9.11-hecc5488_0\n", | |
" certifi 2019.6.16-py36_1 --> 2019.9.11-py36_0\n", | |
"\n", | |
"\n", | |
"\n", | |
"Downloading and Extracting Packages\n", | |
"libxslt-1.1.33 | 556 KB | ##################################### | 100% \n", | |
"ca-certificates-2019 | 144 KB | ##################################### | 100% \n", | |
"lxml-4.4.1 | 1.6 MB | ##################################### | 100% \n", | |
"certifi-2019.9.11 | 147 KB | ##################################### | 100% \n", | |
"Preparing transaction: done\n", | |
"Verifying transaction: done\n", | |
"Executing transaction: done\n", | |
"\n", | |
"Note: you may need to restart the kernel to use updated packages.\n" | |
] | |
} | |
], | |
"source": [ | |
"conda install lxml" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[<Element table at 0x7f1929f270e8>]\n" | |
] | |
} | |
], | |
"source": [ | |
"from lxml import html\n", | |
"\n", | |
"# Aqui inicializamos nosso Parser com a resposta da requisição lá de cima.\n", | |
"# O objeto `response` tem um atributo `.text` que representa o coódigo-fonte da página\n", | |
"tree = html.fromstring(response.text)\n", | |
"\n", | |
"# Aqui vamos usar uma sintax chamada de XPath, para acessar os elementos do HTML\n", | |
"table = tree.xpath('//table[@id=\"tblDadosAjustes\"]')\n", | |
"\n", | |
"# Esse XPath pode ser interpretado desta forma:\n", | |
"# * A partir do nó atual:\n", | |
"# * procure por uma <table> com o id=\"tblDadosAjustes\"\n", | |
"\n", | |
"print(table)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Mercadoria \n", | |
" WIN - Ibovespa Mini \n", | |
"\n", | |
"Vencimento \n", | |
" Z19 \n", | |
"\n", | |
"Preço de ajuste anterior \n", | |
" 106.253 \n", | |
"\n", | |
"Preço de ajuste Atual \n", | |
" 105.750 \n", | |
"\n", | |
"Variação \n", | |
" -503 \n", | |
"\n", | |
"Valor do ajuste por contrato (R$) \n", | |
" 100,60 \n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"# Esse XPath pode ser interpretado desta forma:\n", | |
"# * A partir do nó atual:\n", | |
"# * procure por uma <th>\n", | |
"# * A partir deste nó, retorne o 'pai'\n", | |
"\n", | |
"headers = tree.xpath('//th/..')\n", | |
"\n", | |
"# Esse XPath pode ser interpretado desta forma:\n", | |
"# * A partir do nó atual:\n", | |
"# * procure por uma <td> cujo o conteúdo contenha a string \"WIN\"\n", | |
"# * A partir deste nó, retorne o 'pai'\n", | |
"\n", | |
"data = tree.xpath('//td[contains(text(), \"WIN\")]/..')\n", | |
"\n", | |
"# Aqui vamos combinar a lista headers (os cabeçalhos da tabela),\n", | |
"# com a lista data (dados da tabela)\n", | |
"# através da função `zip`\n", | |
"\n", | |
"for header, value in zip(headers[0].getchildren(), data[0].getchildren()):\n", | |
" print(header.text, '\\n', value.text, '\\n')\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.7" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment