Created
June 22, 2019 17:51
-
-
Save JonasLopesdoO/b332dd08ce409b625330acd7d319bcf4 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Trabalho Final da disciplina de Machine Learning" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Tipo de problema\n", | |
"NER - Named Entity Recognition.\n", | |
"\n", | |
"Neste tipo de problema um modelo de predição para textos tentará predizer da melhor maneira possível quais são as entidades em um texto. Os problemas de NER surgem devido diversas maneiras de expressar uma determinada coisa e também onde uma coisa pode significar várias outras.\n", | |
"A Língua Portuguesa por exemplo possui muitos tempos verbais e maneiras de conjugação de verbos.\n", | |
"\n", | |
"Ex: conjugar, conjugado, conjugaria, conjugarei, conjugaríamos, conjugaríeis, conjugaste, conjugou, conjuguemos...\n", | |
"\n", | |
"Tente explicar por exemplo para um extrangeiro a diferença entre \"bota a calça e calça a bota\"." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Problema\n", | |
"O problema a ser resolvido com a resolução deste trabalho é utilizar um dataset com milhares de textos e palavras rotulados e a partir de um modelo de predições de texto, realizar o ranqueamento das melhores classificações de entidades.\n", | |
"\n", | |
"Nós nos focaremos em 4 tipos de named entities: persons, locations, organizations e nomes de demais entidades que não pertençam as anteriores." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Datasets\n", | |
"Será utilizado um dataset amplamente difundido que foi utilizado primariamente na Conference on Computational Natural Language Learning (CoNLL-2003) acessível a partir do seguinte link: https://www.clips.uantwerpen.be/conll2003/ner/" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Divisão do dataset - rows\n", | |
"Os arquivos de dados de tarefas compartilhadas CoNLL-2003 contêm quatro colunas separadas por um único espaço.\n", | |
"\n", | |
"- O primeiro item de cada linha no dataset é uma palavra. \n", | |
"- O segundo é um Part-of-Speech (POS) tag. \n", | |
"- O terceiro é uma tag de fragmento sintático. \n", | |
"- A quarta é a tag de entidade nomeada.\n", | |
"\n", | |
"As tags de fragmento e os nomes de entidades tem o formato I-TYPE que significa que a palavra está dentro de uma frase do tipo TYPE.\n", | |
"\n", | |
" Uma palavra com a tag O não é parte de uma frase." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Ferramenta\n", | |
"Será utilizado o [Spacy](https://spacy.io/)\n", | |
" que é uma ferramenta para o emprego de técnicas de Processamento de Linguagem Natural - PLN. Será utilizado voltado ao NER - Named Entity Recognition.\n", | |
"\n", | |
"O Spacy é uma ferramenta de código aberto muito poderosa que já possui diversos modelos prontos para uso para diversos idiomas.\n", | |
"\n", | |
"Será utilizado o modelo para classificação de textos em inglês.\n", | |
"\n", | |
"Algumas funções do spacy estão definidas mais abaixo no documento." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"#U.N. NNP I-NP I-ORG \n", | |
"#official NN I-NP O \n", | |
"#Ekeus NNP I-NP I-PER \n", | |
"#heads VBZ I-VP O \n", | |
"#for IN I-PP O \n", | |
"#Baghdad NNP I-NP I-LOC \n", | |
"#. . O O " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Divisão dos arquivos de dados\n", | |
"A divisão dos arquivos do dataset se dão da seguinte forma:\n", | |
" - Consiste de 3 arquivos por linguagem\n", | |
" - Um arquivo de treino\n", | |
" - Dois arquivos de teste, testeA e testeB\n", | |
" - O primeiro arquivo de teste será usado em produção para encontrar os melhores parametros\n", | |
" - O segundo arquivo de teste será usado para a avaliação final\n", | |
" - Os dados estão disponíveis em dois datasets, um em Inglês e também em Alemão. Para o propósito deste trabalho será usado apenas a versão em inglês.\n", | |
" " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Arquivos\n", | |
"- eng.raw.tar - 13,930 MB - Conjunto de dados em inglês\n", | |
"- ner.tgz - 3,374 MB - Contém o software para fazer o build dos dados\n", | |
"- 000README.txt - 8 KB - Instruções para descompactação\n", | |
"\n", | |
"Os dados em inglês são uma coleção de artigos de notícias do Reuters Corpus." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Métricas de Avaliação\n", | |
"A competição utiliza três métricas principais que são:\n", | |
"- Precision\n", | |
"- Recall\n", | |
"- F-Score\n", | |
"\n", | |
"Precision é a porcentagem de named entities encontradas pelo sistema de aprendizado que estão corretas.\n", | |
"\n", | |
"Recall é a porcentagem de named entities presentes no Corpus que são encontradas pelo sistema.\n", | |
"\n", | |
"Uma named entity só está correta se for uma correspondência exata da entidade correspondente no\n", | |
"arquivo de dados." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Dados de treino: Exemplos e suas anotações.\n", | |
"\n", | |
"### Texto: O texto de entrada para o qual o modelo deve prever um rótulo.\n", | |
"\n", | |
"### Label: O label que o modelo deve predizer\n", | |
"\n", | |
"### Gradient: Gradient da função de perca calculando a diferença entre a entrada e a saída esperada" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 46, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"from IPython.display import Image\n", | |
"Image(filename=\"training model.png\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Importando bibliotecas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 107, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import spacy\n", | |
"from spacy import displacy\n", | |
"from spacy.tokens import Span, Doc\n", | |
"from spacy.vocab import Vocab\n", | |
"from spacy.pipeline import EntityRuler\n", | |
"from spacy.lang.en import English\n", | |
"\n", | |
"from sklearn.metrics import classification_report\n", | |
"\n", | |
"import numpy as np \n", | |
"import random" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"nlp = spacy.load(\"en_core_web_sm\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">\n", | |
"<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em; box-decoration-break: clone; -webkit-box-decoration-break: clone\">\n", | |
" Apple\n", | |
" <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n", | |
"</mark>\n", | |
" is looking at buying \n", | |
"<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em; box-decoration-break: clone; -webkit-box-decoration-break: clone\">\n", | |
" U.K.\n", | |
" <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n", | |
"</mark>\n", | |
" startup for \n", | |
"<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em; box-decoration-break: clone; -webkit-box-decoration-break: clone\">\n", | |
" $1 billion\n", | |
" <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; text-transform: uppercase; vertical-align: middle; margin-left: 0.5rem\">MONEY</span>\n", | |
"</mark>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"doc = nlp(u\"Apple is looking at buying U.K. startup for $1 billion\")\n", | |
"spacy.displacy.render(doc, style='ent', jupyter=True)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"C:\\Users\\Jonas Lopes\\Anaconda3\\lib\\runpy.py:193: UserWarning: [W011] It looks like you're calling displacy.serve from within a Jupyter notebook or a similar environment. This likely means you're already running a local web server, so there's no need to make displaCy start another one. Instead, you should be able to replace displacy.serve with displacy.render to show the visualization.\n", | |
" \"__main__\", mod_spec)\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<!DOCTYPE html>\n", | |
"<html lang=\"en\">\n", | |
" <head>\n", | |
" <title>displaCy</title>\n", | |
" </head>\n", | |
"\n", | |
" <body style=\"font-size: 16px; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'; padding: 4rem 2rem; direction: ltr\">\n", | |
"<figure style=\"margin-bottom: 6rem\">\n", | |
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xml:lang=\"en\" id=\"577214f5f2b7446ba4faef2d08af4f77-0\" class=\"displacy\" width=\"750\" height=\"312.0\" direction=\"ltr\" style=\"max-width: none; height: 312.0px; color: #000000; background: #ffffff; font-family: Arial; direction: ltr\">\n", | |
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"222.0\">\n", | |
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"50\">This</tspan>\n", | |
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"50\">DET</tspan>\n", | |
"</text>\n", | |
"\n", | |
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"222.0\">\n", | |
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"225\">is</tspan>\n", | |
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"225\">VERB</tspan>\n", | |
"</text>\n", | |
"\n", | |
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"222.0\">\n", | |
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"400\">a</tspan>\n", | |
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"400\">DET</tspan>\n", | |
"</text>\n", | |
"\n", | |
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"222.0\">\n", | |
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"575\">sentence.</tspan>\n", | |
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"575\">NOUN</tspan>\n", | |
"</text>\n", | |
"\n", | |
"<g class=\"displacy-arrow\">\n", | |
" <path class=\"displacy-arc\" id=\"arrow-577214f5f2b7446ba4faef2d08af4f77-0-0\" stroke-width=\"2px\" d=\"M70,177.0 C70,89.5 220.0,89.5 220.0,177.0\" fill=\"none\" stroke=\"currentColor\"/>\n", | |
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n", | |
" <textPath xlink:href=\"#arrow-577214f5f2b7446ba4faef2d08af4f77-0-0\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nsubj</textPath>\n", | |
" </text>\n", | |
" <path class=\"displacy-arrowhead\" d=\"M70,179.0 L62,167.0 78,167.0\" fill=\"currentColor\"/>\n", | |
"</g>\n", | |
"\n", | |
"<g class=\"displacy-arrow\">\n", | |
" <path class=\"displacy-arc\" id=\"arrow-577214f5f2b7446ba4faef2d08af4f77-0-1\" stroke-width=\"2px\" d=\"M420,177.0 C420,89.5 570.0,89.5 570.0,177.0\" fill=\"none\" stroke=\"currentColor\"/>\n", | |
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n", | |
" <textPath xlink:href=\"#arrow-577214f5f2b7446ba4faef2d08af4f77-0-1\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">det</textPath>\n", | |
" </text>\n", | |
" <path class=\"displacy-arrowhead\" d=\"M420,179.0 L412,167.0 428,167.0\" fill=\"currentColor\"/>\n", | |
"</g>\n", | |
"\n", | |
"<g class=\"displacy-arrow\">\n", | |
" <path class=\"displacy-arc\" id=\"arrow-577214f5f2b7446ba4faef2d08af4f77-0-2\" stroke-width=\"2px\" d=\"M245,177.0 C245,2.0 575.0,2.0 575.0,177.0\" fill=\"none\" stroke=\"currentColor\"/>\n", | |
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n", | |
" <textPath xlink:href=\"#arrow-577214f5f2b7446ba4faef2d08af4f77-0-2\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">attr</textPath>\n", | |
" </text>\n", | |
" <path class=\"displacy-arrowhead\" d=\"M575.0,179.0 L583.0,167.0 567.0,167.0\" fill=\"currentColor\"/>\n", | |
"</g>\n", | |
"</svg>\n", | |
"</figure>\n", | |
"</body>\n", | |
"</html>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"Using the 'dep' visualizer\n", | |
"Serving on http://0.0.0.0:5000 ...\n", | |
"\n", | |
"Shutting down server on port 5000.\n" | |
] | |
} | |
], | |
"source": [ | |
"doc_dep = nlp(u\"This is a sentence.\")\n", | |
"displacy.serve(doc_dep, style=\"dep\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Reconheça e atualize named entities" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"FB 0 2 ORG\n" | |
] | |
} | |
], | |
"source": [ | |
"doc = nlp(u\"FB is hiring a new VP of global policy\")\n", | |
"doc.ents = [Span(doc, 0, 1, label=doc.vocab.strings[u\"ORG\"])] # onde 0 é a posição inícial e 1 é a posição final da string\n", | |
"for ent in doc.ents:\n", | |
" print(ent.text, ent.start_char, ent.end_char, ent.label_)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Treinando e atualizando modelos de rede neural" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"nlp = spacy.load(\"en_core_web_sm\")\n", | |
"train_data = [(u\"Uber blew through $1 million\", {\"entities\": [(0, 4, \"ORG\")]})]\n", | |
"\n", | |
"other_pipes = [pipe for pipe in nlp.pipe_names if pipe != \"ner\"]\n", | |
"\n", | |
"with nlp.disable_pipes(*other_pipes):\n", | |
" optimizer = nlp.begin_training()\n", | |
" for i in range(10):\n", | |
" random.shuffle(train_data)\n", | |
" for text, annotations in train_data:\n", | |
" nlp.update([text], [annotations], sgd=optimizer)\n", | |
" \n", | |
"# salvando o modelo no disco \n", | |
"nlp.to_disk(\"models/modelo.bin\")\n", | |
"# trazendo o modelo de volta ao disco\n", | |
"nlp = spacy.load(\"models/modelo.bin\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# utilizando o novo modelo\n", | |
"eng_testa = open(\"dados/eng.testb.txt\").read()\n", | |
"doc = nlp(eng_testa)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Serialização simples e eficiente" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from spacy.tokens import Doc\n", | |
"from spacy.vocab import Vocab\n", | |
"\n", | |
"nlp = spacy.load(\"en_core_web_sm\")\n", | |
"saida = open(\"dados/eng.testb.txt\").read()\n", | |
"doc = nlp(saida)\n", | |
"\n", | |
"doc.to_disk(\"dados/saida.bin\")\n", | |
"\n", | |
"new_doc = Doc(Vocab()).from_disk(\"dados/saida.bin\")\n", | |
"\n", | |
"new_doc.ents" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"spacy.explain(\"dobj\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Matcher Pattern" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"New York\n" | |
] | |
} | |
], | |
"source": [ | |
"# Matcher para palavras\n", | |
"\n", | |
"# Matcher is initialized with the shared vocab\n", | |
"from spacy.matcher import Matcher\n", | |
"\n", | |
"# Each dict represents one token and its attributes\n", | |
"matcher = Matcher(nlp.vocab)\n", | |
"\n", | |
"# Add with ID, optional callback and pattern(s)\n", | |
"pattern = [{\"LOWER\": \"new\"}, {\"LOWER\": \"york\"}]\n", | |
"matcher.add('CITIES', None, pattern)\n", | |
"\n", | |
"# Match by calling the matcher on a Doc object\n", | |
"doc = nlp(\"I live in New York\")\n", | |
"matches = matcher(doc)\n", | |
"\n", | |
"# Matches are (match_id, start, end) tuples\n", | |
"for match_id, start, end in matches:\n", | |
" # Get the matched span by slicing the Doc\n", | |
" span = doc[start:end]\n", | |
" print(span.text)\n", | |
"# 'New York'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Utilizando Match Pattern para criar um modelo de NER com Spacy" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 385, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"818268" | |
] | |
}, | |
"execution_count": 385, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"train = open(\"dados/eng.train.txt\").read()\n", | |
"train = train.split()\n", | |
"len(train)\n", | |
"# O spacy só aceita por vez 100000 caracteres" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 386, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#Tentativa com arquivo de treino\n", | |
"nlp = English();\n", | |
"ruler = EntityRuler(nlp)\n", | |
"patterns = []\n", | |
"i = 0\n", | |
"\n", | |
"while(i <= len(train)-4):\n", | |
" if train[i] == '-DOCSTART-':\n", | |
" i += 4\n", | |
" #print(\"Pattern\",train[i] + \" Label\",train[i+3]) \n", | |
" patterns += [{\"label\": train[i+3], \"pattern\" : train[i]}]\n", | |
"\n", | |
" i += 4\n", | |
"\n", | |
"ruler.add_patterns(patterns)\n", | |
"\n", | |
"#adicionando o pipe ruler dentro do nlp\n", | |
"nlp.add_pipe(ruler)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 387, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Enviando o ruler para o disco e salvando-o.\n", | |
"ruler.to_disk(\"models/patterns.json\")\n", | |
"nlp.to_disk(\"models/modelo.bin\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Adicionando pipe com um modelo do zero\n", | |
"nlp = English()\n", | |
"\n", | |
"ruler_disk = EntityRuler(nlp).from_disk(\"models/patterns.json\")\n", | |
"\n", | |
"nlp.add_pipe(ruler_disk)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# carregando o modelo criado a partir do disco\n", | |
"nlp = spacy.load(\"models/modelo.bin\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[('blew', 'O'), ('through', 'O'), ('$', 'O'), ('1', 'I-MISC'), ('million', 'O')]\n" | |
] | |
} | |
], | |
"source": [ | |
"doc = nlp(u\"Uber blew through $1 million\")\n", | |
"print([(ent.text, ent.label_) for ent in doc.ents])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[('Peter', 'I-PER'), ('is', 'I-MISC'), ('a', 'O'), ('good', 'O'), ('guy', 'O')]\n" | |
] | |
} | |
], | |
"source": [ | |
"doc = nlp(u\"Peter is a good guy\")\n", | |
"print([(ent.text, ent.label_) for ent in doc.ents])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Avaliando com as métricas do CoNLL" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"O CoNNL trabalha com o EM - Exact Match utilizando a precision e recall\n", | |
" - Onde precision como mostrado no início , é a porcentagem de NEs encontradas pelo sistema que estão corretas\n", | |
" - e o recall que é a porcentagem de NEs na solução que são encontradas pelo sistema.\n", | |
" - Uma NE só está correta se é uma combinação exata da entidade na solução." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- ### Antes de mais nada, um exemplo para melhor entendimento" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Digamos que tenhamos as seguintes anotações de treino corretas\n", | |
"\n", | |
"Unlike <ENAMEX TYPE=\"PERSON\">Robert</ENAMEX>, <ENAMEX TYPE=\"PERSON\">John\n", | |
"Briggs Jr</ENAMEX> contacted <ENAMEX TYPE=\"ORGANIZATION\">Wonderful\n", | |
"Stockbrockers Inc</ENAMEX> in <ENAMEX TYPE=\"LOCATION\">New York</ENAMEX> and\n", | |
"instructed them to sell all his shares in <ENAMEX\n", | |
"TYPE=\"ORGANIZATION\">Acme</ENAMEX>. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Vamos supor que um sistema produziu as seguintes saídas\n", | |
"\n", | |
"<ENAMEX TYPE=\"LOCATION\">Unlike</ENAMEX> Robert, <ENAMEX\n", | |
"TYPE=\"ORGANIZATION\">John Briggs Jr</ENAMEX> contacted Wonderful <ENAMEX\n", | |
"TYPE=\"ORGANIZATION\">Stockbrockers</ENAMEX> Inc <TIMEX TYPE=\"DATE\">in New\n", | |
"York</TIMEX> and instructed them to sell all his shares in <ENAMEX\n", | |
"TYPE=\"ORGANIZATION\">Acme</ENAMEX>. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- ### O sistema gera os seguintes erros:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": { | |
"image/png": { | |
"height": 500, | |
"width": 700 | |
} | |
}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"Image(filename=\"saidas eplicar.png\", width=700, height=500)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Para o exemplo acima, existem 5 entidades verdadeiras, 5 palpites do sistema onde somente um desses palpites combina exatamente com a solução: <Organization> Acme </Organization>\n", | |
"A precision é então 20% (1/5)\n", | |
"A recall é também 20% (1/5) " | |
] | |
}, | |
{ | |
"cell_type": "raw", | |
"metadata": {}, | |
"source": [ | |
"F-score\n", | |
"\n", | |
"F1 = 2 * (precision * recall) / (precision + recall)\n", | |
"F1 = 2 * (20 * 20) / (20 + 20)\n", | |
"F1 = 20" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Utilizando as métricas" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- Primeiro vamos definir o texto em uma string normal Python para entao manipularmos com o nosso modelo criado" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 258, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [], | |
"source": [ | |
"testa = u\"\"\"Their PRP$ I-NP O\n", | |
" stay NN I-NP O\n", | |
" on IN I-PP O\n", | |
" top NN I-NP O\n", | |
" , , O O\n", | |
" though RB I-ADVP O\n", | |
" , , O O\n", | |
" may MD I-VP O\n", | |
" be VB I-VP O\n", | |
" short-lived JJ I-ADJP O\n", | |
" as IN I-PP O\n", | |
" title NN I-NP O\n", | |
" rivals NNS I-NP O\n", | |
" Essex NNP I-NP I-ORG\n", | |
" , , O O\n", | |
" Derbyshire NNP I-NP I-ORG\n", | |
" and CC I-NP O\n", | |
" Surrey NNP I-NP I-ORG\n", | |
" all DT O O\n", | |
" closed VBD I-VP O\n", | |
" in RP I-PRT O\n", | |
" on IN I-PP O\n", | |
" victory NN I-NP O\n", | |
" while IN I-SBAR O\n", | |
" Kent NNP I-NP I-ORG\n", | |
" made VBD I-VP O\n", | |
" up RP I-PRT O\n", | |
" for IN I-PP O\n", | |
" lost VBN I-NP O\n", | |
" time NN I-NP O\n", | |
" in IN I-PP O\n", | |
" their PRP$ I-NP O\n", | |
" rain-affected JJ I-NP O\n", | |
" match NN I-NP O\n", | |
" against IN I-PP O\n", | |
" Nottinghamshire NNP I-NP I-ORG\n", | |
" . . O O\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 275, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"testa = open(\"dados/eng.testa.txt\").read()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- Após isso utilizamos a função split para realizar a separação dos tokens, onde cada palavra será separada e adicionada a uma lista Python" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 276, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"206312" | |
] | |
}, | |
"execution_count": 276, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"testa = testa.split()\n", | |
"len(testa)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- Feito isto poderemos utilizar um numpy array para facilitar a comparação a posteriori das entidades corretas do arquivo de teste, com as entidades preditas pelo nosso modelo\n", | |
"- Para isso iremos ignorar as palavras marcadas como 'O', já que elas não sao entidades" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 277, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Entidades corretas do dado de teste\n", | |
"correct_ent = [[' ',' ']]\n", | |
"correct_ent = np.array(correct_ent)\n", | |
"correct_ent = np.delete(correct_ent, np.s_[:1], axis=0)\n", | |
"\n", | |
"i = 0\n", | |
"\n", | |
"while(i <= len(testa)-4):\n", | |
" if testa[i] == '-DOCSTART-':\n", | |
" i += 4\n", | |
" \n", | |
" if testa[i+3] != 'O': \n", | |
" correct_ent = np.insert(correct_ent, len(correct_ent), [testa[i], testa[i+3]], axis=0)\n", | |
"\n", | |
" i += 4" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- Temos agora um numpy array com as entidades corretas que serão utilizadas para comparação com as predições\n", | |
"- Na célula abaixo iremos criar uma string pegando apenas a primeira posição de cada linha ou seja, a palavra. Colocando todas as palavras em uma string criaremos um \"documento\", assim chamado pelo spacy que será usado como entrada para o modelo" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 283, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# Colocando o texto em uma String para poder usar nlp do spacy\n", | |
"testa_doc = ''\n", | |
"\n", | |
"i = 0\n", | |
"\n", | |
"while(i <= len(testa)-4):\n", | |
" if testa[i] == '-DOCSTART-':\n", | |
" i += 4\n", | |
" \n", | |
" testa_doc += testa[i]\n", | |
" testa_doc += ' '\n", | |
" \n", | |
" i += 4" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 292, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'CRICKET - LEICESTERSHIRE TAKE OVER AT TOP AFTER INNINGS VICTORY . LONDON 1996-08-30 West Indian all-rounder Phil Simmons took four for 38 on Friday as Leicestershire beat Somerset by an innings and 39 runs in two days to take over at the head of the county championship . Their stay on top , though , may be short-lived as title rivals Essex , Derbyshire and Surrey all closed in on victory while Kent made up for lost time in their rain-affected match against Nottinghamshire . After bowling Somerse'" | |
] | |
}, | |
"execution_count": 292, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# 500 primeiros caracteres\n", | |
"testa_doc[:500]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 291, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"275559" | |
] | |
}, | |
"execution_count": 291, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# tamanho da String\n", | |
"len(testa_doc)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- Chamamos então a função nlp que é responsável por atribuir a um documento as palavras e suas entidades preditas pelo modelo" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 293, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"doc = nlp(testa_doc)\n", | |
"predicted_ent = [(ent.text, ent.label_) for ent in doc.ents]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 294, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"('CRICKET', 'O')" | |
] | |
}, | |
"execution_count": 294, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"doc.ents[0].text, doc.ents[0].label_" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- O modelo foi treinando utilizando as anotações 'O', como dito acima não precisaremos delas já que não são entidades.\n", | |
"- Utilizaremos um laço para remover as palavras anotadas com 'O' e adicionaremos as restantes em um numpy array do mesmo tipo das entidades corretas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui\n", | |
"aqui" | |
] | |
} | |
], | |
"source": [ | |
"# Transformando as entidades preditas no mesmo formato das entidades corretas\n", | |
"predicted_ent = [[' ',' ']]\n", | |
"predicted_ent = np.array(predicted_ent)\n", | |
"predicted_ent = np.delete(predicted_ent, np.s_[:1], axis=0)\n", | |
"\n", | |
"i = 0\n", | |
"\n", | |
"while(i <= len(doc.ents)-1):\n", | |
" \n", | |
" if doc.ents[i].label_ != 'O': \n", | |
" predicted_ent = np.insert(predicted_ent, len(predicted_ent), [doc.ents[i].text, doc.ents[i].label_], axis=0)\n", | |
" print(\"aqui\")\n", | |
" i += 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 267, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([['Essex', 'I-ORG'],\n", | |
" ['Derbyshire', 'I-ORG'],\n", | |
" ['Surrey', 'I-ORG'],\n", | |
" ['Kent', 'I-ORG'],\n", | |
" ['Nottinghamshire', 'I-ORG']], dtype='<U39')" | |
] | |
}, | |
"execution_count": 267, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"correct_ent" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 268, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([['on', 'I-MISC'],\n", | |
" [',', 'I-LOC'],\n", | |
" [',', 'I-LOC'],\n", | |
" ['Essex', 'I-LOC'],\n", | |
" [',', 'I-LOC'],\n", | |
" ['Derbyshire', 'I-ORG'],\n", | |
" ['and', 'I-LOC'],\n", | |
" ['Surrey', 'I-LOC'],\n", | |
" ['on', 'I-MISC'],\n", | |
" ['Kent', 'I-ORG'],\n", | |
" ['for', 'I-MISC'],\n", | |
" ['against', 'I-MISC'],\n", | |
" ['Nottinghamshire', 'I-ORG'],\n", | |
" ['.', 'I-LOC']], dtype='<U39')" | |
] | |
}, | |
"execution_count": 268, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"predicted_ent" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Agora iremos calcular a precision e a recall a partir das duas listas: correct_ent e predicted_ent" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 118, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAdoAAABUCAIAAACqS2HkAAAKeUlEQVR42uydO3YaPRvHh++8S4EUOVnBeAUmTaq07uwSmncH6d4GytCldZUmsAKzgpwUgb3wnZE0F0mPbthgbH6/jhldR8N/pEePpH8Oh0MFAACvzf94BAAAyDEAACDHAADIMQAAIMcAAMgxAAAgxwAAyDEAACDHAADIMQAAIMcAAMgxAAAgxwAAyDEAACDHAADIMQAAIMcAAMgxAAAgxwAAyDEAACDHAADIMQAAIMcAAMgxAAAgxwAAyDEAvCE2D6PRaPSwOVV4QI4BUooS5Ga5v8gCnkQC939/V1V1/+X2ROEBOQaIiN3PVVXdrw+K9X1zyfr16cP4lUt4+90p12G3qKvV9ARfivHs6XA4fL/1vwdyXnJ4QI4BjuwP3q+NoOjOXv1xou9NPtb9j9cu5KBc1Xj2Y1FX1fbx1xl67vvlt9UlfJUAOYZ3TtO/63p3uz/bqqq/fh73955mY61JN5aNoPvd9hrdAENDg9izHJogUt1cp1y5hg3JnuFaPrzyD2Lpa5P5tqqazrj8AIa5eE+lu+LVsKT6gBzDddot/H5gIyl31Q9tvFj93DRS0vzeLWqtkY209AG+DcRFd2vdFJVGTX8vdq1JZDu/iymSnIqotNNVa9HYLerV1FbkJt8uW1X8XuPHsyd1YTgcUKYIZSep20idZcIL3z4ldXX7Z6cL1F4ZduRLqw8FHADeBVpeOgOtj1amug4FMfrUS5d3oU/FDRTJVkjFj5NxRevvIJn1fSIJz5qe8bz8CCpcd6G4+lAAcgzvBFcoxPvpAANlEVL0wqTlyI2ifwtKaucU+DiEs1LpOndjpZPCCyWx1PiI6gNyDNepxhFhyOzFxqXGFmgjkHE1Muo7xP0kiL3wYKc6kKP4NYp8ouRbst5aalxWfUCOAVNFaec5oMZWDE9aI+nFpDbnSxL6vHRlsG+9jKnCLa1gqCirPhTBVB68C7T3QtitLTDPF05g/+tR9ocYapBx3IhM44VS8bvMVtm1e5q0QuP2u1FJe9JR52PXP7LKQwzfYp7SfnkzmW87N8Jjqg94VgBeFbJ3Q2T9meP+sF/eKQextL/ufrncJL4S5V6/m/+a3Hsl3C9vhu5k489f6wxfOp15ie9d/03aPIwm82292CUWicSrDxcjx71r42WtWjUukyXlOCIKnF+Nn9F5Nvz+u9dv7l31oxuaNy+ybvrbL/fW6o3mzuTx8e8mXq5Uvrf/LupqO/9v06U6XVWWEu7+bPv7Rq0DXe6mTMY/bvKxHtZJ9mQehDds55PRSJfA7v4WVx8ux9FNmMO4BKNTNx+SPwNxRBQ49yReFWuiHAeAPhnzdrYXAtNpmSbhKu/ViaZr3ZST6zILGLvFSb5w7eLOgJiO395UnuDQE3oLXuPfW1KAI6IAAJQxaiT5ZEaBqd7exTY/mev+DQAAbMfnRFnKOpsWAABUl+NZoaf91HzJYAbQnXlw9k8JbxnrzSIOg5qbTmw7bWfSTo5SSZvZhmZL2jStojE3CACvKscR/5tfD2YHKlH2pvb0+WoqCVoT0ktjNY0pn9qbxUp7O58kpVJvpeLO6MuF6mpnF207n3AUAwC8mhwbpyTBT2c7n6+GE2adaXnzoGXPX+fv7CalfYSSi6mcAiknT39BVjzSgxFWv0yNygoiq2vnrXFaTeOCnDjtgrN1AN4N5/WsCK26765LzgvBvVO8HLI8INzkUluzSEHCy/XbO94uLFVoW4ISdykR3O8AcHQ72u/YE6CYMEXk0tmkK0veQnKc3s/AixLZm0XqbGd8T16uXQHgrXVhz26sUBIWWusuLiY1pubuPIMBjuXWhCxckmqOytGLkfJG/tGM9FqoLMcRExQAoKr+OXkOZ3cvLj8ebTx7Onww3tCr6SjXKfoizmELf8B5uQHeFm9mC6GIKcHpa+uTZUq5/W5ZWNyDcSSOy6gUpvIAkONLQY/oC7TvOetLlCprTV793CSMDHJGx5lMAAA5vnjGHz4pdfyWXDOhdpySDkx39ifM+wBEZN0UabjJVp/VN+2T9+WFLDSm2x6BpeYAyPGZMKuqhaUZaiA/GKu3emyF1Du3BnvX/s6Dxg851r9tF3o7Ng29YbcyrfyLRgLAu5Pjajx7Wrc6G/OsaLqS/UoMK1BCHy23jXbJSazX2RXJitqtDOGYBAB4l3I8PJLGQs3v2ao5nj25Ae/XsWNkxrMnL+EmRtIEIBZJuRBjPQCAI2AlDFwBqXOmnxseaM23uAwEzkXCQ+71d5OTC3gSt73kUXnPDH/uBixuvH4jwVO1O61J7xjiPYK2Q2B3D6xbl9TNyVvl/qK5X/YJL0K37vjGUzFPWV9ak94xBHsE3Wyk7h50qwgnH+vLWFLolKtdr+57Kp4kc+WS+Fru4cGjRFPduqbxjuroOWdl05pM5cG5GM+e+ilF5xj35p6Z2+wGsUYZut/tmNYNMByaiuPe4aA1NTAOHC+fGgpLKuaOlb3yD2Lpa9oNpnOMcR/AMBfvqXRXvBqWVD+JdP611bCBbJcPQubyYdqh53ZEjWhNjBVw9ExGM5hUo7t2/Lu+V7+t630Af8tQN0V9tQ2XHhsLqdhJSMUPRbL39fMLKx6lHNlCsL3VPo1BvoPnNEyhqPoqcMLm4FR1t6jF9KTn46ctlCfx3I6p0ZW25hvYYBMuhOSp9vpdq+v4ae7BVzXwj0hmK6Tix8m44v1j1/eJJOKT7XLB/Qi2ohZWP0OOnefTZBDaX9upffBTGdE9/7mV1eiqWxM5hpeawnF2jc7oXQspemHSL7AbRdj9WdL9wMchnJU0+RUrnThZ5pfE+v8WVz9Djv0Nw/02Cs31uSHDDzJUiNIaXXVrIsfwPFPFobAXG385bQ0IH5YSVZssEQl3wwI5ivIU+UTJt+R/qPX/TVQ/eayLNNqw+2tpVYn2BkOnPkjPrbRBr601kWM4oakidRSKp8bx0wjTZjbZBpjxJQl9XroySOd+PXdw65ZWGNqmql8ox245JMuxUI1Y8SPdRzed0ga9utZEjuF0pooCuQ7OzxS9tTn/3+D0U7C8gUGo3CWL/H8D/UgTw4t+xERPylhx5PORDU/JxLwAhTW69tZ8KXB0e+9r89S53WF/zOSKJcdddb+8Uy5FaQ/P/XK5SXhFlfuJ6t32+u2dnK1Tx5+/1hneV2ZL6gJvrd5LTG8QWC92iZ1J4tXP83ErfT775c/ftRDNr1TyuZXViNbE7xhy1bjQtVVA7/28X97cVT+6wdx+eWO8MdXOpgN//+bO5PHx7yZerlS+ah/Tflfp/fJmuqqs/87uz3a467T6fwdcX5syGe9TvaF1VyfZ93UQ3qC2CdQlsLekKq5+Zrulno+qR5vv5mF0V3352kbzi9860jZXU8+trEa0Jn7HkDeJF54WyZwy7pMxI7j2QmACJtOIGC1XZrr+Jn/ynFQVNnaL00Lh2sX9vPKNjTFjRcHzGWSrA3YX5OpaA/Toc8uuEa35cow44xLg/KiTCj6t2YwVBiDHAAAXAbZjAADkGAAAkGMAAOQYAACQYwAA5BgAAJBjAADkGAAAkGMAAOQYAACQYwAA5BgAAEr5fwAAAP//Xa1wjIrPNwQAAAAASUVORK5CYII=\n", | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 118, | |
"metadata": { | |
"image/png": { | |
"height": 500, | |
"width": 500 | |
} | |
}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"Image(filename=\"precision.png\", width=500, height=500)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 269, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"TP = 0\n", | |
"FP = 0\n", | |
"FN = 0\n", | |
"\n", | |
"for i in range(0, len(correct_ent)):\n", | |
" for j in range(0, len(predicted_ent)):\n", | |
" compare = correct_ent[i] == predicted_ent[j]\n", | |
" if compare[0] and compare[1]:\n", | |
" TP += 1\n", | |
" elif compare[0] == True and compare[1] == False:\n", | |
" FP += 1\n", | |
" \n", | |
"FN = len(predicted_ent) - TP - FP" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 270, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([False, True])" | |
] | |
}, | |
"execution_count": 270, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"compare = correct_ent[0] == predicted_ent[9]\n", | |
"compare" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 271, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"True Positives: 3\n", | |
"False Positives: 2\n", | |
"False Negatives: 9\n" | |
] | |
} | |
], | |
"source": [ | |
"print(\"True Positives:\",TP)\n", | |
"print(\"False Positives:\",FP)\n", | |
"print(\"False Negatives:\",FN)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 272, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.6" | |
] | |
}, | |
"execution_count": 272, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"precision = 0\n", | |
"precision = TP / (TP + FP)\n", | |
"precision" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Cálculo do Recall" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 53, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 53, | |
"metadata": { | |
"image/png": { | |
"height": 500, | |
"width": 500 | |
} | |
}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"Image(filename=\"recall.png\", width=500, height=500)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- Somente pare relembrar, o recall para o CoNNL é a porcentagem de entidades nomeadas - NE na solução, que foram encontradas pelo sistema" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 273, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.25" | |
] | |
}, | |
"execution_count": 273, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"recall = 0\n", | |
"recall = TP / (TP + FN)\n", | |
"recall" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Por fim calcularemos o F-measure amenizando as duas métricas calculadas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 254, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAQIAAAAoCAIAAACabWPjAAAHmElEQVR42uxc/5miOheO82wB0RJgG9isJTBTAVoCdoCWEK0gWgIzFaAloBUAJQDPNjDf8/nem8mGHyLqXNS8f0kEkpzknJzzJpyXz4dGFEW+73ue5ziO7/sP2UfOeRRFbe4MgoAx5jhOHMdZlnHOP58Dvu9TSgkhdYXksXVAHWnHcRhjD9bHLMsopW3UII7jIAggFihDlmWfTwPf98ujLwsfWQ08z9O0ghAihPg0eD4wxsrugCx8IY+LIAjm87m8/P37NyFkt9sRg+fD4XBwHKeu8JHVwLKsoii0wnKJwcNju90SQsbjcV3hjwfu/H6/Vy+TJCGElE1CA2azGWNsNpu1F3eaprvd7v2ILMvSNB2Px5PJhBCCEvy7Xq9hjXzft21btvDj44NSCl1Vl7I8zxeLhWVZiOpmsxlujqJosVhgoVNv3mw2uBM3qzIJgmA0GuEyy7LlcqnVQikdjUbaX9cV1HVFcbJrh8OBMTYcDrWl4Kuw7C2VbaqElCxQ6YSFYag55b2iC9rEhfIe3/cRVqqFzaQNZAg2BoWUUvzGv5RS13WzI+C54TYhhOu6lYFNGIaMMTQgjmPExHBqwYOVB1G2mRAiWwKmSEbMWqSk1lL35usK6lqiONm1Sp5QLayeyq7rwnDWTXTLsspqEMex7/uEELUP/WGNCCEtiUXP8yzLiuNYCBFFURiGlNKTsXUcx3g/pVSO6Ofnp2VZQogwDEFTEkLCMJSPyOapKipnuZzKsuUQchiGKLEsS+M9gyBQB05WgckhL/8/9splmXEKw7DO0l0oqOuK4mTXcCkrqiys7idmeUN/oAnyUgjBGPM8D7Oth2rAGCsLolltXNd1HMd1Xc/zVJmeVAasyKq45Uytm1vOEUIIzjl0RjVJdTxvuS6p8IwxzrnabO09mLINtQghTqrBJYK6oiiauwaBVJrFrzGqXOng8zR0KcuyurWih2rguq5qnlsCuu04zln6I4RQDQRmqnyD7/uVcmswOqoWld28yrfBQ8Agfhm8v9+jPUsp1WqBpb+doK4oiuaucc7LyqMVVjBFUBRKqYxX6niYuwiUl8vldDpFkFqOm+sCOPTd8zzf9z8+PmzbbvMgCFk1Ckec9/r6WvmvjPnq5ImwvhywAZvNxvM87f4kSSaTyX6/h2VdrVaV71FbkiRJURRaLUEQwDf+HkF1FkVz1+oq0gpfKttHCJlOpw19yLLsLtTg/f2dUip1AKPb5sEwDGezWXHEer2Gh9BhdDebDee8mb3WGAxJOiVJAt5DE/V+v0+SZLvdFkWBroFsIYS8vb1h3gOu6+JZ7T15noMnwbOYypJjgdxgQW4nqGuJ4s+fPw1dU0dESqlcWKsGWhPzPFdJK9u2NTvUT8KUc14UxfJfzOdzdbDrMJlMMDPSNIWL+Pr6qpGSlYBZTdNULkQqjQiiuvI9nuep+3ogXm3bHg6HjuMcDge1U7vdzrbtNE3lQEqbhZBAXYtwORwOLcuSDUPheDzebrcgzl3XhReAXnDO2+wzdhbUFUXx69ev5q5BSfb7vbZvoBYO4FqpGAwGsPeqamJE5eLegMFg4LouzMl/i+FwWN4sC4JAXRxOKpJlWZUmqhLr9Xq1WgkhDoeDxmrjXzDllc9CRcGUO44jpwhocpg39a88z6fTKey9HBdJt+Nmz/Nk45MkWa1WjLGiKOZHYCqghbIWzKfFYtG+190EdUVRNHdtuVzCeVHHQi+sjN+12AiORMvQp59M0bfF4v3cMzFoxo9KjwjLENYEGNS6KM1A83exc2JwX6hWA9VzwOJ7u4B4uVyq7mxLRFHUTGR9P5Ikkf66wX2rAUIQdSwRmtxODebzeYdo+yzP9XsoKSjz4og2YaJBj1AODNQdOLmT0v4TjZ7EBmZkDdrj5SRVimndN+trYHArp6jsEQFteNIernJmdA26qAFWA22X4dZ4mBDZ4BFiA7mJeLlTfm5skJ2PRyKtTWqJbp+ONGSa6L5vUBcYfAOeOfDI85xz3kbsOEO2P2I6nY5Go5bnox4POOykHfeoLDzPKcrzHDK9kBjFCUHzve9ZJgBCOwn7CBzFaXmK84HR5tzoGeCca59WgjM9a8ENwxBZgNQ3YPk2G/X3hXsZspMflJ2FgWFUDDRWEMc/+4ztdvv29qad/qws7MIUPTkuzCtx16klLkTvMk10ZooMLskrcaepJa7lFPUt08TZLpaZ/VLWnfNK3G9qiauoQQ8zTRg1uFQZOuSVuOvUEtcKkXuVacKowUXollfiflNLeJ7H/gYoPhUtFaNXmSaMGlwE7fMxsMmqy145bJo/ULb3dYu15oDhfnUFgCmtfI/amMpaKKUdfOXOq0GlcDpIprmntwgMjBqcmJcIDFQrVXfqoTyiqsesZXyKoiiOY9WLkLVYlqXqoRACl9p71OmFZ7W2BUFQPjB/UzWoE865ksH5zuaeqpdaLd0S978YnlSlO7vllbjf1BJX5Jp7lWniXJjtsy9cklfi3lNLXLh91rtME0YNOmMymYxGIzWpk9lFfhIYp0jfhnxyIZQPmD0DzGrwFRj8/PkzjmPzNc8TwqwG/xyJQZC6WCzMGeYnxP8CAAD//0+JnjaTz+5jAAAAAElFTkSuQmCC\n", | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 254, | |
"metadata": { | |
"image/png": { | |
"height": 400, | |
"width": 400 | |
} | |
}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"Image(filename=\"fscore.png\", width=400, height=400)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 274, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.35294117647058826" | |
] | |
}, | |
"execution_count": 274, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"F1_score = 2 * (precision * recall) / (precision + recall)\n", | |
"F1_score" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Training model with train method" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# encontrar uma maneira de treinar para cada entidade\n", | |
"nlp = spacy.load(\"en_core_web_sm\")\n", | |
"train_data = [(u\"Uber blew through $1 million\", {\"entities\": [(0, 4, \"ORG\")]})]\n", | |
"\n", | |
"other_pipes = [pipe for pipe in nlp.pipe_names if pipe != \"ner\"]\n", | |
"\n", | |
"with nlp.disable_pipes(*other_pipes):\n", | |
" optimizer = nlp.begin_training()\n", | |
" for i in range(10):\n", | |
" random.shuffle(train_data)\n", | |
" for text, annotations in train_data:\n", | |
" nlp.update([text], [annotations], sgd=optimizer)\n", | |
" \n", | |
"# salvando o modelo no disco \n", | |
"nlp.to_disk(\"models/modelo.bin\")\n", | |
"# trazendo o modelo de volta ao disco\n", | |
"nlp = spacy.load(\"models/modelo.bin\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# usando um modelo já pronto do Spacy\n", | |
"nlp = spacy.load(\"en_core_web_sm\")\n", | |
"# sobrescrevendo as entidades atuais com as novas entidades dos dados de treino a partir do patterns.json\n", | |
"ruler_disk = EntityRuler(nlp, overwrite_ents=True).from_disk(\"models/patterns.json\")\n", | |
"\n", | |
"nlp.add_pipe(ruler_disk)\n", | |
"\n", | |
"doc = nlp(u\"MyCorp Inc. is a company in the U.S., Microsoft too\")\n", | |
"print([(ent.text, ent.label_) for ent in doc.ents])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"nlp.to_disk(\"models/modelo.bin\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"-" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment