Created
March 2, 2017 01:12
-
-
Save ricalanis/eb43f5ec01cf0aedf1d96581540e78da to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"comentarios = \"\"\"\n", | |
"<div class=\"grupo_comentarios\" id=\"orden1\"><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">1<!----></div><div>Más vale que México ya vaya desarrollando su programa nuclear, para atacar a los gringos antes que ellos a nosotros, porque este orate tiene planeado atacarnos antes que a otros países, ya ven que nos odia con odio jarocho..<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">El Pirrurris<!----></div><div class=\"localidad\">Monterrey<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309850=escape('Más vale que México ya vaya desarrollando su programa nuclear, para atacar a los gringos antes que ellos a nosotros, porque este orate tiene planeado atacarnos antes que a otros países, ya ven que nos odia con odio jarocho.');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309850=textocomentario3309850;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309850',textocomentario3309850);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309850)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">2<!----></div><div>Si se siente senil e inseguro, que mejor se compre un convertible rojo o una Harley..<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Guantanamo<!----></div><div class=\"localidad\">Monterrey<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309714=escape('Si se siente senil e inseguro, que mejor se compre un convertible rojo o una Harley.');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309714=textocomentario3309714;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309714',textocomentario3309714);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309714)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">3<!----></div><div>Se avecina la tercera guerra civil..<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Don Vittorio Antonio Andolini<!----></div><div class=\"localidad\">Monterrey<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309709=escape('Se avecina la tercera guerra civil.');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309709=textocomentario3309709;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309709',textocomentario3309709);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309709)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">4<!----></div><div>que haga otro memo y ya esta.<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Regio<!----></div><div class=\"localidad\">Mty<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309569=escape('que haga otro memo y ya esta');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309569=textocomentario3309569;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309569',textocomentario3309569);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309569)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">5<!----></div><div>y eso lo va pagar North Korea e Ira.<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Regio<!----></div><div class=\"localidad\">Mty<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309567=escape('y eso lo va pagar North Korea e Ira');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309567=textocomentario3309567;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309567',textocomentario3309567);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309567)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">6<!----></div><div>Contra China e Israel.<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Alex001<!----></div><div class=\"localidad\">Monterrey, N. L.<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309563=escape('Contra China e Israel');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309563=textocomentario3309563;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309563',textocomentario3309563);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309563)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">7<!----></div><div>President Donald \"el patriota\" Trump está preparandose para la guerra con China e Israel. \n", | |
"\n", | |
"Ramón Ayala ya está terminando el corrido: \n", | |
"\n", | |
"\n", | |
" \"EL GÜERO INVENCIBLE\".<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Alex001<!----></div><div class=\"localidad\">Monterrey, N. L.<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309562=escape('President Donald \"el patriota\" Trump está preparandose para la guerra con China e Israel. Ramón Ayala ya está terminando el corrido: \"EL GÜERO INVENCIBLE\"');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309562=textocomentario3309562;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309562',textocomentario3309562);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309562)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">8<!----></div><div>¿Cómo? o sea el si puede fabricar bombas atómicas y los demás paises no. No pues me encanto. ¿Así o más claro? este tipo se quiere apoderar del mundo..<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Elver Galinda<!----></div><div class=\"localidad\">Monterrey<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309558=escape('¿Cómo? o sea el si puede fabricar bombas atómicas y los demás paises no. No pues me encanto. ¿Así o más claro? este tipo se quiere apoderar del mundo.');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309558=textocomentario3309558;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309558',textocomentario3309558);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309558)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">9<!----></div><div>Va de mal en pero ese loco , si no controla esa lengua le van a venir aventando unos bombonazos esos ojos razgados jajaj.<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Fox<!----></div><div class=\"localidad\">Cuerna<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309556=escape('Va de mal en pero ese loco , si no controla esa lengua le van a venir aventando unos bombonazos esos ojos razgados jajaj');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309556=textocomentario3309556;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309556',textocomentario3309556);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309556)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">10<!----></div><div>lo del muro, lo de las deportaciones, lo de los impuestos al NAFTA, lo de los inmigrantes... eso no importa. ESTO si es peligroso.<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">ElmerHomero<!----></div><div class=\"localidad\">mty<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309546=escape('lo del muro, lo de las deportaciones, lo de los impuestos al NAFTA, lo de los inmigrantes... eso no importa. ESTO si es peligroso');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309546=textocomentario3309546;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309546',textocomentario3309546);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309546)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">11<!----></div><div>Seria muy sano que ha este señor Trump, lo atendiera un Psiquiatra, necesita mucha ayuda.<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">Tomas Moro<!----></div><div class=\"localidad\">Monterrey<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309531=escape('Seria muy sano que ha este señor Trump, lo atendiera un Psiquiatra, necesita mucha ayuda');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309531=textocomentario3309531;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309531',textocomentario3309531);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309531)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script></div></div><div class=\"comentario\"><div class=\"texto\"><div class=\"n_comment\">12<!----></div><div>UN LOCO FANFARRON E INESTABLE ATRAS DE UN BOTON ROJO....<br><!----></div><div class=\"datos_comentario\"><div class=\"participante\">JW<!----></div><div class=\"localidad\">MTY<!----></div></div><div class=\"acciones\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309525=escape('UN LOCO FANFARRON E INESTABLE ATRAS DE UN BOTON ROJO...');\n", | |
"\t\t\t\t\t\t\t\t\t\t textocomentario3309525=textocomentario3309525;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script><div class=\"btn\" onclick=\"lanzaVentana('1223617','3309525',textocomentario3309525);\"><img src=\"../../libre/imgdiseno/art/ico_reportar.png\"> Denunciar comentarios\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"btn\" id=\"btncompartir\" onclick=\"compartircomentario(textocomentario3309525)\"><img src=\"../../libre/imgdiseno/art/ico_compartir.png\"> Compartir\n", | |
"\t\t\t\t\t\t\t\t\t </div><div class=\"clear\"> </div></div><div class=\"clear\"> </div></div><div class=\"clear\"><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t contadorAux++;\n", | |
"\t\t\t\t\t\t\t\t\t --></script><script type=\"text/javascript\"><!--\n", | |
"\t\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t idComentarioOpinionAux = 3309525 ;\n", | |
"\t\t\t\t\t\t\t\t\t\t \n", | |
"\t\t\t\t\t\t\t\t\t\t --></script></div></div></div>\n", | |
"\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from bs4 import BeautifulSoup" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/usr/local/lib/python3.4/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system (\"lxml\"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.\n", | |
"\n", | |
"To get rid of this warning, change this:\n", | |
"\n", | |
" BeautifulSoup([your markup])\n", | |
"\n", | |
"to this:\n", | |
"\n", | |
" BeautifulSoup([your markup], \"lxml\")\n", | |
"\n", | |
" markup_type=markup_type))\n" | |
] | |
} | |
], | |
"source": [ | |
"content = BeautifulSoup(comentarios)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"comment_sections = content.findAll(\"div\",{\"class\":\"comentario\"})" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"text_sections = []\n", | |
"for comment in comment_sections:\n", | |
" text_sections.append(comment.find(\"div\",{\"class\":\"texto\"}))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"comentarios_text = []\n", | |
"for section in text_sections:\n", | |
" comentarios_text.append(section.findAll(\"div\")[1].text)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['Más vale que México ya vaya desarrollando su programa nuclear, para atacar a los gringos antes que ellos a nosotros, porque este orate tiene planeado atacarnos antes que a otros países, ya ven que nos odia con odio jarocho..',\n", | |
" 'Si se siente senil e inseguro, que mejor se compre un convertible rojo o una Harley..',\n", | |
" 'Se avecina la tercera guerra civil..',\n", | |
" 'que haga otro memo y ya esta.',\n", | |
" 'y eso lo va pagar North Korea e Ira.',\n", | |
" 'Contra China e Israel.',\n", | |
" 'President Donald \"el patriota\" Trump está preparandose para la guerra con China e Israel. \\n\\nRamón Ayala ya está terminando el corrido: \\n\\n\\n \"EL GÜERO INVENCIBLE\".',\n", | |
" '¿Cómo? o sea el si puede fabricar bombas atómicas y los demás paises no. No pues me encanto. ¿Así o más claro? este tipo se quiere apoderar del mundo..',\n", | |
" 'Va de mal en pero ese loco , si no controla esa lengua le van a venir aventando unos bombonazos esos ojos razgados jajaj.',\n", | |
" 'lo del muro, lo de las deportaciones, lo de los impuestos al NAFTA, lo de los inmigrantes... eso no importa. ESTO si es peligroso.',\n", | |
" 'Seria muy sano que ha este señor Trump, lo atendiera un Psiquiatra, necesita mucha ayuda.',\n", | |
" 'UN LOCO FANFARRON E INESTABLE ATRAS DE UN BOTON ROJO....']" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"comentarios_text" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import requests" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def sentiment_edgar(text):\n", | |
" direccion = \"https://damp-spire-78293.herokuapp.com/sentiment?text=\"+text\n", | |
" print(direccion)\n", | |
" content = requests.get(direccion)\n", | |
" content_json = content.json()\n", | |
" sentiment = content_json[\"value\"]\n", | |
" return sentiment" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=el edgar es bien cool\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"1.0" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sentiment_edgar(\"el edgar es bien cool\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Más vale que México ya vaya desarrollando su programa nuclear, para atacar a los gringos antes que ellos a nosotros, porque este orate tiene planeado atacarnos antes que a otros países, ya ven que nos odia con odio jarocho..\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=mas vale que mexico ya vaya desarrollando su programa nuclear para atacar a los gringos antes que ellos a nosotros porque este orate tiene planeado atacarnos antes que a otros paises ya ven que nos odia con odio jarocho..\n", | |
"Si se siente senil e inseguro, que mejor se compre un convertible rojo o una Harley..\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=si se siente senil e inseguro que mejor se compre un convertible rojo o una harley..\n", | |
"Se avecina la tercera guerra civil..\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=se avecina la tercera guerra civil..\n", | |
"que haga otro memo y ya esta.\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=que haga otro memo y ya esta.\n", | |
"y eso lo va pagar North Korea e Ira.\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=y eso lo va pagar north korea e ira.\n", | |
"Contra China e Israel.\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=contra china e israel.\n", | |
"President Donald \"el patriota\" Trump está preparandose para la guerra con China e Israel. \n", | |
"\n", | |
"Ramón Ayala ya está terminando el corrido: \n", | |
"\n", | |
"\n", | |
" \"EL GÜERO INVENCIBLE\".\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=president donald el patriota trump esta preparandose para la guerra con china e israel. \n", | |
"\n", | |
"ramon ayala ya esta terminando el corrido \n", | |
"\n", | |
"\n", | |
" el gero invencible.\n", | |
"¿Cómo? o sea el si puede fabricar bombas atómicas y los demás paises no. No pues me encanto. ¿Así o más claro? este tipo se quiere apoderar del mundo..\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=como o sea el si puede fabricar bombas atomicas y los demas paises no. no pues me encanto. asi o mas claro este tipo se quiere apoderar del mundo..\n", | |
"Va de mal en pero ese loco , si no controla esa lengua le van a venir aventando unos bombonazos esos ojos razgados jajaj.\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=va de mal en pero ese loco si no controla esa lengua le van a venir aventando unos bombonazos esos ojos razgados jajaj.\n", | |
"lo del muro, lo de las deportaciones, lo de los impuestos al NAFTA, lo de los inmigrantes... eso no importa. ESTO si es peligroso.\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=lo del muro lo de las deportaciones lo de los impuestos al nafta lo de los inmigrantes... eso no importa. esto si es peligroso.\n", | |
"Seria muy sano que ha este señor Trump, lo atendiera un Psiquiatra, necesita mucha ayuda.\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=seria muy sano que ha este seor trump lo atendiera un psiquiatra necesita mucha ayuda.\n", | |
"UN LOCO FANFARRON E INESTABLE ATRAS DE UN BOTON ROJO....\n", | |
"https://damp-spire-78293.herokuapp.com/sentiment?text=un loco fanfarron e inestable atras de un boton rojo....\n" | |
] | |
} | |
], | |
"source": [ | |
"from time import sleep\n", | |
"sentiments = []\n", | |
"for comment in comentarios_text:\n", | |
" print(comment)\n", | |
" count = 0\n", | |
" sentiment_value = 0\n", | |
" while count < 5:\n", | |
" try:\n", | |
" sentiment_value = sentiment_edgar(clean_string(comment))\n", | |
" count = 6\n", | |
" except:\n", | |
" count = count +1\n", | |
" sleep(1)\n", | |
" \n", | |
" \n", | |
" sentiments.append(sentiment_value)\n", | |
" sleep(1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import re" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def clean_string(my_str):\n", | |
" my_str = my_str.lower()\n", | |
" my_str = replace_accents(my_str)\n", | |
" return re.sub('[^a-zA-Z0-9 \\n\\.]', '', my_str)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'algun familiar.'" | |
] | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"clean_string(\"¿algún familiar?.\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def replace_accents(string):\n", | |
" replacement = {\"á\":\"a\",\"é\":\"e\", \"í\":\"i\", \"ó\":\"o\", \"ú\":\"u\"}\n", | |
" for key in replacement.keys():\n", | |
" string = string.replace(key,replacement[key])\n", | |
" return string" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.43818521480072459" | |
] | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"np.average(sentiments)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%matplotlib inline\n", | |
"sentiment_df = pd.DataFrame(sentiments)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x10b76c048>]], dtype=object)" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
}, | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAEKCAYAAAAGvn7fAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE9ZJREFUeJzt3XGM5HV5x/H3wx1areKWEEkLR9YKGDHWQ1tE1LLENoFr\niklrUmmtWWwjtaE19I/aGlP6h23/vVIVqVE5awI10iLiqTGWtdoqjcodqNBy1UsOaY8KgsBpc+DT\nP3aG3Q5zO7P7m5nfc795v5IN+5353c5z33n2ud9+Zn5LZCaSpO44oe0CJEmT5WCXpI5xsEtSxzjY\nJaljHOyS1DEOdknqGAe7JHWMg11zKyJOjoh/jIjHIuJgRFzWdk3SJGxvuwCpRe8FfgQ8HzgX+FRE\n7M/Mb7VbltRMeOWp5lFE/CTwEPCSzDzQu20PcH9m/mmrxUkNGcVoXp0NPNEf6j37gZe0VI80MQ52\nzavnAD8YuO1R4Lkt1CJNlINd8+ox4KSB257H6nCXjmsOds2r/wC2R8SZ6257GfCNluqRJsYXTzW3\nIuIGIIHfBV4O3Aq8KjPvbrUwqSHP2DXPfh94FvAA8FHg9xzq6oINz9gj4ieALwDPBJ4BfGLYW8Ei\n4hrgEuAIsJyZd0ynXEnSKBteoJSZP4qIizLzSERsB74UEa/JzC/1j4mIXcCZmXlWRLwSuBY4f7pl\nS5KOZWQUk5lHep8+A9jG6kUd610K7OkdezuwEBGnTrJISdL4Rg72iDghIvYBh4HbhlxufRpwaN36\nPuD0yZUoSdqMcc7Yf5yZO1kd1r8YEUtDDovBPzaB2iRJWzD2LwHLzEci4lPAzwMr6+76LrBj3fr0\n3m3/T0Q47CVpCzJz8OR5QxuesUfEKRGx0Pv8WcAvA4PveLkFeHPvmPOBhzPz8DGK8yOTq6++uvUa\nqnxU2Itedxb48HukUl9U+diKUWfsPw3siYgTWP1H4O8y8/MRcUWvCa/LzL0RsSsiDgCPA5dvqZI5\ncvDgwbZLKMO90DD2RTOj3u54F6tX5A3eft3A+soJ1yVJ2iKvPG3B8vJy2yWU4V5oGPuimZn9rpiI\nyFk9lrQZEUGNN3LFljNVdVdEkJN88VTTsbKy0nYJZbgXGsa+aMbBLkkdYxSjuWcUo8qMYiRJDvY2\nmB+ucS80jH3RjINdkjrGjF1zz4xdlZmxS5Ic7G0wP1zjXmgY+6IZB7skdYwZu+aeGbsqM2OXJDnY\n22B+uMa90DD2RTMOdknqGDN2zT0zdlVmxi5JcrC3wfxwjXuhYeyLZhzsktQxZuyae2bsqsyMXZLk\nYG+D+eEa90LD2BfNONglqWPM2DX3zNhVmRm7JMnB3gbzwzXuhYaxL5pxsEtSx2yYsUfEDuAjwPNZ\nDSH/NjOvGThmCfgE8O3eTTdl5ruHfC0zdpVkxq7KtpKxbx9x/1HgqszcFxHPAb4WEZ/LzLsHjvtC\nZl66mQeWJE3HhlFMZv53Zu7rff4YcDfwM0MO3dS/JvPO/HCNe6Fh7Itmxs7YI2IROBe4feCuBC6I\niP0RsTcizplceZKkzRrrfey9GGYFeHdm3jxw33OBJzPzSERcAvx1Zp495GuYsaskM3ZVNo2MnYg4\nEbgJ+OjgUAfIzEfXff7piHhfRJycmQ8NHru8vMzi4iIACwsL7Ny5k6WlJWDtRy/XrttYr563ALS9\nZqx6XXd3vbKywvXXXw/w1LzcrFHviglgD/BgZl51jGNOBR7IzIyI84CPZebTqvGMfc3Kysq6gTLf\nKuyFZ+z1VOiLKqZxxv5q4E3AnRFxR++2dwJnAGTmdcAbgLdFxBPAEeCNm6pakjRR/q4YzT3P2FWZ\nvytGkuRgb0P/hRK5FxrOvmjGwS5JHWPGrrlnxq7KzNglSQ72NpgfrnEvNIx90YyDXZI6xoxdc8+M\nXZWZsUuSHOxtMD9c415oGPuiGQe7JHWMGbvmnhm7KjNjlyQ52NtgfrjGvdAw9kUzDnZJ6hgzds09\nM3ZVZsYuSXKwt8H8cI17oWHsi2Yc7JLUMWbsmntm7KrMjF2S5GBvg/nhGvdCw9gXzTjYJaljzNg1\n98zYVZkZuyTJwd4G88M17oWGsS+acbBLUseYsWvumbGrMjN2SdLGgz0idkTEbRHxzYj4RkT84TGO\nuyYi7o2I/RFx7nRK7Q7zwzXuhYaxL5rZPuL+o8BVmbkvIp4DfC0iPpeZd/cPiIhdwJmZeVZEvBK4\nFjh/eiVLkjayqYw9Im4G/iYzP7/utvcDt2Xm3/fW9wAXZubhgT9rxq6SzNhV2VQz9ohYBM4Fbh+4\n6zTg0Lr1fcDpmylCkjQ5o6IYAHoxzMeBt2fmY8MOGVgPPe1YXl5mcXERgIWFBXbu3MnS0hKwlqlN\nc33RRRcd6684l2677baZ7v+wdf+2th6/v4Z+PW2vGaverq9379498/kwuD6e58XIKCYiTgRuBT6d\nmbuH3P9+YCUzb+yty0YxlX7kbr+OGj/2r6ysrBuu7ajUFxWekwrsi/U2H8VsONhj9W+2B3gwM686\nxjG7gCszc1dEnA/szsynvXjqYF+vQh0Okb5KfeFzUkexvtjUYB8VxbwaeBNwZ0Tc0bvtncAZAJl5\nXWbujYhdEXEAeBy4fJNVS5ImaK6uPK30L3D7ddQ4O/RH7vVqPCcV2BfreeWpJM09z9hbUaEOzw77\nKvWFz0kdxfrCM3ZJmmcOdrXK3wmiYeyLZhzsktQxZuytqFCHeW5fpb7wOamjWF+YsUvSPHOwq1Vm\nqRrGvmjGwS5JHWPG3ooKdZjn9lXqC5+TOor1hRm7JM0zB7taZZaqYeyLZhzsktQxZuytqFCHeW5f\npb7wOamjWF+YsUvSPHOwq1VmqRrGvmjGwS5JHWPG3ooKdZjn9lXqC5+TOor1hRm7JM0zB7taZZaq\nYeyLZhzsktQxZuytqFCHeW5fpb7wOamjWF+YsUvSPHOwq1VmqRrGvmjGwS5JHWPG3ooKdZjn9lXq\nC5+TOor1hRm7JM0zB7taZZaqYeyLZkYO9oj4UEQcjoi7jnH/UkQ8EhF39D7eNfkyJUnjGpmxR8Rr\ngceAj2TmS4fcvwT8UWZeOuLrmLE/pUId5rl9lfrC56SOYn0x2Yw9M78IfH/kI0uSSphExp7ABRGx\nPyL2RsQ5E/iamhNmqRrGvmhm+wS+xteBHZl5JCIuAW4Gzh524PLyMouLiwAsLCywc+dOlpaWgLUn\nctrrNf310pyuV/dk1vt/rOejrcfvr9t/Pvprxqq36+t9+/aVqGdNf700g/UKcH1vvchWjPU+9ohY\nBD45LGMfcux3gFdk5kMDt5uxP6VCHea5fZX6wuekjmJ9Mdv3sUfEqbG6A0TEeaz+Y/HQiD8mSZqS\ncd7ueAPwr8CLIuJQRLwlIq6IiCt6h7wBuCsi9gG7gTdOr1x1jVmqhrEvmhmZsWfmZSPufy/w3olV\nJElqxN8V04oKdZjn9lXqC5+TOor1hb8rRpLmmYNdrTJL1TD2RTMOdknqGDP2VlSowzy3r1Jf+JzU\nUawvzNglaZ452NUqs1QNY18042CXpI4xY29FhTrMc/sq9YXPSR3F+sKMXZLmmYNdrTJL1TD2RTMO\ndknqGDP2VlSowzy3r1Jf+JzUUawvzNglaZ452NUqs1QNY18042CXpI4xY29FhTrMc/sq9YXPSR3F\n+sKMXZLmmYNdrTJL1TD2RTMOdknqGDP2VlSowzy3r1Jf+JzUUawvzNglaZ452NUqs1QNY18042CX\npI4xY29FhTrMc/sq9YXPSR3F+sKMXZLmmYNdrTJL1TD2RTMOdknqmJEZe0R8CPgV4IHMfOkxjrkG\nuAQ4Aixn5h1DjjFjf0qFOsxz+yr1hc9JHcX6YuIZ+4eBi4/5kBG7gDMz8yzgrcC1mylAkjRZIwd7\nZn4R+P4Gh1wK7OkdezuwEBGnTqY8dZ1ZqoaxL5rZPoGvcRpwaN36PuB04PDggffff/8EHk6StJFJ\nDHZYDY3XGxpMnXHGC4lYfciI4IQTTmTbtmcC8OST/wswtfXRo48PVLPS++/SnK77GaJWrfT+u9Ty\nurfqnbEuLS3N5bp/W9v1rOmvl2awXgGu760X2YqxLlCKiEXgk8NePI2I9wMrmXljb30PcGFmHh44\nLtt9IeLrwCuo8mJI+3VUqAFq1FGhBvDF01q6/uLpKLcAbwaIiPOBhweHuiRthhl7MyOjmIi4AbgQ\nOCUiDgFXAycCZOZ1mbk3InZFxAHgceDyaRYsSdrYTH9XjFFMX4Uf8SrUADXqqFADGMXUMu9RjCSp\nEAe7pHLM2JtxsEtSx5ixt6JCdlehBqhRR4UawIy9FjN2SVIZDnZJ5ZixN+Ngl6SOMWNvRYXsrkIN\nUKOOCjWAGXstZuySpDIc7JLKMWNvxsEuSR1jxt6KCtldhRqgRh0VagAz9lrM2CVJZTjYJZVjxt6M\ng12SOsaMvRUVsrsKNUCNOirUAGbstZixS5LKcLBLKseMvRkHuyR1jBl7KypkdxVqgBp1VKgBzNhr\nMWOXJJXhYJdUjhl7Mw52SeoYM/ZWVMjuKtQANeqoUAOYsddixi5JKsPBLqkcM/ZmHOyS1DFm7K2o\nkN1VqAFq1FGhBjBjr6XTGXtEXBwR90TEvRHxjiH3L0XEIxFxR+/jXZspQJI0WRsO9ojYBrwHuBg4\nB7gsIl485NAvZOa5vY93T6FOSXPEjL2ZUWfs5wEHMvNgZh4FbgReP+S4Tf2YIEmanlGD/TTg0Lr1\nfb3b1kvggojYHxF7I+KcSRYoaf4sLS21XcJxbfuI+8d55eDrwI7MPBIRlwA3A2c3rkyStCWjBvt3\ngR3r1jtYPWt/SmY+uu7zT0fE+yLi5Mx86OlfbhlY7H2+AOwElnrrld5/p7X+6kAt03686uv+bW3X\nw4j7523dW/Uy5v6Z67ytd+/ezc6dO1uvZ01/vTSD9QpwfW+9yFZs+HbHiNgO/DvwOuB+4N+AyzLz\n7nXHnAo8kJkZEecBH8vMp1Xj2x3Xq/A2qgo1QI06KtQAvt1xzcrKSutxzPH8dscNz9gz84mIuBL4\nLLAN+GBm3h0RV/Tuvw54A/C2iHgCOAK8cUu1S1JP20P9eOcFSq2ocCZQoQaoUUeFGsAz9lqO5zN2\nf6WApHJ8H3szDnZJ6hijmFZU+BGvQg1Qo44KNYBRTC1GMZKkMhzsksoxY2/GwS5JHWPG3ooK2V2F\nGqBGHRVqADP2WszYJUllONgllWPG3oyDXZI6xoy9FRWyuwo1QI06KtQAZuy1mLFLkspwsEsqx4y9\nGQe7JHWMGXsrKmR3FWqAGnVUqAHM2GsxY5ckleFgl1SOGXszDnZJ6hgz9lZUyO4q1AA16qhQA5ix\n12LGLkkqw8EuqRwz9mYc7JLUMWbsraiQ3VWoAWrUUaEGMGOvxYxdklSGg11SOWbszTjYJaljzNhb\nUSG7q1AD1KijQg1gxl6LGbskqYyRgz0iLo6IeyLi3oh4xzGOuaZ3//6IOHfyZUqaJ2bszWw42CNi\nG/Ae4GLgHOCyiHjxwDG7gDMz8yzgrcC1U6pV0pzYt29f2yUc10adsZ8HHMjMg5l5FLgReP3AMZcC\newAy83ZgISJOnXilkubGww8/3HYJx7VRg/004NC69X2920Ydc3rz0iRJW7F9xP3jviQ8+Irt0D93\n0km/OuaXm7wnn3yYxx9v7eElbcLBgwfbLuG4NmqwfxfYsW69g9Uz8o2OOb1329P84Ae3bra+KdjU\nu4amqEIdFWqAGnVUqKH/FjsB7Nmzp+0SqNIXmzVqsH8VOCsiFoH7gd8ALhs45hbgSuDGiDgfeDgz\nDw9+oc2+D1OStDUbDvbMfCIirgQ+C2wDPpiZd0fEFb37r8vMvRGxKyIOAI8Dl0+9aknSMc3sylNJ\n0mxM/MpTL2haM2ovIuK3entwZ0T8S0T8XBt1Tts4PdE77hci4omI+LVZ1jdLY35/LEXEHRHxjYhY\nmXGJMzPG98cpEfGZiNjX24vlFsqciYj4UEQcjoi7Njhm/LmZmRP7YDWuOQAsAicC+4AXDxyzC9jb\n+/yVwFcmWUOVjzH34lXA83qfX9zFvRhnH9Yd90/ArcCvt113iz2xAHwTOL23PqXtulvciz8H/qq/\nD8CDwPa2a5/SfrwWOBe46xj3b2puTvqM3Qua1ozci8z8cmY+0lveTjff/z9OTwD8AfBx4H9mWdyM\njbMXvwnclJn3AWTm92Zc46yMsxf/BZzU+/wk4MHMfGKGNc5MZn4R+P4Gh2xqbk56sHtB05px9mK9\n3wH2TrWidozch4g4jdVv6v6vo+jqCz/j9MRZwMkRcVtEfDUifntm1c3WOHvxAeAlEXE/sB94+4xq\nq2hTc3PU2x03a6IXNB3nxv47RcRFwFuAV0+vnNaMsw+7gT/JzIzVN3J39a2x4+zFicDLgdcBzwa+\nHBFfycx7p1rZ7I2zF+8E9mXmUkS8EPhcRLwsMx+dcm1VjT03Jz3YJ3pB03FunL2g94LpB4CLM3Oj\nH8WOV+PswytYvQ4CVrPUSyLiaGbeMpsSZ2acvTgEfC8zfwj8MCL+GXgZ0LXBPs5eXAD8BUBm/mdE\nfAd4EavX18ybTc3NSUcxT13QFBHPYPWCpsFvzluANwNsdEFTB4zci4g4A/gH4E2ZeaCFGmdh5D5k\n5s9m5gsy8wWs5uxv6+BQh/G+Pz4BvCYitkXEs1l9oexbM65zFsbZi3uAXwLo5ckvAr490yrr2NTc\nnOgZe3pB01PG2Qvgz4CfAq7tna0ezczz2qp5Gsbch7kw5vfHPRHxGeBO4MfABzKzc4N9zL74S+DD\nEbGf1ZPQP87Mh1oreooi4gbgQuCUiDgEXM1qLLeluekFSpLUMf6v8SSpYxzsktQxDnZJ6hgHuyR1\njINdkjrGwS5JHeNgl6SOcbBLUsf8H3VnclXqhx0OAAAAAElFTkSuQmCC\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x10a9714a8>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"sentiment_df.hist()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>0.287673</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>0.208612</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>0.492372</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>0.542989</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>0.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>0.591604</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>1.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>0.165844</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>0.332553</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>1.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>0.136576</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 0\n", | |
"0 0.000000\n", | |
"1 0.287673\n", | |
"2 0.208612\n", | |
"3 0.492372\n", | |
"4 0.542989\n", | |
"5 0.500000\n", | |
"6 0.591604\n", | |
"7 1.000000\n", | |
"8 0.165844\n", | |
"9 0.332553\n", | |
"10 1.000000\n", | |
"11 0.136576" | |
] | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sentiment_df" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'¿Cómo? o sea el si puede fabricar bombas atómicas y los demás paises no. No pues me encanto. ¿Así o más claro? este tipo se quiere apoderar del mundo..'" | |
] | |
}, | |
"execution_count": 31, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"comentarios_text[7]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 50, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"string_larga = \"\"\n", | |
"for comment in comentarios_text:\n", | |
" string_larga = string_larga + \" \"+ comment.lower().replace(\".\",\" \")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from nltk.corpus import stopwords" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"stop = stopwords.words('spanish')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'vale mexico vaya desarrollando programa nuclear atacar gringos nosotros orate planeado atacarnos paises ven odia odio jarocho si senil inseguro mejor compre convertible rojo harley avecina tercera guerra civil haga memo va pagar north korea ira china israel president donald el patriota trump preparandose guerra china israel ramon ayala terminando corrido el gero invencible como si puede fabricar bombas atomicas demas paises pues encanto asi claro tipo quiere apoderar mundo va mal loco si controla lengua van venir aventando bombonazos ojos razgados jajaj muro deportaciones impuestos nafta inmigrantes importa si peligroso seria sano seor trump atendiera psiquiatra necesita mucha ayuda loco fanfarron inestable atras boton rojo'" | |
] | |
}, | |
"execution_count": 51, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"\n", | |
"clean_string(\" \".join([word for word in string_larga.split() if word not in stop]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"http://www.elnorte.com/aplicaciones/articulo/default.aspx?id=1052730" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.4.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment