Last active
September 16, 2016 18:48
-
-
Save crawles/bd22ceb6410ec1c6580b408ea1cc4f5d to your computer and use it in GitHub Desktop.
Example
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This notebook implements a pre-trained sentiment analysis pipeline including a regex pre-processing step, tokenization, n-gram computation, and logistic regreression model as a RESTful API." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2016-08-24T11:16:17.753340", | |
"start_time": "2016-08-24T11:16:17.733692" | |
}, | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import cPickle\n", | |
"import json\n", | |
"\n", | |
"import pandas as pd\n", | |
"import sklearn\n", | |
"import requests" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Import the trained model" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2016-08-24T14:20:40.129088", | |
"start_time": "2016-08-24T14:20:40.044663" | |
}, | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"resp = requests.get(\"https://raw.githubusercontent.com/crawles/gpdb_sentiment_analysis_twitter_model/master/twitter_sentiment_model.pkl\")\n", | |
"resp.raise_for_status()\n", | |
"cl = cPickle.loads(resp.content)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2016-09-16T11:57:34.885062", | |
"start_time": "2016-09-16T11:57:34.874853" | |
} | |
}, | |
"source": [ | |
"### Import data pre-processing function" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def regex_preprocess(raw_tweets):\n", | |
" pp_text = pd.Series(raw_tweets)\n", | |
" \n", | |
" user_pat = '(?<=^|(?<=[^a-zA-Z0-9-_\\.]))@([A-Za-z]+[A-Za-z0-9]+)'\n", | |
" http_pat = '(https?:\\/\\/(?:www\\.|(?!www))[^\\s\\.]+\\.[^\\s]{2,}|www\\.[^\\s]+\\.[^\\s]{2,})'\n", | |
" repeat_pat, repeat_repl = \"(.)\\\\1\\\\1+\",'\\\\1\\\\1'\n", | |
"\n", | |
" pp_text = pp_text.str.replace(pat = user_pat, repl = 'USERNAME')\n", | |
" pp_text = pp_text.str.replace(pat = http_pat, repl = 'URL')\n", | |
" pp_text.str.replace(pat = repeat_pat, repl = repeat_repl)\n", | |
" return pp_text" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Setup the API" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Jupyter Kernel Gateway utilizes a global REQUEST JSON string that will be replaced on each invocation of the API." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"REQUEST = json.dumps({\n", | |
" 'path' : {},\n", | |
" 'args' : {}\n", | |
"})" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Compute sentiment using trained model and serve using POST" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using the kernel gateway, a cell is created as an HTTP handler using a single line comment. The handler supports common HTTP verbs (GET, POST, DELETE, etc). For more information, view the <a href=\"https://jupyter-kernel-gateway.readthedocs.io/en/latest/http-mode.html\">docs</a>." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2016-08-22T17:52:15.653020", | |
"start_time": "2016-08-22T17:52:15.636712" | |
}, | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# POST /polarity_compute\n", | |
"req = json.loads(REQUEST)\n", | |
"tweets = req['body']['data']\n", | |
"print(cl.predict_proba(regex_preprocess(tweets))[:][:,1])" | |
] | |
} | |
], | |
"metadata": { | |
"anaconda-cloud": {}, | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.12" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment