Skip to content

Instantly share code, notes, and snippets.

@vabarbosa
Last active June 28, 2017 05:22
Show Gist options
  • Save vabarbosa/dca176c3a68f0c101cbe475571e56bf7 to your computer and use it in GitHub Desktop.
Save vabarbosa/dca176c3a68f0c101cbe475571e56bf7 to your computer and use it in GitHub Desktop.
intro to notebooks with pixiedust - part 2
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intro to Notebooks with PixieDust \n",
"\n",
"<center>\n",
"<img style=\"max-width:200px; display:inline-block; padding-right:25px;\" src=\"https://libraries.mit.edu/news/files/2016/02/jupyter.png\"/>\n",
"<img style=\"max-width:200px; display:inline-block; padding-left:25px;\" src=\"https://github.com/ibm-watson-data-lab/pixiedust/raw/master/docs/_static/PixieDust%202C%20%28512x512%29.png\"/>\n",
" \n",
"<br/> \n",
"</center> \n",
"\n",
"### PART II \n",
"\n",
"* Package Manager\n",
"* Scala Bridge\n",
"* Custom Visualization \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br/> \n",
"\n",
"#### Import PixieDust\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pixiedust"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br/>\n",
"\n",
"## Package Manager\n",
"\n",
"You can use the PackageManager component of Pixiedust to install and uninstall maven packages into your notebook kernel without editing configuration files. You can find more info at https://ibm-cds-labs.github.io/pixiedust/packagemanager.html.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Install package from spark-packages.org\n",
"pixiedust.installPackage(\"graphframes:graphframes:0.1.0-spark1.6\")\n",
"\n",
"# Install package from maven\n",
"# pixiedust.installPackage(\"org.apache.commons:commons-csv:0\")\n",
"\n",
"# Install jar from URL\n",
"# pixiedust.installPackage(\"https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"pixiedust.printAllPackages()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br/> \n",
"\n",
"## Scala Bridge\n",
"\n",
"Data scientists working with Spark may occasionaly need to call out to one of the hundreds of libraries available on spark-packages.org which are written in Scala or Java. PixieDust provides a solution to this problem by letting users directly write and run scala code in a Python notebook.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define Python variables"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"python_var = \"Hello From Python\"\n",
"python_num = 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Python variables in Scala code"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%scala\n",
"\n",
"println(python_var)\n",
"println(python_num+10)\n",
"\n",
"val __scala_var = \"Hello From Scala\"\n",
"val __scala_num = 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use Scala variables in Python code"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(__scala_var)\n",
"print(__scala_num * 3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Environment info\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%scala\n",
"\n",
"val __scala_version = util.Properties.versionNumberString"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import platform\n",
"\n",
"print('PYTHON VERSON = ' + platform.python_version())\n",
"print('SPARK VERSON = ' + sc.version)\n",
"print('SCALA VERSON = ' + __scala_version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br/>\n",
"\n",
"## Custom (Simple Word Cloud) Visualization\n",
"\n",
"Create a PixieDust word cloud visualization using [a little word cloud generator](https://github.com/amueller/word_cloud) that already exists and is easy to use."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Word cloud library\n",
"\n",
"Install the word cloud generator library if not already available"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# !pip install --upgrade wordcloud"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get data\n",
"\n",
"Crime data from the city of Boston (over a two-week span)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"pixiedust": {
"displayParams": {
"handlerId": "dataframe"
}
}
},
"outputs": [],
"source": [
"import pixiedust\n",
"\n",
"df = pixiedust.sampleData(\"https://raw.githubusercontent.com/ibm-cds-labs/open-data/master/crime/boston_crime_sample.csv\")\n",
"\n",
"display(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define the Template\n",
"\n",
"First step will be to create the HTML fragment for the template of your visualization.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from pixiedust.display.display import *\n",
"from wordcloud import WordCloud\n",
"import cStringIO\n",
"import base64\n",
"\n",
"class SimpleWordCloudDisplay(Display):\n",
" def doRender(self, handlerId):\n",
" # convert from dataframe to dict\n",
" dfdict = {}\n",
" df = self.entity.toPandas()\n",
" for x in range(len(df)):\n",
" currentid = df.iloc[x,0] or 'NoStreet'\n",
" currentvalue = df.iloc[x,1]\n",
" dfdict.setdefault(currentid, 0)\n",
" dfdict[currentid] = dfdict[currentid] + currentvalue\n",
"\n",
" # create word cloud from dict\n",
" wc = WordCloud(background_color=\"white\").fit_words(dfdict)\n",
"\n",
" # encode word cloud image to base64 string\n",
" b = cStringIO.StringIO()\n",
" wc.to_image().save(b, format=\"PNG\")\n",
" img_str = base64.b64encode(b.getvalue())\n",
"\n",
" self._addHTMLTemplateString(\n",
"\"\"\"\n",
"<center><img src=\"data:image/png;base64,{0}\"></center>\n",
"\"\"\".format(img_str.decode(\"ascii\"))\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Specify the Metadata\n",
"\n",
"The menu options can be added to the toolbar area by including some menu info metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@PixiedustDisplay()\n",
"class SimpleWordCloudMeta(DisplayHandlerMeta):\n",
" @addId\n",
" def getMenuInfo(self,entity,dataHandler):\n",
" if entity.__class__.__name__ == \"DataFrame\":\n",
" return [\n",
" {\n",
" \"categoryId\": \"Chart\",\n",
" \"title\": \"Simple Word Cloud\",\n",
" \"icon\": \"fa-cloud\",\n",
" \"id\": \"mySimpleWordCloud\"\n",
" }\n",
" ]\n",
" else:\n",
" return []\n",
"\n",
" def newDisplayHandler(self,options,entity):\n",
" return SimpleWordCloudDisplay(options,entity)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### Behold, the cloud \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"pixiedust": {
"displayParams": {
"handlerId": "dataframe"
}
}
},
"outputs": [],
"source": [
"df2 = df.groupBy(\"street\").count()\n",
"\n",
"display(df2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "pySpark (Spark 1.6.0) Python 2",
"language": "python",
"name": "pyspark1.6"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment