vabarbosa · June 28, 2017 05:22
diff --git a/notebooks with pixiedust - 2.ipynb b/notebooks with pixiedust - 2.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro to Notebooks with PixieDust  \n",
    "\n",
    "<center>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-right:25px;\" src=\"https://libraries.mit.edu/news/files/2016/02/jupyter.png\"/>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-left:25px;\" src=\"https://github.com/ibm-watson-data-lab/pixiedust/raw/master/docs/_static/PixieDust%202C%20%28512x512%29.png\"/>\n",
    "    \n",
    "<br/>  \n",
    "</center>  \n",
    "\n",
    "### PART II  \n",
    "\n",
    "* Package Manager\n",
    "* Scala Bridge\n",
    "* Custom Visualization  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br/>  \n",
    "\n",
    "#### Import PixieDust\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import pixiedust"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br/>\n",
    "\n",
    "## Package Manager\n",
    "\n",
    "You can use the PackageManager component of Pixiedust to install and uninstall maven packages into your notebook kernel without editing configuration files. You can find more info at https://ibm-cds-labs.github.io/pixiedust/packagemanager.html.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Install package from spark-packages.org\n",
    "pixiedust.installPackage(\"graphframes:graphframes:0.1.0-spark1.6\")\n",
    "\n",
    "# Install package from maven\n",
    "# pixiedust.installPackage(\"org.apache.commons:commons-csv:0\")\n",
    "\n",
    "# Install jar from URL\n",
    "# pixiedust.installPackage(\"https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "pixiedust.printAllPackages()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br/>  \n",
    "\n",
    "## Scala Bridge\n",
    "\n",
    "Data scientists working with Spark may occasionaly need to call out to one of the hundreds of libraries available on spark-packages.org which are written in Scala or Java. PixieDust provides a solution to this problem by letting users directly write and run scala code in a Python notebook.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Define Python variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "python_var = \"Hello From Python\"\n",
    "python_num = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use Python variables in Scala code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%scala\n",
    "\n",
    "println(python_var)\n",
    "println(python_num+10)\n",
    "\n",
    "val __scala_var = \"Hello From Scala\"\n",
    "val __scala_num = 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use Scala variables in Python code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(__scala_var)\n",
    "print(__scala_num * 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### Environment info\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%scala\n",
    "\n",
    "val __scala_version = util.Properties.versionNumberString"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import platform\n",
    "\n",
    "print('PYTHON VERSON = ' + platform.python_version())\n",
    "print('SPARK VERSON = ' + sc.version)\n",
    "print('SCALA VERSON = ' + __scala_version)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br/>\n",
    "\n",
    "## Custom (Simple Word Cloud) Visualization\n",
    "\n",
    "Create a PixieDust word cloud visualization using [a little word cloud generator](https://github.com/amueller/word_cloud) that already exists and is easy to use."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Word cloud library\n",
    "\n",
    "Install the word cloud generator library if not already available"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# !pip install --upgrade wordcloud"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Get data\n",
    "\n",
    "Crime data from the city of Boston (over a two-week span)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "pixiedust": {
     "displayParams": {
      "handlerId": "dataframe"
     }
    }
   },
   "outputs": [],
   "source": [
    "import pixiedust\n",
    "\n",
    "df = pixiedust.sampleData(\"https://raw.githubusercontent.com/ibm-cds-labs/open-data/master/crime/boston_crime_sample.csv\")\n",
    "\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Define the Template\n",
    "\n",
    "First step will be to create the HTML fragment for the template of your visualization.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pixiedust.display.display import *\n",
    "from wordcloud import WordCloud\n",
    "import cStringIO\n",
    "import base64\n",
    "\n",
    "class SimpleWordCloudDisplay(Display):\n",
    "    def doRender(self, handlerId):\n",
    "        # convert from dataframe to dict\n",
    "        dfdict = {}\n",
    "        df = self.entity.toPandas()\n",
    "        for x in range(len(df)):\n",
    "            currentid = df.iloc[x,0] or 'NoStreet'\n",
    "            currentvalue = df.iloc[x,1]\n",
    "            dfdict.setdefault(currentid, 0)\n",
    "            dfdict[currentid] = dfdict[currentid] + currentvalue\n",
    "\n",
    "        # create word cloud from dict\n",
    "        wc = WordCloud(background_color=\"white\").fit_words(dfdict)\n",
    "\n",
    "        # encode word cloud image to base64 string\n",
    "        b = cStringIO.StringIO()\n",
    "        wc.to_image().save(b, format=\"PNG\")\n",
    "        img_str = base64.b64encode(b.getvalue())\n",
    "\n",
    "        self._addHTMLTemplateString(\n",
    "\"\"\"\n",
    "<center><img src=\"data:image/png;base64,{0}\"></center>\n",
    "\"\"\".format(img_str.decode(\"ascii\"))\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Specify the Metadata\n",
    "\n",
    "The menu options can be added to the toolbar area by including some menu info metadata."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "@PixiedustDisplay()\n",
    "class SimpleWordCloudMeta(DisplayHandlerMeta):\n",
    "    @addId\n",
    "    def getMenuInfo(self,entity,dataHandler):\n",
    "        if entity.__class__.__name__ == \"DataFrame\":\n",
    "            return [\n",
    "                {\n",
    "                    \"categoryId\": \"Chart\",\n",
    "                    \"title\": \"Simple Word Cloud\",\n",
    "                    \"icon\": \"fa-cloud\",\n",
    "                    \"id\": \"mySimpleWordCloud\"\n",
    "                }\n",
    "            ]\n",
    "        else:\n",
    "            return []\n",
    "\n",
    "    def newDisplayHandler(self,options,entity):\n",
    "        return SimpleWordCloudDisplay(options,entity)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### Behold, the cloud  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "pixiedust": {
     "displayParams": {
      "handlerId": "dataframe"
     }
    }
   },
   "outputs": [],
   "source": [
    "df2 = df.groupBy(\"street\").count()\n",
    "\n",
    "display(df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "pySpark (Spark 1.6.0) Python 2",
   "language": "python",
   "name": "pyspark1.6"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Intro to Notebooks with PixieDust \n",
	"\n",
	"<center>\n",
	"<img style=\"max-width:200px; display:inline-block; padding-right:25px;\" src=\"https://libraries.mit.edu/news/files/2016/02/jupyter.png\"/>\n",
	"<img style=\"max-width:200px; display:inline-block; padding-left:25px;\" src=\"https://github.com/ibm-watson-data-lab/pixiedust/raw/master/docs/_static/PixieDust%202C%20%28512x512%29.png\"/>\n",
	" \n",
	"<br/> \n",
	"</center> \n",
	"\n",
	"### PART II \n",
	"\n",
	"* Package Manager\n",
	"* Scala Bridge\n",
	"* Custom Visualization \n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<br/> \n",
	"\n",
	"#### Import PixieDust\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"import pixiedust"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<br/>\n",
	"\n",
	"## Package Manager\n",
	"\n",
	"You can use the PackageManager component of Pixiedust to install and uninstall maven packages into your notebook kernel without editing configuration files. You can find more info at https://ibm-cds-labs.github.io/pixiedust/packagemanager.html.\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"# Install package from spark-packages.org\n",
	"pixiedust.installPackage(\"graphframes:graphframes:0.1.0-spark1.6\")\n",
	"\n",
	"# Install package from maven\n",
	"# pixiedust.installPackage(\"org.apache.commons:commons-csv:0\")\n",
	"\n",
	"# Install jar from URL\n",
	"# pixiedust.installPackage(\"https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar\")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"pixiedust.printAllPackages()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<br/> \n",
	"\n",
	"## Scala Bridge\n",
	"\n",
	"Data scientists working with Spark may occasionaly need to call out to one of the hundreds of libraries available on spark-packages.org which are written in Scala or Java. PixieDust provides a solution to this problem by letting users directly write and run scala code in a Python notebook.\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Define Python variables"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"python_var = \"Hello From Python\"\n",
	"python_num = 10"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Use Python variables in Scala code"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"%%scala\n",
	"\n",
	"println(python_var)\n",
	"println(python_num+10)\n",
	"\n",
	"val __scala_var = \"Hello From Scala\"\n",
	"val __scala_num = 5"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Use Scala variables in Python code"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"print(__scala_var)\n",
	"print(__scala_num * 3)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\n",
	"#### Environment info\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"%%scala\n",
	"\n",
	"val __scala_version = util.Properties.versionNumberString"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"import platform\n",
	"\n",
	"print('PYTHON VERSON = ' + platform.python_version())\n",
	"print('SPARK VERSON = ' + sc.version)\n",
	"print('SCALA VERSON = ' + __scala_version)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"<br/>\n",
	"\n",
	"## Custom (Simple Word Cloud) Visualization\n",
	"\n",
	"Create a PixieDust word cloud visualization using [a little word cloud generator](https://github.com/amueller/word_cloud) that already exists and is easy to use."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Word cloud library\n",
	"\n",
	"Install the word cloud generator library if not already available"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"# !pip install --upgrade wordcloud"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Get data\n",
	"\n",
	"Crime data from the city of Boston (over a two-week span)\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false,
	"pixiedust": {
	"displayParams": {
	"handlerId": "dataframe"
	}
	}
	},
	"outputs": [],
	"source": [
	"import pixiedust\n",
	"\n",
	"df = pixiedust.sampleData(\"https://raw.githubusercontent.com/ibm-cds-labs/open-data/master/crime/boston_crime_sample.csv\")\n",
	"\n",
	"display(df)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Define the Template\n",
	"\n",
	"First step will be to create the HTML fragment for the template of your visualization.\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"from pixiedust.display.display import *\n",
	"from wordcloud import WordCloud\n",
	"import cStringIO\n",
	"import base64\n",
	"\n",
	"class SimpleWordCloudDisplay(Display):\n",
	" def doRender(self, handlerId):\n",
	" # convert from dataframe to dict\n",
	" dfdict = {}\n",
	" df = self.entity.toPandas()\n",
	" for x in range(len(df)):\n",
	" currentid = df.iloc[x,0] or 'NoStreet'\n",
	" currentvalue = df.iloc[x,1]\n",
	" dfdict.setdefault(currentid, 0)\n",
	" dfdict[currentid] = dfdict[currentid] + currentvalue\n",
	"\n",
	" # create word cloud from dict\n",
	" wc = WordCloud(background_color=\"white\").fit_words(dfdict)\n",
	"\n",
	" # encode word cloud image to base64 string\n",
	" b = cStringIO.StringIO()\n",
	" wc.to_image().save(b, format=\"PNG\")\n",
	" img_str = base64.b64encode(b.getvalue())\n",
	"\n",
	" self._addHTMLTemplateString(\n",
	"\"\"\"\n",
	"<center><img src=\"data:image/png;base64,{0}\"></center>\n",
	"\"\"\".format(img_str.decode(\"ascii\"))\n",
	" )"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Specify the Metadata\n",
	"\n",
	"The menu options can be added to the toolbar area by including some menu info metadata."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"@PixiedustDisplay()\n",
	"class SimpleWordCloudMeta(DisplayHandlerMeta):\n",
	" @addId\n",
	" def getMenuInfo(self,entity,dataHandler):\n",
	" if entity.__class__.__name__ == \"DataFrame\":\n",
	" return [\n",
	" {\n",
	" \"categoryId\": \"Chart\",\n",
	" \"title\": \"Simple Word Cloud\",\n",
	" \"icon\": \"fa-cloud\",\n",
	" \"id\": \"mySimpleWordCloud\"\n",
	" }\n",
	" ]\n",
	" else:\n",
	" return []\n",
	"\n",
	" def newDisplayHandler(self,options,entity):\n",
	" return SimpleWordCloudDisplay(options,entity)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\n",
	"#### Behold, the cloud \n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false,
	"pixiedust": {
	"displayParams": {
	"handlerId": "dataframe"
	}
	}
	},
	"outputs": [],
	"source": [
	"df2 = df.groupBy(\"street\").count()\n",
	"\n",
	"display(df2)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"anaconda-cloud": {},
	"kernelspec": {
	"display_name": "pySpark (Spark 1.6.0) Python 2",
	"language": "python",
	"name": "pyspark1.6"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.11"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}