Skip to content

Instantly share code, notes, and snippets.

@vabarbosa
Last active June 28, 2017 05:22

Revisions

  1. vabarbosa revised this gist Jun 28, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion notebooks with pixiedust - 2.ipynb
    Original file line number Diff line number Diff line change
    @@ -8,7 +8,7 @@
    "\n",
    "<center>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-right:25px;\" src=\"https://libraries.mit.edu/news/files/2016/02/jupyter.png\"/>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-left:25px;\" src=\"https://github.com/ibm-cds-labs/pixiedust/raw/master/docs/_static/PixieDust%202C%20%28512x512%29.png\"/>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-left:25px;\" src=\"https://github.com/ibm-watson-data-lab/pixiedust/raw/master/docs/_static/PixieDust%202C%20%28512x512%29.png\"/>\n",
    " \n",
    "<br/> \n",
    "</center> \n",
  2. vabarbosa revised this gist May 10, 2017. 1 changed file with 7 additions and 5 deletions.
    12 changes: 7 additions & 5 deletions notebooks with pixiedust - 2.ipynb
    Original file line number Diff line number Diff line change
    @@ -6,16 +6,18 @@
    "source": [
    "# Intro to Notebooks with PixieDust \n",
    "\n",
    "### PART II \n",
    "\n",
    "> Interactive notebooks are powerful tools for fast and flexible experimentation and data analysis. Notebooks can contain live code, static text, equations and visualizations. In this lab, you create a notebook via the IBM Data Science Experience to explore and visualize data to gain insight. We will be using PixieDust, an open source Python notebook helper library, to visualize the data in different ways (e.g., charts, maps, etc.) with one simple call. \n",
    "\n",
    "<center>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-right:25px;\" src=\"https://libraries.mit.edu/news/files/2016/02/jupyter.png\"/>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-left:25px;\" src=\"https://github.com/ibm-cds-labs/pixiedust/raw/master/docs/_static/PixieDust%202C%20%28512x512%29.png\"/>\n",
    " \n",
    "<br/> \n",
    "</center> \n"
    "</center> \n",
    "\n",
    "### PART II \n",
    "\n",
    "* Package Manager\n",
    "* Scala Bridge\n",
    "* Custom Visualization \n"
    ]
    },
    {
  3. vabarbosa created this gist May 10, 2017.
    393 changes: 393 additions & 0 deletions notebooks with pixiedust - 2.ipynb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,393 @@
    {
    "cells": [
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "# Intro to Notebooks with PixieDust \n",
    "\n",
    "### PART II \n",
    "\n",
    "> Interactive notebooks are powerful tools for fast and flexible experimentation and data analysis. Notebooks can contain live code, static text, equations and visualizations. In this lab, you create a notebook via the IBM Data Science Experience to explore and visualize data to gain insight. We will be using PixieDust, an open source Python notebook helper library, to visualize the data in different ways (e.g., charts, maps, etc.) with one simple call. \n",
    "\n",
    "<center>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-right:25px;\" src=\"https://libraries.mit.edu/news/files/2016/02/jupyter.png\"/>\n",
    "<img style=\"max-width:200px; display:inline-block; padding-left:25px;\" src=\"https://github.com/ibm-cds-labs/pixiedust/raw/master/docs/_static/PixieDust%202C%20%28512x512%29.png\"/>\n",
    " \n",
    "<br/> \n",
    "</center> \n"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "<br/> \n",
    "\n",
    "#### Import PixieDust\n"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "import pixiedust"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "<br/>\n",
    "\n",
    "## Package Manager\n",
    "\n",
    "You can use the PackageManager component of Pixiedust to install and uninstall maven packages into your notebook kernel without editing configuration files. You can find more info at https://ibm-cds-labs.github.io/pixiedust/packagemanager.html.\n"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "# Install package from spark-packages.org\n",
    "pixiedust.installPackage(\"graphframes:graphframes:0.1.0-spark1.6\")\n",
    "\n",
    "# Install package from maven\n",
    "# pixiedust.installPackage(\"org.apache.commons:commons-csv:0\")\n",
    "\n",
    "# Install jar from URL\n",
    "# pixiedust.installPackage(\"https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.6.jar\")"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "pixiedust.printAllPackages()"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "<br/> \n",
    "\n",
    "## Scala Bridge\n",
    "\n",
    "Data scientists working with Spark may occasionaly need to call out to one of the hundreds of libraries available on spark-packages.org which are written in Scala or Java. PixieDust provides a solution to this problem by letting users directly write and run scala code in a Python notebook.\n"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "#### Define Python variables"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "python_var = \"Hello From Python\"\n",
    "python_num = 10"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "#### Use Python variables in Scala code"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "%%scala\n",
    "\n",
    "println(python_var)\n",
    "println(python_num+10)\n",
    "\n",
    "val __scala_var = \"Hello From Scala\"\n",
    "val __scala_num = 5"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "#### Use Scala variables in Python code"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "print(__scala_var)\n",
    "print(__scala_num * 3)"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "\n",
    "#### Environment info\n"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "%%scala\n",
    "\n",
    "val __scala_version = util.Properties.versionNumberString"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "import platform\n",
    "\n",
    "print('PYTHON VERSON = ' + platform.python_version())\n",
    "print('SPARK VERSON = ' + sc.version)\n",
    "print('SCALA VERSON = ' + __scala_version)"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "<br/>\n",
    "\n",
    "## Custom (Simple Word Cloud) Visualization\n",
    "\n",
    "Create a PixieDust word cloud visualization using [a little word cloud generator](https://github.com/amueller/word_cloud) that already exists and is easy to use."
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "#### Word cloud library\n",
    "\n",
    "Install the word cloud generator library if not already available"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false
    },
    "outputs": [],
    "source": [
    "# !pip install --upgrade wordcloud"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "#### Get data\n",
    "\n",
    "Crime data from the city of Boston (over a two-week span)\n"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false,
    "pixiedust": {
    "displayParams": {
    "handlerId": "dataframe"
    }
    }
    },
    "outputs": [],
    "source": [
    "import pixiedust\n",
    "\n",
    "df = pixiedust.sampleData(\"https://raw.githubusercontent.com/ibm-cds-labs/open-data/master/crime/boston_crime_sample.csv\")\n",
    "\n",
    "display(df)"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "#### Define the Template\n",
    "\n",
    "First step will be to create the HTML fragment for the template of your visualization.\n"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": true
    },
    "outputs": [],
    "source": [
    "from pixiedust.display.display import *\n",
    "from wordcloud import WordCloud\n",
    "import cStringIO\n",
    "import base64\n",
    "\n",
    "class SimpleWordCloudDisplay(Display):\n",
    " def doRender(self, handlerId):\n",
    " # convert from dataframe to dict\n",
    " dfdict = {}\n",
    " df = self.entity.toPandas()\n",
    " for x in range(len(df)):\n",
    " currentid = df.iloc[x,0] or 'NoStreet'\n",
    " currentvalue = df.iloc[x,1]\n",
    " dfdict.setdefault(currentid, 0)\n",
    " dfdict[currentid] = dfdict[currentid] + currentvalue\n",
    "\n",
    " # create word cloud from dict\n",
    " wc = WordCloud(background_color=\"white\").fit_words(dfdict)\n",
    "\n",
    " # encode word cloud image to base64 string\n",
    " b = cStringIO.StringIO()\n",
    " wc.to_image().save(b, format=\"PNG\")\n",
    " img_str = base64.b64encode(b.getvalue())\n",
    "\n",
    " self._addHTMLTemplateString(\n",
    "\"\"\"\n",
    "<center><img src=\"data:image/png;base64,{0}\"></center>\n",
    "\"\"\".format(img_str.decode(\"ascii\"))\n",
    " )"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "#### Specify the Metadata\n",
    "\n",
    "The menu options can be added to the toolbar area by including some menu info metadata."
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": true
    },
    "outputs": [],
    "source": [
    "@PixiedustDisplay()\n",
    "class SimpleWordCloudMeta(DisplayHandlerMeta):\n",
    " @addId\n",
    " def getMenuInfo(self,entity,dataHandler):\n",
    " if entity.__class__.__name__ == \"DataFrame\":\n",
    " return [\n",
    " {\n",
    " \"categoryId\": \"Chart\",\n",
    " \"title\": \"Simple Word Cloud\",\n",
    " \"icon\": \"fa-cloud\",\n",
    " \"id\": \"mySimpleWordCloud\"\n",
    " }\n",
    " ]\n",
    " else:\n",
    " return []\n",
    "\n",
    " def newDisplayHandler(self,options,entity):\n",
    " return SimpleWordCloudDisplay(options,entity)"
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "\n",
    "#### Behold, the cloud \n"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": false,
    "pixiedust": {
    "displayParams": {
    "handlerId": "dataframe"
    }
    }
    },
    "outputs": [],
    "source": [
    "df2 = df.groupBy(\"street\").count()\n",
    "\n",
    "display(df2)"
    ]
    },
    {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
    "collapsed": true
    },
    "outputs": [],
    "source": []
    }
    ],
    "metadata": {
    "anaconda-cloud": {},
    "kernelspec": {
    "display_name": "pySpark (Spark 1.6.0) Python 2",
    "language": "python",
    "name": "pyspark1.6"
    },
    "language_info": {
    "codemirror_mode": {
    "name": "ipython",
    "version": 2
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython2",
    "version": "2.7.11"
    }
    },
    "nbformat": 4,
    "nbformat_minor": 0
    }