Skip to content

Instantly share code, notes, and snippets.

@jackparmer
Last active December 29, 2015 21:19
Show Gist options
  • Save jackparmer/7729584 to your computer and use it in GitHub Desktop.
Save jackparmer/7729584 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Plotly - Bubble Charts"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Bubble Charts & Hover Text with Plotly"
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "This notebook is out-of-date (Apr-14 2015)"
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Up-to-date version: <a href=\"https://plot.ly/python/bubble-charts-tutorial/\">Bubble chart tutorial</a> in the <a href=\"https://plot.ly/python/user-guide/\">User Guide</a>"
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "See <a href=\"https://plot.ly/ipython-notebook/\">IPython notebook</a> for more Plotly IPython notebooks."
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "***"
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "I'm Jack Parmer"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Plotly is like graphing crack. It standardizes the graphing interface across scientific computing languages (Python, R, MATLAB, ect) while giving rich interactivity and web shareability that has not been possible before with matplotlib, ggplot, MATLAB, ect. On the Plotly website, you can style your graphs with a GUI, so you don't have to spend hours writing code that simply changes the legend opacity.\n\nPlotly does this all while backing up your graphs on the cloud, so that years later, you can find data that may have otherwise been on a harddrive in a landfill. If you make your data public, <i>other people</i> can also find your graphs and data. The best practice that we have today for saving and sharing research data is to entomb it as a thesis in the engineering library basement. All that is changing.\n\nLike d3.js? Like interactive, NYT graphics? So do we. Now, with the <a href='https://plot.ly/api'>Plotly APIs</a>, you can make them yourself without being an expert web programmer. If you <i>are</i> an expert web programmer, now you have scientific languages and tools like R, Python, Pandas, and MATLAB instead of javascript to wrangle your data and create beautiful data vis. Science meets the world-wide-web. Engineering meets design. Let's do this.\n\nI'm going to show you this brave new world below, starting with bubble charts. Bubble charts are sweet because they take advantage of the innate interactivity of Plotly graphs. When you hover on a bubble chart point, you want to see what its size represents, you want to zoom-in to points that are clustered, and you want to pan around once you're zoomed-in. You become a Bubble Chart Explorer. Plotly lets you do all this, all while upping the game for scientific, publication-quality graphics. \n\nCheck out the graphs below, then follow us on twitter <a href=\"https://twitter.com/plotlygraphs\">@plotlygraphs</a> for more graphing inspiration.\n\n*Hearts*<br>\nTeam Plotly<br>\nMontreal | San Francisco | Boston"
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "1: The Never Ending Story Bubble Chart"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Here is a simple, single-trace bubble chart showing how to add hover-text and custom colors. Most of the code is for color interpolation and typecasting - not necessary for making a bubble chart, but perhaps handy for some readers. The lovely colorscale is borrowed from <a href=\"colorbrewer.com\">colorbrewer.com</a>."
},
{
"cell_type": "code",
"collapsed": false,
"input": "import plotly\nimport math\nimport numpy as np",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Plotly installs by pip, easy_install, or tar ball. See <a href=\"https://plot.ly/api/python\">https://plot.ly/api/python</a> for deets."
},
{
"cell_type": "code",
"collapsed": false,
"input": "un='jackp'\nk='11m2qbzob9'\nplotly.plotly('IPython.Demo', '1fw3zw2o13')\npy = plotly.plotly(username=un, key=k)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Sign up for Plotly and grap your API key at <a href=\"https://plot.ly/api/python\">https://plot.ly/api/python</a>"
},
{
"cell_type": "code",
"collapsed": false,
"input": "# RGB color scale from colorbrewer.com\nGnBu = [(247, 252, 240),(224, 243, 219),(204, 235, 197),\\\n (168, 221, 181),(123, 204, 196),(78, 179, 211),\\\n (43, 140, 190),(8, 104, 172),(8, 64, 129)]\n\ndef rgbToHsl(rgb):\n ''' Adapted from M Bostock's RGB to HSL converter in d3.js\n https://github.com/mbostock/d3/blob/master/src/color/rgb.js '''\n r,g,b = float(rgb[0])/255.0,\\\n float(rgb[1])/255.0,\\\n float(rgb[2])/255.0\n mx = max(r, g, b)\n mn = min(r, g, b)\n h = s = l = (mx + mn) / 2\n if mx == mn: # achromatic\n h = 0\n s = 0 if l > 0 and l < 1 else h\n else:\n d = mx - mn; \n s = d / (mx + mn) if l < 0.5 else d / (2 - mx - mn)\n if mx == r:\n h = (g - b) / d + ( 6 if g < b else 0 )\n elif mx == g:\n h = (b - r) / d + 2\n else:\n h = r - g / d + 4\n\n return (round(h*60,4), round(s*100,4), round(l*100,4))\n\ndef interp3(fraction, start, end):\n ''' Interpolate between values of 2, 3-member tuples '''\n def intp(f, s, e):\n return s + (e - s)*f \n return tuple([intp(fraction, start[i], end[i]) for i in range(3)])\n\ndef colorscale(scl, r):\n ''' Interpolate a hsl colorscale from \"scl\" with length \"r\" '''\n c = []\n SCL_FI = len(scl)-1 # final index of color scale \n \n for i in r:\n c_i = int(i*math.floor(SCL_FI)/round(r[-1])) # start color index\n hsl_o = rgbToHsl( scl[c_i] ) # convert rgb to hls\n hsl_f = rgbToHsl( scl[c_i+1] ) \n section_min = c_i*r[-1]/SCL_FI\n section_max = (c_i+1)*(r[-1]/SCL_FI)\n fraction = (i-section_min)/(section_max-section_min)\n hsl = interp3( fraction, hsl_o, hsl_f )\n c.append( 'hsl'+str(hsl) )\n return c\n\nr = np.arange(0,20,0.1)\nx = [2*np.cos(i)*i+(i*0.2*np.random.rand()) for i in r]\ny = [2*np.sin(i)*i+(i*0.2*np.random.rand()) for i in r]\ns = [(i+5)*i/5 for i in r] # diameter of bubble size in pixels\nt = [('Area: '+str(round(3.14*math.pow(d/2,2),2))+' (sq. pixels)<br>\\\n Radius: '+str(round(d/2,2))+' pixels') for d in s] # Show hover text as bubble area\nc = colorscale(GnBu, r)\n\n# set hovermode to 'closest' to turn off showing all points near the same x on hover\nlayout = { 'hovermode':'closest','title':'click-drag to zoom-in<br>double-click to zoom-out',\\\n'xaxis':{'showticklabels':False,'ticks':'','linecolor':'white','showgrid':False,'zeroline':False},\\\n'yaxis':{'showticklabels':False,'ticks':'','linecolor':'white','showgrid':False,'zeroline':False} }\n\ndata = [ {'x':x,'y':y,'mode':'markers','opacity':0.7,'text':t,\\\n 'marker':{'size':s,'color':c,'line':{'width':2}}} ]\n\npy.iplot(data, layout=layout)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\n\n\n"
},
{
"html": "<iframe height=\"650\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~jackp/1310/600/600\" width=\"650\"></iframe>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": "<IPython.core.display.HTML at 0x1014f7f50>"
}
],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "2. Cool, Lets look at this as subplots"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Lets look at the previous graph as subplots with color variations. The x and y-axes are coupled, so if you zoom and pan in one subplot, the other subplots zoom and pan as well. Sweet!"
},
{
"cell_type": "code",
"collapsed": false,
"input": "# color scales from colorbrewer.com\n\nBuGn = [(247,252,253), (229,245,249), (204,236,230),\\\n (153,216,201), (102,194,164), (65,174,118),\\\n (35,139,69), (0,109,44), (0,68,27)]\nBuPu = [(247,252,253), (224,236,244), (191,211,230),\\\n (158,188,218), (140,150,198), (140,107,177),\\\n (136,65,157), (129,15,124), (77,0,75)]\nGnBu = [(247, 252, 240),(224, 243, 219),(204, 235, 197),\\\n (168, 221, 181),(123, 204, 196),(78, 179, 211),\\\n (43, 140, 190),(8, 104, 172),(8, 64, 129)]\n\nPuBu = [(255,247,251), (236,231,242), (208,209,230),\\\n (166,189,219), (116,169,207), (54,144,192),\\\n (5,112,176), (4,90,141), (2,56,88)]\nPuBuGn = [(255,247,251), (236,226,240), (208,209,230),\\\n (166,189,219), (103,169,207), (54,144,192),\\\n (2,129,138), (1,108,89), (1,70,54)]\nPuRd = [(247,244,249), (231,225,239), (212,185,218),\\\n (201,148,199), (223,101,176), (231,41,138),\\\n (206,18,86), (152,0,67), (103,0,31)]\n\nRdPu = [(255,247,243), (253,224,221), (252,197,192),\\\n (250,159,181), (247,104,161), (221,52,151),\\\n (174,1,126), (122,1,119), (73,0,106)]\nYlGn = [(255,255,229), (247,252,185), (217,240,163),\\\n (173,221,142), (120,198,121), (65,171,93),\\\n (35,132,67), (0,104,55), (0,69,41)]\nYlGnBu = [(255,255,217), (237,248,177), (199,233,180),\\\n (127,205,187), (65,182,196), (29,145,192),\\\n (34,94,168), (37,52,148), (8,29,88)]\n\ndata = []\nlayout = {'showlegend':False,'hovermode':'closest',\\\n 'title':'drag your mouse along the left and bottom border to pan<br>\\\n or hold down shift and drag inside. double-click to re-center'}\npadding = 0.0\ndomains = [[i*(1-3*padding)/3, ((i+1)*(1-3*padding)/3)] for i in range(3)]\ncscl = [[BuGn,BuPu,GnBu],[PuBu,PuBuGn,PuRd],[RdPu,YlGn,YlGnBu]] # colorscale\ns = [(i+5)*i/10 for i in r] # diameter of bubble size in pixels\n\nfor j in range(3):\n for k in range(3):\n c = colorscale(cscl[j][k], r)\n data.append({'name': cscl[j][k]+' Spiral '+str(j)+str(k), \n 'x': [2*np.cos(i*(k/4+1))*i+(i*0.2*np.random.rand()) for i in r],\n 'y': [2*np.sin(i)*i+(i*0.2*np.random.rand()) for i in r],\n 'type':'scatter','mode':'markers','width': 950,'height': 950,'opacity': 0.7, \n 'marker': {'color':c, 'size':s, 'opacity':0.9, 'line':{'width':2}}, \n 'xaxis': 'x' + str(j) if j!=0 else '', \n 'yaxis': 'y' + str(k) if k!=0 else '' })\n xy_i = str(j) if j!=0 else ''\n layout['xaxis'+xy_i] = layout['yaxis'+xy_i] = \\\n {'domain':domains[j],'showticklabels':False,'ticks':'',\\\n 'linecolor':'#E3E3E3','linewidth':8,'showgrid':False,'zeroline':False }\n \npy.iplot(data, layout=layout, width=1000, height=1000) # iframe size",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\n\n\n"
},
{
"html": "<iframe height=\"1050\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~jackp/1251/1000/1000\" width=\"1050\"></iframe>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": "<IPython.core.display.HTML at 0x105026e10>"
}
],
"prompt_number": 14
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "3. Bubble Chart of US Crime per state in 2005"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This example is from Nathan Yau's blog, <a href=\"Flowing Data\">Flowing Data</a>. The original post is <a href=\"http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/\">here</a>, and you can download the data <a href=\"http://datasets.flowingdata.com/crimeRatesByState2005.tsv\">here</a>. "
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": "3.1 Check out the first few rows"
},
{
"cell_type": "code",
"collapsed": false,
"input": "import pandas as pd\npd.set_option('display.max_columns', 15)\npd.set_option('display.line_width', 400)\npd.set_option('display.mpl_style', 'default')\ncrime_data = pd.read_csv('crimeRatesByState2005.tsv',sep='\\t')\ncrime_data[:2]",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>state</th>\n <th>murder</th>\n <th>Forcible_rate</th>\n <th>Robbery</th>\n <th>aggravated_assult</th>\n <th>burglary</th>\n <th>larceny_theft</th>\n <th>motor_vehicle_theft</th>\n <th>population</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td> Alabama </td>\n <td> 8.2</td>\n <td> 34.3</td>\n <td> 141.4</td>\n <td> 247.8</td>\n <td> 953.8</td>\n <td> 2650.0</td>\n <td> 288.3</td>\n <td> 4627851</td>\n </tr>\n <tr>\n <th>1</th>\n <td> Alaska </td>\n <td> 4.8</td>\n <td> 81.1</td>\n <td> 80.9</td>\n <td> 465.1</td>\n <td> 622.5</td>\n <td> 2599.1</td>\n <td> 391.0</td>\n <td> 686293</td>\n </tr>\n </tbody>\n</table>\n</div>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": " state murder Forcible_rate Robbery aggravated_assult burglary larceny_theft motor_vehicle_theft population\n0 Alabama 8.2 34.3 141.4 247.8 953.8 2650.0 288.3 4627851\n1 Alaska 4.8 81.1 80.9 465.1 622.5 2599.1 391.0 686293"
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": "3.2 Burrrrglary versus Murrrrder per state"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Size the bubbles by state population and include hover text for the name of the state."
},
{
"cell_type": "code",
"collapsed": false,
"input": "data = []\nfor i in range(len(crime_data['state'])):\n # Create 1 data object per point, so every point is a different color\n mx = float(crime_data['population'].max())\n s = [ math.sqrt(float(crime_data['population'][i])/mx)*60.0 ]\n t = [ 'State: %s<br>Population: %s' % (crime_data['state'][i], crime_data['population'][i]) ]\n d = {'x':[crime_data['murder'][i]],\\\n 'y':[crime_data['burglary'][i]],\\\n 'marker': {'size':s, 'opacity':0.9, 'line':{'width':1}},\\\n 'type':'scatter','mode':'markers','text':t}\n data.append(d)\n\ncitation = {'showarrow':False, 'font':{'size':10},'xref':'paper','yref':'paper','x':-0.18,'y':-0.18,'align':'left',\\\n 'text':'Data source and inspiration:<br>http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/'}\n\nlayout = {'showlegend':False,'hovermode':'closest', 'title':'','annotations':[citation],\\\n 'title':'US Crime Rate by State<br>Bubble Size is State Population',\\\n 'xaxis':{ 'ticks':'','linecolor':'white','showgrid':False,'zeroline':False, 'title': 'Number of Murders', 'nticks':12 },\n 'yaxis':{ 'ticks':'','linecolor':'white','showgrid':False,'zeroline':False, 'title': 'Number of Burglaries', 'nticks':12 }}\n\npy.iplot(data, layout=layout)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\n\n\n"
},
{
"html": "<iframe height=\"650\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~jackp/1328/600/600\" width=\"650\"></iframe>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": "<IPython.core.display.HTML at 0x1007c4d50>"
}
],
"prompt_number": 37
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "4. Hans Rosling Bubble Chart and Pandas"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Literally and figurative, Hans Rosling has put bubble charts on the map. You can watch one of his sweet TED talks with animated bubble charts in this <a href=\"http://www.youtube.com/watch?v=hVimVzgtD6w\">YouTube</a>. Plotly can't do animations (email us us if you're interested in this - feedback [at] plot [dot] ly), but we can take snapshots through time with subplots. I grabbed this Gap Minder data from this <a href=\"http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt\">UC Berkeley stats page</a>."
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": "4.1 Scope out a few rows"
},
{
"cell_type": "code",
"collapsed": false,
"input": "gdp_data = pd.read_csv('gapMinderDataFiveYear.txt',sep='\\t')\ngdp_data[20:30]",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>country</th>\n <th>year</th>\n <th>pop</th>\n <th>continent</th>\n <th>lifeExp</th>\n <th>gdpPercap</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>20</th>\n <td> Albania</td>\n <td> 1992</td>\n <td> 3326498</td>\n <td> Europe</td>\n <td> 71.581</td>\n <td> 2497.437901</td>\n </tr>\n <tr>\n <th>21</th>\n <td> Albania</td>\n <td> 1997</td>\n <td> 3428038</td>\n <td> Europe</td>\n <td> 72.950</td>\n <td> 3193.054604</td>\n </tr>\n <tr>\n <th>22</th>\n <td> Albania</td>\n <td> 2002</td>\n <td> 3508512</td>\n <td> Europe</td>\n <td> 75.651</td>\n <td> 4604.211737</td>\n </tr>\n <tr>\n <th>23</th>\n <td> Albania</td>\n <td> 2007</td>\n <td> 3600523</td>\n <td> Europe</td>\n <td> 76.423</td>\n <td> 5937.029526</td>\n </tr>\n <tr>\n <th>24</th>\n <td> Algeria</td>\n <td> 1952</td>\n <td> 9279525</td>\n <td> Africa</td>\n <td> 43.077</td>\n <td> 2449.008185</td>\n </tr>\n <tr>\n <th>25</th>\n <td> Algeria</td>\n <td> 1957</td>\n <td> 10270856</td>\n <td> Africa</td>\n <td> 45.685</td>\n <td> 3013.976023</td>\n </tr>\n <tr>\n <th>26</th>\n <td> Algeria</td>\n <td> 1962</td>\n <td> 11000948</td>\n <td> Africa</td>\n <td> 48.303</td>\n <td> 2550.816880</td>\n </tr>\n <tr>\n <th>27</th>\n <td> Algeria</td>\n <td> 1967</td>\n <td> 12760499</td>\n <td> Africa</td>\n <td> 51.407</td>\n <td> 3246.991771</td>\n </tr>\n <tr>\n <th>28</th>\n <td> Algeria</td>\n <td> 1972</td>\n <td> 14760787</td>\n <td> Africa</td>\n <td> 54.518</td>\n <td> 4182.663766</td>\n </tr>\n <tr>\n <th>29</th>\n <td> Algeria</td>\n <td> 1977</td>\n <td> 17152804</td>\n <td> Africa</td>\n <td> 58.014</td>\n <td> 4910.416756</td>\n </tr>\n </tbody>\n</table>\n</div>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 50,
"text": " country year pop continent lifeExp gdpPercap\n20 Albania 1992 3326498 Europe 71.581 2497.437901\n21 Albania 1997 3428038 Europe 72.950 3193.054604\n22 Albania 2002 3508512 Europe 75.651 4604.211737\n23 Albania 2007 3600523 Europe 76.423 5937.029526\n24 Algeria 1952 9279525 Africa 43.077 2449.008185\n25 Algeria 1957 10270856 Africa 45.685 3013.976023\n26 Algeria 1962 11000948 Africa 48.303 2550.816880\n27 Algeria 1967 12760499 Africa 51.407 3246.991771\n28 Algeria 1972 14760787 Africa 54.518 4182.663766\n29 Algeria 1977 17152804 Africa 58.014 4910.416756"
}
],
"prompt_number": 50
},
{
"cell_type": "code",
"collapsed": false,
"input": "data = []\nyears = (1987,2007,1952,1967) # ordering of years is funny for placement in subplot quadrants\nsp = [('x','y'), ('x2','y'), ('x','y2'), ('x2','y2')]\n# color scale from d3's 'category10' colorscale\ncmap = {'Asia':'#1f77b4','Europe':'#ff7f0e','Africa':'#2ca02c',\\\n 'Americas':'#d62728','Oceania':'#9467bd'}\nfor i in range(len(years)):\n # grab all rows of this year, turn off Kuwait - its gdp per cap is extremely high\n df = gdp_data[(gdp_data['year']==years[i]) & (gdp_data['country']!='Kuwait')] \n for name, g in df.groupby('continent'):\n mx = g['pop'].max()\n s = [ math.sqrt(j/mx)*60.0 for j in g['pop'] ]\n t = g.apply(lambda x:'Country: %s<br>Life Expectancy: %s<br>GDP per capita: %s<br>Population: %s<br>Year: %s' \\\n % (x['country'], x['lifeExp'],x['gdpPercap'], x['pop'], str(years[i])),axis=1)\n d = {'name':name,'x':g['gdpPercap'],'y':g['lifeExp'],\\\n 'type':'scatter','mode':'markers','text':t,\\\n 'marker': {'color':cmap[name],'size':s, 'opacity':0.9, 'line':{'width':1}}}\n d['xaxis'] = sp[i][0]\n d['yaxis'] = sp[i][1]\n data.append(d) \n\nlayout = { 'showlegend':False, 'width':1000, 'height': 700, 'hovermode':'closest',\\\n 'title':'drag to zoom-in; double-click to zoom-out<br>shift-drag to pan' }\n\nfor ax in ('xaxis','yaxis','xaxis2','yaxis2'):\n layout[ax] = { 'ticks':'','linecolor':'white','showgrid':False,'zeroline':False }\n layout[ax]['domain'] = [0.5,1] if '2' in ax else [0,0.5]\n layout[ax]['title'] = 'gdp per capita (usd, 2000)' if 'x' in ax else 'life expectancy (years)'\n\nlayout['annotations'] = []\n# annotation positions correspond to order of 'years' tuple: 1987,2007,1952,1967\nanno_positions = ((0.05,0.4), (0.6,0.4), (0.05,0.95), (0.6,0.95))\n\nfor yr in range(len(years)):\n anno_obj = { 'xref': 'paper', 'yref': 'paper', 'showarrow': False,\\\n 'font': {'family':'','size': 36,'color':'rgb(23, 190, 207)'} } \n anno_obj['x'] = anno_positions[yr][0]\n anno_obj['y'] = anno_positions[yr][1]\n anno_obj['text'] = str(years[yr])\n layout['annotations'].append( anno_obj )\n\npy.iplot(data,layout=layout,width=1050,height=750)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\n\n\n"
},
{
"html": "<iframe height=\"800\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~jackp/1309/1050/750\" width=\"1100\"></iframe>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 159,
"text": "<IPython.core.display.HTML at 0x10798e8d0>"
}
],
"prompt_number": 159
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "5. NYT Graphics with Plotly"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This graph is a remake from a 2009 NYT article <a href=\"http://www.nytimes.com/interactive/2009/03/01/business/20090301_WageGap.html\">Why is Her Paycheck Smaller?</a>. Hover over points to see the occupation and salary gap. Male physicians make 40% more salary on average than females? Its not a bubble chart, but it shows some impressive use of hover text."
},
{
"cell_type": "code",
"collapsed": false,
"input": "service = {'name': 'Service, sales, and office',\n\t\t 'x': [359, 363, 381, 403, 413, 429, 437, 439, 448, 473, 501, 502, 505, 509, 523, 532, 547, 556, 574, 575, 584, 585, 594,\\\n 618, 629, 635, 647, 662, 683, 689, 744, 800, 807, 887, 896, 905, 931, 958, 973, 1024, 1239],\n\t\t 'y': [333, 351, 341, 348, 363, 385, 379, 357, 442, 389, 483, 418, 522, 503, 468, 486, 406, 568, 521, 489, 550, 426, 525,\\\n 519, 513, 414, 555, 608, 579, 602, 541, 678, 855, 660, 684, 796, 806, 645, 787, 705, 1030],\n\t\t 'text': ['Food preparation workers<br>Women make 8% less than men', \n\t\t \t\t\t'Dining room attendants and bartender helpers<br>Women make 6% less than men',\n\t\t\t\t\t'Cooks<br>Women make 9% less than men', 'Cashiers<br>Women make 15% less than men',\n\t\t\t\t\t'Waiters and waitresses<br>Women make 13% less than men', 'Telemarketers<br>Women make 7% less than men',\n\t\t\t\t\t'Personal home and care aides<br>Women make 14% less than men', 'Maids and housekeeping cleaners<br>Women make 18% less than men',\n\t\t\t\t\t'Stock clerks<br>Women make 1% less than men', 'Janitors and building cleaners<br>Women make 18% less than men',\n\t\t\t\t\t'Receptionists<br>Women make 4% less than men', 'Nursing, psychiatric, and home health aides<br>Women make 16% less than men', \n\t\t\t\t\t'Data entry keyers<br>Women make 1% more than men', 'Shipping and receiving clerks<br>Women make 2% less than men', \n\t\t\t\t\t'Security guards<br>Women make 11% less than men', 'Chefs and head cooks<br>Women make 9% less than men',\n\t\t\t\t\t'Bartenders<br>Women make 26% less than men', 'Ticket agents and travel clerks<br>Women make 9% less than men',\n\t\t\t\t\t'File clerks<br>Women make 9% less than men', 'Medical assistants<br>Women make 15% less than men',\n\t\t\t\t\t'Office clerks<br>Women make 5% less than men', 'Supervisors of food preparation workers<br>Women make 27% less than men',\n\t\t\t\t\t'Bill and account collectors<br>Women make 11% less than men', 'Customer service representatives<br>Women make 14% less than men',\n\t\t\t\t\t'Recreation and fitness workers<br>Women make 18% less than men', 'Retail sales workers<br>Women make 35% less than men',\n\t\t\t\t\t'Dispatchers<br>Women make 15% less than men', 'Bookkeeping, accounting, and auditing clerks<br>Women make 9% less than men',\n\t\t\t\t\t'Bailers, correctional officers, and jailers<br>Women make 15% less than men', 'Secretaries and administrative assistants<br>Women make 13% less than men',\n\t\t\t\t\t'Supervisors of retail workers<br>Women make 27% less than men', 'Supervisors of office and administrative support<br>Women make 15% less than men',\n\t\t\t\t\t'Postal service clerks<br>Women make 4% more than men', 'Production clerks<br>Women make 25% less than men', \n\t\t\t\t\t'Advertising sales agents<br>Women make 24% less than men', 'Police officers<br>Women make 13% less than men', \n\t\t\t\t\t'Postal service mail carriers<br>Women make 13% less than men', 'Insurance sales agents<br>Women make 32% less than men', \n\t\t\t\t\t'Sales representatives<br>Women make 19% less than men', 'Real estate brokers<br>Women make 31% less than men',\n\t\t\t\t\t'Financial services sales agents<br>Women make 17% less than men'],\n\t\t\t\t'type':'scatter',\n\t\t\t\t'mode':'markers',\n\t\t\t\t'marker':{'size':9,'color':'#CA4D64'}}\n\nproduction = {'name': 'Production and transportation',\n\t\t 'x': [482, 495, 499, 541, 543, 564, 590, 668, 732, 858],\n\t\t 'y': [419, 342, 403, 448, 474, 411, 484, 504, 514, 619],\n\t\t 'text': ['Laborers and freight movers<br>Women make 13% less than men', \n\t\t \t\t\t'Laundry workers<br>Women make 31% less than men',\n\t\t\t\t\t'Bakers<br>Women make 18% less than men',\n\t\t\t\t\t'Electronics assemblers<br>Women make 17% less than men', 'Bus drivers<br>Women make 11% less than men',\n\t\t\t\t\t'Butchers and other meat processing workers<br>Women make 27% less than men', 'Metal workers and plastic workers<br>Women make 18% less than men',\n\t\t\t\t\t'Truck drivers<br>Women make 25% less than men', 'Inspectors and testers<br>Women make 31% less than men',\n\t\t\t\t\t'Supervisors of production workers<br>Women make 28% less than men'],\n\t\t\t'type':'scatter',\n\t\t\t'mode':'markers',\n\t\t\t'marker':{'size':9,'color':'#88A4B8'}}\n\nscience = {'name': 'Science, computers, and health care',\n\t\t 'x': [683, 904, 1050, 1050, 1098, 1159, 1239, 1245, 1266, 1351, 1367, 1508, 1793, 1883],\n\t\t 'y': [540, 766, 848, 805, 983, 1039, 1045, 1099, 1078, 985, 861, 1323, 1065, 1609],\n\t\t 'text': ['Health diagnosing and treatment technicians<br>Women make 21% less than men', \n\t\t \t\t\t'Computer support specialists<br>Women make 15% less than men',\n\t\t\t\t\t'Diagnostic related technicians<br>Women make 19% less than men',\n\t\t\t\t\t'Clinical laboratory technicians<br>Women make 23% less than men', 'Registered nurses<br>Women make 11% less than men',\n\t\t\t\t\t'Market and survey researchers<br>Women make 10% less than men', 'Computer scientists and system analysts<br>Women make 16% less than men',\n\t\t\t\t\t'Physical therapists<br>Women make 12% less than men', 'Computer programmers<br>Women make 15% less than men',\n\t\t\t\t\t'Chemists and material scientists<br>Women make 27% less than men', 'Medical scientists<br>Women make 37% less than men',\n\t\t\t\t\t'Computer software engineers<br>Women make 12% less than men', 'Physicians and surgeons<br>Women make 40% less than men',\n\t\t\t\t\t'Pharmacists<br>Women make 15% less than men'],\n\t\t 'type':'scatter',\n\t\t 'mode':'markers',\n 'marker':{'size':9,'color':'#8D9F69'}}\n\nmanagement = {'name': 'Management, business, and financial',\n\t\t 'x': [730, 790, 964, 991, 1038, 1064, 1123, 1129, 1183, 1328, 1364, 1376, 1384, 1409, 1450, 1511, 1507, 1597, 1917],\n\t\t 'y': [586, 741, 731, 747, 814, 920, 749, 849, 863, 993, 967, 1051, 1088, 1072, 915, 1033, 1077, 1370, 1540],\n\t\t 'text': ['Food service managers<br>Women make 20% less than men', \n\t\t \t\t\t'Whole and retail buyers<br>Women make 15% less than men',\n\t\t\t\t\t'Property managers<br>Women make 24% less than men',\n\t\t\t\t\t'Claims adjusters<br>Women make 24% less than men', 'Human resources specialists<br>Women make 21% less than men',\n\t\t\t\t\t'Social and community service managers<br>Women make 14% less than men', 'Compliance officers<br>Women make 33% less than men',\n\t\t\t\t\t'Loan officers<br>Women make 25% less than men', 'Accountants and auditors<br>Women make 27% less than men',\n\t\t\t\t\t'General and operations managers<br>Women make 25% less than men', 'Education administrators<br>Women make 29% less than men',\n\t\t\t\t\t'Personal financial advisors<br>Women make 23% less than men', 'Management analysts<br>Women make 21% less than men',\n\t\t\t\t\t'Medical and health services managers<br>Women make 24% less than men', 'Financial managers<br>Women make 37% less than men',\n\t\t\t\t\t'Marketing and sales managers<br>Women make 31% less than men', 'Human resources managers<br>Women make 32% less than men',\n\t\t\t\t\t'Computer and information systems managers<br>Women make 14% less than men', 'Chief executives<br>Women make 19% less than men'],\n\t\t\t'type':'scatter',\n\t\t\t'mode':'markers',\n\t\t\t'marker':{'size':9,'color':'#005082'}}\n\nentertainment = {'name': 'Entertainment, education, and law',\n\t\t 'x': [762, 828, 860, 890, 938, 957, 979, 1000, 1236, 1779],\n\t\t 'y': [755, 723, 887, 702, 848, 781, 809, 904, 971, 1388],\n\t\t 'text': ['Social workers<br>Women make 1% less than men', \n\t\t \t\t\t'Counselors<br>Women make 13% less than men',\n\t\t\t\t\t'Special education teachers<br>Women make 3% more than men',\n\t\t\t\t\t'Designers<br>Women make 22% less than men', 'Elementary and middle school teachers<br>Women make 9% less than men', \n\t\t\t\t\t'Engineering technicians, except drafters<br>Women make 18% less than men', 'Editors<br>Women make 17% less than men', \n\t\t\t\t\t'High school teachers<br>Women make 10% less than men', 'Professors and postsecondary teachers<br>Women make 22% less than men',\n\t\t\t\t\t'Lawyers<br>Women make 22% less than men'],\n\t\t\t'type':'scatter',\n\t\t\t'mode':'markers',\n\t\t\t'marker':{'size':9,'color':'#D28628'}}\n\nblank = {'name': '', 'x': [0, 0], 'y': [0, 0], 'line':{'color':'#F0F0F0','width':3},'mode':'lines'}\n\nsource = {'name': '<i> Source: Bureau of Labor Statistics:</i><br><i>Census Bureau </i>', 'x': [0, 0], 'y': [0, 0],\\\n 'line':{'color':'#F0F0F0','width':3},'mode':'lines'}\n\nequal = {'name': 'Equal Wages', 'x': [0, 1650], 'y': [0, 1650],'line':{'color':'black','width':3},'mode':'lines'}\n\ntenpercentless = {'name': 'Women earn 10% less than men','x': [0, 1833.3], 'y': [0, 0.9*1833.3],'line':{'color':'#606060', 'width':2},'mode':'lines'}\ntwentypercentless = {'name': 'Women earn 20% less than men', 'x': [0, 2062.5], 'y': [0, 0.8*2062.5],'line':{'color':'#909090','width':2},'mode':'lines'}\nthirtypercentless = {'name': 'Women earn 30% less than men', 'x': [0, 2357.143], 'y': [0, 0.7*2357.143],'line':{'color':'#D2D2D2','width':2},'mode':'lines'}\n\nlayout = {'autosize':False,\n\t\t\t'font':{'color':\"rgb(33, 33, 33)\",'family':\"Arial, sans-serif\",'size':12},\n\t\t\t'height':650,\n\t\t\t'width':1100,\n\t\t\t'xaxis':{\n\t\t\t\t'range':[0,2437],\n\t\t\t\t'type': 'linear',\n\t\t\t\t'ticks': 'none',\n\t\t\t\t'autorange': False,\n\t\t\t\t'zeroline': False,\n\t\t\t\t'mirror': False,\n\t\t\t\t'linecolor':'white',\n\t\t\t\t'tickcolor':'white',\n\t\t\t\t'autotick':False,\n\t\t\t\t'dtick': 250,\n\t\t\t\t'gridwidth': .7\n\t\t\t},\n\t\t\t'yaxis':{\n\t\t\t\t'range':[0,1650],\n\t\t\t\t'type': 'linear',\n\t\t\t\t'ticks': 'none',\n\t\t\t\t'autorange': False,\n\t\t\t\t'zeroline': False,\n\t\t\t\t'mirror': False,\n\t\t\t\t'linecolor':'white',\n\t\t\t\t'tickcolor':'white',\n\t\t\t\t'autotick':False,\n\t\t\t\t'dtick': 250,\n\t\t\t\t'gridwidth': .7\n\t\t\t},\n\t\t\t'legend':{\n\t\t\t\t 'bgcolor': \"#F0F0F0\",\n\t\t\t\t 'bordercolor': \"#F0F0F0\",\n\t\t\t\t 'borderwidth': 10,\n\t\t\t\t 'x': 0.0845912623961403,\n\t\t\t\t 'y': 0.9811399147727271,\n\t\t\t\t 'traceorder': 'reversed'\n\t\t\t},\n\t\t\t'margin':{'b':80,'l':100,'pad':2,'r':250,'t':80},\n\t\t\t'annotations':[{\n\t\t\t\t\t'text':\"<i>Source: Bureau of Labor Statistics - Census Bureau</i>\",\n\t\t\t\t\t'x':2100,\n\t\t\t\t\t'y':30,\n\t\t\t\t\t'showarrow':False,\n\t\t\t\t\t'ref':'plot',\n\t\t\t\t\t'align':'left',\n\t\t\t\t\t'font':{'size':'9'}\n\t\t\t\t},{\n\t\t\t\t\t'text':\"<b>Men's</b> median weekly earnings\",\n\t\t\t\t\t'x':1120,\n\t\t\t\t\t'y':70,\n\t\t\t\t\t'showarrow':False,\n\t\t\t\t\t'ref':'plot',\n\t\t\t\t\t'align':'left',\n\t\t\t\t\t'font':{'size':'13'}\n\t\t\t\t},{\n\t\t\t\t\t'text':\"<b>Women's</b><br>median weekly<br>earnings\",\n\t\t\t\t\t'x':40,\n\t\t\t\t\t'y':870,\n\t\t\t\t\t'showarrow':False,\n\t\t\t\t\t'ref':'plot',\n\t\t\t\t\t'align':'left',\n\t\t\t\t\t'font':{'size':'13'}\n\t\t\t\t},{\n\t\t\t\t\t'text':\"<i>Roll over</i><br><i>dots for</i><br><i>information</i>\",\n\t\t\t\t\t'x':1870,\n\t\t\t\t\t'y':630,\n\t\t\t\t\t'showarrow':False,\n\t\t\t\t\t'ref':'plot',\n\t\t\t\t\t'align':'left',\n\t\t\t\t\t'font':{'size':'14'}\n\t\t\t\t},\n\t\t\t]\n\t\t}\npy.iplot([equal, tenpercentless, twentypercentless, thirtypercentless,\\\n service, production, science, management, entertainment], layout=layout, width=1150, height=700)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\n\n\n"
},
{
"html": "<iframe height=\"750\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~jackp/1316/1150/700\" width=\"1200\"></iframe>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 23,
"text": "<IPython.core.display.HTML at 0x10777e6d0>"
}
],
"prompt_number": 23
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment