Created
August 19, 2021 00:26
-
-
Save tonyfast/d01a0248979b21c9484d0827e04c0e68 to your computer and use it in GitHub Desktop.
using templates to generate alt text for figures in matplotib
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "f1dd85ab-e176-4915-bf8a-79adf54f942d", | |
"metadata": {}, | |
"source": [ | |
"# generating alt text for an image from a dataframe.\n", | |
"\n", | |
"to share generated alt text we need to understand that both the plots and alt text are projections of a dataframe;\n", | |
"one in pure form and the other in pure typography. to generate an example scenario we need:\n", | |
"1. a dataframe\n", | |
"2. a plot of the dataframe\n", | |
"3. formatted text derived from the dataframe\n", | |
"\n", | |
"warning: the alt text in this example can be improve, please help me. i wanted to demonstrate an end to end workflow for generated alt text. \n", | |
"\n", | |
"consult with [chartability](https://chartability.fizz.studio/ \"a methodology for ensuring that data visualizations, systems, and interfaces are accessible\")." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "83b2c83c-ff70-4c71-a8cd-7f3bd2a4a1b7", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" %matplotlib agg\n", | |
" import pandas, IPython.display as display, io, jinja2, base64" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "312d441d-0c83-4c2b-9c75-23b833d34f2b", | |
"metadata": {}, | |
"source": [ | |
"create some sample data `df`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "60e47590-5c96-4575-b5e5-433dfd6ae9a8", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/home/tonyfast/miniforge3/lib/python3.9/site-packages/pandas/util/__init__.py:15: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n", | |
" import pandas.util.testing\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>DeWbsEAqVl</th>\n", | |
" <th>u7Shqke0mg</th>\n", | |
" <th>DNIa9Ydxjh</th>\n", | |
" <th>JhBydInTBG</th>\n", | |
" <th>QzFCquTMgL</th>\n", | |
" <th>iiTLaCjbvK</th>\n", | |
" <th>eTJR8I4inM</th>\n", | |
" <th>VeRukIzX7j</th>\n", | |
" <th>3t3PjjOxRt</th>\n", | |
" <th>7uNCtGznIg</th>\n", | |
" <th>...</th>\n", | |
" <th>ZWDCUmkC9H</th>\n", | |
" <th>SKxnmjbznW</th>\n", | |
" <th>OJZQgDldhm</th>\n", | |
" <th>d8jXEUjVX1</th>\n", | |
" <th>zZwmXOGn2R</th>\n", | |
" <th>itPeesfNjT</th>\n", | |
" <th>nCUaPaX4hN</th>\n", | |
" <th>7XYLZeOtEn</th>\n", | |
" <th>7eun8Zwowd</th>\n", | |
" <th>bvKwo0nVBO</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>A</th>\n", | |
" <td>-0.513266</td>\n", | |
" <td>-0.569885</td>\n", | |
" <td>1.046707</td>\n", | |
" <td>-0.040230</td>\n", | |
" <td>0.252344</td>\n", | |
" <td>-0.869195</td>\n", | |
" <td>0.400589</td>\n", | |
" <td>1.186798</td>\n", | |
" <td>1.121367</td>\n", | |
" <td>0.379662</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.295613</td>\n", | |
" <td>-0.130283</td>\n", | |
" <td>1.396725</td>\n", | |
" <td>1.436834</td>\n", | |
" <td>-0.376181</td>\n", | |
" <td>0.083873</td>\n", | |
" <td>0.432758</td>\n", | |
" <td>0.569672</td>\n", | |
" <td>1.730417</td>\n", | |
" <td>-0.608284</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>B</th>\n", | |
" <td>-0.053815</td>\n", | |
" <td>1.635761</td>\n", | |
" <td>0.236135</td>\n", | |
" <td>0.674369</td>\n", | |
" <td>0.299866</td>\n", | |
" <td>-0.743593</td>\n", | |
" <td>1.655680</td>\n", | |
" <td>-0.240112</td>\n", | |
" <td>0.203308</td>\n", | |
" <td>1.455075</td>\n", | |
" <td>...</td>\n", | |
" <td>0.017701</td>\n", | |
" <td>-0.122268</td>\n", | |
" <td>-0.969117</td>\n", | |
" <td>-0.305535</td>\n", | |
" <td>-1.006477</td>\n", | |
" <td>-0.807684</td>\n", | |
" <td>-0.169746</td>\n", | |
" <td>1.277646</td>\n", | |
" <td>0.098450</td>\n", | |
" <td>-0.430554</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>C</th>\n", | |
" <td>-1.979477</td>\n", | |
" <td>0.762999</td>\n", | |
" <td>0.931630</td>\n", | |
" <td>-0.867674</td>\n", | |
" <td>1.146874</td>\n", | |
" <td>-0.828406</td>\n", | |
" <td>1.091406</td>\n", | |
" <td>-0.166953</td>\n", | |
" <td>-0.571468</td>\n", | |
" <td>2.372057</td>\n", | |
" <td>...</td>\n", | |
" <td>1.664442</td>\n", | |
" <td>0.899626</td>\n", | |
" <td>-1.266544</td>\n", | |
" <td>1.674116</td>\n", | |
" <td>-1.782052</td>\n", | |
" <td>-1.735131</td>\n", | |
" <td>0.320543</td>\n", | |
" <td>-0.457179</td>\n", | |
" <td>1.483902</td>\n", | |
" <td>0.120748</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>D</th>\n", | |
" <td>0.234598</td>\n", | |
" <td>-0.114717</td>\n", | |
" <td>-0.136272</td>\n", | |
" <td>-1.135210</td>\n", | |
" <td>0.606916</td>\n", | |
" <td>-0.102985</td>\n", | |
" <td>-0.614704</td>\n", | |
" <td>-1.935681</td>\n", | |
" <td>-0.640178</td>\n", | |
" <td>-0.397213</td>\n", | |
" <td>...</td>\n", | |
" <td>0.186925</td>\n", | |
" <td>-2.806571</td>\n", | |
" <td>0.806676</td>\n", | |
" <td>-0.019412</td>\n", | |
" <td>-0.824382</td>\n", | |
" <td>-1.037335</td>\n", | |
" <td>-1.542543</td>\n", | |
" <td>1.022760</td>\n", | |
" <td>-0.300296</td>\n", | |
" <td>0.441676</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>4 rows × 30 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" DeWbsEAqVl u7Shqke0mg DNIa9Ydxjh JhBydInTBG QzFCquTMgL iiTLaCjbvK \\\n", | |
"A -0.513266 -0.569885 1.046707 -0.040230 0.252344 -0.869195 \n", | |
"B -0.053815 1.635761 0.236135 0.674369 0.299866 -0.743593 \n", | |
"C -1.979477 0.762999 0.931630 -0.867674 1.146874 -0.828406 \n", | |
"D 0.234598 -0.114717 -0.136272 -1.135210 0.606916 -0.102985 \n", | |
"\n", | |
" eTJR8I4inM VeRukIzX7j 3t3PjjOxRt 7uNCtGznIg ... ZWDCUmkC9H \\\n", | |
"A 0.400589 1.186798 1.121367 0.379662 ... -0.295613 \n", | |
"B 1.655680 -0.240112 0.203308 1.455075 ... 0.017701 \n", | |
"C 1.091406 -0.166953 -0.571468 2.372057 ... 1.664442 \n", | |
"D -0.614704 -1.935681 -0.640178 -0.397213 ... 0.186925 \n", | |
"\n", | |
" SKxnmjbznW OJZQgDldhm d8jXEUjVX1 zZwmXOGn2R itPeesfNjT nCUaPaX4hN \\\n", | |
"A -0.130283 1.396725 1.436834 -0.376181 0.083873 0.432758 \n", | |
"B -0.122268 -0.969117 -0.305535 -1.006477 -0.807684 -0.169746 \n", | |
"C 0.899626 -1.266544 1.674116 -1.782052 -1.735131 0.320543 \n", | |
"D -2.806571 0.806676 -0.019412 -0.824382 -1.037335 -1.542543 \n", | |
"\n", | |
" 7XYLZeOtEn 7eun8Zwowd bvKwo0nVBO \n", | |
"A 0.569672 1.730417 -0.608284 \n", | |
"B 1.277646 0.098450 -0.430554 \n", | |
"C -0.457179 1.483902 0.120748 \n", | |
"D 1.022760 -0.300296 0.441676 \n", | |
"\n", | |
"[4 rows x 30 columns]" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" df = pandas.util.testing.makeDataFrame(); df.T" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "e0e6f2dd-cdb3-4846-ba2c-40a35488044c", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>count</th>\n", | |
" <th>mean</th>\n", | |
" <th>std</th>\n", | |
" <th>min</th>\n", | |
" <th>25%</th>\n", | |
" <th>50%</th>\n", | |
" <th>75%</th>\n", | |
" <th>max</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>A</th>\n", | |
" <td>30.0</td>\n", | |
" <td>0.309370</td>\n", | |
" <td>0.864548</td>\n", | |
" <td>-0.891069</td>\n", | |
" <td>-0.356039</td>\n", | |
" <td>0.113096</td>\n", | |
" <td>1.033456</td>\n", | |
" <td>2.519034</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>B</th>\n", | |
" <td>30.0</td>\n", | |
" <td>0.119557</td>\n", | |
" <td>0.923898</td>\n", | |
" <td>-1.570313</td>\n", | |
" <td>-0.563011</td>\n", | |
" <td>0.055682</td>\n", | |
" <td>0.754481</td>\n", | |
" <td>1.752108</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>C</th>\n", | |
" <td>30.0</td>\n", | |
" <td>-0.007826</td>\n", | |
" <td>1.213352</td>\n", | |
" <td>-1.979477</td>\n", | |
" <td>-0.872796</td>\n", | |
" <td>0.052330</td>\n", | |
" <td>0.923629</td>\n", | |
" <td>2.372057</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>D</th>\n", | |
" <td>30.0</td>\n", | |
" <td>-0.505479</td>\n", | |
" <td>0.884101</td>\n", | |
" <td>-2.806571</td>\n", | |
" <td>-1.074367</td>\n", | |
" <td>-0.393693</td>\n", | |
" <td>0.135341</td>\n", | |
" <td>1.022760</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" count mean std min 25% 50% 75% max\n", | |
"A 30.0 0.309370 0.864548 -0.891069 -0.356039 0.113096 1.033456 2.519034\n", | |
"B 30.0 0.119557 0.923898 -1.570313 -0.563011 0.055682 0.754481 1.752108\n", | |
"C 30.0 -0.007826 1.213352 -1.979477 -0.872796 0.052330 0.923629 2.372057\n", | |
"D 30.0 -0.505479 0.884101 -2.806571 -1.074367 -0.393693 0.135341 1.022760" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" statistics = df.describe(); statistics.T" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "0237333b-0413-463a-a7c7-a58154480714", | |
"metadata": {}, | |
"source": [ | |
"## about the data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "ccbf698d-72bf-47a2-8ba0-45794305af51", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/markdown": [ | |
"`df` is a dataframe with 30 rows and 4 columns with the names A, B, C, D" | |
], | |
"text/plain": [ | |
"<IPython.core.display.Markdown object>" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" display.Markdown(F\"`df` is a dataframe with {len(df)} rows and {len(df.columns)} columns with the names {', '.join(df.columns)}\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3257f61c-ef66-49d7-9104-149f64e8d5f5", | |
"metadata": {}, | |
"source": [ | |
"## capturing the `matplotlib` figure\n", | |
"\n", | |
"using `io.BytesIO` to read the figure from the pipe and for the image a base64 encoded [data uri](https://en.wikipedia.org/wiki/Data_URI_scheme \"data uri scheme wiki\"). this first example uses a `boxplot`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "6ae4fc03-834c-47ae-b8f2-93f97167e08d", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" data = io.BytesIO()\n", | |
" df.plot.box().figure.savefig(data)\n", | |
" image = F\"data:image/png;base64,{base64.b64encode(data.getvalue()).decode()}\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "ee8406b6-f208-4922-b761-fdfde259c231", | |
"metadata": {}, | |
"source": [ | |
"write a formatted string describing the `boxplot` figure." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "1b5365d4-26bc-422b-9191-05575869c615", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" alt = F\"\"\"A box plot showing the columns with names {\", \".join(df.columns)}. The averages for each columns are: {\n", | |
" \", \".join(f'{k} is {v:.2f}' for k, v in statistics.loc['mean'].items())\n", | |
" }.\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "dfa2735f-318e-4ce4-90aa-e0a404d0349c", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/markdown": [ | |
"" | |
], | |
"text/plain": [ | |
"<IPython.core.display.Markdown object>" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" display.Markdown(F\"\"\"\"\"\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8f944b44-4d06-4c7b-b171-0f52d311c536", | |
"metadata": {}, | |
"source": [ | |
"other figures with need different alt text, and `jinja2` will be the most powerful candidate for templating. we put these concepts together in another figure below." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4717a172-2e75-4e64-a690-d28968abe2e0", | |
"metadata": {}, | |
"source": [ | |
"`capture` the figure as a data uri" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "b6dc4f34-4e74-4c61-9406-b2a254d8bd77", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" def capture(figure):\n", | |
" buffer = io.BytesIO()\n", | |
" figure.savefig(buffer)\n", | |
" return F\"data:image/png;base64,{base64.b64encode(buffer.getvalue()).decode()}\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "a43ae6c0-374c-408a-ad21-28491c44c94b", | |
"metadata": {}, | |
"source": [ | |
"use a template for the alt text to make an `accessible` figure." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"id": "13f6f5f5-4c94-4ed9-bb9d-8450f205d7c1", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
" def accessible(figure, template, **kwargs):\n", | |
" return display.Markdown(F\"\"\"} \"{template.render(**globals(), **kwargs)}\")\"\"\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "d94fae2f-747b-4b31-952a-0d44e14cbfc6", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/markdown": [ | |
"" | |
], | |
"text/plain": [ | |
"<IPython.core.display.Markdown object>" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
" accessible(df.plot.scatter(*\"AB\").figure, jinja2.Template(\"A scatter plot comparing {{len(df)}} points of A vs B.\"), len=len)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "f0ec843b-194a-485b-bc6e-81101b464749", | |
"metadata": {}, | |
"source": [ | |
"These alt text examples are probably incomplete, but the goal was to demontrate templating alt text from generated data with `pandas` and `matplotlib`" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.9.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment