Skip to content

Instantly share code, notes, and snippets.

@taldcroft
Created September 30, 2014 14:26
Show Gist options
  • Save taldcroft/04842c847193579a5bab to your computer and use it in GitHub Desktop.
Save taldcroft/04842c847193579a5bab to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:86e93fe7a23208c585c0c1f576f1901fcbd70a50706a7bb98b0f1d69bd6c29ee"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Astropy MaskedColumn troubles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I've been struggling with processing table data with missing information using `Astropy.table.Table`\n",
"\n",
"Here's a simple test case that illustrates the issue I was running into.\n",
"\n",
"Bug or feature?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import sys\n",
"import numpy as np\n",
"import astropy\n",
"print(sys.version)\n",
"print(np.__version__)\n",
"print(astropy.__version__)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2.7.8 |Continuum Analytics, Inc.| (default, Aug 21 2014, 15:21:46) \n",
"[GCC 4.2.1 (Apple Inc. build 5577)]\n",
"1.8.1\n",
"1.0.dev9990\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Define a test case"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"from astropy.table import Table, Column, MaskedColumn\n",
"\n",
"def make_test_table():\n",
" # Make a test table\n",
" t = Table(masked=True)\n",
" t['a'] = Column([1, 2, 3, 4])\n",
" t['b'] = MaskedColumn([11, 22, 33, 44], mask=[True, True, False, False])\n",
" t['c'] = MaskedColumn([111, 222, 333, 444], mask=[True, False, True, False])\n",
" return t\n",
"\n",
"def show_table(heading, table, verbose=True):\n",
" print('\\n\\n*** {} ***\\n'.format(heading))\n",
" if verbose:\n",
" print(Table(np.array(t))) # show data\n",
" print(Table(t.mask)) # show mask\n",
" print(t) # show masked table"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Example 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'm trying to update missing information in column `b` from column `c`.\n",
"\n",
"Apparently this doesn't work. I think this is a common use case? Is it documented how to do this correctly?\n",
"\n",
"What happens is that the data is updated, but not the mask ... I'm not exactly sure what happens to the mask."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"t = make_test_table()\n",
"show_table('Before', t)\n",
"\n",
"# Try to update missing information in column `b` from column `c`\n",
"mask = t['b'].mask\n",
"t['b'][mask] = t['c'][mask]\n",
"print('mask: ', mask)\n",
"\n",
"show_table('After', t)\n",
"\n",
"print('This is weird ... should a table ever be in such a state???')\n",
"print(t['b'][1])\n",
"print(t[1]['b'])\n",
"print('The reason is that the `b` column mask is not the one shown when I print the table:')\n",
"print(t['b'].mask)\n",
"print(t.mask['b'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"*** Before ***\n",
"\n",
" a b c \n",
"--- --- ---\n",
" 1 11 111\n",
" 2 22 222\n",
" 3 33 333\n",
" 4 44 444\n",
" a b c \n",
"----- ----- -----\n",
"False True True\n",
"False True False\n",
"False False True\n",
"False False False\n",
" a b c \n",
"--- --- ---\n",
" 1 -- --\n",
" 2 -- 222\n",
" 3 33 --\n",
" 4 44 444\n",
"('mask: ', array([ True, True, False, False], dtype=bool))\n",
"\n",
"\n",
"*** After ***\n",
"\n",
" a b c \n",
"--- --- ---\n",
" 1 111 111\n",
" 2 222 222\n",
" 3 33 333\n",
" 4 44 444\n",
" a b c \n",
"----- ----- -----\n",
"False True True\n",
"False False False\n",
"False False True\n",
"False False False\n",
" a b c \n",
"--- --- ---\n",
" 1 -- --\n",
" 2 222 222\n",
" 3 33 --\n",
" 4 44 444\n",
"This is weird ... should a table ever be in such a state???\n",
"222\n",
"222\n",
"The reason is that the `b` column mask is not the one shown when I print the table:\n",
"[ True False False False]\n",
" b \n",
"-----\n",
" True\n",
"False\n",
"False\n",
"False\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Example 2 [THIS IS NOT EXPECTED TO WORK BECAUSE TEMP COPY IS MADE]"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"t = make_test_table()\n",
"show_table('Before', t, verbose=False)\n",
"\n",
"# Try to update missing information in column `b` from column `c`\n",
"# This time we do t[mask]['b'] instead of t['b'][mask].\n",
"# This doesn't change `t`, probably because a temp copy `t[mask]` is modified.\n",
"mask = t['b'].mask\n",
"t[mask]['b'] = t[mask]['c']\n",
"\n",
"show_table('After', t, verbose=False)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Example 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Why does adding an independent column `asdf` change `t.mask['b']` ???"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"t = make_test_table()\n",
"\n",
"# Try to update missing information in column `b` from column `c`\n",
"mask = t['b'].mask\n",
"t['b'][mask] = t['c'][mask]\n",
"\n",
"print(t.mask['b'])\n",
"\n",
"t['asdf'] = np.arange(len(t))\n",
"print(t.mask['b'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" b \n",
"-----\n",
" True\n",
"False\n",
"False\n",
"False\n",
" b \n",
"-----\n",
" True\n",
"False\n",
"False\n",
"False\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment