Last active
August 29, 2015 14:12
-
-
Save kowey/e2370c5b8cbb2f2e01d1 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:cfd825ef318eb3d896cf218060d459df23b04a94c05ea842ebee858bae869d2b" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's read a small CSV file into Orange:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from __future__ import print_function\n", | |
"import Orange\n", | |
"\n", | |
"FILE_A = 'tiny.attach.tab'\n", | |
"table_a = Orange.data.Table(FILE_A)\n", | |
"\n", | |
"print(table_a.domain)\n", | |
"for inst in table_a:\n", | |
" print(inst)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"[id, foo, CLASS]\n", | |
"['a1', 'x', 'False']\n", | |
"['a2', 'y', 'True']\n", | |
"['a3', 'z', 'False']\n" | |
] | |
} | |
], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"How nice. We can also do things like ask for the range of possible values associated with a variable" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"class_values = table_a.domain['CLASS']\n", | |
"list(class_values)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 2, | |
"text": [ | |
"[<orange.Value 'CLASS'='False'>, <orange.Value 'CLASS'='True'>]" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Splendid! So in the A table domain, the 'CLASS' variable is associated with the values 'True' and 'False'.\n", | |
"\n", | |
"So shall we look at a second table?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"FILE_B = 'tiny.relate.tab'\n", | |
"table_b = Orange.data.Table(FILE_B)\n", | |
"\n", | |
"print(table_b.domain)\n", | |
"for inst in table_b:\n", | |
" print(inst)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"[id, bar, CLASS]\n", | |
"['a1', 'b', 'Narration']\n", | |
"['a2', 'd', 'Elaboration']\n", | |
"['a3', 'e', 'Background']\n" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"So far so good, now what do you think should be the values associated with variable 'CLASS' in table B?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"list(table_b.domain['CLASS'])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 4, | |
"text": [ | |
"[<orange.Value 'CLASS'='False'>,\n", | |
" <orange.Value 'CLASS'='True'>,\n", | |
" <orange.Value 'CLASS'='Background'>,\n", | |
" <orange.Value 'CLASS'='Elaboration'>,\n", | |
" <orange.Value 'CLASS'='Narration'>]" | |
] | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Wait, what? Where did 'False' and 'True' come from?\n", | |
"Hang on, let's check back in table A" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"list(table_a.domain['CLASS'])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 5, | |
"text": [ | |
"[<orange.Value 'CLASS'='False'>,\n", | |
" <orange.Value 'CLASS'='True'>,\n", | |
" <orange.Value 'CLASS'='Background'>,\n", | |
" <orange.Value 'CLASS'='Elaboration'>,\n", | |
" <orange.Value 'CLASS'='Narration'>]" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Huh. Did you know that variable descriptors in Orange were global and of course mutable? I didn't but [looking more closely at the manual](http://docs.orange.biolab.si/reference/rst/Orange.feature.descriptor.html#Orange.feature.Descriptor), it says quite plainly:\n", | |
"\n", | |
"> Orange considers two variables (e.g. in two different data tables) the same if they have the same descriptor. It is allowed - but not recommended - to have different descriptors with the same name.\n", | |
"\n", | |
"This matters particularly if we are use two different tables that have a local 'CLASS' feature, particularly if we're doing things like looking up the probability for 'True' by its presumed index `1` (`\u0ca0_\u0ca0`)" | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CLASS id foo | |
d d d | |
class | |
False a1 x | |
True a2 y | |
False a3 z |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CLASS id bar | |
d d d | |
class | |
Narration a1 b | |
Elaboration a2 d | |
Background a3 e |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment