Skip to content

Instantly share code, notes, and snippets.

@naveenrajm7
Created June 21, 2018 22:13
Show Gist options
  • Save naveenrajm7/361a648cfe3e03f0b1e1de7c61281dfa to your computer and use it in GitHub Desktop.
Save naveenrajm7/361a648cfe3e03f0b1e1de7c61281dfa to your computer and use it in GitHub Desktop.
PgmPy Model for Student Example
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Student Model \n",
"using pgmpy\n",
"\n",
"Below is the model used for this example\n",
"![img](https://www.uni-oldenburg.de/fileadmin/_processed/a/1/csm_Koller_Fig_3.4_Bayesian-student-network_4daf5d8f7a.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building CPD tables\n",
"\n",
"Specifying the cpds of the above figure. each node has a CPD \n",
"The 'TabularCPD' function within the pgmpy.factors.discrete sub-module allows storage and retrieval of CPDs in a tabular format. \n",
"Given the above example, specify all CPDs for the student model:\n",
"* difficulty_cpd\n",
"* intelligence_cpd\n",
"- sat_cpd\n",
"- grade_cpd\n",
"- letter_cpd"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"from pgmpy.factors.discrete import TabularCPD\n",
"from pgmpy.models import BayesianModel\n",
"\n",
"difficulty_cpd = TabularCPD(variable='D',\n",
" variable_card=2,\n",
" values=[[.6, .4]])\n",
"\n",
"intelligence_cpd = TabularCPD(variable='I',\n",
" variable_card=2,\n",
" values=[[.7, .3]])\n",
"\n",
"sat_cpd = TabularCPD(variable='S',\n",
" variable_card=2,\n",
" values=[[.95, 0.2],\n",
" [.05, 0.8]],\n",
" evidence=['I'],\n",
" evidence_card=[2])\n",
"\n",
"# grade\n",
"grade_cpd = TabularCPD(variable='G',\n",
" variable_card=3,\n",
" values=[[.3, .05, .9, .5 ],\n",
" [.4, .25, .08, .3],\n",
" [.3, .7, .02, .2]],\n",
" evidence=['I', 'D'],\n",
" evidence_card=[2, 2])\n",
"\n",
"letter_cpd = TabularCPD(variable='L',\n",
" variable_card=2,\n",
" values=[[.1, 0.4, .99],\n",
" [.9, 0.6, .01]],\n",
" evidence=['G'],\n",
" evidence_card=[3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the Student Model\n",
"In the Student Modelling example, Grade node is a child node of Difficulty and Intelligence nodes. Similarly SAT is a child of Intelligence node. You can start building the Bayesian Model by specifying the dependencies in the Bayesian Network as arguments to BayesianModel() instance:\n",
"```\n",
"[('D', 'G'),\n",
"('I', 'G'),\n",
"('I', 'S'),\n",
"('G', 'L')]\n",
"```\n",
"Assign the instance to student_model."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# buildind model\n",
"student_model = BayesianModel([('D', 'G'),('I', 'G'), ('I', 'S'), ('G', 'L')])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, the Bayesian model is built and stored within the variable student_model. In the subsequent sections, we will see how to use the bayesian model to store probabilistic dependencies between nodes (or variables) and how to make inferences on nodes based on observed evidences.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add CPDs\n",
"In Bayesian Networks, the relationship between nodes are specified by CPDs. In order to start working with a BayesianModel in pgmpy, we need to add CPDs created in the previous sections to the model object.\n",
"\n",
"Add the pre-defined CPDs using BayesianModel's add_cpds() method and then validate the model"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"# adding cpds\n",
"student_model.add_cpds(difficulty_cpd, intelligence_cpd, sat_cpd, grade_cpd, letter_cpd)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Obtain CPDs, Leaves and Independencies\n",
"You can now look at the CPDs, leaves, independencies by invoking the BayesianModel's get_cpds(), get_leaves() and get_independencies() methods respectively."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPDS\n",
"[<TabularCPD representing P(D:2) at 0x7f7ce8087d68>, <TabularCPD representing P(I:2) at 0x7f7ce8087d30>, <TabularCPD representing P(S:2 | I:2) at 0x7f7ce8087da0>, <TabularCPD representing P(G:3 | I:2, D:2) at 0x7f7ce8087cf8>, <TabularCPD representing P(L:2 | G:3) at 0x7f7ce8087cc0>]\n",
"Independencies\n",
"(D _|_ S, I)\n",
"(D _|_ S | I)\n",
"(D _|_ I | S)\n",
"(D _|_ L | G)\n",
"(D _|_ S | L, I)\n",
"(D _|_ L, S | G, I)\n",
"(D _|_ L | S, G)\n",
"(D _|_ S | G, L, I)\n",
"(D _|_ L | G, S, I)\n",
"(I _|_ D)\n",
"(I _|_ D | S)\n",
"(I _|_ L | G)\n",
"(I _|_ L | D, G)\n",
"(I _|_ L | S, G)\n",
"(I _|_ L | D, S, G)\n",
"(L _|_ S | I)\n",
"(L _|_ D, S, I | G)\n",
"(L _|_ S | D, I)\n",
"(L _|_ S, I | D, G)\n",
"(L _|_ D, S | G, I)\n",
"(L _|_ D, I | S, G)\n",
"(L _|_ S | D, G, I)\n",
"(L _|_ I | D, S, G)\n",
"(L _|_ D | G, S, I)\n",
"(S _|_ D)\n",
"(S _|_ D, L, G | I)\n",
"(S _|_ L | G)\n",
"(S _|_ L, G | D, I)\n",
"(S _|_ L | D, G)\n",
"(S _|_ D, G | L, I)\n",
"(S _|_ D, L | G, I)\n",
"(S _|_ G | D, L, I)\n",
"(S _|_ L | D, G, I)\n",
"(S _|_ D | G, L, I)\n",
"(G _|_ S | I)\n",
"(G _|_ S | D, I)\n",
"(G _|_ S | L, I)\n",
"(G _|_ S | D, L, I)\n"
]
}
],
"source": [
"print(\"CPDS\")\n",
"print(student_model.get_cpds())\n",
"#student_model.get_leaves()\n",
"print(\"Independencies\")\n",
"print(student_model.get_independencies())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Verifying the CPDs\n",
"In order to verify the CPDs, we could use the get_cpds method on the student_model.\n",
"```python\n",
"for cpd in fraud_model.get_cpds():\n",
" print(\"CPD of {variable}:\".format(variable=cpd.variable))\n",
" print(cpd)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPD of D:\n",
"╒═════╤═════╕\n",
"│ D_0 │ 0.6 │\n",
"├─────┼─────┤\n",
"│ D_1 │ 0.4 │\n",
"╘═════╧═════╛\n",
"CPD of I:\n",
"╒═════╤═════╕\n",
"│ I_0 │ 0.7 │\n",
"├─────┼─────┤\n",
"│ I_1 │ 0.3 │\n",
"╘═════╧═════╛\n",
"CPD of S:\n",
"╒═════╤══════╤═════╕\n",
"│ I │ I_0 │ I_1 │\n",
"├─────┼──────┼─────┤\n",
"│ S_0 │ 0.95 │ 0.2 │\n",
"├─────┼──────┼─────┤\n",
"│ S_1 │ 0.05 │ 0.8 │\n",
"╘═════╧══════╧═════╛\n",
"CPD of G:\n",
"╒═════╤═════╤══════╤══════╤═════╕\n",
"│ I │ I_0 │ I_0 │ I_1 │ I_1 │\n",
"├─────┼─────┼──────┼──────┼─────┤\n",
"│ D │ D_0 │ D_1 │ D_0 │ D_1 │\n",
"├─────┼─────┼──────┼──────┼─────┤\n",
"│ G_0 │ 0.3 │ 0.05 │ 0.9 │ 0.5 │\n",
"├─────┼─────┼──────┼──────┼─────┤\n",
"│ G_1 │ 0.4 │ 0.25 │ 0.08 │ 0.3 │\n",
"├─────┼─────┼──────┼──────┼─────┤\n",
"│ G_2 │ 0.3 │ 0.7 │ 0.02 │ 0.2 │\n",
"╘═════╧═════╧══════╧══════╧═════╛\n",
"CPD of L:\n",
"╒═════╤═════╤═════╤══════╕\n",
"│ G │ G_0 │ G_1 │ G_2 │\n",
"├─────┼─────┼─────┼──────┤\n",
"│ L_0 │ 0.1 │ 0.4 │ 0.99 │\n",
"├─────┼─────┼─────┼──────┤\n",
"│ L_1 │ 0.9 │ 0.6 │ 0.01 │\n",
"╘═════╧═════╧═════╧══════╛\n"
]
}
],
"source": [
"# Iterate over fraud_model.get_cpds()\n",
"for cpd in student_model.get_cpds():\n",
" print(\"CPD of {variable}:\".format(variable=cpd.variable))\n",
" print(cpd)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computations of Probabilities\n",
"The next logical step will be the computation of probabilities and CPDs of various nodes within the Bayesian Model by specifying evidence. This will give us inferences of different variables based on the evidences observed."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"╒═════╤══════════╕\n",
"│ L │ phi(L) │\n",
"╞═════╪══════════╡\n",
"│ L_0 │ 0.1418 │\n",
"├─────┼──────────┤\n",
"│ L_1 │ 0.8582 │\n",
"╘═════╧══════════╛\n"
]
}
],
"source": [
"from pgmpy.inference.base import Inference\n",
"from pgmpy.factors import factor_product\n",
"\n",
"import itertools\n",
"\n",
"\n",
"class SimpleInference(Inference):\n",
" def query(self, var, evidence):\n",
" # self.factors is a dict of the form of {node: [factors_involving_node]}\n",
" factors_list = set(itertools.chain(*self.factors.values()))\n",
" product = factor_product(*factors_list)\n",
" reduced_prod = product.reduce(evidence, inplace=False)\n",
" reduced_prod.normalize()\n",
" var_to_marg = set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])\n",
" marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)\n",
" return marg_prod\n",
"\n",
"\n",
"infer = SimpleInference(student_model)\n",
"l1 = infer.query(var=['L'], evidence=[('I', 1), ('D', 0)])\n",
"print(l1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### V-structure\n",
"**structure**: X-> W <- Y \n",
"1. X inflences Y , if W is Observered &nbsp;&nbsp;&nbsp; Mathematically, P not |= (X_|_Y | W) \n",
"2. X doesn't influence Y, if W is not Observered &nbsp;&nbsp;&nbsp; Mathematically, P |= (X_|_Y) \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see a V-structure in our example in 'D' -> 'G' <- 'I' \n",
"let us illustrate the two cases \n",
"1\\. 'D' inflences 'I' , if 'G' is Observered &nbsp;&nbsp;&nbsp; Mathematically, P not |= ('D'_|_'I' | 'G') "
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"for D=0\n",
"╒═════╤══════════╕\n",
"│ I │ phi(I) │\n",
"╞═════╪══════════╡\n",
"│ I_0 │ 0.4375 │\n",
"├─────┼──────────┤\n",
"│ I_1 │ 0.5625 │\n",
"╘═════╧══════════╛\n",
"for D=1\n",
"╒═════╤══════════╕\n",
"│ I │ phi(I) │\n",
"╞═════╪══════════╡\n",
"│ I_0 │ 0.1892 │\n",
"├─────┼──────────┤\n",
"│ I_1 │ 0.8108 │\n",
"╘═════╧══════════╛\n"
]
}
],
"source": [
"infer = SimpleInference(student_model)\n",
"l1 = infer.query(var=['I'], evidence=[('D', 0), ('G', 0)])\n",
"print(\"for D=0\")\n",
"print(l1)\n",
"l2 = infer.query(var=['I'], evidence=[('D', 1), ('G', 0)])\n",
"print(\"for D=1\")\n",
"print(l2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can clearly see that 'D' influences 'I' given 'G'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2\\. 'D' doesn't influence 'I', if 'G' is not Observered &nbsp;&nbsp;&nbsp; Mathematically, P |= ('D'_|_'I') \n"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"for D=0\n",
"╒═════╤══════════╕\n",
"│ I │ phi(I) │\n",
"╞═════╪══════════╡\n",
"│ I_0 │ 0.7000 │\n",
"├─────┼──────────┤\n",
"│ I_1 │ 0.3000 │\n",
"╘═════╧══════════╛\n",
"for D=1\n",
"╒═════╤══════════╕\n",
"│ I │ phi(I) │\n",
"╞═════╪══════════╡\n",
"│ I_0 │ 0.7000 │\n",
"├─────┼──────────┤\n",
"│ I_1 │ 0.3000 │\n",
"╘═════╧══════════╛\n"
]
}
],
"source": [
"l3 = infer.query(var=['I'], evidence=[('D', 0)])\n",
"print(\"for D=0\")\n",
"print(l3)\n",
"l4 = infer.query(var=['I'], evidence=[('D', 1)])\n",
"print(\"for D=1\")\n",
"print(l4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can clearly see that 'D' does not influence 'I' when 'G' is not given"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tree-structure\n",
"**structure**: X < - W - > Y \n",
"1. X inflences Y , if W is not Observered &nbsp;&nbsp;&nbsp; Mathematically, P not |= (X_|_Y ) \n",
"2. X doesn't influence Y, if W is Observered &nbsp;&nbsp;&nbsp; Mathematically, P |= (X_|_Y | W) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see a Tree-structure in our example in 'G' < - 'I' - > 'S' \n",
"let us illustrate the two cases \n",
"1\\. 'G' inflences 'S' , if 'I' is not Observered &nbsp;&nbsp;&nbsp; Mathematically, P not |= ('G'_|_'S') "
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"For G=0\n",
"╒═════╤══════════╕\n",
"│ S │ phi(S) │\n",
"╞═════╪══════════╡\n",
"│ S_0 │ 0.4901 │\n",
"├─────┼──────────┤\n",
"│ S_1 │ 0.5099 │\n",
"╘═════╧══════════╛\n",
"For G=1\n",
"╒═════╤══════════╕\n",
"│ S │ phi(S) │\n",
"╞═════╪══════════╡\n",
"│ S_0 │ 0.8189 │\n",
"├─────┼──────────┤\n",
"│ S_1 │ 0.1811 │\n",
"╘═════╧══════════╛\n"
]
}
],
"source": [
"infer = SimpleInference(student_model)\n",
"t1 = infer.query(var=['S'], evidence=[('G', 0)])\n",
"print(\"For G=0\")\n",
"print(t1)\n",
"t2 = infer.query(var=['S'], evidence=[('G', 1)])\n",
"print(\"For G=1\")\n",
"print(t2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can clearly see that 'G' influences 'S' when 'I' is not given"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2\\. 'G' doesn't influence 'S', if 'I' is Observered &nbsp;&nbsp;&nbsp; Mathematically, P |= ('G'_|_'S' | 'I')"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"For G=0\n",
"╒═════╤══════════╕\n",
"│ S │ phi(S) │\n",
"╞═════╪══════════╡\n",
"│ S_0 │ 0.9500 │\n",
"├─────┼──────────┤\n",
"│ S_1 │ 0.0500 │\n",
"╘═════╧══════════╛\n",
"For G=1\n",
"╒═════╤══════════╕\n",
"│ S │ phi(S) │\n",
"╞═════╪══════════╡\n",
"│ S_0 │ 0.9500 │\n",
"├─────┼──────────┤\n",
"│ S_1 │ 0.0500 │\n",
"╘═════╧══════════╛\n"
]
}
],
"source": [
"infer = SimpleInference(student_model)\n",
"s1 = infer.query(var=['S'], evidence=[('G', 0), ('I', 0)])\n",
"print(\"For G=0\")\n",
"print(s1)\n",
"s2 = infer.query(var=['S'], evidence=[('G', 1), ('I', 0)])\n",
"print(\"For G=1\")\n",
"print(s2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can clearly see that 'G' does not influences 'S' when 'I' is given (Observed)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## To-Do list\n",
"- <input type=\"checkbox\" disabled checked> Creating a Bayesian Model\n",
"- <input type=\"checkbox\" disabled checked> Computation of Probabilities (Inference)\n",
"- <input type=\"checkbox\" disabled checked> Knowing Flow of Influence V-structure\n",
"- <input type=\"checkbox\" disabled> Using above knowledge to write function, that takes Bayesian Model and outputs dataframe which shows variation of final variable for _min , max_ values of other random variables\n",
"- <input type=\"checkbox\" disabled> Develop a generic form of the above function which can take any Bayesian Model Object"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment