Skip to content

Instantly share code, notes, and snippets.

@shotahorii
Created March 27, 2019 22:45
Show Gist options
  • Save shotahorii/fb7209a2c64093aaa74be330819cfa80 to your computer and use it in GitHub Desktop.
Save shotahorii/fb7209a2c64093aaa74be330819cfa80 to your computer and use it in GitHub Desktop.
GLMM
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generalized Linear Mixed Model (GLMM)\n",
"- データ解析のための統計モデリング入門(久保 拓弥) 7章\n",
"- [Data](https://github.com/tushuhei/statisticalDataModeling/blob/master/data5.csv)\n",
"\n",
"GLMについて[ここ](https://gist.github.com/shotahorii/cc2508accc0b2e6815663b0531b1edb9)で書いた。また、その一つのインスタンスであるロジスティック回帰について[ここ](https://gist.github.com/shotahorii/63b10aa9337a4e3893ed3dac28af84e5)で書いた。 \n",
"このロジスティック回帰の際に用いた例と同じように、個体iにおける種子の生存確率(8種子中y<sub>i</sub>種子生存)とその個体の葉数x<sub>i</sub>との関係をモデル化したい。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from scipy import stats\n",
"import math"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>N</th>\n",
" <th>y</th>\n",
" <th>x</th>\n",
" <th>id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" N y x id\n",
"0 8 0 2 1\n",
"1 8 1 2 2"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('./data5.csv')\n",
"df.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD7CAYAAABDld6xAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEjhJREFUeJzt3V+MXPdZxvHn9b9208iGgEVobDxFTYKhCfmDNkhL1UHU\n05hGcSSC0xJAQgrcdHHUWKhVpXiXXCC4CBBUbgIldWK7tbOibmyVaiKFqbRzkXH+uE6zEyGINrWr\nJl4k1BLFUlx4udizsXczu3MWn5nzO+9+P9LKu5Pj1ftqdp49+7Pjx9xdAIB0rSt7AADAyghqAEgc\nQQ0AiSOoASBxBDUAJI6gBoDEbSjqE5kZf88PAFbJ3a3fNYXeUbt7yLeJiYnSZ2A/9mO/eG95cfQB\nAIkjqAEgcQR1DvV6vewRBor9qo394rPVnJOs+InMvKjPBQBrgZnJh/2HiQCA4hHUAJA4ghoAEkdQ\nA0DiCGoASBxBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4nIFtZl93sy+Z2ZnzOywmW0a\n9GAAgHl9g9rMPizpTyXd5u43a76+6zODHgzD0e12dfDgQXW73bJHGYh2u62JiQm12+2yRxmIw4cP\na8+ePTp8+HDZowzE/v37tWPHDu3fv7/sUcqVoyrmw5LekPTTmg/pE5I+2eM6R7WMjz/o0ohLN7g0\n4uPj+8oeqVC7du3O9rvepRFvNHaXPVKhtm37yKL9tm+vlT1Sodat++Ci/dav31T2SIXLcrN/Due6\nSNon6b8lvSXpqWWuGd52uGIzMzPZi+C7Lnn264jPzMyUPVohpqene+43PT1d9miFOHToUM/9Dh06\nVPZohXjooYd67vfQQw+VPVqh8gZ13xZyM/spSXsk7ZD0I0lTZvZ77n5k6bWTk5PvvV+v12lmSFin\n05G0XdLN2SM3S9qmTqejnTt3ljdYQZrNpqRtWrzfdWo2mxobGytvsIIcO3ZMvfY7duyY7r///vIG\nK8jU1JR67Tc1NaVHH320vMGuUKvVUqvVWv1v7Jfkku6V9A+XffwHkr7c47qhfRfCleOOutq4o45B\nRR19SBqV9IqkD0oySV+V9Lke1w1zPxRgfHzfojPAaGfUjcbCGfVHQ55Rb99eW7RftDPq9es3Ldpv\nLZ9R5+pMNLMJzf9Nj4uSXpb0gLtfXHKN5/lcSEu321Wn09Ho6GiII4+l2u22ms2mGo1GiCOPpQ4f\nPqxjx45p7969IY48ltq/f7+mpqZ07733VvrIYzl5OxMptwWAklBuCwBBENQAkDiCGgASR1ADQOII\nagBIHEENAIkjqAEgcQQ1ACSOoAaAxBHUAJA4ghoAEkdQA0DiCGoASBxBDQCJI6j7iN7SHR0t5NXG\n6y+Tp10gz5sCNrxEb+mOjhbyalsLrz8V2UKe6xMFC+ronYLR0ZlYbWvl9Zc3qDn6WMZKLd1I30ot\n5BGs1EIeAa+/xQjqZYyOjko6K+lM9sgZSeeyx5G6RqMh6ZwWP38/yB6vvr1796rXfvOPVx+vvyXy\n3HbneVOwow/3+C3d0dFCXm1r4fWnIlvI84habhu9pTs6WsirLfrrjxZyAEgcLeQAEARBDQCJI6gB\nIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOoASBxBDUAJI6gBoDE5QpqM9tiZk+b\nWdfMXjWzOwY9GABg3oac1z0m6Vvu/rtmtkHSVQOcCQBwmb531Ga2WdLH3f0JSXL3n7j7jwc+WSLm\n5uZ06tQpzc3NlT3KQETfr9vt6uDBg+p2u2WPMhDRn7/o++XWrwJG0q9Kel7SE5JekvS4pJEe1w26\ntWbojhz5uo+MXONbttzmIyPX+JEjXy97pEJF3298/MGsyumGkFVO0Z+/6Pu556/iyhPUt0u6KOnX\nso//VtKf97huiOsN3vnz531k5JpFdfUjI9f4+fPnyx6tENH3m5mZyUL60n7SiM/MzJQ9WiGiP3/R\n91uQN6jznFGfk3TW3V/IPp6S9IVeF05OTr73fr1eV71eX8W9fVpmZ2e1aVNNFy5cqqvfuHGHZmdn\ntXXr1lJnK0L0/TqdjqTtki7tJ21Tp9MJ0b0X/fmLul+r1VKr1Vr9b8yT5pK+I+mG7P0JSX/V45qh\nfRcahujf0aPvxx11tUXfb4GKOvrwS+fUpySdlvTPkrb0uGaI6w3HwhnZ5s23hjwji77f+Pi+LKyv\nD31GHfX5i76fe/6gpoW8j7m5Oc3OzqpWq1X6R67lRN+v2+2q0+lodHQ0xJHHUtGfv+j75W0hJ6gB\noCR5g5r/hRwAEkdQA0DiCGoASBxBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgc\nQQ0AiSOoASBxBDUAJI6g7iN6CzIt3dXGfmtEnnaBPG8K3PAStQWZlu5qY7/qU5FVXLk+UbCgjt7Z\nRqdgtbFfDHmDmqOPZSy0IF/eYr3QghzBSi3dEUR//thvbSGol1Gr1fTuu7OSzmSPnNHFi2+oVquV\nN1SBRkdHJZ3V5ftJ57LHqy/688d+a0ye2+48bwp29OEevwWZlu5qY7/qEy3kxYjegkxLd7WxX7XR\nQg4AiaOFHACCIKgBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOoASBxBDUAJI6g\nBoDEEdQAkLjcQW1m68zsJTN7ZpADAQAWW80d9YOSZgY1SKqityCzX7VFb5FHJk+7gKRtkp6VVJf0\nzDLXDL4OYciityCzX7VFb5FfC1RkC7mkpyXdIukTayWoo7cgs1+1RW+RXyvyBvWGfnfcZvZpSW+5\n+2kzq0tato1gcnLyvffr9brq9fqq7/BTsdCCfOHC+1uQI1QCsV+1rdQiH7FSLYpWq6VWq7X639gv\nySX9haTvS3pd0g8lvS3pyR7XDfH70OBFvyNjv2rjjjoGFXn04ZfCeM0cfbjHb0Fmv2qL3iK/FuQN\n6lWV25rZJyTtd/e7e/w3X83nqoroLcjsV23RW+Sjo4UcABJHCzkABEFQA0DiCGoASBxBDQCJI6gB\nIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOoASBxBDUAJI6gBoDEEdR9dLtdHTx4\nUN1ut+xRBmJubk6nTp3S3Nxc2aMMRPTnj/3WiDw1MHneFLCKa3z8wazq6IaQVUcLVVVbttwWtKoq\n9vPHftWnQXQmrviJggV19PJQyl+rjf1iyBvUHH0so9PpSNou6ebskZslbcser77Z2Vlt2lTT5ftt\n3LhDs7Oz5Q1VoOjPH/utLQT1MkZHRyWdlXQme+SMpHPZ49VXq9X07ruzuny/ixffUK1WK2+oAkV/\n/thvjclz253nTcGOPtzdx8f3ZT9+XR/yjGzhjHrz5luDnlHHfv7Yr/qU8+iDFvI+ut2uOp2ORkdH\ntXPnzrLHKdzc3JxmZ2dVq9W0devWsscpXPTnj/2qLW8LOUENACXJG9ScUQNA4ghqAEgcQQ0AiSOo\nASBxBDUAJI6gBoDEEdQAkDiCGgASR1ADQOIIagBIHEENAIkjqAEgcX2D2sy2mdlzZvaqmb1iZvuG\nMRgAYF7ffz3PzK6VdK27nzazqyW9KGmPu7+25Dr+9TwAWIXC/vU8d3/T3U9n778tqSvpuisfsRqi\nt3S3221NTEyo3W6XPcpAnDx5Ug888IBOnjxZ9igDEb2lO/rrL7c87QJ+qcWlJmlW0tU9/tugShBK\nE72le9eu3YsaNBqN3WWPVKiPfezWRfvddNMtZY9UqOgt3dFff+4DaCGXdLWkFzR/7BG+iit6S/f0\n9HTPlufp6emyRyvEiRMneu534sSJskcrRPSW7uivvwV5g3pDnrtuM9sgaUrSU+7+zeWum5ycfO/9\ner2uer2+yvv7dCy0dF+48P6W7giVVc1mU9I2LW55vk7NZlNjY2PlDVaQ48ePq9d+x48f11133VXe\nYAVZqaU7QmVV1Ndfq9VSq9Va/W/Mk+aSnpT0132uGdL3oOGI/h2dO+pq4446BhV19CFpTNL/SDot\n6WVJL0m6s8d1w9xvKKK3dDcaC2fUHw15Rn3TTbcs2i/eGXXslu7orz93WsgLE72lu91uq9lsqtFo\nhDjyWOrkyZM6fvy47rnnnhBHHktFb+mO/vqjhRwAEkcLOQAEQVADQOIIagBIHEENAIkjqAEgcQQ1\nACSOoAaAxBHUAJA4ghoAEkdQA0DiCGoASBxBDQCJI6gBIHEENQAkjqDugxbraoveYh19v+hfn7nl\naRfI86aADS+0WFdb9Bbr6PtF//p0H0ALed9PFCyo6dyrtuide9H3i/71uSBvUHP0sYyVWqwjWKnF\nOoKFFuvL91tosY4g+n7Rvz5Xi6Bexj333CPpnKQz2SNnJP0ge7z6RkdHJZ3V4v3OZY9XX61W07vv\nzury/S5efEO1Wq28oQoUfb/oX5+rlue2O8+bgh19uNNiXXXRW6yj7xf969OdFvLC0GJdbdFbrKPv\nF/3rkxZyAEgcLeQAEARBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOo\nASBxBDUAJI6gBoDE5QpqM7vTzF4zs38zsy8MeigAwCV9g9rM1kn6sqRPSfoVSZ81s18a9GCpOHDg\ngG688UYdOHCg7FEGIvp+7XZbExMTarfbZY8yELSQrxH9mgUk/bqkf7ns4y9K+kKP6wZfhzBkGzd+\naFHDxKZNI2WPVKjo++3atXvRfo3G7rJHKhQt5NWnolrIJf2OpMcv+/j3Jf1dj+uGt90QPPzwwz1b\nkB9++OGyRytE9P2mp6d77jc9PV32aIWghTyGvEG9oci788nJyffer9frqtfrRX76oTp69Kh6tZAf\nPXpUjzzySHmDFST6fs1mU732azabGhsbK2+wgiy0kF+48P4W8giVXCu1kFe5kqvVaqnVaq3+N/ZL\ncs0ffXz7so/XxNFH9DvO6PtxR11t3FGv/uhjvaR/l7RD0iZJpyXt7HHdMPcbik2bRha1kEc7w42+\nX6Oxe9F+Uc+oaSGvrrxBnavc1szulPSY5v+WyFfc/S97XON5PlfVHDhwQEePHtV9990X4khgqej7\ntdttNZtNNRqNEEceS9FCXm20kANA4mghB4AgCGoASBxBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIa\nABJHUANA4ghqAEgcQQ0AiSOoASBxBDUAJI6gBoDEEdQ5/L+qcyqE/aqN/eIjqHOI/oXCftXGfvER\n1ACQOIIaABJXaBVXIZ8IANaQoXYmAgAGg6MPAEgcQQ0AibvioDazr5jZW2Z2poiBUmJm28zsOTN7\n1cxeMbN9Zc9UJDP7gJk9b2YvZ/tNlD1T0cxsnZm9ZGbPlD1L0cxs1sy+mz1/nbLnKZqZbTGzp82s\nm70G7yh7pqKY2Q3Z8/ZS9uuPVsqXKz6jNrPfkPS2pCfd/eYr+mSJMbNrJV3r7qfN7GpJL0ra4+6v\nlTxaYczsKnd/x8zWS2pL2ufuYV70ZvZ5SbdL2uzud5c9T5HM7HVJt7v7f5U9yyCY2VclfcfdnzCz\nDZKucvcflzxW4cxsnaRzku5w97O9rrniO2p3n5YU8gvF3d9099PZ+29L6kq6rtypiuXu72TvfkDS\nBklh/nTZzLZJ+m1J/1j2LANiCnp8aWabJX3c3Z+QJHf/ScSQznxS0n8sF9JS0Cd5EMysJukWSc+X\nO0mxsqOBlyW9KelZdz9V9kwF+htJf6ZA33yWcEnPmtkpM/vjsocp2Eck/aeZPZEdDzxuZiNlDzUg\n90n62koXENQ5ZMceU5IezO6sw3D3/3X3WyVtk3SHmf1y2TMVwcw+Lemt7Cciy96iGXP32zT/U8Pn\nsmPIKDZIuk3S32c7viPpi+WOVDwz2yjpbklPr3QdQd1HdjY2Jekpd/9m2fMMSvZj5b9KurPsWQoy\nJunu7Bz3a5J+08yeLHmmQrn7D7Nf5yR9Q9JouRMV6pyks+7+QvbxlOaDO5rdkl7MnsNlFRXUUe9Y\nJOmfJM24+2NlD1I0M/tZM9uSvT8iaZekEH9Q6u5fcvdfcPdflPQZSc+5+x+WPVdRzOyq7Cc9mdmH\nJDUkfa/cqYrj7m9JOmtmN2QP/ZakmRJHGpTPqs+xhzT/48UVMbMjkuqSfsbMvi9pYuEPAKrOzMYk\n3S/plewc1yV9yd2/Xe5khfl5SQezP3VeJ+mou3+r5JmQz89J+kb2TzdskHTY3Zslz1S0fZIOZ8cD\nr0v6o5LnKZSZXaX5P0j8k77X8r+QA0DaOKMGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOoASBx\nBDUAJO7/AJ8b9lyH+FDqAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x111d152b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from matplotlib import pyplot as plt\n",
"%matplotlib inline\n",
"fig, ax = plt.subplots(1,1)\n",
"chart = ax.scatter(df.x,df.y)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import statsmodels.api as sm \n",
"# add constant to the data\n",
"x_c = sm.add_constant(df[['x']])\n",
"# define target\n",
"df['y_failure'] = df['N']-df['y']\n",
"# modeling with GLM (logit link function is automatically selected for Binomial)\n",
"model = sm.GLM(df[['y','y_failure']], x_c, family=sm.families.Binomial())\n",
"result = model.fit()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"simpletable\">\n",
"<caption>Generalized Linear Model Regression Results</caption>\n",
"<tr>\n",
" <th>Dep. Variable:</th> <td>['y', 'y_failure']</td> <th> No. Observations: </th> <td> 100</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Model:</th> <td>GLM</td> <th> Df Residuals: </th> <td> 98</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Model Family:</th> <td>Binomial</td> <th> Df Model: </th> <td> 1</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Link Function:</th> <td>logit</td> <th> Scale: </th> <td>1.0</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Method:</th> <td>IRLS</td> <th> Log-Likelihood: </th> <td> -322.80</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Date:</th> <td>Mon, 25 Mar 2019</td> <th> Deviance: </th> <td> 513.84</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Time:</th> <td>23:23:43</td> <th> Pearson chi2: </th> <td> 428.</td> \n",
"</tr>\n",
"<tr>\n",
" <th>No. Iterations:</th> <td>6</td> <th> </th> <td> </td> \n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <td></td> <th>coef</th> <th>std err</th> <th>z</th> <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
"</tr>\n",
"<tr>\n",
" <th>const</th> <td> -2.1487</td> <td> 0.237</td> <td> -9.057</td> <td> 0.000</td> <td> -2.614 -1.684</td>\n",
"</tr>\n",
"<tr>\n",
" <th>x</th> <td> 0.5104</td> <td> 0.056</td> <td> 9.179</td> <td> 0.000</td> <td> 0.401 0.619</td>\n",
"</tr>\n",
"</table>"
],
"text/plain": [
"<class 'statsmodels.iolib.summary.Summary'>\n",
"\"\"\"\n",
" Generalized Linear Model Regression Results \n",
"==============================================================================\n",
"Dep. Variable: ['y', 'y_failure'] No. Observations: 100\n",
"Model: GLM Df Residuals: 98\n",
"Model Family: Binomial Df Model: 1\n",
"Link Function: logit Scale: 1.0\n",
"Method: IRLS Log-Likelihood: -322.80\n",
"Date: Mon, 25 Mar 2019 Deviance: 513.84\n",
"Time: 23:23:43 Pearson chi2: 428.\n",
"No. Iterations: 6 \n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"const -2.1487 0.237 -9.057 0.000 -2.614 -1.684\n",
"x 0.5104 0.056 9.179 0.000 0.401 0.619\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.summary()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"b1 = -2.1487\n",
"b2 = 0.5104"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/latex": [
"ここで、各xにおける誤差構造(二項分布)のパラメータは以下のように表される。\n",
"<br><br>\n",
"$$logit(q) = \\beta_1 + \\beta_2x$$<br>\n",
"$$q = logistic(\\beta_1 + \\beta_2x) = \\frac{1}{1 + e^{-(\\beta_1 + \\beta_2x)}}$$\n",
"<br><br>\n",
"例えば$$x=4$$の場合を考えると以下である。"
],
"text/plain": [
"<IPython.core.display.Latex object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%latex\n",
"ここで、各xにおける誤差構造(二項分布)のパラメータは以下のように表される。\n",
"<br><br>\n",
"$$logit(q) = \\beta_1 + \\beta_2x$$<br>\n",
"$$q = logistic(\\beta_1 + \\beta_2x) = \\frac{1}{1 + e^{-(\\beta_1 + \\beta_2x)}}$$\n",
"<br><br>\n",
"例えば$$x=4$$の場合を考えると以下である。"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.47325056402972265\n"
]
}
],
"source": [
"x = 4\n",
"q = 1/(1+math.exp(-(b1+b2*x)))\n",
"print(q)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEACAYAAAB8nvebAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEgxJREFUeJzt3WtsXGl9x/HfPzghs2yzg9oRKrvyTKAKbrc1EKlcOqo6\nlptyiURbqZWgrZB4UWkl2rUARdC1qnVfuGo1KmkQmxcrzKqttm5EhFrAgKDZnSJGKpjuJcvejLQ+\nSZbLalD3tKVrVyz8+8LjXDwzdpI5nuc8Pt+PZGXm+Og5f52T+fmZ5zkXc3cBAOKyL3QBAIAbR3gD\nQIQIbwCIEOENABEivAEgQoQ3AEQok/A2s9vM7DNm9rSZPWlmb82iXQBAf2MZtXNK0hfd/ffNbEzS\nLRm1CwDow4a9SMfMDkl61N1fn01JAICdZDFscljSD83sATN7xMzuN7NSBu0CAAbIIrzHJB2VdJ+7\nH5X0kqSPZdAuAGCALMa8n5d0yd2/1X1/VtJHt65kZtxEBQBugrvb1mVD97zd/QVJl8zsSHfRtKSn\nBqwb9Ofee+8NXkNeftgX7Av2RRz7YpCszja5W9KDZrZf0nOSPpBRuwCAPjIJb3d/XNKvZtEWAGBn\nhbrCstFohC4hN9gXV7AvrmBfXJH3fTH0ed7XvSEzH9W2AGCvMDP5bkxYAgBGj/AGgAgR3gAQIcIb\nACJEeANAhAhvAIgQ4Q0AESK8ASBChDcARIjwRuEsrSwpXU/7/i5dT7W0sjTiioAbR3ijcOrjdc2e\nm+0J8HQ91ey5WdXH64EqA64f4Y3CKR8sa356/poA3wzu+el5lQ+WA1cI7IwbU6GwNgP7RP2Emu0m\nwY1cGnRjKsIbhZakiQ6fOqzVmVXVyrXQ5QA9uKsgsEW6nqrZbmp1ZlXNdnPgJCaQR4Q3CunqMe5a\nudYzBg7kHeGNwuk3OdlvEhPIM8a8UThLK0uqj9f7Tk6m66naF9s6fuR4gMqAXkxYAkCEmLAEgD2E\n8AaACI1l0YiZJZL+S9JPJf3Y3d+SRbsAgP4yCW9thHbD3V/MqD0AwDayGjaxDNsCAOwgq8B1SV81\ns2Uz++OM2gQADJDVsEnd3b9vZhVthPjT7v71rSvNzc1dft1oNNRoNDLaPADsDa1WS61Wa8f1Mj/P\n28zulfQ/7v7xLcs5zxsAbtCunedtZreY2a3d16+S9FuSvj1su9ibeIoNkI0sxrxfI+nrZvaopH+X\n9Hl3/0oG7WIP4ik2QDa4PB4jt/XGUDzFBhiMe5sgV3iKDXB9CG/kDk+xAXbGjamQKzzFBhgO4Y2R\n4yk2wPAIb4wUT7EBssGYN0aKp9gAN4YJSwCIEBOWALCHEN4AECHCGwAiRHgDQIQIbwCIEOENABEi\nvAEgQoQ3AESI8AaACBHeABAhwhsAIkR4A0CECG8AiBDhDQARIrwBIEKENwBEKLPwNrN9ZvaImX0u\nqzYBAP1l2fOekfRUhu1hl3Q6HS0vL6vT6YQuBcBNyiS8zewOSe+W9Kks2sPuWVw8o2p1QseO3aVq\ndUKLi2dClwTgJmTyDEsz+4ykeUm3SfqIu7+nzzo8wzKwTqejanVCa2sPS5qUdF6l0pQuXHhGlUol\ndHkA+hj0DMuxDBo+LukFd3/MzBqSejayaW5u7vLrRqOhRqMx7OZxA5Ik0YEDNa2tTXaXTGr//qqS\nJCG8A1haWVJ9vK7ywXLP79L1VO2LbR0/cjxAZQip1Wqp1WrtuN7QPW8z+0tJfyTpZUklST8j6bPu\n/v4t69HzDoyed76k66lmz81qfnr+mgAftBzFtGtPj3f3e9x93N1fJ+m9kh7aGtzIh0qlooWF0yqV\npnTo0FGVSlNaWDhNcAdSPljW/PS8Zs/NKl1PJRHcuH6ZjHlfbszsN8SYd+51Oh0lSaJarUZw58Bm\nYJ+on1Cz3SS4cY1BPe9Mw3uHAghvYIAkTXT41GGtzqyqVq6FLgc5smvDJgCGk66narabWp1ZVbPd\nvDyEAmyH8AYCunqMu1au9YyBA4MQ3kAg/SYn+01iAv0w5g0EwnneuB5MWAJAhJiwBIA9hPAGgAgR\n3gAQIcIbACJEeANAhAhvAIgQ4Q0AESK8ASBChDcARIjwBoAIEd4AECHCGwAiRHgDQIQIbwCIEOEN\nABEivAEgQoQ3AESI8AaACI0N24CZvVLS1yQd6LZ31t3/Yth2AQCDDd3zdvf/kzTl7m+W9CZJ7zKz\ntwxdWcY6nY6Wl5fV6XRClwJgG3n4rOahhp1kMmzi7i91X75SG73vXD1peHHxjKrVCR07dpeq1Qkt\nLp4JXRKAPvLwWc1DDdcjk6fHm9k+Sf8h6fWS7nP3P+uzTpCnx3c6HVWrE1pbe1jSpKTzKpWmdOHC\nM6pUKiOvB0B/efis5qGGrQY9PX7oMW9JcvefSnqzmR2S9M9m9kvu/tTW9ebm5i6/bjQaajQaWWx+\nW0mS6MCBmtbWJrtLJrV/f1VJkhDeQI7k4bOahxparZZardaO62XS876mQbM/l/S/7v7xLcvpeQMY\nKA+f1TzUsNWgnvfQY95m9nNmdlv3dUnSMUnPDNtuViqVihYWTqtUmtKhQ0dVKk1pYeE0wQ3kTB4+\nq3mo4XoN3fM2s1+R9Hfa+EOwT9IZd5/vs16QnvemTqejJElUq9VyeSAAbMjDZzUPNWwa1PPOfNhk\nmwKChjcAxGjXhk0AAKNHeBfE0sqS0vW07+/S9VRLK0sjrgjAMAjvgqiP1zV7brYnwNP1VLPnZlUf\nrweqDMDNILwLonywrPnp+WsCfDO456fnVT5YDlwhgBvBhGXBbAb2ifoJNdtNghvIOc42wWVJmujw\nqcNanVlVrVwLXQ6AbXC2CSRt9Lyb7aZWZ1bVbDcHTmICyDfCu0CuHuOulWs9Y+AA4kF4F0S/ycl+\nk5gA4sCYd0EsrSypPl7vOzmZrqdqX2zr+JHjASoDsB0mLAEgQkxYAsAeQngDQIQIbwCIEOENABEi\nvAEgQoQ3AESI8AaACBHeABAhwhsAIkR4A0CECG8AiBDhDQARGjq8zewOM3vIzJ40syfM7O4sCgMA\nDJZFz/tlSR929zslvV3SB81sIoN2scd1Oh0tLy+r0+mELgXieMRm6PB29x+4+2Pd1z+S9LSk24dt\nF3vb4uIZVasTOnbsLlWrE1pcPBO6pELjeMQn0/t5m1lNUkvSL3eD/OrfcT9vSNro4VWrE1pbe1jS\npKTzKpWmdOHCM6pUKqHLKxyOR74Nup/3WIYbuFXSWUkzW4N709zc3OXXjUZDjUYjq80jIkmS6MCB\nmtbWJrtLJrV/f1VJkhAWAXA88qXVaqnVau24XiY9bzMbk/QFSV9y91MD1qHnDUn09PKG45Fvu/0k\nnU9LempQcANXq1QqWlg4rVJpSocOHVWpNKWFhdMERSAcjzgN3fM2s7qkr0l6QpJ3f+5x9y9vWY+e\nN67R6XSUJIlqtRpBkQMcj3ziAcQAeiytLKk+Xlf5YLnnd+l6qvbFto4fOR6gMmziAcQAetTH65o9\nN6t0Pb1mebqeavbcrOrj9UCVYSeEN1Bg5YNlzU/PXxPgm8E9Pz3ft0eOfGDYBMDlwD5RP6Fmu0lw\n5whj3gC2laSJDp86rNWZVdXKtdDloIsxbwADpeupmu2mVmdW1Ww3e8bAkT+EN1BwV49x18q1njFw\n5BPhDRRYv8nJfpOYyB/GvIEC4zzv/GPCEgAixIQlAOwhhDcARIjwBoAIEd4AECHCGwAiRHgDQIQI\nbwCIEOENABEivAEgQoQ3AESI8AaACBHeABAhwhsAIkR4A0CEMglvM1swsxfM7HwW7QEAtpdVz/sB\nSe/IqK09rdPpaHl5WZ1OJ3QpUH6OR17qQDwyCW93/7qkF7Noay9bXDyjanVCx47dpWp1QouLZ0KX\nVGh5OR55qQNxyexJOmZWlfR5d58c8PtCP0mn0+moWp3Q2trDkiYlnVepNKULF55RpVIJXV7h5OV4\n5KUO5NegJ+mMjbKIubm5y68bjYYajcYoNx9UkiQ6cKCmtbXNv22T2r+/qiRJ+JAGkJfjkZc6kB+t\nVkutVmvH9eh5j8DSypLecMsbNHnkrT09rPMr39CzLz3LQ15HLC893rzUgfwaxTMsrfuDLerjdZ18\n/KQ+cX9TpdKUDh06qlJpSp+4v6mTj59UfbweusTCqVQqWlg4fc3xWFg4PfLAzEsdiE8mPW8z+0dJ\nDUk/K+kFSfe6+wNb1ilsz1uS0vVUs+dm9aE3fkgvfv9FvfrnX62Tj5/U/PS8ygfLocsrrE6noyRJ\nVKvVggZmXupA/gzqeWc2bHIdBRQ6vKUrAX6ifkLNdpPgBrAjwjsnkjTR4VOHtTqzqlq5FrocADk3\nijFv7CBdT9VsN7U6s6pmu6l0PQ1dEoBIEd4jsjlkMj89r1q5pvnpec2emyXAAdwUwnsErg7uzTHu\n8sEyAQ7gpjHmPQJLK0uqj9f7Tk6m66naF9uc5w2gLyYsASBCTFgCwB5CeANAhAhvAIgQ4Q0guKWV\npYFnXaXrqZZWlkZcUf4R3gCCq4/X+542u3maLTdv60V4Awiu33UP/a6PwBWcKgggN7h5Wy/O8wYQ\nBW7edi3O8waQe9y87foR3gBygZu33RjCG0Bw3LztxjHmDSA4bt42GBOWABChQk5YctUWgL1qT4c3\nV20B2Kv2dHhz1RaA6xXbN/U9Hd7StQGepAnBDaCv2L6pZzJhaWbvlPS32vhjsODuf91nnaATlly1\nBWAnW7+Z5+Gb+q5NWJrZPkmflPQOSXdKep+ZTQzbbpa4agvA9Yjpm/rQPW8ze5uke939Xd33H5Pk\nW3vfoXreefxLCiDf8vRNfTdPFbxd0qWr3j/fXRYcV20BuFGxfFMfG+XG5ubmLr9uNBpqNBq7ur32\nxXbfHvZmgBf5qi0AvbZ2+DY7eqP8pt5qtdRqtXZcL6thkzl3f2f3fa6GTQDgegwaUg091Lprl8eb\n2SskPStpWtL3JX1T0vvc/ekt6xHeAHIrr/dX2dV7m3RPFTylK6cK/lWfdQhvALhB3JgKACJUyBtT\nAcBeRXgDQIQIbwCIEOENABEivAEgQoQ3AESI8AaACBHeABAhwhsAIkR4A0CECG8AiBDhDQARIrwB\nIEKENwBEiPAGgAgR3gAQIcIbACJEeANAhAhvAIgQ4Q0AESK8ASBChDcARGio8Daz3zOzb5vZT8zs\naFZFAQC2N2zP+wlJvyvp3zKoZde1Wq3QJeQG++IK9sUV7Isr8r4vhgpvd3/W3b8jyTKqZ1fl/WCM\nEvviCvbFFeyLK/K+LxjzBoAIje20gpl9VdJrrl4kySXNuvvnd6swAMBg5u7DN2L2sKSPuPsj26wz\n/IYAoIDcvWdoesee9w3Ydty738YBADdn2FMFf8fMLkl6m6QvmNmXsikLALCdTIZNAACjVZizTczs\nnWb2jJmtmNlHQ9cTipndYWYPmdmTZvaEmd0duqbQzGyfmT1iZp8LXUtIZnabmX3GzJ7u/v94a+ia\nQjGzD3UvQDxvZg+a2YHQNW1ViPA2s32SPinpHZLulPQ+M5sIW1UwL0v6sLvfKentkj5Y4H2xaUbS\nU6GLyIFTkr7o7r8o6Y2Sng5cTxBm9lpJfyrpqLtPamNu8L1hq+pViPCW9BZJ33H3C+7+Y0n/JOm3\nA9cUhLv/wN0f677+kTY+oLeHrSocM7tD0rslfSp0LSGZ2SFJv+7uD0iSu7/s7v8duKyQXiHpVWY2\nJukWSd8LXE+PooT37ZIuXfX+eRU4sDaZWU3SmyR9I2wlQZ2UdEIb1y4U2WFJPzSzB7pDSPebWSl0\nUSG4+/ck/Y2ki5K+Kyl1938NW1WvooQ3tjCzWyWdlTTT7YEXjpkdl/RC95uIKZLbPOySMUlHJd3n\n7kclvSTpY2FLCsPMytr4Zl6V9FpJt5rZH4StqldRwvu7ksaven9Hd1khdb8KnpX0D+7+L6HrCagu\n6T1m9pykRUlTZvb3gWsK5XlJl9z9W933Z7UR5kX0m5Kec/f/dPefSPqspF8LXFOPooT3sqRfMLNq\nd9b4vZKKfGbBpyU95e6nQhcSkrvf4+7j7v46bfyfeMjd3x+6rhDc/QVJl8zsSHfRtIo7iXtR0tvM\n7KCZmTb2Re4mb7O8wjK33P0nZvYnkr6ijT9YC+6eu4MxCmZWl/SHkp4ws0e1MdZ7j7t/OWxlyIG7\nJT1oZvslPSfpA4HrCcLdv2lmZyU9KunH3X/vD1tVLy7SAYAIFWXYBAD2FMIbACJEeANAhAhvAIgQ\n4Q0AESK8ASBChDcARIjwBoAI/T9CDxZH7oZxeAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1128b1978>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# extract data with x = 4\n",
"df4 = df.query('x==4').reset_index(drop=True)\n",
"df4_cnt = df4[['y','N']].groupby(by='y').count().reset_index().rename(columns={'N':'cnt'})\n",
"\n",
"# visualise\n",
"fig, ax = plt.subplots(1,1)\n",
"datapoint = ax.scatter(df4_cnt.y, df4_cnt.cnt)\n",
"\n",
"# theoretical\n",
"n = 8\n",
"k = df4_cnt.y.tolist()\n",
"theoretical = ax.plot(k, len(df4)*stats.binom.pmf(k, n, q), 'gx', ms=8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上のグラフで、グリーンのバツPLOTがq=0.47の二項分布に従った場合の予測カウントで、ブルーの丸PLOTが実際のデータである。かなりズレがあることがわかる。実際のデータの方が推定された二項分布よりもばらつきが大きい**過分散**の状態である。"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"theoretical mean: 3.7860045122377812\n",
"theoretical var: 1.9942757414021763\n",
"data mean: 4.05\n",
"data var: 8.365789473684211\n"
]
}
],
"source": [
"# 二項分布の平均と分散\n",
"print('theoretical mean:',n*q)\n",
"print('theoretical var:',n*q*(1-q))\n",
"\n",
"# 実際のデータの平均と分散\n",
"print('data mean:',df4.y.mean())\n",
"print('data var:',df4.y.var())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上のロジスティック回帰モデルでは、各xにおけるyのばらつきは、xの値から算出されるパラメータをもつ二項分布に従うと過程した。つまり、パラメータはxのみに依存するのであり、他の観測されていない影響は考慮されない。が、過分散になってしまっているので、ここにはxでは説明できない個体差が存在していることがわかる。"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/latex": [
"この個体差を誤差構造の式に追加してみる。\n",
"<br><br>\n",
"$$logit(q_i) = \\beta_1 + \\beta_2x_i + \\gamma_i$$\n",
"<br><br>\n",
"ちなみに用語として、この線形予測子における$$\\beta_1$$と$$\\beta_2x_i$$は固定効果(fixed effects),\n",
"$$\\gamma_i$$はランダム効果(random effects)と呼ばれる。\n",
"<br><br>\n",
"ここで$$\\gamma$$は$$-\\infty \\le \\gamma \\le \\infty $$の連続値である。<br>\n",
"GLMでは観測できない個体差を想定していないので、$$\\gamma_i=0$$とおいていることになる。<br>\n",
"一方、GLMMでは、パラメータ$$\\{\\gamma_1,\\gamma_2,...,\\gamma_{100} \\}$$が何らかの確率分布に従っていると仮定する。\n",
"<br><br>\n",
"ここでは$$\\gamma_i$$は平均0標準偏差$$s$$の正規分布に従うと仮定する。\n",
"<br>\n",
"各個体の$$\\gamma_i$$は個体間で相互に独立した確率変数であると仮定すると、確率密度関数は以下となる。\n",
"<br><br>\n",
"$$p(\\gamma_i|s)=\\frac{1}{\\sqrt{2\\pi s^2}}exp(-\\frac{\\gamma_i^2}{2s^2})$$"
],
"text/plain": [
"<IPython.core.display.Latex object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%latex\n",
"この個体差を誤差構造の式に追加してみる。\n",
"<br><br>\n",
"$$logit(q_i) = \\beta_1 + \\beta_2x_i + \\gamma_i$$\n",
"<br><br>\n",
"ちなみに用語として、この線形予測子における$$\\beta_1$$と$$\\beta_2x_i$$は固定効果(fixed effects),\n",
"$$\\gamma_i$$はランダム効果(random effects)と呼ばれる。\n",
"<br><br>\n",
"ここで$$\\gamma$$は$$-\\infty \\le \\gamma \\le \\infty $$の連続値である。<br>\n",
"GLMでは観測できない個体差を想定していないので、$$\\gamma_i=0$$とおいていることになる。<br>\n",
"一方、GLMMでは、パラメータ$$\\{\\gamma_1,\\gamma_2,...,\\gamma_{100} \\}$$が何らかの確率分布に従っていると仮定する。\n",
"<br><br>\n",
"ここでは$$\\gamma_i$$は平均0標準偏差$$s$$の正規分布に従うと仮定する。\n",
"<br>\n",
"各個体の$$\\gamma_i$$は個体間で相互に独立した確率変数であると仮定すると、確率密度関数は以下となる。\n",
"<br><br>\n",
"$$p(\\gamma_i|s)=\\frac{1}{\\sqrt{2\\pi s^2}}exp(-\\frac{\\gamma_i^2}{2s^2})$$"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/latex": [
"ここで問題となるのが、$$\\gamma_i$$は最尤推定できないということ。ひとつの対処法として、個体ごとの尤度$$L_i$$\n",
"の式の中で$$\\gamma_i$$を積分することで、尤度から$$\\gamma_i$$を消してしまう。\n",
"<br><br>\n",
"$$L_i = \\int_{-\\infty}^\\infty p(y_i|\\beta_1,\\beta_2,\\gamma_i) p(\\gamma_i|s) d\\gamma_i$$\n",
"<br><br>\n",
"これはつまり、取り得る$$\\gamma_i$$の各値における尤度を評価し、その期待値を算出することに相当する。<br>\n",
"ちなみに、二項分布$$p(y_i|\\beta_1,\\beta_2,\\gamma_i)$$と正規分布$$p(\\gamma_i|s)$$をかけて$$\\gamma_i$$\n",
"で積分する操作は二種類の分布を混ぜていることに相当する。(久保本のp157参照)\n",
"<br><br>\n",
"さて、この問題におけるパラメータ推定に戻ると、全データの尤度は各個体の尤度$$L_i$$の積なので、\n",
"$$L(\\beta_1,\\beta_2,s)=\\prod_iL_i$$となる。対数をとって、対数尤度$$logL(\\beta_1,\\beta_2,s)$$\n",
"を最大化するパラメータの値を探す。"
],
"text/plain": [
"<IPython.core.display.Latex object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%latex\n",
"ここで問題となるのが、$$\\gamma_i$$は最尤推定できないということ。ひとつの対処法として、個体ごとの尤度$$L_i$$\n",
"の式の中で$$\\gamma_i$$を積分することで、尤度から$$\\gamma_i$$を消してしまう。\n",
"<br><br>\n",
"$$L_i = \\int_{-\\infty}^\\infty p(y_i|\\beta_1,\\beta_2,\\gamma_i) p(\\gamma_i|s) d\\gamma_i$$\n",
"<br><br>\n",
"これはつまり、取り得る$$\\gamma_i$$の各値における尤度を評価し、その期待値を算出することに相当する。<br>\n",
"ちなみに、二項分布$$p(y_i|\\beta_1,\\beta_2,\\gamma_i)$$と正規分布$$p(\\gamma_i|s)$$をかけて$$\\gamma_i$$\n",
"で積分する操作は二種類の分布を混ぜていることに相当する。(久保本のp157参照)\n",
"<br><br>\n",
"さて、この問題におけるパラメータ推定に戻ると、全データの尤度は各個体の尤度$$L_i$$の積なので、\n",
"$$L(\\beta_1,\\beta_2,s)=\\prod_iL_i$$となる。対数をとって、対数尤度$$logL(\\beta_1,\\beta_2,s)$$\n",
"を最大化するパラメータの値を探す。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import statsmodels.genmod as gm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Statsmodelsを使って実装したいところではあったが、ドキュメントがほぼ無く動作方法が不明だったので諦める。PythonでのGLMMの良い方法が見つかり次第後から実装したい。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment