Last active
January 26, 2019 09:42
-
-
Save isaacgeng/8d751a22f389ab93028c15aa95a5f7ce to your computer and use it in GitHub Desktop.
HW.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import pandas as pd\n", | |
"import statsmodels.api as sm\n", | |
"from statsmodels.iolib.summary2 import summary_col\n", | |
"import ggplot" | |
], | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stderr", | |
"text": [ | |
"C:\\Program Files\\Anaconda3\\lib\\site-packages\\statsmodels\\compat\\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.\n", | |
" from pandas.core import datetools\n", | |
"C:\\Program Files\\Anaconda3\\lib\\site-packages\\ggplot\\utils.py:81: FutureWarning: pandas.tslib is deprecated and will be removed in a future version.\n", | |
"You can access Timestamp as pandas.Timestamp\n", | |
" pd.tslib.Timestamp,\n", | |
"C:\\Program Files\\Anaconda3\\lib\\site-packages\\ggplot\\stats\\smoothers.py:4: FutureWarning: The pandas.lib module is deprecated and will be removed in a future version. These are private functions and can be accessed from pandas._libs.lib instead\n", | |
" from pandas.lib import Timestamp\n" | |
] | |
} | |
], | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"art_museum_data = pd.read_excel(\"artmuseum.xlsx\")\n", | |
"art_museum_data.columns=[\"fee\",\"nh\",\"ne\",\"space\",\"year\",\"rail\",\"bus\",\"comp\",\"piif\",\"pub\",\"lib\",\"lec\",\"ws\",\"guide\",\"club\",\"private\",\"tokyo\"]" | |
], | |
"outputs": [], | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"X = art_museum_data.loc[:,\"nh\":\"tokyo\"]\n", | |
"Y = art_museum_data.loc[:,\"fee\"]" | |
], | |
"outputs": [], | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"X = sm.add_constant(X) # adding a constant\n", | |
"model = sm.OLS(Y, X)\n", | |
"res = model.fit()\n", | |
"print(res.summary())\n", | |
"# here the conditional number is very big, indicating strong instability of the model" | |
], | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: fee R-squared: 0.771\n", | |
"Model: OLS Adj. R-squared: 0.578\n", | |
"Method: Least Squares F-statistic: 3.990\n", | |
"Date: Sat, 26 Jan 2019 Prob (F-statistic): 0.00251\n", | |
"Time: 14:10:44 Log-Likelihood: -232.78\n", | |
"No. Observations: 36 AIC: 499.6\n", | |
"Df Residuals: 19 BIC: 526.5\n", | |
"Df Model: 16 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"const 3453.0041 3259.777 1.059 0.303 -3369.787 1.03e+04\n", | |
"nh -0.0038 0.003 -1.219 0.238 -0.010 0.003\n", | |
"ne 0.6252 0.737 0.848 0.407 -0.917 2.168\n", | |
"space 0.0168 0.005 3.672 0.002 0.007 0.026\n", | |
"year -1.0351 1.630 -0.635 0.533 -4.446 2.376\n", | |
"rail 330.2386 165.396 1.997 0.060 -15.940 676.417\n", | |
"bus -120.6369 122.653 -0.984 0.338 -377.352 136.078\n", | |
"comp -326.4530 202.917 -1.609 0.124 -751.163 98.257\n", | |
"piif -249.1368 154.648 -1.611 0.124 -572.818 74.545\n", | |
"pub -601.5254 172.905 -3.479 0.003 -963.419 -239.632\n", | |
"lib 132.4595 144.636 0.916 0.371 -170.267 435.186\n", | |
"lec 30.7109 143.474 0.214 0.833 -269.584 331.006\n", | |
"ws -28.8792 110.058 -0.262 0.796 -259.233 201.475\n", | |
"guide -460.0881 192.169 -2.394 0.027 -862.302 -57.875\n", | |
"club 135.0211 130.438 1.035 0.314 -137.989 408.031\n", | |
"private -165.2505 189.250 -0.873 0.393 -561.355 230.854\n", | |
"tokyo 43.9299 152.037 0.289 0.776 -274.287 362.146\n", | |
"==============================================================================\n", | |
"Omnibus: 1.556 Durbin-Watson: 1.890\n", | |
"Prob(Omnibus): 0.459 Jarque-Bera (JB): 1.047\n", | |
"Skew: -0.039 Prob(JB): 0.592\n", | |
"Kurtosis: 2.168 Cond. No. 2.20e+06\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.2e+06. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
] | |
} | |
], | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# 模型的理论构建\n", | |
"- $$Y(fee) = X\\beta+\\epsilon=1*\\beta_0+nh*\\beta_1+ne*\\beta_2+...+tokyo*\\beta_{16}+\\epsilon $$\n", | |
"- $\\beta=\\partial{fee}/\\partial{x}$:x每增加一单位,对应的费用增加或者减少$\\beta$元。\n" | |
], | |
"metadata": {} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# 解读回归结果\n", | |
"- 首先,可以明显的看到space的系数在0.5%的水平上显著为正(P>|t|即p-value小于0.5%,三颗星),数量上即每增加一平方米,收费增加0.0168日元。\n", | |
"- 其次,公立(pub)的博物馆收费显著(P-value小于0.5%,三颗星)低于非公立博物馆,平均价格下降600日元左右。\n", | |
"- 再次,有guide的博物馆在收费上低于其他,显著性一颗星(P=0.027在5%水平显著,1颗星)。这个很难解释\n", | |
"- 最后,$R^2=\\frac{SSR}{SST}=1-\\frac{SSE}{SST}$ 和 Adj$R^2=1-(\\frac{n-1}{n-p})(\\frac{SSE}{SST})$,其中R^2表明了加入的解释变量在多大程度解释了被解释变量的变化。AdjR^2进一步加入了解释变量的数量的惩罚项,衡量了在控制解释变量数量的情况下,解释变量多大程度上解释了被解释变量。其中R^2为70%多,adjR^2为60%左右,还是比较高的。\n", | |
"- 所有解释变量都参与,AIC 为499" | |
], | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# 尝试解释为什么guide的值为负\n", | |
"## 猜想1:guide可能和其他变量有相关性,比如pub,猜想公立博物馆有guide的比重更高" | |
], | |
"metadata": {} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"\n", | |
"### 公立博物馆中有guide的比重\n", | |
"guide_1=X['guide'][X['pub']==1].count()\n", | |
"guide_2=X['guide'][X['pub']==1].sum()\n", | |
"guide_pub = guide_2/guide_1\n", | |
"### 非公立博物馆中有guide的比重\n", | |
"guide_11 = X['guide'][X['pub']==0].count()\n", | |
"guide_21 = X['guide'][X['pub']==0].sum()\n", | |
"guide_non_pub = guide_21/guide_11\n", | |
"print(guide_pub,guide_non_pub)\n", | |
"\n" | |
], | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"0.8 0.875\n" | |
] | |
}, | |
{ | |
"output_type": "error", | |
"ename": "TypeError", | |
"evalue": "corr() missing 1 required positional argument: 'other'", | |
"traceback": [ | |
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", | |
"\u001b[1;32m<ipython-input-40-b8cc5278261c>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 13\u001b[0m \u001b[1;31m# 结果差不多,否决\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 14\u001b[0m \u001b[1;31m### 尝试2:查看guide和其他解释变量的相关性\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 15\u001b[1;33m \u001b[0mX\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpub\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcorr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[1;31mTypeError\u001b[0m: corr() missing 1 required positional argument: 'other'" | |
] | |
} | |
], | |
"execution_count": 40, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# 0.8 0.875\n", | |
"# 结果差不多,否决" | |
], | |
"metadata": {} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# 尝试2:查看guide和其他解释变量的相关性\n", | |
"- 2.1 尝试相关系数correlation\n", | |
"- 2.2 使用guide作为被解释变量,其他变量为解释变量,挑选出显著并且比较大的系数的变量,在下次回归当中删除。" | |
], | |
"metadata": {} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"### 尝试2.1:查看guide和其他解释变量的相关性\n", | |
"pub_cor = [X.pub.corr(X[i]) for i in X.columns]\n", | |
"Possible_corr_var = [X.columns[i] for i in range(15) if pub_cor[i]>0.4]\n", | |
"print(Possible_corr_var)\n", | |
"## 2.2 使用线性概率模型,查看那些变量显著影响了guide,下次从回归当中删除。\n", | |
"model_guide = sm.OLS(X.guide,X.iloc[:,X.columns!='guide']).fit().summary()\n", | |
"print(model_guide)\n", | |
"##看到 系数比较大并且比较显著的影响guide的变量为['club','lib','rail']" | |
], | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"['ne', 'space', 'pub', 'lib']\n", | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: guide R-squared: 0.752\n", | |
"Model: OLS Adj. R-squared: 0.565\n", | |
"Method: Least Squares F-statistic: 4.034\n", | |
"Date: Sat, 26 Jan 2019 Prob (F-statistic): 0.00217\n", | |
"Time: 17:07:44 Log-Likelihood: 9.5210\n", | |
"No. Observations: 36 AIC: 12.96\n", | |
"Df Residuals: 20 BIC: 38.29\n", | |
"Df Model: 15 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"const 5.0422 3.622 1.392 0.179 -2.512 12.597\n", | |
"nh -6.271e-06 3.39e-06 -1.848 0.080 -1.34e-05 8.09e-07\n", | |
"ne 0.0003 0.001 0.404 0.690 -0.001 0.002\n", | |
"space -1.107e-05 4.72e-06 -2.345 0.029 -2.09e-05 -1.22e-06\n", | |
"year -0.0024 0.002 -1.339 0.196 -0.006 0.001\n", | |
"rail 0.5902 0.140 4.213 0.000 0.298 0.882\n", | |
"bus 0.3023 0.126 2.405 0.026 0.040 0.564\n", | |
"comp -0.2342 0.230 -1.017 0.321 -0.714 0.246\n", | |
"piif -0.3642 0.160 -2.269 0.034 -0.699 -0.029\n", | |
"pub -0.4317 0.177 -2.446 0.024 -0.800 -0.063\n", | |
"lib 0.5158 0.123 4.208 0.000 0.260 0.771\n", | |
"lec -0.0249 0.167 -0.149 0.883 -0.373 0.323\n", | |
"ws 0.1129 0.126 0.899 0.379 -0.149 0.375\n", | |
"club 0.3173 0.134 2.365 0.028 0.037 0.597\n", | |
"private -0.0606 0.220 -0.276 0.786 -0.519 0.398\n", | |
"tokyo 0.2229 0.170 1.313 0.204 -0.131 0.577\n", | |
"==============================================================================\n", | |
"Omnibus: 0.236 Durbin-Watson: 1.786\n", | |
"Prob(Omnibus): 0.889 Jarque-Bera (JB): 0.042\n", | |
"Skew: -0.082 Prob(JB): 0.979\n", | |
"Kurtosis: 2.965 Cond. No. 2.10e+06\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.1e+06. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
] | |
} | |
], | |
"execution_count": 106, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"## 下面尝试把这些变量删掉看会不会影响guide的显著性" | |
], | |
"metadata": {} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"### 继续尝试2.1,尝试删掉和guide共线性比较大的变量然后回归\n", | |
"model2 = sm.OLS(Y, X.iloc[:,~X.columns.isin(Possible_corr_var)])\n", | |
"res2 = model2.fit()\n", | |
"print(res2.summary())\n", | |
"# 回归结果R^2下降严重,AIC上升较大,所以这一方法不可取,放弃\n" | |
], | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: fee R-squared: 0.413\n", | |
"Model: OLS Adj. R-squared: 0.106\n", | |
"Method: Least Squares F-statistic: 1.346\n", | |
"Date: Sat, 26 Jan 2019 Prob (F-statistic): 0.260\n", | |
"Time: 17:08:19 Log-Likelihood: -249.71\n", | |
"No. Observations: 36 AIC: 525.4\n", | |
"Df Residuals: 23 BIC: 546.0\n", | |
"Df Model: 12 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"const 6526.6179 4260.828 1.532 0.139 -2287.576 1.53e+04\n", | |
"nh 0.0014 0.004 0.346 0.733 -0.007 0.010\n", | |
"year -2.7408 2.139 -1.281 0.213 -7.165 1.684\n", | |
"rail 255.3356 185.842 1.374 0.183 -129.109 639.780\n", | |
"bus -111.6011 162.732 -0.686 0.500 -448.239 225.036\n", | |
"comp 185.7578 183.966 1.010 0.323 -194.806 566.321\n", | |
"piif 69.1083 144.591 0.478 0.637 -230.000 368.217\n", | |
"lec 156.4431 202.753 0.772 0.448 -262.984 575.870\n", | |
"ws 37.6614 125.003 0.301 0.766 -220.926 296.249\n", | |
"guide -381.7440 178.043 -2.144 0.043 -750.054 -13.434\n", | |
"club 25.1796 159.059 0.158 0.876 -303.859 354.218\n", | |
"private -298.1011 271.183 -1.099 0.283 -859.086 262.884\n", | |
"tokyo -49.7687 201.684 -0.247 0.807 -466.985 367.447\n", | |
"==============================================================================\n", | |
"Omnibus: 0.208 Durbin-Watson: 1.941\n", | |
"Prob(Omnibus): 0.901 Jarque-Bera (JB): 0.038\n", | |
"Skew: -0.075 Prob(JB): 0.981\n", | |
"Kurtosis: 2.950 Cond. No. 1.68e+06\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 1.68e+06. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n", | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: fee R-squared: 0.720\n", | |
"Model: OLS Adj. R-squared: 0.554\n", | |
"Method: Least Squares F-statistic: 4.344\n", | |
"Date: Sat, 26 Jan 2019 Prob (F-statistic): 0.00124\n", | |
"Time: 17:08:19 Log-Likelihood: -236.40\n", | |
"No. Observations: 36 AIC: 500.8\n", | |
"Df Residuals: 22 BIC: 523.0\n", | |
"Df Model: 13 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"const 2586.9108 3180.851 0.813 0.425 -4009.771 9183.592\n", | |
"nh -0.0019 0.003 -0.688 0.499 -0.008 0.004\n", | |
"ne 0.8677 0.718 1.208 0.240 -0.622 2.357\n", | |
"space 0.0199 0.004 4.830 0.000 0.011 0.028\n", | |
"year -0.6063 1.600 -0.379 0.708 -3.925 2.712\n", | |
"bus -221.9872 112.812 -1.968 0.062 -455.944 11.970\n", | |
"comp -226.8406 185.080 -1.226 0.233 -610.673 156.992\n", | |
"piif -85.5670 124.392 -0.688 0.499 -343.540 172.406\n", | |
"pub -456.0002 143.674 -3.174 0.004 -753.963 -158.038\n", | |
"lec 19.8057 129.767 0.153 0.880 -249.315 288.927\n", | |
"ws -58.8600 102.880 -0.572 0.573 -272.219 154.499\n", | |
"guide -212.0717 124.662 -1.701 0.103 -470.605 46.461\n", | |
"private -107.3435 187.088 -0.574 0.572 -495.341 280.654\n", | |
"tokyo 55.8507 135.820 0.411 0.685 -225.823 337.524\n", | |
"==============================================================================\n", | |
"Omnibus: 0.285 Durbin-Watson: 1.756\n", | |
"Prob(Omnibus): 0.867 Jarque-Bera (JB): 0.472\n", | |
"Skew: 0.055 Prob(JB): 0.790\n", | |
"Kurtosis: 2.450 Cond. No. 2.09e+06\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.09e+06. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
] | |
} | |
], | |
"execution_count": 107, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"### 继续尝试2.2\n", | |
"model3 = sm.OLS(Y, X.iloc[:,~X.columns.isin(['club','lib','rail'])])\n", | |
"res3 = model3.fit()\n", | |
"print(res3.summary())\n", | |
"# 回归结果R^2基本不变,并且看到AIC也基本没有变,所以模型的解释力没有大的变化,但是这里guide变得不显著了,所以我们无法对guide做肯定判断,但是space和pub的结果是稳定的。" | |
], | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
" OLS Regression Results \n", | |
"==============================================================================\n", | |
"Dep. Variable: fee R-squared: 0.720\n", | |
"Model: OLS Adj. R-squared: 0.554\n", | |
"Method: Least Squares F-statistic: 4.344\n", | |
"Date: Sat, 26 Jan 2019 Prob (F-statistic): 0.00124\n", | |
"Time: 17:20:33 Log-Likelihood: -236.40\n", | |
"No. Observations: 36 AIC: 500.8\n", | |
"Df Residuals: 22 BIC: 523.0\n", | |
"Df Model: 13 \n", | |
"Covariance Type: nonrobust \n", | |
"==============================================================================\n", | |
" coef std err t P>|t| [0.025 0.975]\n", | |
"------------------------------------------------------------------------------\n", | |
"const 2586.9108 3180.851 0.813 0.425 -4009.771 9183.592\n", | |
"nh -0.0019 0.003 -0.688 0.499 -0.008 0.004\n", | |
"ne 0.8677 0.718 1.208 0.240 -0.622 2.357\n", | |
"space 0.0199 0.004 4.830 0.000 0.011 0.028\n", | |
"year -0.6063 1.600 -0.379 0.708 -3.925 2.712\n", | |
"bus -221.9872 112.812 -1.968 0.062 -455.944 11.970\n", | |
"comp -226.8406 185.080 -1.226 0.233 -610.673 156.992\n", | |
"piif -85.5670 124.392 -0.688 0.499 -343.540 172.406\n", | |
"pub -456.0002 143.674 -3.174 0.004 -753.963 -158.038\n", | |
"lec 19.8057 129.767 0.153 0.880 -249.315 288.927\n", | |
"ws -58.8600 102.880 -0.572 0.573 -272.219 154.499\n", | |
"guide -212.0717 124.662 -1.701 0.103 -470.605 46.461\n", | |
"private -107.3435 187.088 -0.574 0.572 -495.341 280.654\n", | |
"tokyo 55.8507 135.820 0.411 0.685 -225.823 337.524\n", | |
"==============================================================================\n", | |
"Omnibus: 0.285 Durbin-Watson: 1.756\n", | |
"Prob(Omnibus): 0.867 Jarque-Bera (JB): 0.472\n", | |
"Skew: 0.055 Prob(JB): 0.790\n", | |
"Kurtosis: 2.450 Cond. No. 2.09e+06\n", | |
"==============================================================================\n", | |
"\n", | |
"Warnings:\n", | |
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", | |
"[2] The condition number is large, 2.09e+06. This might indicate that there are\n", | |
"strong multicollinearity or other numerical problems.\n" | |
] | |
} | |
], | |
"execution_count": 108, | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# 2.2 稳健性检验的解读(针对guide这一变量)\n", | |
"回归结果R^2基本不变,并且看到AIC也基本没有变,所以模型的解释力没有大的变化,但是这里guide变得不显著了,所以我们无法对guide做肯定判断,但是space和pub的结果是稳定的。" | |
], | |
"metadata": {} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# 可能的问题及方法\n", | |
"- 数据不足:1. 样本量太少,只有36个,可以增加调查范围或者扩大搜集博物馆的地域范围 2.其他解释变量数量不足。\n", | |
"- 可能的解释变量:\n", | |
"- A:博物馆定价方面(供给侧)1. 从博物馆的性质出发,可以看到公立博物馆收费低廉,如果进一步知道政府对公立博物馆的补贴数量和比重,可以更精确的解释为什么公立博物馆更加低廉。2. 另外,从成本出发,如果知道博物馆雇佣的员工数,工资总额及文物的修缮保养费用及其他运营费用的话可以增进分析。3. 周边环境,考虑到如果周边博物馆或者艺术馆有集聚(如北京798艺术区),受众出行到这一博物馆参观的概率会增加。\n", | |
"- B:需求方面。1. 博物馆的受众的特征,如年龄,兴趣,性别,收入等,这些变量会决定受众的支付意愿。2. 潜在的受众数量(市场大小),例如可以找到社交媒体上的#XX博物馆的帖子的数量和热度来作为相关的变量。\n", | |
"- C: 博物馆的所属种类(博物馆属于哪个市场?):例如动漫博物馆和传统的艺术博物馆几乎不存在竞争性,不应放在一起考量,搜集博物馆所属门类的变量有助于建立更为细致的模型。" | |
], | |
"metadata": { | |
"collapsed": false, | |
"outputHidden": false, | |
"inputHidden": false | |
} | |
} | |
], | |
"metadata": { | |
"kernel_info": { | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"version": 3, | |
"name": "ipython" | |
}, | |
"mimetype": "text/x-python", | |
"nbconvert_exporter": "python", | |
"version": "3.5.5", | |
"pygments_lexer": "ipython3", | |
"name": "python", | |
"file_extension": ".py" | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"language": "python", | |
"display_name": "Python 3" | |
}, | |
"nteract": { | |
"version": "0.12.3" | |
}, | |
"gist_id": "8d751a22f389ab93028c15aa95a5f7ce" | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment