Created
October 16, 2018 08:21
-
-
Save leetschau/5ef9007f568954cd583e06f00fb9a929 to your computer and use it in GitHub Desktop.
二次型的Python实现,用于测试 RStudio 中是否能够实现跨 chunk 的 Python 运行
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "二次型系统的参数估计和异常检测" | |
output: html_notebook | |
--- | |
二次型系统([Quadratic form](https://en.wikipedia.org/wiki/Quadratic_form))是只包含二次项,不包含常数和一次项的单变量或者多变量系统,例如下面分别是包含1, 2 和 3个特征变量的二次型系统: | |
$$ | |
y = a x^2 \\ | |
y = a x^2 + b xy + c y^2 \\ | |
y = a x^2 + b y^2 + c z^2 + d xy + e xz + f yz | |
$$ | |
```{r setup, include=FALSE} | |
library(reticulate) | |
use_condaenv(condaenv = 'anaconda', conda = '/home/leo/apps/miniconda3/bin/conda', required = TRUE) | |
``` | |
# Python 实现 | |
创建包含3个特征变量和一个响应变量的 **输入** dataframe: | |
```{python} | |
import pandas as pd | |
import numpy as np | |
from itertools import combinations | |
import statsmodels.api as sm | |
from statsmodels.stats.outliers_influence import OLSInfluence | |
n = 200 | |
x1 = np.random.uniform(-10, 10, n) | |
x2 = np.random.uniform(-4, 4, n) | |
x3 = np.random.uniform(-2, 8, n) | |
y = 2.89 * x1 ** 2 + 4.33 * x2 ** 2 + 6.1 * x1 * x2 + 5.9 * x2 * x3 + np.random.normal(size=n) | |
label = 'y' | |
features = 'x1,x2,x3' | |
data = pd.DataFrame(data=[x1, x2, x3, y]).T | |
``` | |
实现公式生成函数,输入特征变量名称列表和向量变量名称,返回对应的二次型计算公式: | |
```{python} | |
def build_formula(label: str, features: str) -> str: | |
featlist = features.split(',') | |
quads = ' + '.join(map(lambda feat: 'I(' + feat + ' ** 2)', featlist)) | |
ints = ' + '.join( | |
map(lambda feat_pair: 'I(%s * %s)' % (feat_pair[0], feat_pair[1]), | |
combinations(featlist, 2))) | |
return "%s ~ %s + %s" % (label, quads, ints) | |
print(build_formula(label, features)) | |
``` | |
使用上面的测试数据,结合二次型生成函数,检验计算结果: | |
```{python} | |
res = sm.OLS.from_formula(build_formula(label, features), data=data).fit() | |
print(res.params) | |
rst = OLSInfluence(res).summary_frame().student_resid | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment