Skip to content

Instantly share code, notes, and snippets.

@leetschau
Created October 16, 2018 08:21
Show Gist options
  • Save leetschau/5ef9007f568954cd583e06f00fb9a929 to your computer and use it in GitHub Desktop.
Save leetschau/5ef9007f568954cd583e06f00fb9a929 to your computer and use it in GitHub Desktop.
二次型的Python实现,用于测试 RStudio 中是否能够实现跨 chunk 的 Python 运行
---
title: "二次型系统的参数估计和异常检测"
output: html_notebook
---
二次型系统([Quadratic form](https://en.wikipedia.org/wiki/Quadratic_form))是只包含二次项,不包含常数和一次项的单变量或者多变量系统,例如下面分别是包含1, 2 和 3个特征变量的二次型系统:
$$
y = a x^2 \\
y = a x^2 + b xy + c y^2 \\
y = a x^2 + b y^2 + c z^2 + d xy + e xz + f yz
$$
```{r setup, include=FALSE}
library(reticulate)
use_condaenv(condaenv = 'anaconda', conda = '/home/leo/apps/miniconda3/bin/conda', required = TRUE)
```
# Python 实现
创建包含3个特征变量和一个响应变量的 **输入** dataframe:
```{python}
import pandas as pd
import numpy as np
from itertools import combinations
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import OLSInfluence
n = 200
x1 = np.random.uniform(-10, 10, n)
x2 = np.random.uniform(-4, 4, n)
x3 = np.random.uniform(-2, 8, n)
y = 2.89 * x1 ** 2 + 4.33 * x2 ** 2 + 6.1 * x1 * x2 + 5.9 * x2 * x3 + np.random.normal(size=n)
label = 'y'
features = 'x1,x2,x3'
data = pd.DataFrame(data=[x1, x2, x3, y]).T
```
实现公式生成函数,输入特征变量名称列表和向量变量名称,返回对应的二次型计算公式:
```{python}
def build_formula(label: str, features: str) -> str:
featlist = features.split(',')
quads = ' + '.join(map(lambda feat: 'I(' + feat + ' ** 2)', featlist))
ints = ' + '.join(
map(lambda feat_pair: 'I(%s * %s)' % (feat_pair[0], feat_pair[1]),
combinations(featlist, 2)))
return "%s ~ %s + %s" % (label, quads, ints)
print(build_formula(label, features))
```
使用上面的测试数据,结合二次型生成函数,检验计算结果:
```{python}
res = sm.OLS.from_formula(build_formula(label, features), data=data).fit()
print(res.params)
rst = OLSInfluence(res).summary_frame().student_resid
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment