#参考网址http://www.statsmodels.org
statsmodels有多个api这里我们只介绍statsmodels.api,statsmodels.formula.api 用不同的api估计模型,语法是不一样的
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
#准备一些数据
data=pd.DataFrame(np.random.randint(3,10,(10,3)),columns=["y","x1","x2"])
#------------------------首先介绍用statsmodels.api实现OLS线性回归
#设定模型因变量自变量
model = sm.OLS(data["y"], data.loc[:,["x1","x2"]])
#估计模型
results = model.fit()
#查看模型估计结果
print(results.summary()) #这个结果报告表格比sklearn库更接近统计学软件的报告表格
OLS Regression Results
=======================================================================================
Dep. Variable: y R-squared (uncentered): 0.752
Model: OLS Adj. R-squared (uncentered): 0.690
Method: Least Squares F-statistic: 12.12
Date: Fri, 27 Nov 2020 Prob (F-statistic): 0.00379
Time: 19:30:18 Log-Likelihood: -24.375
No. Observations: 10 AIC: 52.75
Df Residuals: 8 BIC: 53.35
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.0033 0.443 0.007 0.994 -1.018 1.025
x2 0.8768 0.468 1.873 0.098 -0.203 1.956
==============================================================================
Omnibus: 1.234 Durbin-Watson: 0.858
Prob(Omnibus): 0.540 Jarque-Bera (JB): 0.186
Skew: -0.333 Prob(JB): 0.911
Kurtosis: 3.066 Cond. No. 5.05
==============================================================================
#results对象还有predict,resid,ssr等很多方法属性
#------------------------然后介绍用statsmodels.formula.api实现OLS线性回归
我们还可以用更接近R的语法实现上面的回归结果如下:
model2 = smf.ols('y ~ x1 + np.log(x2)', data=data).fit()
print(model2.summary())
用过R的人,应该会更喜欢用这个语法来做回归。而且可以直接用数学表达式来表示一个变量,而不用真正生成这么一个变量








暂无数据