2020-12-02
阅读量:
7811
用Python的statsmodels库计算时间序列的自相关函数和画图
在时间序列分析课程中会需要用到自相关函数的计算,也就是当前期的值和滞后期的值之间的关系,这个指标的计算在计量软件中会比较容易实现,但是如果想要用python做怎么实现呢。代码如下:
#导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
#准备数据:模拟一个时间序列
np.random.seed(100)
price=pd.Series(np.random.rand(20),index=pd.date_range('2011-1-20', freq='D',periods=20) )
#计算自相关函数的有关统计量
#我们用函数sm.tsa.acf来实现
#这个函数的第一个位置参数x是需要进行自相关函数值计算的时间序列,关键字unbiased参数设定是否进行无偏估计
#关键字nlags 参数设定之后期数,默认期数为40,或者是能够计算出来的最大期数。
#关键字qstat参数用来设定是否报告每个自相关系数对应的Ljung-Box q 统计量,默认是False
#关键字fft参数用来设定是否通过傅氏变换(FFT)来计算ACF,默认取值为False
#关键字alpha参数用来设定显著性水平,如果设定了显著性水平,则会报告对应的置信区间
#键字missing参数用来设定当遇到缺失值的时候如何处理,默认值是None
#此函数的返回结果包括自相关系数,置信区间(如果没有设定alpha,则不会报告),Q统计量,p-values
#下面是此函数的帮助文件
'''
help(sm.tsa.acf)
Help on function acf in module statsmodels.tsa.stattools:
acf(x, unbiased=False, nlags=40, qstat=False, fft=None, alpha=None, missing='none')
Calculate the autocorrelation function.
Parameters
----------
x : array_like
The time series data.
unbiased : bool
If True, then denominators for autocovariance are n-k, otherwise n.
nlags : int, optional
Number of lags to return autocorrelation for.
qstat : bool, optional
If True, returns the Ljung-Box q statistic for each autocorrelation
coefficient. See q_stat for more information.
fft : bool, optional
If True, computes the ACF via FFT.
alpha : scalar, optional
If a number is given, the confidence intervals for the given level are
returned. For instance if alpha=.05, 95 % confidence intervals are
returned where the standard deviation is computed according to
Bartlett's formula.
missing : str, optional
A string in ['none', 'raise', 'conservative', 'drop'] specifying how the NaNs
are to be treated.
Returns
-------
acf : ndarray
The autocorrelation function.
confint : ndarray, optional
Confidence intervals for the ACF. Returned if alpha is not None.
qstat : ndarray, optional
The Ljung-Box Q-Statistic. Returned if q_stat is True.
pvalues : ndarray, optional
The p-values associated with the Q-statistics. Returned if q_stat is
True.
Notes
-----
The acf at lag 0 (ie., 1) is returned.
For very long time series it is recommended to use fft convolution instead.
When fft is False uses a simple, direct estimator of the autocovariances
that only computes the first nlag + 1 values. This can be much faster when
the time series is long and only a small number of autocovariances are
needed.
If unbiased is true, the denominator for the autocovariance is adjusted
but the autocorrelation is not an unbiased estimator.
References
----------
.. [1] Parzen, E., 1963. On spectral analysis with missing observations
and amplitude modulation. Sankhya: The Indian Journal of
Statistics, Series A, pp.383-392.
'''
#下面我们进行举例应用
r,q,p=sm.tsa.acf(price.values,nlags=10, fft=True, qstat=True) #没有设定alpha参数,不报告置信区间
#数组r中存放的是自相关函数值
#q数组q中存放的是Q统计量的值
#数组p中存放的是p-value值计算结果如下,r中有11个值,q中有10个值,p中有10个值,这是为什么呢?
这是因为r中除了有自己和滞后1-10期的值的相关系数,还有自己和滞后0期(也就是自己和自己的相关系数)。
而自己和自己的相关性肯定是不需要进行检验的。q中对应的是时间序列和滞后1-10期的自相关系数的检验统计量,p是10个对应的显著性水平。
这个有点让人崩溃,因为python毕竟不是专门的时间序列分析软件,所以做的有很多不太到位的地方。




因为上面报告的结果不太到位,,我们对数据再整理一下,让这些指标的显示格式更接近统计学软件的输出表格,并整理成一个数据框
data = np.c_[range(1,11), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print(table.set_index('lag',inplace=True))
结果如下:
table
Out[100]:
AC Q Prob(>Q)
lag
1.0 -0.109869 0.279543 0.597001
2.0 -0.412310 4.435092 0.108876
3.0 0.109787 4.747058 0.191284
4.0 -0.052253 4.822142 0.306038
5.0 -0.174016 5.710402 0.335425
6.0 0.019037 5.721793 0.455065
7.0 0.003418 5.722188 0.572536
8.0 0.141850 6.459978 0.595850
9.0 0.190041 7.904599 0.543788
10.0 -0.142743 8.801124 0.551076
这样我们是不是就更好理解了。
#我们如果对自相关函数画图呢?
#我们可以直接自己用matplotlib库和我们生成的数据画图,但是这样比较麻烦。
#statmodels库本身提供了比较方便的画自相关函数图的接口
#语法如下
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(111)
fig = sm.graphics.tsa.plot_acf(price.values, lags=10, ax=ax1) #自相关函数图
fig.savefig('d:\\acf.png')
help(sm.graphics.tsa.plot_acf)
Help on function plot_acf in module statsmodels.graphics.tsaplots:
plot_acf(x, ax=None, lags=None, *, alpha=0.05, use_vlines=True, unbiased=False, fft=False, missing='none', title='Autocorrelation', zero=True, vlines_kwargs=None, **kwargs)
Plot the autocorrelation function
Plots lags on the horizontal and the correlations on vertical axis.
Parameters
----------
x : array_like
Array of time-series values
ax : AxesSubplot, optional
If given, this subplot is used to plot in instead of a new figure being
created.
lags : {int, array_like}, optional
An int or array of lag values, used on horizontal axis. Uses
np.arange(lags) when lags is an int. If not provided,
``lags=np.arange(len(corr))`` is used.
alpha : scalar, optional
If a number is given, the confidence intervals for the given level are
returned. For instance if alpha=.05, 95 % confidence intervals are
returned where the standard deviation is computed according to
Bartlett's formula. If None, no confidence intervals are plotted.
use_vlines : bool, optional
If True, vertical lines and markers are plotted.
If False, only markers are plotted. The default marker is 'o'; it can
be overridden with a ``marker`` kwarg.
unbiased : bool
If True, then denominators for autocovariance are n-k, otherwise n
fft : bool, optional
If True, computes the ACF via FFT.
missing : str, optional
A string in ['none', 'raise', 'conservative', 'drop'] specifying how
the NaNs are to be treated.
title : str, optional
Title to place on plot. Default is 'Autocorrelation'
zero : bool, optional
Flag indicating whether to include the 0-lag autocorrelation.
Default is True.
vlines_kwargs : dict, optional
Optional dictionary of keyword arguments that are passed to vlines.
**kwargs : kwargs, optional
Optional keyword arguments that are directly passed on to the
Matplotlib ``plot`` and ``axhline`` functions.
Returns
-------
Figure
If `ax` is None, the created figure. Otherwise the figure to which
`ax` is connected.
See Also
--------
matplotlib.pyplot.xcorr
matplotlib.pyplot.acorr
Notes
-----
Adapted from matplotlib's `xcorr`.
Data are plotted as ``plot(lags, corr, **kwargs)``
kwargs is used to pass matplotlib optional arguments to both the line
tracing the autocorrelations and for the horizontal line at 0. These
options must be valid for a Line2D object.
vlines_kwargs is used to pass additional optional arguments to the
vertical lines connecting each autocorrelation to the axis. These options
must be valid for a LineCollection object.
这样我们就完成了自相关函数的计算和画图。
0.3069
7
0
关注作者
收藏
评论(0)
发表评论
暂无数据
推荐帖子
0条评论
0条评论
0条评论

