用Python的statsmodels库计算时间序列的自相关函数和画图

在时间序列分析课程中会需要用到自相关函数的计算，也就是当前期的值和滞后期的值之间的关系，这个指标的计算在计量软件中会比较容易实现，但是如果想要用python做怎么实现呢。代码如下：

#导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
#准备数据：模拟一个时间序列
np.random.seed(100)
price=pd.Series(np.random.rand(20),index=pd.date_range('2011-1-20',  freq='D',periods=20) )
#计算自相关函数的有关统计量
#我们用函数sm.tsa.acf来实现
#这个函数的第一个位置参数x是需要进行自相关函数值计算的时间序列，关键字unbiased参数设定是否进行无偏估计
#关键字nlags 参数设定之后期数，默认期数为40,或者是能够计算出来的最大期数。     
#关键字qstat参数用来设定是否报告每个自相关系数对应的Ljung-Box q 统计量，默认是False
#关键字fft参数用来设定是否通过傅氏变换(FFT)来计算ACF,默认取值为False
#关键字alpha参数用来设定显著性水平，如果设定了显著性水平，则会报告对应的置信区间
#键字missing参数用来设定当遇到缺失值的时候如何处理，默认值是None
#此函数的返回结果包括自相关系数，置信区间（如果没有设定alpha,则不会报告）,Q统计量，p-values
#下面是此函数的帮助文件
'''
help(sm.tsa.acf)
Help on function acf in module statsmodels.tsa.stattools:
acf(x, unbiased=False, nlags=40, qstat=False, fft=None, alpha=None, missing='none')
    Calculate the autocorrelation function.
    
    Parameters
    ----------
    x : array_like
       The time series data.
    unbiased : bool
       If True, then denominators for autocovariance are n-k, otherwise n.
    nlags : int, optional
        Number of lags to return autocorrelation for.
    qstat : bool, optional
        If True, returns the Ljung-Box q statistic for each autocorrelation
        coefficient.  See q_stat for more information.
    fft : bool, optional
        If True, computes the ACF via FFT.
    alpha : scalar, optional
        If a number is given, the confidence intervals for the given level are
        returned. For instance if alpha=.05, 95 % confidence intervals are
        returned where the standard deviation is computed according to
        Bartlett's formula.
    missing : str, optional
        A string in ['none', 'raise', 'conservative', 'drop'] specifying how the NaNs
        are to be treated.
    
    Returns
    -------
    acf : ndarray
        The autocorrelation function.
    confint : ndarray, optional
        Confidence intervals for the ACF. Returned if alpha is not None.
    qstat : ndarray, optional
        The Ljung-Box Q-Statistic.  Returned if q_stat is True.
    pvalues : ndarray, optional
        The p-values associated with the Q-statistics.  Returned if q_stat is
        True.
    
    Notes
    -----
    The acf at lag 0 (ie., 1) is returned.
    
    For very long time series it is recommended to use fft convolution instead.
    When fft is False uses a simple, direct estimator of the autocovariances
    that only computes the first nlag + 1 values. This can be much faster when
    the time series is long and only a small number of autocovariances are
    needed.
    
    If unbiased is true, the denominator for the autocovariance is adjusted
    but the autocorrelation is not an unbiased estimator.
    
    References
    ----------
    .. [1] Parzen, E., 1963. On spectral analysis with missing observations
       and amplitude modulation. Sankhya: The Indian Journal of
       Statistics, Series A, pp.383-392.
'''
#下面我们进行举例应用
r,q,p=sm.tsa.acf(price.values,nlags=10, fft=True, qstat=True) #没有设定alpha参数，不报告置信区间
#数组r中存放的是自相关函数值
#q数组q中存放的是Q统计量的值
#数组p中存放的是p-value值

计算结果如下,r中有11个值，q中有10个值，p中有10个值，这是为什么呢？

这是因为r中除了有自己和滞后1-10期的值的相关系数，还有自己和滞后0期（也就是自己和自己的相关系数）。

而自己和自己的相关性肯定是不需要进行检验的。q中对应的是时间序列和滞后1-10期的自相关系数的检验统计量，p是10个对应的显著性水平。

这个有点让人崩溃，因为python毕竟不是专门的时间序列分析软件，所以做的有很多不太到位的地方。

因为上面报告的结果不太到位，，我们对数据再整理一下，让这些指标的显示格式更接近统计学软件的输出表格，并整理成一个数据框
data = np.c_[range(1,11), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print(table.set_index('lag',inplace=True))

结果如下：


table
Out[100]: 
            AC         Q  Prob(>Q)
lag                               
1.0  -0.109869  0.279543  0.597001
2.0  -0.412310  4.435092  0.108876
3.0   0.109787  4.747058  0.191284
4.0  -0.052253  4.822142  0.306038
5.0  -0.174016  5.710402  0.335425
6.0   0.019037  5.721793  0.455065
7.0   0.003418  5.722188  0.572536
8.0   0.141850  6.459978  0.595850
9.0   0.190041  7.904599  0.543788
10.0 -0.142743  8.801124  0.551076

这样我们是不是就更好理解了。



#我们如果对自相关函数画图呢？
#我们可以直接自己用matplotlib库和我们生成的数据画图，但是这样比较麻烦。
#statmodels库本身提供了比较方便的画自相关函数图的接口
#语法如下
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(111)
fig = sm.graphics.tsa.plot_acf(price.values, lags=10, ax=ax1) #自相关函数图
fig.savefig('d:\\acf.png')

help(sm.graphics.tsa.plot_acf)
Help on function plot_acf in module statsmodels.graphics.tsaplots:
plot_acf(x, ax=None, lags=None, *, alpha=0.05, use_vlines=True, unbiased=False, fft=False, missing='none', title='Autocorrelation', zero=True, vlines_kwargs=None, **kwargs)
Plot the autocorrelation function

Plots lags on the horizontal and the correlations on vertical axis.

Parameters
----------
x : array_like
Array of time-series values
ax : AxesSubplot, optional
If given, this subplot is used to plot in instead of a new figure being
created.
lags : {int, array_like}, optional
An int or array of lag values, used on horizontal axis. Uses
np.arange(lags) when lags is an int. If not provided,
``lags=np.arange(len(corr))`` is used.
alpha : scalar, optional
If a number is given, the confidence intervals for the given level are
returned. For instance if alpha=.05, 95 % confidence intervals are
returned where the standard deviation is computed according to
Bartlett's formula. If None, no confidence intervals are plotted.
use_vlines : bool, optional
If True, vertical lines and markers are plotted.
If False, only markers are plotted. The default marker is 'o'; it can
be overridden with a ``marker`` kwarg.
unbiased : bool
If True, then denominators for autocovariance are n-k, otherwise n
fft : bool, optional
If True, computes the ACF via FFT.
missing : str, optional
A string in ['none', 'raise', 'conservative', 'drop'] specifying how
the NaNs are to be treated.
title : str, optional
Title to place on plot. Default is 'Autocorrelation'
zero : bool, optional
Flag indicating whether to include the 0-lag autocorrelation.
Default is True.
vlines_kwargs : dict, optional
Optional dictionary of keyword arguments that are passed to vlines.
**kwargs : kwargs, optional
Optional keyword arguments that are directly passed on to the
Matplotlib ``plot`` and ``axhline`` functions.

Returns
-------
Figure
If `ax` is None, the created figure. Otherwise the figure to which
`ax` is connected.

See Also
--------
matplotlib.pyplot.xcorr
matplotlib.pyplot.acorr

Notes
-----
Adapted from matplotlib's `xcorr`.

Data are plotted as ``plot(lags, corr, **kwargs)``

kwargs is used to pass matplotlib optional arguments to both the line
tracing the autocorrelations and for the horizontal line at 0. These
options must be valid for a Line2D object.

vlines_kwargs is used to pass additional optional arguments to the
vertical lines connecting each autocorrelation to the axis. These options
must be valid for a LineCollection object.

这样我们就完成了自相关函数的计算和画图。