ermutuxia

2020-12-02   阅读量: 6079

Python

用Python的statsmodels库计算时间序列的自相关函数和画图

扫码加入数据分析学习群

在时间序列分析课程中会需要用到自相关函数的计算,也就是当前期的值和滞后期的值之间的关系,这个指标的计算在计量软件中会比较容易实现,但是如果想要用python做怎么实现呢。代码如下:

#导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
#准备数据:模拟一个时间序列
np.random.seed(100)
price=pd.Series(np.random.rand(20),index=pd.date_range('2011-1-20',  freq='D',periods=20) )
#计算自相关函数的有关统计量
#我们用函数sm.tsa.acf来实现
#这个函数的第一个位置参数x是需要进行自相关函数值计算的时间序列,关键字unbiased参数设定是否进行无偏估计
#关键字nlags 参数设定之后期数,默认期数为40,或者是能够计算出来的最大期数。     
#关键字qstat参数用来设定是否报告每个自相关系数对应的Ljung-Box q 统计量,默认是False
#关键字fft参数用来设定是否通过傅氏变换(FFT)来计算ACF,默认取值为False
#关键字alpha参数用来设定显著性水平,如果设定了显著性水平,则会报告对应的置信区间
#键字missing参数用来设定当遇到缺失值的时候如何处理,默认值是None
#此函数的返回结果包括自相关系数,置信区间(如果没有设定alpha,则不会报告),Q统计量,p-values
#下面是此函数的帮助文件
'''
help(sm.tsa.acf)
Help on function acf in module statsmodels.tsa.stattools:
acf(x, unbiased=False, nlags=40, qstat=False, fft=None, alpha=None, missing='none')
    Calculate the autocorrelation function.
    
    Parameters
    ----------
    x : array_like
       The time series data.
    unbiased : bool
       If True, then denominators for autocovariance are n-k, otherwise n.
    nlags : int, optional
        Number of lags to return autocorrelation for.
    qstat : bool, optional
        If True, returns the Ljung-Box q statistic for each autocorrelation
        coefficient.  See q_stat for more information.
    fft : bool, optional
        If True, computes the ACF via FFT.
    alpha : scalar, optional
        If a number is given, the confidence intervals for the given level are
        returned. For instance if alpha=.05, 95 % confidence intervals are
        returned where the standard deviation is computed according to
        Bartlett's formula.
    missing : str, optional
        A string in ['none', 'raise', 'conservative', 'drop'] specifying how the NaNs
        are to be treated.
    
    Returns
    -------
    acf : ndarray
        The autocorrelation function.
    confint : ndarray, optional
        Confidence intervals for the ACF. Returned if alpha is not None.
    qstat : ndarray, optional
        The Ljung-Box Q-Statistic.  Returned if q_stat is True.
    pvalues : ndarray, optional
        The p-values associated with the Q-statistics.  Returned if q_stat is
        True.
    
    Notes
    -----
    The acf at lag 0 (ie., 1) is returned.
    
    For very long time series it is recommended to use fft convolution instead.
    When fft is False uses a simple, direct estimator of the autocovariances
    that only computes the first nlag + 1 values. This can be much faster when
    the time series is long and only a small number of autocovariances are
    needed.
    
    If unbiased is true, the denominator for the autocovariance is adjusted
    but the autocorrelation is not an unbiased estimator.
    
    References
    ----------
    .. [1] Parzen, E., 1963. On spectral analysis with missing observations
       and amplitude modulation. Sankhya: The Indian Journal of
       Statistics, Series A, pp.383-392.
'''
#下面我们进行举例应用
r,q,p=sm.tsa.acf(price.values,nlags=10, fft=True, qstat=True) #没有设定alpha参数,不报告置信区间
#数组r中存放的是自相关函数值
#q数组q中存放的是Q统计量的值
#数组p中存放的是p-value值

计算结果如下,r中有11个值,q中有10个值,p中有10个值,这是为什么呢?

这是因为r中除了有自己和滞后1-10期的值的相关系数,还有自己和滞后0期(也就是自己和自己的相关系数)。

而自己和自己的相关性肯定是不需要进行检验的。q中对应的是时间序列和滞后1-10期的自相关系数的检验统计量,p是10个对应的显著性水平。

这个有点让人崩溃,因为python毕竟不是专门的时间序列分析软件,所以做的有很多不太到位的地方。

image.png

image.png

image.png

image.png


因为上面报告的结果不太到位,,我们对数据再整理一下,让这些指标的显示格式更接近统计学软件的输出表格,并整理成一个数据框
data = np.c_[range(1,11), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print(table.set_index('lag',inplace=True))

结果如下:


table
Out[100]: 
            AC         Q  Prob(>Q)
lag                               
1.0  -0.109869  0.279543  0.597001
2.0  -0.412310  4.435092  0.108876
3.0   0.109787  4.747058  0.191284
4.0  -0.052253  4.822142  0.306038
5.0  -0.174016  5.710402  0.335425
6.0   0.019037  5.721793  0.455065
7.0   0.003418  5.722188  0.572536
8.0   0.141850  6.459978  0.595850
9.0   0.190041  7.904599  0.543788
10.0 -0.142743  8.801124  0.551076

这样我们是不是就更好理解了。



#我们如果对自相关函数画图呢?
#我们可以直接自己用matplotlib库和我们生成的数据画图,但是这样比较麻烦。
#statmodels库本身提供了比较方便的画自相关函数图的接口
#语法如下
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(111)
fig = sm.graphics.tsa.plot_acf(price.values, lags=10, ax=ax1) #自相关函数图
fig.savefig('d:\\acf.png')

image.png

help(sm.graphics.tsa.plot_acf)
Help on function plot_acf in module statsmodels.graphics.tsaplots:
plot_acf(x, ax=None, lags=None, *, alpha=0.05, use_vlines=True, unbiased=False, fft=False, missing='none', title='Autocorrelation', zero=True, vlines_kwargs=None, **kwargs)
    Plot the autocorrelation function
    
    Plots lags on the horizontal and the correlations on vertical axis.
    
    Parameters
    ----------
    x : array_like
        Array of time-series values
    ax : AxesSubplot, optional
        If given, this subplot is used to plot in instead of a new figure being
        created.
    lags : {int, array_like}, optional
        An int or array of lag values, used on horizontal axis. Uses
        np.arange(lags) when lags is an int.  If not provided,
        ``lags=np.arange(len(corr))`` is used.
    alpha : scalar, optional
        If a number is given, the confidence intervals for the given level are
        returned. For instance if alpha=.05, 95 % confidence intervals are
        returned where the standard deviation is computed according to
        Bartlett's formula. If None, no confidence intervals are plotted.
    use_vlines : bool, optional
        If True, vertical lines and markers are plotted.
        If False, only markers are plotted.  The default marker is 'o'; it can
        be overridden with a ``marker`` kwarg.
    unbiased : bool
        If True, then denominators for autocovariance are n-k, otherwise n
    fft : bool, optional
        If True, computes the ACF via FFT.
    missing : str, optional
        A string in ['none', 'raise', 'conservative', 'drop'] specifying how
        the NaNs are to be treated.
    title : str, optional
        Title to place on plot.  Default is 'Autocorrelation'
    zero : bool, optional
        Flag indicating whether to include the 0-lag autocorrelation.
        Default is True.
    vlines_kwargs : dict, optional
        Optional dictionary of keyword arguments that are passed to vlines.
    **kwargs : kwargs, optional
        Optional keyword arguments that are directly passed on to the
        Matplotlib ``plot`` and ``axhline`` functions.
    
    Returns
    -------
    Figure
        If `ax` is None, the created figure.  Otherwise the figure to which
        `ax` is connected.
    
    See Also
    --------
    matplotlib.pyplot.xcorr
    matplotlib.pyplot.acorr
    
    Notes
    -----
    Adapted from matplotlib's `xcorr`.
    
    Data are plotted as ``plot(lags, corr, **kwargs)``
    
    kwargs is used to pass matplotlib optional arguments to both the line
    tracing the autocorrelations and for the horizontal line at 0. These
    options must be valid for a Line2D object.
    
    vlines_kwargs is used to pass additional optional arguments to the
    vertical lines connecting each autocorrelation to the axis.  These options
    must be valid for a LineCollection object.


这样我们就完成了自相关函数的计算和画图。

添加CDA认证专家【维克多阿涛】,微信号:【cdashijiazhuang】,提供数据分析指导及CDA考试秘籍。已助千人通过CDA数字化人才认证。欢迎交流,共同成长!
0.3069 7 0 关注作者 收藏

评论(0)


暂无数据

推荐课程

推荐帖子