2020-12-02
阅读量:
7290
用Python的statsmodels库计算时间序列的自相关函数和画图
在时间序列分析课程中会需要用到自相关函数的计算,也就是当前期的值和滞后期的值之间的关系,这个指标的计算在计量软件中会比较容易实现,但是如果想要用python做怎么实现呢。代码如下:
#导入库 import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm #准备数据:模拟一个时间序列 np.random.seed(100) price=pd.Series(np.random.rand(20),index=pd.date_range('2011-1-20', freq='D',periods=20) ) #计算自相关函数的有关统计量 #我们用函数sm.tsa.acf来实现 #这个函数的第一个位置参数x是需要进行自相关函数值计算的时间序列,关键字unbiased参数设定是否进行无偏估计 #关键字nlags 参数设定之后期数,默认期数为40,或者是能够计算出来的最大期数。 #关键字qstat参数用来设定是否报告每个自相关系数对应的Ljung-Box q 统计量,默认是False #关键字fft参数用来设定是否通过傅氏变换(FFT)来计算ACF,默认取值为False #关键字alpha参数用来设定显著性水平,如果设定了显著性水平,则会报告对应的置信区间 #键字missing参数用来设定当遇到缺失值的时候如何处理,默认值是None #此函数的返回结果包括自相关系数,置信区间(如果没有设定alpha,则不会报告),Q统计量,p-values #下面是此函数的帮助文件 ''' help(sm.tsa.acf) Help on function acf in module statsmodels.tsa.stattools: acf(x, unbiased=False, nlags=40, qstat=False, fft=None, alpha=None, missing='none') Calculate the autocorrelation function. Parameters ---------- x : array_like The time series data. unbiased : bool If True, then denominators for autocovariance are n-k, otherwise n. nlags : int, optional Number of lags to return autocorrelation for. qstat : bool, optional If True, returns the Ljung-Box q statistic for each autocorrelation coefficient. See q_stat for more information. fft : bool, optional If True, computes the ACF via FFT. alpha : scalar, optional If a number is given, the confidence intervals for the given level are returned. For instance if alpha=.05, 95 % confidence intervals are returned where the standard deviation is computed according to Bartlett's formula. missing : str, optional A string in ['none', 'raise', 'conservative', 'drop'] specifying how the NaNs are to be treated. Returns ------- acf : ndarray The autocorrelation function. confint : ndarray, optional Confidence intervals for the ACF. Returned if alpha is not None. qstat : ndarray, optional The Ljung-Box Q-Statistic. Returned if q_stat is True. pvalues : ndarray, optional The p-values associated with the Q-statistics. Returned if q_stat is True. Notes ----- The acf at lag 0 (ie., 1) is returned. For very long time series it is recommended to use fft convolution instead. When fft is False uses a simple, direct estimator of the autocovariances that only computes the first nlag + 1 values. This can be much faster when the time series is long and only a small number of autocovariances are needed. If unbiased is true, the denominator for the autocovariance is adjusted but the autocorrelation is not an unbiased estimator. References ---------- .. [1] Parzen, E., 1963. On spectral analysis with missing observations and amplitude modulation. Sankhya: The Indian Journal of Statistics, Series A, pp.383-392. ''' #下面我们进行举例应用 r,q,p=sm.tsa.acf(price.values,nlags=10, fft=True, qstat=True) #没有设定alpha参数,不报告置信区间 #数组r中存放的是自相关函数值 #q数组q中存放的是Q统计量的值 #数组p中存放的是p-value值
计算结果如下,r中有11个值,q中有10个值,p中有10个值,这是为什么呢?
这是因为r中除了有自己和滞后1-10期的值的相关系数,还有自己和滞后0期(也就是自己和自己的相关系数)。
而自己和自己的相关性肯定是不需要进行检验的。q中对应的是时间序列和滞后1-10期的自相关系数的检验统计量,p是10个对应的显著性水平。
这个有点让人崩溃,因为python毕竟不是专门的时间序列分析软件,所以做的有很多不太到位的地方。
因为上面报告的结果不太到位,,我们对数据再整理一下,让这些指标的显示格式更接近统计学软件的输出表格,并整理成一个数据框 data = np.c_[range(1,11), r[1:], q, p] table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"]) print(table.set_index('lag',inplace=True)) 结果如下: table Out[100]: AC Q Prob(>Q) lag 1.0 -0.109869 0.279543 0.597001 2.0 -0.412310 4.435092 0.108876 3.0 0.109787 4.747058 0.191284 4.0 -0.052253 4.822142 0.306038 5.0 -0.174016 5.710402 0.335425 6.0 0.019037 5.721793 0.455065 7.0 0.003418 5.722188 0.572536 8.0 0.141850 6.459978 0.595850 9.0 0.190041 7.904599 0.543788 10.0 -0.142743 8.801124 0.551076 这样我们是不是就更好理解了。 #我们如果对自相关函数画图呢? #我们可以直接自己用matplotlib库和我们生成的数据画图,但是这样比较麻烦。 #statmodels库本身提供了比较方便的画自相关函数图的接口 #语法如下 fig = plt.figure(figsize=(12,8)) ax1 = fig.add_subplot(111) fig = sm.graphics.tsa.plot_acf(price.values, lags=10, ax=ax1) #自相关函数图 fig.savefig('d:\\acf.png')
help(sm.graphics.tsa.plot_acf) Help on function plot_acf in module statsmodels.graphics.tsaplots: plot_acf(x, ax=None, lags=None, *, alpha=0.05, use_vlines=True, unbiased=False, fft=False, missing='none', title='Autocorrelation', zero=True, vlines_kwargs=None, **kwargs) Plot the autocorrelation function Plots lags on the horizontal and the correlations on vertical axis. Parameters ---------- x : array_like Array of time-series values ax : AxesSubplot, optional If given, this subplot is used to plot in instead of a new figure being created. lags : {int, array_like}, optional An int or array of lag values, used on horizontal axis. Uses np.arange(lags) when lags is an int. If not provided, ``lags=np.arange(len(corr))`` is used. alpha : scalar, optional If a number is given, the confidence intervals for the given level are returned. For instance if alpha=.05, 95 % confidence intervals are returned where the standard deviation is computed according to Bartlett's formula. If None, no confidence intervals are plotted. use_vlines : bool, optional If True, vertical lines and markers are plotted. If False, only markers are plotted. The default marker is 'o'; it can be overridden with a ``marker`` kwarg. unbiased : bool If True, then denominators for autocovariance are n-k, otherwise n fft : bool, optional If True, computes the ACF via FFT. missing : str, optional A string in ['none', 'raise', 'conservative', 'drop'] specifying how the NaNs are to be treated. title : str, optional Title to place on plot. Default is 'Autocorrelation' zero : bool, optional Flag indicating whether to include the 0-lag autocorrelation. Default is True. vlines_kwargs : dict, optional Optional dictionary of keyword arguments that are passed to vlines. **kwargs : kwargs, optional Optional keyword arguments that are directly passed on to the Matplotlib ``plot`` and ``axhline`` functions. Returns ------- Figure If `ax` is None, the created figure. Otherwise the figure to which `ax` is connected. See Also -------- matplotlib.pyplot.xcorr matplotlib.pyplot.acorr Notes ----- Adapted from matplotlib's `xcorr`. Data are plotted as ``plot(lags, corr, **kwargs)`` kwargs is used to pass matplotlib optional arguments to both the line tracing the autocorrelations and for the horizontal line at 0. These options must be valid for a Line2D object. vlines_kwargs is used to pass additional optional arguments to the vertical lines connecting each autocorrelation to the axis. These options must be valid for a LineCollection object.
这样我们就完成了自相关函数的计算和画图。






评论(0)


暂无数据
推荐帖子
0条评论
0条评论
0条评论