top of page

Python - Time Series Forecast - 3. Stationarity


Time Series Forecast Steps:


1. Identification of Time Series Type

2. Seasonality Tests

3. Stationarity Tests

4. Decomposition

5. Regression

6. Deep Learning




3. Stationarity Tests


In this example we perform the ADF and KPSS test and decide if the provided UBS monthly time series is stationary or not. Either Augmented Dickey Fuller Test or KPSS test both confirmed, that the provided monthly time series is stationary.


3.1. Stationarity Terms


Stationary series is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Observations from a non-stationary time series show seasonal effects, trends, and other structures that depend on the time index. Examples of often stationary series are: stock trading volume (not the stock price), coin flips, white noise, … Weak stationarity only requires the shift-invariance (in time) of the first moment and the cross moment (the auto-covariance). Strong stationarity requires the shift-invariance (in time) of the finite-dimensional distributions of a stochastic process. Unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. In a simple term, the unit root is non-stationary but does not always have a trend component.


Test — this is the result of test. We compare it to critical values.

In the case series is not stationary, we must detrend it.

p-value — The p-value reported by the test is the probability score based on which you can decide whether to reject the null hypothesis or not. If the p-value is less than a predefined alpha level (typically 0.05), we reject the null hypothesis.

Lags used — a number of lags included, where n is the length of the series.

Number of observations used — number of rows.

Critical value 1% — alpha = 0.01 implies that the null hypothesis is rejected 1 % of the time when it is in fact true.

Critical value 5% — alpha = 0.05 implies that the null hypothesis is rejected 5 % of the time when it is in fact true.

Critical value 10% — alpha = 0.1 implies that the null hypothesis is rejected 10 % of the time when it is in fact true.


3.2. Stationarity Test Types


Dickey Fuller Test - DF - tests the null hypothesis that a unit root is present in an autoregressive time series model.


Augmented Dickey Fuller Test - ADF - tests the null hypothesis that a unit root is present in a time series sample. It is an augmented version of the Dickey–Fuller test for a larger and more complicated set of time series models.


ADF - GLS - test for a unit root in an economic time series sample. It was developed by Elliott, Rothenberg and Stock (ERS) in 1992 as a modification of the augmented Dickey–Fuller test.


Phillips–Perron test - PP - a unit root test. That is, it is used in time series analysis to test the null hypothesis that a time series is integrated of order 1.


Kwiatkowski-Phillips-Schmidt-Shin - KPSS - Unit root test that tests for the stationarity of a given series around a deterministic trend.


# Install and import the needed libraries

pip install pandas-datareader

import pandas_datareader.data as web

import datetime

import pandas as pd

pd.set_option('display.max_columns', None)

pd.set_option('display.max_rows', None)

pip install seaborn

import seaborn as sns

sns.set()

pip install statsmodels

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

# re is the library to remove and replace the not needed characters

import re

chars_to_remove = [',']

regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'


# We upload the monthly stock trading volumes data in csv format to data frame. For our example I have downloaded monthly # Stock trading volumes of a bank representing the OHL strategy. Open high low (OHL) strategy refers to an intraday trading

# strategy. Trading volume is considered a solid technical indicator representing the overall market sentiment around a

# security or the market. This serves as an indication a trader must buy a stock. Most Active (Volume) helps you to identify the # stocks with highest trading volume during the month. Most active stocks offers high liquidity and lower bid-ask.

# Volume — The current month’s trading volume.

# Low — The current month’s low price.

# High — The current month’s low price.

# Open — The opening price for the specified month.

# File: https://www.kaggle.com/datasets/spribylova/monthly-ubs-stocks


UBS_volume_m = pd.read_csv("UBS_volume_m.csv", index_col=0, parse_dates=True)

UBS_volume_m.head()


UBS_volume_m=UBS_volume_m.groupby(['Date'])['Volume'].sum().reset_index()


UBS_volume_m


UBS_volume_m.dtypes


# Convert object type to integer and remove comma regular expressions


UBS_volume_m['Volume_int'] = UBS_volume_m['Volume'].str.replace(regular_expression, '', regex=True)

UBS_volume_m['Volume_int2'] = UBS_volume_m['Volume_int'].astype(int)

# Convert Date in nanoseconds to simple 7 values string

UBS_volume_m['Date_str'] = UBS_volume_m['Date'].astype(str)

UBS_volume_m['Date_str7'] = UBS_volume_m['Date_str'].str[:7]

UBS_volume_m.plot(figsize=(10,2),x='Date_str7', y='Volume_int2',color="tab:grey", label='Volume')



# Note that in March 2020 was the Stock price extremely low due to the beginning of the Covid.



UBS_volume_m

UBS_volume_m.dtypes

UBS_volume_m.plot(figsize=(13,2),x='Date_str7',y='Volume_int2', kind="bar", color="tab:grey", legend=False);




import matplotlib.dates as mdates

from matplotlib.dates import DateFormatter

from statsmodels.tsa.seasonal import seasonal_decompose

UBS_volume_m_2 = UBS_volume_m[['Date','Volume_int2']]


# Rolling function will provide trend over given number of bins

UBS_volume_m_2.rolling(window = 12).mean().plot(y='Volume_int2',figsize=(8,2), color="tab:grey", title="Rolling Mean over 12 month period");



UBS_volume_m_2.rolling(window = 20).mean().plot(figsize=(8,2), color="tab:grey", title="Rolling mean over 20 month period");



UBS_volume_m_2.rolling(window = 12).var().plot(figsize=(8,2), color="tab:grey", title="Rolling Variance over 12 month period");

UBS_volume_m_2.rolling(window = 20).var().plot(figsize=(8,2), color="tab:grey", title="Rolling variance over 20 month period");


import statsmodels.api as sm



# Augmented Dickey Fuller Test


# Null Hypothesis: The series has a unit root (series is not stationary).

# Alternate Hypothesis: The series has no unit root, the trend is stationary.

# If the null hypothesis in failed to be rejected, this test may provide evidence that the series is non-stationary.

# The 0.0039 p-value obtained is less than the significance level (say 0.05) and we can reject the null hypothesis.

# The series is stationary, data does not have unit rot. We reject the null hypothesis.

# The absolute value of Test Statistic 3.72 is higher than absolute value of any Critical value ( 3.57, 2.92, 2.6 ).

# The printed value is negative -3.72, the more negative this value, the more likely we are to reject the null hypothesis.

# We can see that our statistic value of -3.72 is less than the value of -3.57 at 1%.

# This suggests as well that we can reject the null hypothesis with a significance level of less than 1%.

# 1% is a low probability that the result is a statistical fluke.


from statsmodels.tsa.stattools import adfuller



def adf_test(timeseries):

print("Results of Dickey-Fuller Test:")

dftest = adfuller(timeseries, autolag="AIC")

dfoutput = pd.Series(

dftest[0:4],

index=[

"Test Statistic",

"p-value",

"#Lags Used",

"Number of Observations Used",

],

)

for key, value in dftest[4].items():

dfoutput["Critical Value (%s)" % key] = value

print(dfoutput)



adf_test(UBS_volume_m_2["Volume_int2"])


Results of Dickey-Fuller Test:

Test Statistic -3.719042

p-value 0.003853

#Lags Used 0.000000

Number of Observations Used 49.000000

Critical Value (1%) -3.571472

Critical Value (5%) -2.922629

Critical Value (10%) -2.599336

dtype: float64


# KPSS Test


# Null Hypothesis: The process is trend stationary.

# Alternate Hypothesis: The series has a unit root (series is not stationary).

# A function is created to carry out the KPSS test on a time series.

# The 0.1 p-value obtained should be more than the significance level (say 0.05) in order to reject the null hypothesis.

# The series is stationary, we cannot reject te null hypothesis.

# The result of Test Statistic 0.165 is lower than any Critical value ( 0.34, 0.46, 0.57 ).


from statsmodels.tsa.stattools import kpss



def kpss_test(timeseries):

print("Results of KPSS Test:")

kpsstest = kpss(timeseries, regression="c")

kpss_output = pd.Series(

kpsstest[0:3], index=["Test Statistic", "p-value", "Lags Used"]

)

for key, value in kpsstest[3].items():

kpss_output["Critical Value (%s)" % key] = value

print(kpss_output)


kpss_test(UBS_volume_m_2['Volume_int2'])


Results of KPSS Test:

Test Statistic 0.164824

p-value 0.100000

Lags Used 11.000000

Critical Value (10%) 0.347000

Critical Value (5%) 0.463000

Critical Value (2.5%) 0.574000

Critical Value (1%) 0.739000

dtype: float64





29 views0 comments

Recent Posts

See All

Python - sktime

There are various libraries created for Python Time Series. Each of them has its own style, contributors and functions. Each library has...

Comments


bottom of page