Time Series Forecast Steps:
1. Identification of Time Series Type
2. Seasonality Tests
3. Stationarity Tests
4. Decomposition
5. Regression
6. Deep Learning
3. Stationarity Tests
In this example we perform the ADF and KPSS test and decide if the provided UBS monthly time series is stationary or not. Either Augmented Dickey Fuller Test or KPSS test both confirmed, that the provided monthly time series is stationary.
3.1. Stationarity Terms
Stationary series is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Observations from a non-stationary time series show seasonal effects, trends, and other structures that depend on the time index. Examples of often stationary series are: stock trading volume (not the stock price), coin flips, white noise, … Weak stationarity only requires the shift-invariance (in time) of the first moment and the cross moment (the auto-covariance). Strong stationarity requires the shift-invariance (in time) of the finite-dimensional distributions of a stochastic process. Unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. In a simple term, the unit root is non-stationary but does not always have a trend component.
Test — this is the result of test. We compare it to critical values.
In the case series is not stationary, we must detrend it.
p-value — The p-value reported by the test is the probability score based on which you can decide whether to reject the null hypothesis or not. If the p-value is less than a predefined alpha level (typically 0.05), we reject the null hypothesis.
Lags used — a number of lags included, where n is the length of the series.
Number of observations used — number of rows.
Critical value 1% — alpha = 0.01 implies that the null hypothesis is rejected 1 % of the time when it is in fact true.
Critical value 5% — alpha = 0.05 implies that the null hypothesis is rejected 5 % of the time when it is in fact true.
Critical value 10% — alpha = 0.1 implies that the null hypothesis is rejected 10 % of the time when it is in fact true.
3.2. Stationarity Test Types
Dickey Fuller Test - DF - tests the null hypothesis that a unit root is present in an autoregressive time series model.
Augmented Dickey Fuller Test - ADF - tests the null hypothesis that a unit root is present in a time series sample. It is an augmented version of the Dickey–Fuller test for a larger and more complicated set of time series models.
ADF - GLS - test for a unit root in an economic time series sample. It was developed by Elliott, Rothenberg and Stock (ERS) in 1992 as a modification of the augmented Dickey–Fuller test.
Phillips–Perron test - PP - a unit root test. That is, it is used in time series analysis to test the null hypothesis that a time series is integrated of order 1.
Kwiatkowski-Phillips-Schmidt-Shin - KPSS - Unit root test that tests for the stationarity of a given series around a deterministic trend.
# Install and import the needed libraries
pip install pandas-datareader
import pandas_datareader.data as web
import datetime
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pip install seaborn
import seaborn as sns
sns.set()
pip install statsmodels
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# re is the library to remove and replace the not needed characters
import re
chars_to_remove = [',']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'
# We upload the monthly stock trading volumes data in csv format to data frame. For our example I have downloaded monthly # Stock trading volumes of a bank representing the OHL strategy. Open high low (OHL) strategy refers to an intraday trading
# strategy. Trading volume is considered a solid technical indicator representing the overall market sentiment around a
# security or the market. This serves as an indication a trader must buy a stock. Most Active (Volume) helps you to identify the # stocks with highest trading volume during the month. Most active stocks offers high liquidity and lower bid-ask.
# Volume — The current month’s trading volume.
# Low — The current month’s low price.
# High — The current month’s low price.
# Open — The opening price for the specified month.
# File: https://www.kaggle.com/datasets/spribylova/monthly-ubs-stocks
UBS_volume_m = pd.read_csv("UBS_volume_m.csv", index_col=0, parse_dates=True)
UBS_volume_m.head()
UBS_volume_m=UBS_volume_m.groupby(['Date'])['Volume'].sum().reset_index()
UBS_volume_m
UBS_volume_m.dtypes
# Convert object type to integer and remove comma regular expressions
UBS_volume_m['Volume_int'] = UBS_volume_m['Volume'].str.replace(regular_expression, '', regex=True)
UBS_volume_m['Volume_int2'] = UBS_volume_m['Volume_int'].astype(int)
# Convert Date in nanoseconds to simple 7 values string
UBS_volume_m['Date_str'] = UBS_volume_m['Date'].astype(str)
UBS_volume_m['Date_str7'] = UBS_volume_m['Date_str'].str[:7]
UBS_volume_m.plot(figsize=(10,2),x='Date_str7', y='Volume_int2',color="tab:grey", label='Volume')
# Note that in March 2020 was the Stock price extremely low due to the beginning of the Covid.
UBS_volume_m
UBS_volume_m.dtypes
UBS_volume_m.plot(figsize=(13,2),x='Date_str7',y='Volume_int2', kind="bar", color="tab:grey", legend=False);
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
from statsmodels.tsa.seasonal import seasonal_decompose
UBS_volume_m_2 = UBS_volume_m[['Date','Volume_int2']]
# Rolling function will provide trend over given number of bins
UBS_volume_m_2.rolling(window = 12).mean().plot(y='Volume_int2',figsize=(8,2), color="tab:grey", title="Rolling Mean over 12 month period");
UBS_volume_m_2.rolling(window = 20).mean().plot(figsize=(8,2), color="tab:grey", title="Rolling mean over 20 month period");
UBS_volume_m_2.rolling(window = 12).var().plot(figsize=(8,2), color="tab:grey", title="Rolling Variance over 12 month period");
UBS_volume_m_2.rolling(window = 20).var().plot(figsize=(8,2), color="tab:grey", title="Rolling variance over 20 month period");
import statsmodels.api as sm
# Augmented Dickey Fuller Test
# Null Hypothesis: The series has a unit root (series is not stationary).
# Alternate Hypothesis: The series has no unit root, the trend is stationary.
# If the null hypothesis in failed to be rejected, this test may provide evidence that the series is non-stationary.
# The 0.0039 p-value obtained is less than the significance level (say 0.05) and we can reject the null hypothesis.
# The series is stationary, data does not have unit rot. We reject the null hypothesis.
# The absolute value of Test Statistic 3.72 is higher than absolute value of any Critical value ( 3.57, 2.92, 2.6 ).
# The printed value is negative -3.72, the more negative this value, the more likely we are to reject the null hypothesis.
# We can see that our statistic value of -3.72 is less than the value of -3.57 at 1%.
# This suggests as well that we can reject the null hypothesis with a significance level of less than 1%.
# 1% is a low probability that the result is a statistical fluke.
from statsmodels.tsa.stattools import adfuller
def adf_test(timeseries):
print("Results of Dickey-Fuller Test:")
dftest = adfuller(timeseries, autolag="AIC")
dfoutput = pd.Series(
dftest[0:4],
index=[
"Test Statistic",
"p-value",
"#Lags Used",
"Number of Observations Used",
],
)
for key, value in dftest[4].items():
dfoutput["Critical Value (%s)" % key] = value
print(dfoutput)
adf_test(UBS_volume_m_2["Volume_int2"])
Results of Dickey-Fuller Test:
Test Statistic -3.719042
p-value 0.003853
#Lags Used 0.000000
Number of Observations Used 49.000000
Critical Value (1%) -3.571472
Critical Value (5%) -2.922629
Critical Value (10%) -2.599336
dtype: float64
# KPSS Test
# Null Hypothesis: The process is trend stationary.
# Alternate Hypothesis: The series has a unit root (series is not stationary).
# A function is created to carry out the KPSS test on a time series.
# The 0.1 p-value obtained should be more than the significance level (say 0.05) in order to reject the null hypothesis.
# The series is stationary, we cannot reject te null hypothesis.
# The result of Test Statistic 0.165 is lower than any Critical value ( 0.34, 0.46, 0.57 ).
from statsmodels.tsa.stattools import kpss
def kpss_test(timeseries):
print("Results of KPSS Test:")
kpsstest = kpss(timeseries, regression="c")
kpss_output = pd.Series(
kpsstest[0:3], index=["Test Statistic", "p-value", "Lags Used"]
)
for key, value in kpsstest[3].items():
kpss_output["Critical Value (%s)" % key] = value
print(kpss_output)
kpss_test(UBS_volume_m_2['Volume_int2'])
Results of KPSS Test:
Test Statistic 0.164824
p-value 0.100000
Lags Used 11.000000
Critical Value (10%) 0.347000
Critical Value (5%) 0.463000
Critical Value (2.5%) 0.574000
Critical Value (1%) 0.739000
dtype: float64
Comments