I'm attempting to replicate Ernie Chan's example 2.7 outlined in his seminal book Algorithmic Trading (page 55) in python. There isn't much pertinent material found online but the statsmodel library is very helpful. However the eigenvector my code produces looks incorrect in that the values do not properly correlate to the test data. Here's the code in several steps:
import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta
years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')
symbols = ['BTC-USD', 'BCH-USD','ETH-USD']
df = yf.download(symbols,
start=lastyeartoday,
end=today,
progress=False)
df = df.dropna()
data = pd.DataFrame()
for symbol in symbols:
data[symbol] = df['Close'][symbol]
data.tail()
This produces the following output:
Let's plot the the three series:
# Plot the prices series
import matplotlib.pyplot as plt
%matplotlib inline
for symbol in symbols:
data[symbol].plot(figsize=(10,8))
plt.show()
Graph:
Now we run the cointegrated Johansen test on the dataset:
import numpy as np
import pandas as pd
import statsmodels.api as sm
# data = pd.read_csv("http://web.pdx.edu/~crkl/ceR/data/usyc87.txt",index_col='YEAR',sep='\s+',nrows=66)
# y = data['Y']
# c = data['C']
from statsmodels.tsa.vector_ar.vecm import coint_johansen
"""
Johansen cointegration test of the cointegration rank of a VECM
Parameters
----------
endog : array_like (nobs_tot x neqs)
Data to test
det_order : int
* -1 - no deterministic terms - model1
* 0 - constant term - model3
* 1 - linear trend
k_ar_diff : int, nonnegative
Number of lagged differences in the model.
Returns
-------
result: Holder
An object containing the results which can be accessed using dot-notation. The object’s attributes are
eig: (neqs) - Eigenvalues.
evec: (neqs x neqs) - Eigenvectors.
lr1: (neqs) - Trace statistic.
lr2: (neqs) - Maximum eigenvalue statistic.
cvt: (neqs x 3) - Critical values (90%, 95%, 99%) for trace statistic.
cvm: (neqs x 3) - Critical values (90%, 95%, 99%) for maximum eigenvalue statistic.
method: str “johansen”
r0t: (nobs x neqs) - Residuals for Δ𝑌.
rkt: (nobs x neqs) - Residuals for 𝑌−1.
ind: (neqs) - Order of eigenvalues.
"""
def joh_output(res):
output = pd.DataFrame([res.lr2,res.lr1],
index=['max_eig_stat',"trace_stat"])
print(output.T,'\n')
print("Critical values(90%, 95%, 99%) of max_eig_stat\n",res.cvm,'\n')
print("Critical values(90%, 95%, 99%) of trace_stat\n",res.cvt,'\n')
# model with constant/trend (deterministic) term with lags set to 1
joh_model = coint_johansen(data,0,1) # k_ar_diff +1 = K
joh_output(joh_model)
As the test values are far greater than the critical values we can rule out the null hypothesis and declare that there is very high cointegration between the three crpto pairs.
Now let's print the eigenvalues:
array([0.02903038, 0.01993949, 0.00584357])
The first row of our eigenvectors should be considered the strongest in that it has the shortest half-life for mean reversion:
print('Eigenvector in scientific notation:\n{0}\n'.format(joh_model.evec[0]))
print('Eigenvector in decimal notation:')
i = 0
for val in joh_model.evec[0]:
print('{0}: {1:.10f}'.format(i, val))
i += 1
Result:
Eigenvector in scientific notation: [ 2.21531848e-04 -1.70103937e-04 -9.40374745e-05]
Eigenvector in decimal notation: 0: 0.0002215318 1: -0.0001701039 2: -0.0000940375
And here's the problem I've mentioned in my introduction. Per Ernie's description these values should correlate with the hedge ratios for each of the crosses. However they are a) way to small b) two of them are negative (obviously incorrect for these three crypto pairs) and c) seem to be completely uncorrelated to the test data (e.g. BTC is obviously trading at a massive premium and should be the smallest value).
Now I'm no math genius and there's a good chance that I messed up somewhere, which is why I provided all the code/steps involved for replication. Any pointers and insights would be much appreciated. Many thanks in advance.
from Johansen Test Is Producing An Incorrect Eigenvector



No comments:
Post a Comment