I can correlate two arrays of difdferent length using this method:
import pandas as pd
import numpy as np
from scipy.stats.stats import pearsonr
a = [0, 0.4, 0.2, 0.4, 0.2, 0.4, 0.2, 0.5]
b = [25, 40, 62, 58, 53, 54]
df = pd.DataFrame(dict(x=a))
CORR_VALS = np.array(b)
def get_correlation(vals):
return pearsonr(vals, CORR_VALS)[0]
df['correlation'] = df.rolling(window=len(CORR_VALS)).apply(get_correlation)
It get a result like this:
In [1]: df
Out[1]:
x correlation
0 0.0 NaN
1 0.4 NaN
2 0.2 NaN
3 0.4 NaN
4 0.2 NaN
5 0.4 0.527932
6 0.2 -0.159167
7 0.5 0.189482
First of all, the pearson coeff should just be the highest number in this dataset...
Secondly, how could I do this for multiple sets of data? I would like an output like I would get in df.corr(). With the indices and columns labeled appropriately.
for example, say I have the following datasets:
a = [0, 0.4, 0.2, 0.4, 0.2, 0.4, 0.2, 0.5]
b = [25, 40, 62, 58, 53, 54]
c = [ 0, 0.4, 0.2, 0.4, 0.2, 0.45, 0.2, 0.52, 0.52, 0.4, 0.21, 0.2, 0.4, 0.51]
d = [ 0.4, 0.2, 0.5]
I want a correlation matrix of four Pearson coeffs...
from Numpy/Pandas correlate multiple arrays of different length
No comments:
Post a Comment