There's a paper that I found interesting and would like to use some of the methods in Python. Erb et al. 2020 implements partial correlation on compositional data and Jin et al. 2022 implements it in an R package called Propr.
I found the function bShrink
that I'm simplifying below:
library(corpcor)
# Load iris dataset (not compositional but will work for this case)
X = read.table("https://pastebin.com/raw/e3BSEZiK", sep = "\t", row.names = 1, header = TRUE, check.names = FALSE)
bShrink <- function(M){
# transform counts to log proportions
P <- M / rowSums(M)
B <- log(P)
# covariance shrinkage
D <- ncol(M)
Cb <- cov.shrink(B,verbose=FALSE)
G <- diag(rep(1,D))-matrix(1/D,D,D)
Cov <- G%*%Cb%*%G
# partial correlation
PC <- cor2pcor(Cov)
return(PC)
}
> bShrink(X)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.96409509 0.6647093 -0.23827651
[2,] 0.9640951 1.00000000 -0.4585507 0.02735205
[3,] 0.6647093 -0.45855072 1.0000000 0.85903005
[4,] -0.2382765 0.02735205 0.8590301 1.00000000
Now I'm trying to port this in Python. Getting some small differences between Cb
which is expected but major differences in PC
which is the partial correlation (cor2pcor function).
I tried using the answers from here but couldn't get it to work: Partial Correlation in Python
Here's my Python code:
import numpy as np
import pandas as pd
from sklearn.covariance import LedoitWolf
def bShrink(M:pd.DataFrame):
components = M.columns
M = M.values
P = M/M.sum(axis=1).reshape(-1,1)
B = np.log(P)
D = M.shape[1]
lw_model = LedoitWolf()
lw_model.fit(B)
Cb = lw_model.covariance_
G = np.eye(D) - np.ones((D, D)) / D
Cov = G @ Cb @ G
precision = np.linalg.inv(Cov)
diag = np.diag(precision)
Z = np.outer(diag, diag)
partial = -precision / Z
return pd.DataFrame(partial, index=components, columns=components)
X = pd.read_csv("https://pastebin.com/raw/e3BSEZiK", sep="\t", index_col=0)
bShrink(X)
# sepal_length sepal_width petal_length petal_width
# sepal_length -5.551115e-17 -5.551115e-17 -5.551115e-17 -5.551115e-17
# sepal_width -5.551115e-17 -5.551115e-17 -5.551115e-17 -5.551115e-17
# petal_length -5.551115e-17 -5.551115e-17 -5.551115e-17 -5.551115e-17
# petal_width -5.551115e-17 -5.551115e-17 -5.551115e-17 -5.551115e-17
I'm trying to avoid using Pingouin or any other packages than numpy, pandas, and sklearn.
How can I create a partial correlation matrix from a shrunken covariance matrix?
from Calculating (partial) correlation from a (shrunken) covariance matrix (Help porting R code to Python)
No comments:
Post a Comment