Sunday 20 August 2023

Calculating (partial) correlation from a (shrunken) covariance matrix (Help porting R code to Python)

There's a paper that I found interesting and would like to use some of the methods in Python. Erb et al. 2020 implements partial correlation on compositional data and Jin et al. 2022 implements it in an R package called Propr.

I found the function bShrink that I'm simplifying below:

library(corpcor)

# Load iris dataset (not compositional but will work for this case)
X = read.table("https://pastebin.com/raw/e3BSEZiK", sep = "\t", row.names = 1, header = TRUE, check.names = FALSE)

bShrink <- function(M){

  
  # transform counts to log proportions
  P <- M / rowSums(M)
  B <- log(P)

  # covariance shrinkage
  D  <- ncol(M)
  Cb <- cov.shrink(B,verbose=FALSE)  

  G   <- diag(rep(1,D))-matrix(1/D,D,D)
  Cov <- G%*%Cb%*%G

  # partial correlation
  PC <- cor2pcor(Cov)
    
  return(PC)
}

> bShrink(X)
           [,1]        [,2]       [,3]        [,4]
[1,]  1.0000000  0.96409509  0.6647093 -0.23827651
[2,]  0.9640951  1.00000000 -0.4585507  0.02735205
[3,]  0.6647093 -0.45855072  1.0000000  0.85903005
[4,] -0.2382765  0.02735205  0.8590301  1.00000000

Now I'm trying to port this in Python. Getting some small differences between Cb which is expected but major differences in PC which is the partial correlation (cor2pcor function).

I tried using the answers from here but couldn't get it to work: Partial Correlation in Python

Here's my Python code:

import numpy as np
import pandas as pd
from sklearn.covariance import LedoitWolf

def bShrink(M:pd.DataFrame):
    components = M.columns
    M = M.values
    P = M/M.sum(axis=1).reshape(-1,1)
    B = np.log(P)
    D = M.shape[1]
    lw_model = LedoitWolf()
    lw_model.fit(B)
    Cb = lw_model.covariance_
    G = np.eye(D) - np.ones((D, D)) / D
    Cov = G @ Cb @ G
    
    
    precision = np.linalg.inv(Cov)
    diag = np.diag(precision)
    Z = np.outer(diag, diag)
    partial = -precision / Z
    
    return pd.DataFrame(partial, index=components, columns=components)

X = pd.read_csv("https://pastebin.com/raw/e3BSEZiK", sep="\t", index_col=0)

bShrink(X)
#   sepal_length    sepal_width petal_length    petal_width
# sepal_length  -5.551115e-17   -5.551115e-17   -5.551115e-17   -5.551115e-17
# sepal_width   -5.551115e-17   -5.551115e-17   -5.551115e-17   -5.551115e-17
# petal_length  -5.551115e-17   -5.551115e-17   -5.551115e-17   -5.551115e-17
# petal_width   -5.551115e-17   -5.551115e-17   -5.551115e-17   -5.551115e-17

I'm trying to avoid using Pingouin or any other packages than numpy, pandas, and sklearn.

How can I create a partial correlation matrix from a shrunken covariance matrix?



from Calculating (partial) correlation from a (shrunken) covariance matrix (Help porting R code to Python)

No comments:

Post a Comment