Tuesday, 25 September 2018

Use 'groupby' on statistically processed R2 values- python

For my research I have a specific calculation for R2 values. It is not an R2 value directly calculated using Linregress function.

The code I am using is for statistically processed R2 value (labelled as 'best R2). I get the R2 value for entire x and y axis. However, there are multiple 'Test Events' in the data. This means I need R2 value for Individual 'Test event'

Code I am using until now to calculate R2 values (and what I need the output to be) is as follows:


import numpy, scipy,pandas as pd, matplotlib
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import scipy.stats
import copy
df=pd.read_excel("I:/Python/Excel.xlsx")
df.head()

xyDataPairs = df[['x', 'y']].values.tolist()

minDataPoints = len(xyDataPairs) - 1
# utility function
def UniqueCombinations(items, n):
if n==0:
    yield []
else:
    for i in range(len(items)):
        for cc in UniqueCombinations(items[i+1:],n-1):
            yield [items[i]]+cc

bestR2 = 0.0
bestDataPairCombination = []
bestParameters = []

for pairs in UniqueCombinations(xyDataPairs, minDataPoints):
x = []
y = []
for pair in pairs:
    x.append(pair[0])
    y.append(pair[1])
fittedParameters = numpy.polyfit(x, y, 1) # straight line
modelPredictions = numpy.polyval(fittedParameters, x)
absError = modelPredictions - y
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(y))
if Rsquared > bestR2:
    bestR2 = Rsquared
    bestDataPairCombination = copy.deepcopy(pairs)
    bestParameters = copy.deepcopy(fittedParameters)
print('best R2', bestR2)

The above best R2 value is for entire x and y columns. However, say I have to split the entire data set into four events each event has it's own R2 value. Then how do I get it? I need to get the above code give me 'bestR2' values with 'groupby' with respect to 'Test Event. It is an R2 value which is highly processed to suit the results I needed for my research project. Thus direct usage of Linregress won't help and this is the reason I calculated bestR2 differently. In short: I need the best R2 value for multiple test events as calculated by above method.


Result should be as follows:

Test_Event  best R2
1           0.999
2           0.547
3           0.845
4           0.784

Thanks for reading!!



from Use 'groupby' on statistically processed R2 values- python

No comments:

Post a Comment