For my research I have a specific calculation for R2 values. It is not an R2 value directly calculated using Linregress function.
The code I am using is for statistically processed R2 value (labelled as 'best R2). I get the R2 value for entire x and y axis. However, there are multiple 'Test Events' in the data. This means I need R2 value for Individual 'Test event'
Code I am using until now to calculate R2 values (and what I need the output to be) is as follows:
import numpy, scipy,pandas as pd, matplotlib
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import scipy.stats
import copy
df=pd.read_excel("I:/Python/Excel.xlsx")
df.head()
xyDataPairs = df[['x', 'y']].values.tolist()
minDataPoints = len(xyDataPairs) - 1
# utility function
def UniqueCombinations(items, n):
if n==0:
yield []
else:
for i in range(len(items)):
for cc in UniqueCombinations(items[i+1:],n-1):
yield [items[i]]+cc
bestR2 = 0.0
bestDataPairCombination = []
bestParameters = []
for pairs in UniqueCombinations(xyDataPairs, minDataPoints):
x = []
y = []
for pair in pairs:
x.append(pair[0])
y.append(pair[1])
fittedParameters = numpy.polyfit(x, y, 1) # straight line
modelPredictions = numpy.polyval(fittedParameters, x)
absError = modelPredictions - y
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(y))
if Rsquared > bestR2:
bestR2 = Rsquared
bestDataPairCombination = copy.deepcopy(pairs)
bestParameters = copy.deepcopy(fittedParameters)
print('best R2', bestR2)
The above best R2 value is for entire x and y columns. However, say I have to split the entire data set into four events each event has it's own R2 value. Then how do I get it? I need to get the above code give me 'bestR2' values with 'groupby' with respect to 'Test Event. It is an R2 value which is highly processed to suit the results I needed for my research project. Thus direct usage of Linregress won't help and this is the reason I calculated bestR2 differently. In short: I need the best R2 value for multiple test events as calculated by above method.
Result should be as follows:
Test_Event best R2
1 0.999
2 0.547
3 0.845
4 0.784
Thanks for reading!!
from Use 'groupby' on statistically processed R2 values- python
No comments:
Post a Comment