A colleague recommends that I use scipy.optimize.linprog to optimise my algorithm. I'm not sure though because I have a dataframe linkage problem, not a function. I'll explain.
I have two dataframes (df_1 & df_2) and some variables (A,B,C).
My algorithm uses A,B,C to calculate a score (S) for each row in df_2.
As A,B,C are varied, my algorithm finds the row in df_2 with the highest score (S).
For each value of column 'O' in df_1, I want the top scoring row in df_2 to ideally have the same value of column 'M' as df_1. To do that, I want to maximise p_hat, a measure of the similarity between them.
I want to find the values of A,B,C that give the maximum p_hat.
I can vary A,B or C to see which gives the maximum p_hat but I would like to use an optimisation algorithm to make sure that I get the maximum value please.
By varying A,B,C how can I get the maximum p_hat and what optimisation code can I do this with please? Does scipy.optimize.linprog work for this type of thing please?
df_1 = pd.DataFrame({'O' : [1,2,3],
'M' : [2,8,3]})
df_2 = pd.DataFrame({'O' : [1,1,1,
2,2,2,
3,3,3],
'M' : [9,2,4,
6,7,8,
5,3,4],
'X' : [2,4,6,
4,8,7,
3,1,9],
'Y' : [3,6,1,
4,6,5,
1,0,7],
'Z' : [2,4,8,
3,5,4,
7,5,1]})
df_1 = df_1.set_index('O')
M_G = df_1.M
# Variables
# Feature 1
F_min = -10
F_max = 10
F_ste = (F_max - F_min) / 300
L_1 = np.arange(F_min,F_max+F_ste,F_ste).tolist()
# Feature 2
L_2 = [0.0, 0.5, 1.0]
# Results
di_Re = {}
for F_2 in L_2:
# Results
df_Re = pd.DataFrame(data={'O':df_1.index})
df_Re = df_Re.set_index('O')
for F_1 in L_1:
A = F_1
B = F_2
C = 2
# Score
df_2['S'] = df_2['X']*A + df_2['Y']*B + df_2['Z']*C
# Top score
Ma_To = df_2.sort_values(['S', 'X', 'M'], ascending=[False, True, True])
Ma_Or = Ma_To.set_index('O')
M_Top = Ma_Or[~Ma_Or.index.duplicated(keep='first')].M
# Compare the top scoring Row for each T to df_1
M_Top = M_Top.sort_index()
M_G = M_G.sort_index()
R_G = M_G.reindex(M_Top.index)
T_N_T = M_Top == R_G
# Record the results
df_R_ = pd.DataFrame({'T_N_T':T_N_T})
df_R_.columns = [F_1]
df_Re = pd.concat([df_Re, df_R_], axis=1)
# p hat
df_Re.loc['p_hat'] = df_Re.sum()/len(df_Re.index)
df_RT = df_Re.T
di_Re[F_2] = df_RT
# Plot
ax = plt.gca()
# Line style
style = {
L_2[0]: ':',
L_2[1]: '--',
L_2[2]: '-'}
for k in di_Re:
di_Re[k].plot(ax=ax, y='p_hat', label=round(k, 2), linewidth=4, style=style[k], c='k')
plt.xlabel('A')
plt.ylabel('p\u0302')
ax.legend(title='B',
loc="center left",
bbox_to_anchor=(1, 0, 0.5, 1))
font = {'weight' : 'normal',
'size' : 30}
plt.rc('font', **font)
fig = plt.gcf()
fig.set_size_inches(6,6)
plt.xlim([F_min,F_max])
Tick_step = (F_max - F_min) / 2
plt.xticks(np.arange(F_min,F_max+Tick_step,Tick_step))
plt.minorticks_on()
plt.grid(b=True, which='major', color='b')
plt.grid(b=True, which='minor', color='b')
plt.show()
from Optimise a step function in Pandas using data
No comments:
Post a Comment