Saturday, 2 June 2018

Plot graph with multiple attributes similar to "hue" in Seaborn

I have the following sample data set called df, where stage time is how many days to get there:

id stage1_time stage1_to_2_time stage2_time stage2_to_3_time stage3_time
a  1           3                4           3                7 
b  3   
c  2           3                5
d

I wrote the following script to get a scatter plot of stage1_time against a CDF:

#create eCDF function
def ecdf(df):
    n = len(df)
    x = np.sort(df)
    y = np.arange(1.0, n+1) / n
    return x, y

def generate_scatter_plot(df):

    x, y = ecdf(df)

    plt.plot(x, y, marker='.', linestyle='none') 
    plt.axvline(x.mean(), color='gray', linestyle='dashed', linewidth=2) #Add mean

    x_m = int(x.mean())
    y_m = stats.percentileofscore(df.as_matrix(), x.mean())/100.0

    plt.annotate('(%s,%s)' % (x_m,int(y_m*100)) , xy=(x_m,y_m), xytext=(10,-5), textcoords='offset points')

    percentiles= np.array([0,25,50,75,100])
    x_p = np.percentile(df, percentiles)
    y_p = percentiles/100.0

    plt.plot(x_p, y_p, marker='D', color='red', linestyle='none') # Overlay quartiles

    for x,y in zip(x_p, y_p):                                        
        plt.annotate('%s' % int(x), xy=(x,y), xytext=(10,-5), textcoords='offset points')

#Data to plot
stage1_time = df['stage1_time'].dropna()

#Scatter Plot
stage1_time_scatter = generate_scatter_plot(pd.DataFrame({"df" : stage1_time.as_matrix()}))
axes[0].title('Scatter Plot of Days to Stage1')
axes[0].xlabel('Days to Stage1')
axes[0].legend(('Days to Stage1', "Mean", 'Quartiles'), loc='lower right')
axes[0].margins(0.02)

plt.show()

Currently I have daysit took all who reached stage1 plotted, however what I am trying to achieve is that the scatter has three colors when I plot: those who reached stage1 and stayed there, those who moved on to stage2, and those who moved on to stage3. I would also like the counts for the data in the graph: # in stage1, # in stage2 and # in stage3.

Can anyone assist with getting there please?

FYI, intention is to use this as a base so that I can also create a graph for stage2_time, where those reaching stage_3 are highlighted a different color.



from Plot graph with multiple attributes similar to "hue" in Seaborn

No comments:

Post a Comment