Hemant Vishwakarma: python - matplot lib sub-plot grid: where to insert row/column arguments

For context, I'm working with SKLearn's text analysis topic extraction documentation script for displaying the top words for a given fit. But my actual issue is toggling matplotlib.

How to reference sub-plot row/column locations?

Extracting subplot coordinates in Python

This question asks about coordinates of subplots, but I can't find a way to use this info to help me with my for loop, which is supposed to plot the top words from a list of data inputs (running the model with different data at each iteration and plotting results in a distinct sub plot):

tf_list = [cm_array, xb_array, array_3, array_4, array_5, array_6, array_7]

for i in range(enumerate(tf_list)):
    tf = tf_vectorizer.fit_transform(tf_list[i])
    n_components = 1
    lda.fit(tf)
    n_top_words = 20
    tf_feature_names = tf_vectorizer.get_feature_names_out()
    top_word_comparison(lda, tf_feature_names, n_top_words, "Topics in LDA model")

I think this should work in theory, but the trouble is I can't figure out how to change the documentation's plot function to incorporate different fits. The furthest I got (with the help of Alex):

   def top_word_comparison(axes, model, feature_names, n_top_words, subplot_title):
    #column logic
    for j in range(len(tf_list)):
        top_features_ind = model.components_.argsort()[: -n_top_words - 1 : -1]
        top_features = [feature_names[i] for i in top_features_ind]
        weights = model.components_[top_features_ind]
        
        #print(len(model.components_))
        print(weights)
        ax = axes[j]
        ax.barh(top_features, weights, height=0.7)
        ax.set_title(subplot_title, fontdict={"fontsize": 30})
        ax.invert_yaxis()
        ax.tick_params(axis="both", which="major", labelsize=20)
        for i in "top right left".split():
            ax.spines[i].set_visible(False)

#tf_list = [cm_array, xb_array]
fig, axes = plt.subplots(2, 5, figsize=(30, 15), sharex=True)
fig.suptitle("Topics in LDA model", fontsize=40)

for i in range(len(tf_list)):
    tf = tf_vectorizer.fit_transform(tf_list[i])
    n_components = 1
    lda.fit(tf)
    n_top_words = 20
    tf_feature_names = tf_vectorizer.get_feature_names_out()
    top_word_comparison(axes[0], lda, tf_feature_names, n_top_words, sector_list[i])

plt.subplots_adjust(top=0.90, bottom=0.05, wspace=0.90, hspace=0.3)
plt.show()

Getting the error:

IndexError: index 735 is out of bounds for axis 0 with size 1

Which leads me to think that when I changed:

for topic_idx, topic in enumerate(model.components_):
    top_features_ind = topic.argsort()[: -n_top_words - 1 : -1]
    top_features = [feature_names[i] for i in top_features_ind]
    weights = topic[top_features_ind]

to:

for j in range(len(tf_list)):
        top_features_ind = model.components_.argsort()[: -n_top_words - 1 : -1]
        top_features = [feature_names[i] for i in top_features_ind]
        weights = model.components_[top_features_ind]

Conclusion

Even though each fit only has `1` for `components_`, it seems that I can't just replace `topic` with `model.components_` every time it pops up. So, the trouble is:

My LDA model just has one component for each run, so we are not plotting one sub plot per component like we might see in the documentation
Instead, we are trying to plot sub plots based on entirely new model fits and for that reason, it would make sense to loop over the number of fits/data elements in tf_list. However, when we do so, the matrix algebra seems to collapse

from python - matplot lib sub-plot grid: where to insert row/column arguments

Hemant Vishwakarma

Thursday, 27 January 2022

python - matplot lib sub-plot grid: where to insert row/column arguments

How to reference sub-plot row/column locations?

Conclusion

No comments:

Post a Comment