I have trained RF model using binary classification target variable. As my data is imbalanced, I'm using class_weight='balanced'
(I saw that it might be the cause, could not find solution for this problem). When plotting a tree from the model, I get the following output:
code:
import graphviz
from sklearn import tree
import os
# Assuming your Random Forest model is named 'model'
trees = model.estimators_
# Plot the first tree
dot_data = tree.export_graphviz(trees[0], out_file=None, filled=True, rounded=True, special_characters=True) #), feature_names=X_rf.columns)
graph = graphviz.Source(dot_data)
graph
When I'm not using class_weight='balanced'
in the model training, I get the expected behaviour.
I expect to see integer values at the value attribute, like value = [124, 2145]
, for each node.
Edit
I have tried adding proportion=True, as suggested in the comments (The answer I found in the reference to the Github issue).
It's only changing the value from count to proportion. It doesn't solve the issue.
It seems that only multiplying the value by the inverse weight can solve the problem, but I couldn't find an implementation for that (reminder - the problem is only in visualizing the results, not in the performance of the model).
from Random Forest classifier value is not integer
No comments:
Post a Comment