How can I provide probabilities/weights to whether or not a sample will be chosen during Scikit-Learn's RandomForest base model creation?
For example, let's say I'm modeling 2 classes: 1) control; and 2) treatment. I also have a separate grouping that has sub-categories for each of the samples (A, B, and C).
During base model creation, I want to preferentially draw the samples based on a probability (e.g., this is based on group_2 in the table).
Can I achieve this with either the sample_weights or class_weights option? If not, is there another way to do this?
For example, either this: sample_weights={"sample_1":0.125, "sample_2":0.125, "sample_3":0.25, "sample_4":0.25, "sample_5":0.125, "sample_6":0.125} or (probably less likely, but just in case) this class_weights={"A":0.125, "B":0.75, "C":0.125}.
Are there any implementations that allow this? If not, how can I use class inheritance to build a custom RandomForestClassifier that can take in a base_sample_probability argument?
Extra Info:
I've found some resources describing class_weights parameter in practice. It appears to be used for imbalanced classes:
and here:
This modification of random forest is referred to as Weighted Random Forest.
Another approach to make random forest more suitable for learning from extremely imbalanced data follows the idea of cost sensitive learning. Since the RF classifier tends to be biased towards the majority class, we shall place a heavier penalty on misclassifying the minority class.
https://machinelearningmastery.com/bagging-and-random-forest-for-imbalanced-classification/
from How to select base-model samples in Random Forest based on some user-defined probability (Scikit-Learn)?

No comments:
Post a Comment