For for tf.estimator.BoostedTreesClassifier
, why do all feature columns required to be of type bucketsized
or indicator
column?
What is the best way to handle both the numerical, and categorical data that is used by the classifier?
It just seems impossible to work with numerical data. Decision trees are perfect since I don't even need to scale my data.
My code is as follows:
def _parse_record():
# do something
return {'feature_1': array[0], 'feature_2': array[190.98]}, label
def input_fn():
# parse record
return dataset
feature_cols = []
for _ in numerical_features:
feature_cols.append(tf.feature_column.numeric_column(key=_))
for _ in cat:
c = tf.feature_column.categorical_column_with_hash_bucket(key=_, hash_bucket_size=100)
ind = tf.feature_column.indicator_column(c)
feature_cols.append(ind)
classifier = tf.estimator.BoostedTreesClassifier(
feature_columns=feature_cols,
n_batches_per_layer=100,
n_trees=100,
)
f=lambda: input_fn()
classifier.train(input_fn=f)
However, this gives me:
ValueError: For now, only bucketized_column and indicator column are supported but got: _NumericColumn(key='active_time', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)
from Why doesn't TF Boosted Trees accept numerical data as input?
No comments:
Post a Comment