Friday, 22 February 2019

Why doesn't TF Boosted Trees accept numerical data as input?

For for tf.estimator.BoostedTreesClassifier, why do all feature columns required to be of type bucketsized or indicator column?

What is the best way to handle both the numerical, and categorical data that is used by the classifier?

It just seems impossible to work with numerical data. Decision trees are perfect since I don't even need to scale my data.

My code is as follows:

def _parse_record():
    # do something
    return {'feature_1': array[0], 'feature_2': array[190.98]}, label

def input_fn():
    # parse record
    return dataset

feature_cols = []
for _ in numerical_features:
    feature_cols.append(tf.feature_column.numeric_column(key=_))
for _ in cat:
    c = tf.feature_column.categorical_column_with_hash_bucket(key=_, hash_bucket_size=100)
    ind = tf.feature_column.indicator_column(c)
    feature_cols.append(ind)

classifier = tf.estimator.BoostedTreesClassifier(
    feature_columns=feature_cols,
    n_batches_per_layer=100,
    n_trees=100,
)

f=lambda: input_fn()
classifier.train(input_fn=f)

However, this gives me:

ValueError: For now, only bucketized_column and indicator column are supported but got: _NumericColumn(key='active_time', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)



from Why doesn't TF Boosted Trees accept numerical data as input?

No comments:

Post a Comment