Hemant Vishwakarma: Why does Tensorflow Bernoulli distribution always return 0?

Tuesday 29 June 2021

Why does Tensorflow Bernoulli distribution always return 0?

I am working on classifying texts based on word occurrences. One of the steps is to estimate the probability of a particular text for each possible class. To do this, I am given NSAMPLES of texts from a vocabulary of NFEATURES words, each labelled with one of NLABELS class labels. From this, I construct a binary occurrence matrix where entry(sample,feature) is 1 iff text "sample" contains the word encoded by "feature".

From the occurrence matrix, we can construct a matrix of conditional probabilities and then smooth this so the probabilities are neither 0.0 or 1.0, using the following code (copied from Coursera notebook):

def laplace_smoothing(labels, binary_data, n_classes):
    # Compute the parameter estimates (adjusted fraction of documents in class that contain word)
    n_words = binary_data.shape[1]
    alpha = 1 # parameters for Laplace smoothing
    theta = np.zeros([n_classes, n_words]) # stores parameter values - prob. word given class
    for c_k in range(n_classes): # 0, 1, ..., 19
        class_mask = (labels == c_k)
        N = class_mask.sum() # number of articles in class
        theta[c_k, :] = (binary_data[class_mask, :].sum(axis=0) + alpha)/(N + alpha*2)
    return theta

To see the problem, here is code to mock up inputs and call for the result:

import tensorflow_probability as tfp
tfd = tfp.distributions

NSAMPLES = 2000   # Size of corpus
NFEATURES = 10000 # Number of words in corpus
NLABELS = 10      # Number of classes
ONE_PROB = 0.02   # Probability that binary_datum will be 1

def mock_binary_data( nsamples, nfeatures, one_prob ):
    binary_data = ( np.random.uniform( 0, 1, ( nsamples, nfeatures ) ) < one_prob ).astype( 'int32' )
    return binary_data

def mock_labels( nsamples, nlabels ):
    labels = np.random.randint( 0, nlabels, nsamples )
    return labels

binary_data = mock_binary_data( NSAMPLES, NFEATURES, ONE_PROB )
labels = mock_labels( NSAMPLES, NLABELS )
smoothed_data = laplace_smoothing( labels, binary_data, NLABELS )

bernoulli = tfd.Independent( tfd.Bernoulli( probs = smoothed_data ), reinterpreted_batch_ndims = 1 )

test_random_data = mock_binary_data( 1, NFEATURES, ONE_PROB )[ 0 ]
bernoulli.prob( test_random_data )

When I execute this, I get:

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>

that is, all the probabilities are zero. Some step here is incorrect, can you please help me find it?

from Why does Tensorflow Bernoulli distribution always return 0?

Hemant Vishwakarma

Tuesday 29 June 2021

Why does Tensorflow Bernoulli distribution always return 0?

No comments:

Post a Comment