Thursday, 22 November 2018

Stacking RBMs to create Deep belief network in sklearn

According to this website, deep belief network is just stacking multiple RBMs together, using the output of previous RBM as the input of next RBM. enter image description here

In the scikit-learn documentation, there is one example of using RBM to classify MNIST dataset. They put a RBM and a LogisticRegression in a pipeline to achieve better accuracy.

Therefore I wonder if I can add multiple RBM into that pipeline to create a Deep Belief Networks as shown in the following code.

from sklearn.neural_network import BernoulliRBM
import numpy as np
from sklearn import linear_model, datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

digits = datasets.load_digits()
X = np.asarray(digits.data, 'float32')
Y = digits.target
X = (X - np.min(X, 0)) / (np.max(X, 0) + 0.0001)  # 0-1 scaling

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
                                                    test_size=0.2,
                                                    random_state=0)

logistic = linear_model.LogisticRegression(C=100)
rbm1 = BernoulliRBM(n_components=100, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm2 = BernoulliRBM(n_components=80, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm3 = BernoulliRBM(n_components=60, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
DBN3 = Pipeline(steps=[('rbm1', rbm1),('rbm2', rbm2), ('rbm3', rbm3), ('logistic', logistic)])

DBN3.fit(X_train, Y_train)

print("Logistic regression using RBM features:\n%s\n" % (
    metrics.classification_report(
        Y_test,
        DBN3.predict(X_test))))

However, I discover that the more RBM I add to the pipeline, the less the accuracy is.

1 RBM in pipeline --> 95%

2 RBMs in pipeline --> 93%

3 RBMs in pipeline --> 89%

The training curve below shows that 100 iterations is just right for convergent. More iterations will cause over-fitting and the likelihood will go down again.

Batch size = 10

enter image description here

Batch size = 256 or above

I have noticed one interesting thing. If I use a higher batch size, the performance of the network deteriorates a lot. When the batch size is above 256, the accuracy drops to only less than 10%. The training curve somehow doesn't make sense to me, with first and second RBMs don't learn much, but the third RBM suddenly learns quickly. enter image description here

It looks like 89% is somehow the bottleneck for a network with 3 RBMs.

I wonder if I am doing anything wrong here. Is my understanding of deep belief network correct?



from Stacking RBMs to create Deep belief network in sklearn

No comments:

Post a Comment