According to this website, deep belief network is just stacking multiple RBMs together, using the output of previous RBM as the input of next RBM. 
In the scikit-learn documentation, there is one example of using RBM to classify MNIST dataset. They put a RBM and a LogisticRegression in a pipeline to achieve better accuracy.
Therefore I wonder if I can add multiple RBM into that pipeline to create a Deep Belief Networks as shown in the following code.
from sklearn.neural_network import BernoulliRBM
import numpy as np
from sklearn import linear_model, datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
digits = datasets.load_digits()
X = np.asarray(digits.data, 'float32')
Y = digits.target
X = (X - np.min(X, 0)) / (np.max(X, 0) + 0.0001) # 0-1 scaling
X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
test_size=0.2,
random_state=0)
logistic = linear_model.LogisticRegression(C=100)
rbm1 = BernoulliRBM(n_components=100, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm2 = BernoulliRBM(n_components=80, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm3 = BernoulliRBM(n_components=60, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
DBN3 = Pipeline(steps=[('rbm1', rbm1),('rbm2', rbm2), ('rbm3', rbm3), ('logistic', logistic)])
DBN3.fit(X_train, Y_train)
print("Logistic regression using RBM features:\n%s\n" % (
metrics.classification_report(
Y_test,
DBN3.predict(X_test))))
However, I discover that the more RBM I add to the pipeline, the less the accuracy is.
1 RBM in pipeline --> 95%
2 RBMs in pipeline --> 93%
3 RBMs in pipeline --> 89%
The training curve below shows that 100 iterations is just right for convergent. More iterations will cause over-fitting and the likelihood will go down again.
Batch size = 10
Batch size = 256 or above
I have noticed one interesting thing. If I use a higher batch size, the performance of the network deteriorates a lot. When the batch size is above 256, the accuracy drops to only less than 10%. The training curve somehow doesn't make sense to me, with first and second RBMs don't learn much, but the third RBM suddenly learns quickly. 
It looks like 89% is somehow the bottleneck for a network with 3 RBMs.
I wonder if I am doing anything wrong here. Is my understanding of deep belief network correct?
from Stacking RBMs to create Deep belief network in sklearn

No comments:
Post a Comment