Hemant Vishwakarma: Why does single-layer perceptron converge so slow without normalization, even when the margin is large?

Tuesday, 17 December 2019

Why does single-layer perceptron converge so slow without normalization, even when the margin is large?

This question is totally re-written after I confirmed my results (the Python Notebook can be found here) with a piece of code written by someone else (can be found here). Here is that code instrumented by me to work with my data and to count epochs till convergence:

import numpy as np
from matplotlib import pyplot as plt

class Perceptron(object):
    """Implements a perceptron network"""
    def __init__(self, input_size, lr=0.1, epochs=1000000):
        self.W = np.zeros(input_size+1)
        #self.W = np.random.randn(input_size+1)
        # add one for bias
        self.epochs = epochs
        self.lr = lr

    def predict(self, x):
        z = self.W.T.dot(x)
        return [1 if self.W.T.dot(x) >=0 else 0]

    def fit(self, X, d):
        errors = []
        for epoch in range(self.epochs):
            if (epoch + 1) % 10000 == 0: print('Epoch',epoch + 1)
            total_error = 0
            for i in range(d.shape[0]):
                x = np.insert(X[i], 0, 1)
                y = self.predict(x)
                e = d[i] - y
                total_error += np.abs(e)
                self.W = self.W + self.lr * e * x
                #print('W: ', self.W)
            errors += [total_error]
            if (total_error == 0):
                print('Done after', epoch, 'epochs')
                nPlot = 100
                plt.plot(list(range(len(errors)-nPlot, len(errors))), errors[-nPlot:])
                plt.show()
                break

if __name__ == '__main__':
    trainingSet = np.array([[279.25746446, 162.44072328,   1.        ],
                            [306.23240054, 128.3794866 ,   1.        ],
                            [216.67811217, 148.58167262,   1.        ],
                            [223.64431813, 197.75745016,   1.        ],
                            [486.68209275,  96.09115377,   1.        ],
                            [400.71323154, 125.18183395,   1.        ],
                            [288.87299305, 204.52217766,   1.        ],
                            [245.1492875 ,  55.75847006,  -1.        ],
                            [ 14.95991122, 185.92681911,   1.        ],
                            [393.92908798, 193.40527965,   1.        ],
                            [494.15988362, 179.23456285,   1.        ],
                            [235.59039363, 175.50868526,   1.        ],
                            [423.72071607,   9.50166894,  -1.        ],
                            [ 76.52735621, 208.33663341,   1.        ],
                            [495.1492875 ,  -7.73818431,  -1.        ]])
    X = trainingSet[:, :2]
    d = trainingSet[:, -1]
    d = np.where(d == -1, 1, 0)
    perceptron = Perceptron(input_size=2)
    perceptron.fit(X, d)
    print(perceptron.W)

The training set consists of 15 points, with a large separation margin. The Perceptron algorithm finds a separator as shown below, but after as many as 122,346 epochs:

As the Wikipedia article explains, the number of epochs needed by the Perceptron to converge is proportional to the square of the size of the vectors and inverse-proportional to the square of the margin. In my data, the size of the vectors is large, but the margin is large as well.

I seek to understand why so many epochs are required.

Update: As per the request in the comments, I updated the code to plot the total errors of the last 100 epochs. Here is the plot:

P.S.:After scaling the features to be distributed as N(0,1), the algorithm converges after two epochs. However, I do not grasp why the algorithm would not converge in a reasonable amount of time even without such scaling.

from Why does single-layer perceptron converge so slow without normalization, even when the margin is large?

Hemant Vishwakarma

Tuesday, 17 December 2019

Why does single-layer perceptron converge so slow without normalization, even when the margin is large?

No comments:

Post a Comment