This question is totally re-written after I confirmed my results (the Python Notebook can be found here) with a piece of code written by someone else (can be found here). Here is that code instrumented by me to work with my data and to count epochs till convergence:
import numpy as np
from matplotlib import pyplot as plt
class Perceptron(object):
"""Implements a perceptron network"""
def __init__(self, input_size, lr=0.1, epochs=1000000):
self.W = np.zeros(input_size+1)
#self.W = np.random.randn(input_size+1)
# add one for bias
self.epochs = epochs
self.lr = lr
def predict(self, x):
z = self.W.T.dot(x)
return [1 if self.W.T.dot(x) >=0 else 0]
def fit(self, X, d):
errors = []
for epoch in range(self.epochs):
if (epoch + 1) % 10000 == 0: print('Epoch',epoch + 1)
total_error = 0
for i in range(d.shape[0]):
x = np.insert(X[i], 0, 1)
y = self.predict(x)
e = d[i] - y
total_error += np.abs(e)
self.W = self.W + self.lr * e * x
#print('W: ', self.W)
errors += [total_error]
if (total_error == 0):
print('Done after', epoch, 'epochs')
nPlot = 100
plt.plot(list(range(len(errors)-nPlot, len(errors))), errors[-nPlot:])
plt.show()
break
if __name__ == '__main__':
trainingSet = np.array([[279.25746446, 162.44072328, 1. ],
[306.23240054, 128.3794866 , 1. ],
[216.67811217, 148.58167262, 1. ],
[223.64431813, 197.75745016, 1. ],
[486.68209275, 96.09115377, 1. ],
[400.71323154, 125.18183395, 1. ],
[288.87299305, 204.52217766, 1. ],
[245.1492875 , 55.75847006, -1. ],
[ 14.95991122, 185.92681911, 1. ],
[393.92908798, 193.40527965, 1. ],
[494.15988362, 179.23456285, 1. ],
[235.59039363, 175.50868526, 1. ],
[423.72071607, 9.50166894, -1. ],
[ 76.52735621, 208.33663341, 1. ],
[495.1492875 , -7.73818431, -1. ]])
X = trainingSet[:, :2]
d = trainingSet[:, -1]
d = np.where(d == -1, 1, 0)
perceptron = Perceptron(input_size=2)
perceptron.fit(X, d)
print(perceptron.W)
The training set consists of 15 points, with a large separation margin. The Perceptron algorithm finds a separator as shown below, but after as many as 122,346 epochs:
As the Wikipedia article explains, the number of epochs needed by the Perceptron to converge is proportional to the square of the size of the vectors and inverse-proportional to the square of the margin. In my data, the size of the vectors is large, but the margin is large as well.
I seek to understand why so many epochs are required.
Update: As per the request in the comments, I updated the code to plot the total errors of the last 100 epochs. Here is the plot:
P.S.:After scaling the features to be distributed as N(0,1), the algorithm converges after two epochs. However, I do not grasp why the algorithm would not converge in a reasonable amount of time even without such scaling.
from Why does single-layer perceptron converge so slow without normalization, even when the margin is large?


No comments:
Post a Comment