I need to write a simple neural network that consists of 1 output node, one hidden layer of 3 nodes, and 1 input layer (variable size). For now I am just trying to train on the xor data so lets presume that there are 3 input nodes (one node represents the bias and is always 1). The data is labeled 0,1.
I did out the equations for backpropogation and found that despite being so simple, my code does not converge to the xor data being correct.
Let W be the 3x3 matrix of weights connecting the input and hidden layer, and w be the 1x3 matrix that connects the hidden to output layer. Here are some helper functions for my method
def feed_forward_predict(x, W, w):
sigmoid = lambda x: 1/(1+np.exp(-x))
z = np.array(list(map(sigmoid, np.matmul(W, x))))
L = sigmoid(np.matmul(w, z))
return [L, z, x]
this just takes in a value and makes a prediction using the formula sig(w*sig(W*x)). We also have
def calculate_objective(data, labels, W, w):
obj = 0
for point, label in zip(data, labels):
L, z, x = feed_forward_predict(point, W, w)
obj += (label - L)**2
return obj
which calculates the Mean Squared Error for a bunch of given data points. Both of these functions should work as I checked them by hand. Now the problem comes in for the back propogation algorithm
def back_prop(traindata, trainlabels):
sigmoid = lambda x: 1/(1+np.exp(-x))
sigmoid_prime = lambda x: np.exp(-x)/((1+np.exp(-x))**2)
W = np.random.rand(3, len(traindata[0]))
w = np.random.rand(1, 3)
obj = calculate_objective(traindata, trainlabels, W, w)
print(obj)
epochs = 10_000
eta = .01
prevobj = np.inf
i=0
while(i < epochs):
prevobj = obj
dellw = np.zeros((1,3))
for point, label in zip(traindata, trainlabels):
y, z, x = feed_forward_predict(point, W, w)
dellw += 2*(y - label) * sigmoid_prime(np.dot(w, z)) * z
w -= eta * dellw
for point, label in zip(traindata, trainlabels):
y, z, x = feed_forward_predict(point, W, w)
temp = 2 * (y - label) * sigmoid_prime(np.dot(w, z))
# Note that s,u,v represent the hidden node weights. My professor required it this way
dells = temp * w[0][0] * sigmoid_prime(np.matmul(W[0,:], x)) * x
dellu = temp * w[0][1] * sigmoid_prime(np.matmul(W[1,:], x)) * x
dellv = temp * w[0][2] * sigmoid_prime(np.matmul(W[2,:], x)) * x
dellW = np.array([dells, dellu, dellv])
W -= eta*dellW
obj = calculate_objective(traindata, trainlabels, W, w)
i = i + 1
print("i=", i, " Objective=",obj)
return [W, w]
However this code, despite seemingly being correct in terms of the matrix multiplications and derivatives I took, does not converge to anything. In fact the error consistantly bounces: it will fall, then rise, then fall back to the same spot, then rise again. I believe that the problem lies with the W matrix gradient but I do not know what exactly it is.
If you'd like to see for yourself what is happening, the input data I used is
0: 0 0 1
0: 1 1 1
1: 1 0 1
1: 0 1 1
where the first number represents the label. I also set the random seed to np.random.seed(0) just so that I could be consistant with my matrices I'm dealing with.
from Neural Network Backpropogation code not working
No comments:
Post a Comment