Wednesday, 3 March 2021

Neural Network Backpropogation code not working

I need to write a simple neural network that consists of 1 output node, one hidden layer of 3 nodes, and 1 input layer (variable size). For now I am just trying to train on the xor data so lets presume that there are 3 input nodes (one node represents the bias and is always 1). The data is labeled 0,1.

I did out the equations for backpropogation and found that despite being so simple, my code does not converge to the xor data being correct.

Let W be the 3x3 matrix of weights connecting the input and hidden layer, and w be the 1x3 matrix that connects the hidden to output layer. Here are some helper functions for my method

def feed_forward_predict(x, W, w):
    sigmoid = lambda x: 1/(1+np.exp(-x))
    z = np.array(list(map(sigmoid, np.matmul(W, x))))
    L = sigmoid(np.matmul(w, z))
    return [L, z, x]

this just takes in a value and makes a prediction using the formula sig(w*sig(W*x)). We also have

def calculate_objective(data, labels, W, w):
    obj = 0
    for point, label in zip(data, labels):
        L, z, x = feed_forward_predict(point, W, w)
        obj += (label - L)**2
    
    return obj

which calculates the Mean Squared Error for a bunch of given data points. Both of these functions should work as I checked them by hand. Now the problem comes in for the back propogation algorithm

def back_prop(traindata, trainlabels):

    sigmoid = lambda x: 1/(1+np.exp(-x))
    sigmoid_prime = lambda x: np.exp(-x)/((1+np.exp(-x))**2)

    W = np.random.rand(3, len(traindata[0]))
    w = np.random.rand(1, 3)

    obj = calculate_objective(traindata, trainlabels, W, w)
    print(obj)


    epochs = 10_000
    eta = .01
    prevobj = np.inf
    i=0
    while(i < epochs):

        prevobj = obj

        dellw = np.zeros((1,3))
        for point, label in zip(traindata, trainlabels):
            y, z, x = feed_forward_predict(point, W, w)
            dellw += 2*(y - label) * sigmoid_prime(np.dot(w, z)) * z

        w -= eta * dellw

     
        for point, label in zip(traindata, trainlabels):
            y, z, x = feed_forward_predict(point, W, w)
            temp = 2 * (y - label) * sigmoid_prime(np.dot(w, z))
            # Note that s,u,v represent the hidden node weights. My professor required it this way
            dells = temp * w[0][0] * sigmoid_prime(np.matmul(W[0,:], x)) * x
           
            dellu = temp * w[0][1] * sigmoid_prime(np.matmul(W[1,:], x)) * x
            
            dellv = temp * w[0][2] * sigmoid_prime(np.matmul(W[2,:], x)) * x


        dellW = np.array([dells, dellu, dellv])
        W -= eta*dellW

        obj = calculate_objective(traindata, trainlabels, W, w)

        i = i + 1
        print("i=", i, " Objective=",obj)

    return [W, w]

However this code, despite seemingly being correct in terms of the matrix multiplications and derivatives I took, does not converge to anything. In fact the error consistantly bounces: it will fall, then rise, then fall back to the same spot, then rise again. I believe that the problem lies with the W matrix gradient but I do not know what exactly it is.

If you'd like to see for yourself what is happening, the input data I used is

0: 0 0 1
0: 1 1 1
1: 1 0 1
1: 0 1 1

where the first number represents the label. I also set the random seed to np.random.seed(0) just so that I could be consistant with my matrices I'm dealing with.



from Neural Network Backpropogation code not working

No comments:

Post a Comment