Monday, 22 March 2021

Tensorflow - numpy gradient check doesnt work

I'm trying to estimate the gradient of a function by the finite difference method : finite difference method for estimating gradient

TLDR:

grad f(x) = [f(x+h)-f(x-h)]/(2h) for sufficiently small h.

this is also used in the gradient check phase to check your backpropagation in AI as you might know.

This is my network :

def defineModel():
  global model
  model = Sequential()
  model.add(keras.Input(shape=input_shape))
  model.add(layers.Conv2D(32, kernel_size=(3, 3), activation="relu"))
  model.add(layers.MaxPooling2D(pool_size=(2, 2)))
  model.add(layers.Conv2D(64, kernel_size=(3, 3), activation="relu"))
  model.add(layers.MaxPooling2D(pool_size=(2, 2)))
  model.add( layers.Flatten())
  model.add(layers.Dropout(0.5))
  model.add(layers.Dense(num_classes, activation="softmax"))
  model.build()
  model.summary()

This part is fine and has no bugs. I just mentioned it here so you have a sense of my model. I work on MNIST so everything is pretty straightforward. With 1 epoch and a few lines of the TF code, I reached +98% accuracy which is quite good for a makeshift model.

Since I'm doing adversarial training, I want the gradient of my loss with respect to the input data:

by the way, I used the tiling idea:

if you cover the input image with square tiles of (tile*tile) dimension with no overlapping, you can assume the gradient of the image within tiles is pretty much constant so it's a reasonable approximation. But as a matter of debugging, in my code tile=1 so we're calculating the pixel-wise gradient.

and this is where the problem lies, but I cannot figure out where! I controlled the loss for loss(x+h) and loss(x-h) and loss(x) are pretty much in a close range so I know it is fine. also my TF automatic backpropagation works fine I have tested it. the problem must be with the way of computing manual gradient.

tile=1
h=1e-4 #also tried 1e-5, 1e-6 but did not work

#A dummy function to wait
def w():
  print()
  ww=input('wait')
  print()

#This function works fine.
def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

#the problem is here. given an index suah is idx, I'm going to manipulate
#x_test[idx] and compute loss gradient with respect to this input
def estimateGrad(idx):
  y=model(np.expand_dims(x_test[idx],axis=0))
  y=np.expand_dims(y_train[idx],axis=0)
  x=np.squeeze(x_test[idx])

  

  cce = tf.keras.losses.CategoricalCrossentropy()
  grad=np.zeros((28,28))
  num=int(28/tile)

  #MNIST pictures are 28*28 pixels. Now grad is 28*28 nd.array
  #and x is input image converted to 28*28 nd.array
  for i in range(num):
    for j in range(num):

      plus=copy.deepcopy(x)
      minus=copy.deepcopy(x)
      #Now plus is x+h and minus is x-h
      plus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]=  plus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]+h
      minus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]=  minus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]-h

       
      plus=np.expand_dims(plus,axis=(0,-1))
      minus=np.expand_dims(minus,axis=(0,-1))

      #Now we pass plus and minus to model prediction in the next two lines
      plus=model(plus)
      minus=model(minus)
      
      #Since I want to find gradient of loss with respect to x, in the next 
      #two lines I will set plus=loss(x+h) and minus=loss(x-h)
      #In other words, here our finction f which we want to cumpute its grads
      #is the loss function

      plus=cce(y,plus).numpy()
      minus=cce(y,minus).numpy()

      #Applying the formola : grad=(loss(x+h)-loss(x-h))/2/h
      grad[i*tile:(i+1)*tile , j*tile:(j+1)*tile]=(plus-minus)/2/h

 #Ok now lets check our grad with TF autograd module
  x= tf.convert_to_tensor(np.expand_dims(x_test[idx], axis=0)) 
  with tf.GradientTape() as tape:
    tape.watch(x) 
    y=model(x)
    y_expanded=np.expand_dims(y_train[idx],axis=0)  
    loss=cce(y_expanded,y)

     
  
  delta=tape.gradient(loss,x)
  delta=delta.numpy()
  delta=np.squeeze(delta)

  #delta is gradients returned by TF via auto-differentiation.
  #We calculate the error and unfortunately its large
  diff=rel_error(grad,delta)

  print('diff ',diff)
  w()
  #problem : diff is very large. it should be less than 1e-4

you can refer here for my full code.



from Tensorflow - numpy gradient check doesnt work

No comments:

Post a Comment