I am currently trying to implement custom optimization for a custom tensorflow layer. Without going in to much detail I have added a small code sample which illustrates how my current code works. The important part is that calculate_gradients(variables, gradients, momentum)
is a function that requires the variable values and gradients of all the variables in the layer. Furthermore this calculation contains intermediate results which have to be stored during optimization. This explains the illustrative momentum
variable. This behaviour to me makes using @custom_gradient
not possible since this does not allow me to propagate this intermediate results to the optimizer which would then have to return it to the custom gradient function for use in the calculation of the next set of gradients. Unless someone knows how this would work (question one) i have not found a way around this.
model = build_model()
for data, target in data:
with tf.GradientTape() as tap:
gradients = tape.gradient(loss(model(data), target), model.trainable_variables)
for layer in model.layers:
layer_gradients = gradients[indices] # actual indexing is not important
new_gradients = calculate_gradients(layer.variables, layer_gradients, momentum)
for variable, grad in zip(layer.variables, new_gradients):
variable.assign(grad)
Trying to implement this in the tensorflow optimizer particularly by replacing _resource_apply_dense
as shown in the documentation [1] i am running into some trouble with the layer-wise behaviour. Particularly since _resource_apply_dense
takes a variable and a gradient. The second code snippet illustrates what i am trying to to, but have currently not found a way to do the get_other_variables_and_gradients(var)
behaviour. Furthermore this solution would calculate the gradients three times for each layer which is very suboptimal.
def _resource_apply_dense(var, grad, apply_state):
other_vars_and_grads = get_other_variables_and_gradients(var)
calculate_gradients(zip((var, grad), other_vars_and_gards))
var.assign(grad)
In short, my second question is: Does anyone have an idea how to implement this behaviour and maybe even better do it without redundant calculations or even a whole new better way. Currently the optimization works when i do everything in a training loop as shown in code snippet one. So this is merely a case of integration with the tensorflow optimizer paradigm and performance since doing everything very 'pythony' with lists in a large for loop is slow.
[1] https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer
from Accessing gradient of multiple variables when applying resource [Tensorflow]
No comments:
Post a Comment