I have a very large tensorflow graph, and two sets of variables: A and B. I create two optimizers:
learning_rate = 1e-3
optimizer1 = tf.train.AdamOptimizer(learning_rate).minimize(loss_1, var_list=var_list_1)
optimizer2 = tf.train.AdamOptimizer(learning_rate).minimize(loss_2, var_list=var_list_2)
The goal here is to iteratively optimize variables 1 and variables 2. The weights from variables 2 are used in the computation of loss 1, but they're not trainable when optimizing loss 1. Meanwhile, the weights from variables 1 are not used in optimizing loss 2 (I would say this is a key asymmetry).
I am finding, weirdly, that this optimization for optimizer2 is much, much slower (2x) than if I were to just optimize that part of the graph by itself. I'm not running any summaries.
Why would this phenomenon happen? How could I fix it? I can provide more details if necessary.
from Optimizing subraph of large graph - slower than optimizing subgraph by itself
No comments:
Post a Comment