Sunday, 20 January 2019

Optimizing subraph of large graph - slower than optimizing subgraph by itself

I have a very large tensorflow graph, and two sets of variables: A and B. I create two optimizers:

learning_rate = 1e-3
optimizer1 = tf.train.AdamOptimizer(learning_rate).minimize(loss_1, var_list=var_list_1)
optimizer2 = tf.train.AdamOptimizer(learning_rate).minimize(loss_2, var_list=var_list_2)

The goal here is to iteratively optimize variables 1 and variables 2. The weights from variables 2 are used in the computation of loss 1, but they're not trainable when optimizing loss 1. Meanwhile, the weights from variables 1 are not used in optimizing loss 2 (I would say this is a key asymmetry).

I am finding, weirdly, that this optimization for optimizer2 is much, much slower (2x) than if I were to just optimize that part of the graph by itself. I'm not running any summaries.

Why would this phenomenon happen? How could I fix it? I can provide more details if necessary.



from Optimizing subraph of large graph - slower than optimizing subgraph by itself

No comments:

Post a Comment