Monday, 30 August 2021

Why does this gRPC call from the Google Secret Manager API hang when run by Apache?

In short:

I have a Django application being served up by Apache on a Google Compute Engine VM.

I want to access a secret from Google Secret Manager in my Python code (when the Django app is initialising).

When I do 'python manage.py runserver', the secret is successfully retrieved. However, when I get Apache to run my application, it hangs when it sends a request to the secret manager.

Too much detail:

I followed the answer to this question GCP VM Instance is not able to access secrets from Secret Manager despite of appropriate Roles. I have created a service account (not the default), and have given it the 'cloud-platform' scope. I also gave it the 'Secret Manager Admin' role in the web console.

After initially running into trouble, I downloaded the a json key for the service account from the web console, and set the GOOGLE_APPLICATION_CREDENTIALS env-var to point to it.

When I run the django server directly on the VM, everything works fine. When I let Apache run the application, I can see from the logs that the service account credential json is loaded successfully.

However, when I make my first API call, via google.cloud.secretmanager.SecretManagerServiceClient.list_secret_versions , the application hangs. I don't even get a 500 error in my browser, just an eternal loading icon. I traced the execution as far as:

grpc._channel._UnaryUnaryMultiCallable._blocking, line 926 : 'call = self._channel.segregated_call(...'

It never gets past that line. I couldn't figure out where that call goes so I couldnt inspect it any further than that.

Thoughts

I don't understand GCP service accounts / API access very well. I can't understand why this difference is occurring between the django dev server and apache, given that they're both using the same service account credentials from json. I'm also surprised that the application just hangs in the google library rather than throwing an exception. There's even a timeout option when sending a request, but changing this doesn't make any difference.

I wonder if it's somehow related to the fact that I'm running the django server under my own account, but apache is using whatever user account it uses?

Update

I tried changing the user/group that apache runs as to match my own. No change.

I enabled logging for gRPC itself. There is a clear difference between when I run with apache vs the django dev server.

On Django:

secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x17cfda0, target=secretmanager.googleapis.com:443, args=0x7fe254620f20, reserved=(nil))
init.cc:167]                grpc_init(void)
client_channel.cc:1099]     chand=0x2299b88: creating client_channel for channel stack 0x2299b18
...
timer_manager.cc:188]       sleep for a 1001 milliseconds
...
client_channel.cc:1879]     chand=0x2299b88 calld=0x229e440: created call
...
call.cc:1980]               grpc_call_start_batch(call=0x229daa0, ops=0x20cfe70, nops=6, tag=0x7fe25463c680, reserved=(nil))
call.cc:1573]               ops[0]: SEND_INITIAL_METADATA...
call.cc:1573]               ops[1]: SEND_MESSAGE ptr=0x21f7a20
...

So, a channel is created, then a call is created, and then we see gRPC start to execute the operations for that call (as far as I read it).

On Apache:

secure_channel_create.cc:178] grpc_secure_channel_create(creds=0x7fd5bc850f70, target=secretmanager.googleapis.com:443, args=0x7fd583065c50, reserved=(nil))
init.cc:167]                grpc_init(void)
client_channel.cc:1099]     chand=0x7fd5bca91bb8: creating client_channel for channel stack 0x7fd5bca91b48
...
timer_manager.cc:188]       sleep for a 1001 milliseconds
...
timer_manager.cc:188]       sleep for a 1001 milliseconds
...

So, we a channel is created... and then nothing. No call, no operations. So the python code is sitting there waiting for gRPC to make this call, which it never does.



from Why does this gRPC call from the Google Secret Manager API hang when run by Apache?

No comments:

Post a Comment