We host elatsicsearch cluster on Elastic Cloud and call it from dataflow (GCP). Job works fine in dev but when we deploy to prod we're seeing lots of connection timeout on the client side.
Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1213, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 570, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "main.py", line 159, in process
File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 152, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 1617, in search
body=body,
File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 390, in perform_request
raise e
File "/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py", line 365, in perform_request
timeout=timeout,
File "/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 258, in perform_request
raise ConnectionError("N/A", str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPSConnection object at 0x7fe5d04e5690>: Failed to establish a new connection: [Errno 110] Connection timed out) caused by: NewConnectionError(<urllib3.connection.HTTPSConnection object at 0x7fe5d04e5690>: Failed to establish a new connection: [Errno 110] Connection timed out)
I increased timeout setting in elasticsearch client to 300s like below but it didn't seem to help.
self.elasticsearch = Elasticsearch([es_host], http_auth=http_auth, timeout=300)
Looking at deployment at https://cloud.elastic.co/deployments//metrics CPU and memory usage are very low (below 10%) and search response time is also order of 200ms. What could be the bottleneck here and how we can we avoid such timeouts?
As seen in below log most of requests are failing with connection timeout while successful request receives response very quick:
I tried ssh into the VM where we experience the connection error. netstat
showed there were about 60 ESTABLISHED connections to the elastic search IP address. When I curl from the VM to elasticsearch address I was able to reproduce timeout. I can curl fine to other URLs. Also I can curl fine to elasticsearch from my local so issue is only connection between VM and elasticsaerch server.
Does dataflow (compute engine) or ElasticSearch has limitation on number of concurrent connection? I could not find any information online.
from Elasticsearch/dataflow - connection timeout after ~60 concurrent connection
No comments:
Post a Comment