I'm facing a very strange issue and am looking for advice on how to debug it more than I am on a simple fix, since I've been unable to create a simple reproducible case.
Over the course of a few hours I'm opening 10,000-100,000 async requests to remote web domains with httpx. Specifically I'm using a shared pool of context managers to share TCP sockets / other resources across requests. I'll only have a few thousand requests pending at any one time. My code at its core is doing the following:
from random import choice
from httpx import AsyncClient
clients = []
for _ in range(200):
client = AsyncClient()
clients.append(await client.__aopen__())
async def run_request(url):
try:
client = choice(clients)
response = await client.get(url, timeout=15)
except Exception as e:
continue
with ProcessPoolExecutor() as executor:
await gather(
*[
asyncio.get_event_loop().run_in_executor(
executor,
partial(run_request, url=url)
)
for url in urls
]
)
Sometimes the exception loop throws in the case of a timeout or an inaccessible host.
At some point my whole machine will hang when trying to create new connections. Chrome freezes, a locally hosted postgres instance freezes, even lsof -i -a freezes. Yet none actually timeout, they just spin forever. It seems as if the OS is unable to allocate new sockets in order to communicate with remote hosts, but I'm not sure if that explains the postgres or lsof behavior.
Is it possible socket opens are being leaked and not released, despite the context manager? Has anyone seen something similar? What are the profiling methods to explore to determine the root cause?
from Mac web requests hanging after thousands of requests
No comments:
Post a Comment