I'm trying not to show/get some error thrown by scrapy within process_response
in RetryMiddleware
. The error the script encounters when max retry limit is crossed. I used proxies within middleware. The weird thing is that the exception the script throws is already within the EXCEPTIONS_TO_RETRY
list. It is completely okay that the script may sometimes cross the number of max retries without any success. However, I just do not wish to see that error even when it is there, meaning suppress or bypass it.
The error is like:
Traceback (most recent call last):
File "middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.TCPTimedOutError: TCP connection timed out: 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond..
This is how process_response
within RetryMiddleware
looks like:
class RetryMiddleware(object):
cus_retry = 3
EXCEPTIONS_TO_RETRY = (defer.TimeoutError, TimeoutError, DNSLookupError, \
ConnectionRefusedError, ConnectionDone, ConnectError, \
ConnectionLost, TCPTimedOutError, TunnelError, ResponseFailed)
def process_exception(self, request, exception, spider):
if isinstance(exception, self.EXCEPTIONS_TO_RETRY) \
and not request.meta.get('dont_retry', False):
return self._retry(request, exception, spider)
def _retry(self, request, reason, spider):
retries = request.meta.get('cus_retry',0) + 1
if retries<=self.cus_retry:
r = request.copy()
r.meta['cus_retry'] = retries
r.meta['proxy'] = f'https://{ip:port}'
r.dont_filter = True
return r
else:
print("done retrying")
How can I get rid of the errors in EXCEPTIONS_TO_RETRY
?
from Unable to get rid of some error raised by process_exception
No comments:
Post a Comment