I'm trying to use https
proxy within async requests making use of asyncio library. When it comes to use http
proxy, there is a clear instruction here but I get stuck in case of using https
proxy. Moreover, I would like to reuse the same session, not creating a new session every time I send a requests.
I've tried so far (proxies used within the script are directly taken from a free proxy site, so consider them as placeholders
):
import asyncio
import aiohttp
from bs4 import BeautifulSoup
proxies = [
'http://89.22.210.191:41258',
'http://91.187.75.48:39405',
'http://103.81.104.66:34717',
'http://124.41.213.211:41828',
'http://93.191.100.231:3128'
]
async def get_text(url):
global proxies,proxy_url
while True:
check_url = proxy_url
proxy = f'http://{proxy_url}'
print("trying using:",check_url)
async with aiohttp.ClientSession() as session:
try:
async with session.get(url,proxy=proxy,ssl=False) as resp:
return await resp.text()
except Exception:
if check_url == proxy_url:
proxy_url = proxies.pop()
async def field_info(field_link):
text = await get_text(field_link)
soup = BeautifulSoup(text,'lxml')
for item in soup.select(".summary .question-hyperlink"):
print(item.get_text(strip=True))
if __name__ == '__main__':
proxy_url = proxies.pop()
links = ["https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page={}&pagesize=50".format(page) for page in range(2,5)]
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(asyncio.gather(*(field_info(url) for url in links)))
loop.run_until_complete(future)
loop.close()
How can I use https
proxies within the script along with reusing the same session
?
from Can't use https proxies along with reusing the same session within a script built upon asyncio
No comments:
Post a Comment