Tuesday 24 November 2020

Why does requests stop?

I am making this app using tkinter and requests which is supposed to be like a download manager. I am using requests and recently I found out about the stream keyword argument in the requests.get(url) function to be able to write down the content while it is being downloaded. My problem is that when the user downloads multiple files or just big files requests just seems to stop. The weird part is that it does not raise an error like it is an expected behavior. Why does this happen? How can I resolve this? Simple version of the download without the GUI (I found out that it has a bit of a problem with this specific url):

import requests
import time

url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
# headers = r.headers
name = url.split('/')[-1].split('.')[0]
print(name)
format_name = '.' + headers['Content-Type'].split('/')[1]
file_size = int(headers['Content-Length'])
downloaded = 0
print(name + format_name)
start = last_print = time.time()
with open(name + format_name, 'wb') as fp:
    for chunk in r.iter_content(chunk_size=4096):
        downloaded += fp.write(chunk)
        now = time.time()
        if now - last_print >= 1:
            pct_done = round(downloaded / file_size * 100)
            speed = round(downloaded / (now - start) / 1024)
            print(f"Download {pct_done} % done, avg speed {speed} kbps")
            last_print = time.time()

UPDATE: I checked two other stackoverflow issues that could have an answer but apparently there questions were remained unanswered as well (link: Streaming download large file with python-requests interrupting, link: What exactly is Python's file.flush() doing?). I tried using both functions mentioned as a solution in the issues yet some of the downloads still stop. The new version of the code:

import requests
import time
import os

url = "https://aspb2.cdn.asset.aparat.com/aparat-video/a5e07b7f62ffaad0c104763c23d7393215613675-1080p.mp4?wmsAuthSign=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ0b2tlbiI6Ijg0ZTVmNjhhMGJkNDJlMmM0MWFjZjgyNzY5YWU4NmMzIiwiZXhwIjoxNjA1NzM3NjIxLCJpc3MiOiJTYWJhIElkZWEgR1NJRyJ9.eaqnWYevFhe-CHG1TGR3SuoTbnVNBEJmLj-ZSxjtNbY"
headers = requests.head(url, headers={'accept-encoding': ''}).headers
print(headers)
r = requests.get(url, allow_redirects=True, stream=True)
name = url.split('/')[-1].split('.')[0]
print(name)
format_name = '.' + headers['Content-Type'].split('/')[1]
file_size = int(headers['Content-Length'])
downloaded = 0
print(name + format_name)
start = last_print = time.time()
with open(name + format_name, 'wb') as fp:
    for chunk in r.iter_content(chunk_size=4096):
        downloaded += fp.write(chunk)
        # Added the 'flush' and 'fsync' function as mentioned in the issues
        fp.flush()
        os.fsync(fp.fileno())
        now = time.time()
        if now - last_print >= 1:
            pct_done = round(downloaded / file_size * 100)
            speed = round(downloaded / (now - start) / 1024)
            print(f"Download {pct_done} % done, avg speed {speed} kbps")
            last_print = time.time()

Even after adding these two functions, requests seems to stop. I have a suspicion that requests sometimes fails to keep the connection because in certain times of the day when my internet is not as strong, this problem occurs the most but again I don't understand why it does not raise an error like urllib. If this is not the case, then how can I solve this?



from Why does requests stop?

No comments:

Post a Comment