Tuesday, 6 October 2020

Download "401 Unauthorized" video with selenium

I'm trying to create a bot that will download videos from this site named "Sdarot" using selenium and python3.

Each video (or episode) in the site has a unique page and URL. When you load an episode, you have to wait 30 seconds for the episode to "load", and only then the <video> tag appears in the HTML source file.

The problem is that the request for the video is encrypted or secured in one way or another (I don't really understand how it works)! When I try to simply wait for the video tag to appear and then download the video with the urllib library (see code below), I get the following error: urllib.error.HTTPError: HTTP Error 401: Unauthorized

I should note that when I try to open the link of the download video in the selenium driver, it opens completely fine and I can download it manually.

How can I download the videos automatically? Thanks in advance!

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import urllib.request


def load(driver, url):

    driver.get(url)  # open the page in the browser

    try:
        # wait for the episode to "load"
        # if something is wrong and the episode doesn't load after 45 seconds,
        # the function will call itself again and try to load again.
        continue_btn = WebDriverWait(driver, 45).until(
            EC.element_to_be_clickable((By.ID, "proceed"))
        )
    except:
        load(url)


def save_video(driver, filename):

    video_element = driver.find_element_by_tag_name(
        "video")  # get the video element
    video_url = video_element.get_property('src')  # get the video url
    # trying to download the video
    urllib.request.urlretrieve(video_url, filename)
    # ERROR: "urllib.error.HTTPError: HTTP Error 401: Unauthorized"


def main():

    URL = r'https://www.sdarot.dev/watch/339-%D7%94%D7%A4%D7%99%D7%92-%D7%9E%D7%95%D7%AA-ha-pijamot/season/1/episode/23'

    DRIVER = webdriver.Chrome()
    load(DRIVER, URL)
    video_url = save_video(DRIVER, "video.mp4")


if __name__ == "__main__":
    main()


from Download "401 Unauthorized" video with selenium

No comments:

Post a Comment