Thursday, 9 September 2021

Extract media files from cache using Selenium

I'm trying to download some videos from a website using Selenium.

Unfortunately I can't download it from source cause the video is stored in a directory with restricted access, trying to retrieve them using urllib, requests or ffmpeg returns a 403 Forbidden error, even after injecting my user data to the website.

I was thinking of playing the video in its entirety and store the media file from cache.

Would it be a possibility? Where can I find the cache folder in a custom profile? How do I discriminate among files in cache?

EDIT: This is what I attempted to do using requests

import requests


def main():

    s = requests.Session()

    login_page = '<<login_page>>'
    login_data = dict()
    login_data['username'] = '<<username>>'
    login_data['password'] = '<<psw>>'

    login_r = s.post(login_page)

    video_src = '<<video_src>>'

    cookies = dict(login_r.cookies) # contains the session cookie

    # static cookies for every session
    cookies['_fbp'] = 'fb.1.1630500067415.734723547'
    cookies['_ga'] = 'GA1.2.823223936.1630500067'
    cookies['_gat'] = '1'
    cookies['_gid'] = 'GA1.2.1293544716.1631011551'
    cookies['user'] = '66051'

    video_r = s.get(video_src, cookies=cookies)
    print(video_r.status_code)



if __name__ == '__main__':
    main()

The print() function returns:

403

This is the network tab for the video:

enter image description here



from Extract media files from cache using Selenium

No comments:

Post a Comment