Wednesday, 13 February 2019

Can't store downloaded files in their concerning folders

I've written a script in python in combination with selenium to download few document files (ending with .doc) from a webpage. The reason I do not wish to use requests or urllib module to download the files is because the website I'm currently palying with do not have any true url connected to each file. They are javascript encrypted. However, I've chosen a link within my script to mimic the same.

What my script does at this moment:

  1. Create a master folder in the desktop
  2. Create subfolders within the master folder taking the name of the files to be downloaded
  3. Download files initiating click on their links and put the files in master folder. (this is what I need rectified)

How can I modify my script to download the files initiating click on their links and put the downloaded files in their concerning folders?

This is my try so far:

import os
import time
from selenium import webdriver

link ='https://www.online-convert.com/file-format/doc' 

dirf = os.path.expanduser('~')
desk_location = dirf + r'\Desktop\file_folder'
if not os.path.exists(desk_location):os.mkdir(desk_location)

def download_files():
    driver.get(link)
    for item in driver.find_elements_by_css_selector("a[href$='.doc']")[:2]:
        filename = item.get_attribute("href").split("/")[-1]
        #creating new folder in accordance with filename to store the downloaded file in thier concerning folder
        folder_name = item.get_attribute("href").split("/")[-1].split(".")[0]
        #set the new location of the folders to be created
        new_location = os.path.join(desk_location,folder_name)
        if not os.path.exists(new_location):os.mkdir(new_location)
        #set the location of the folders the downloaded files will be within
        file_location = os.path.join(new_location,filename)
        item.click()

        time_to_wait = 10
        time_counter = 0
        try:
            while not os.path.exists(file_location):
                time.sleep(1)
                time_counter += 1
                if time_counter > time_to_wait:break
        except Exception:pass

if __name__ == '__main__':
    chromeOptions = webdriver.ChromeOptions()
    prefs = {'download.default_directory' : desk_location,
            'profile.default_content_setting_values.automatic_downloads': 1
        }
    chromeOptions.add_experimental_option('prefs', prefs)
    driver = webdriver.Chrome(chrome_options=chromeOptions)
    download_files()

The following image represents how the downloaded files are currently stored (the files are outside of their concerning folders):

enter image description here



from Can't store downloaded files in their concerning folders

No comments:

Post a Comment