Friday 6 November 2020

Using Python Selenium to download a file in memory, not in disk

I have a bunch of scripts that do web scrapping, download files, and read them with pandas. This process has to be deployed in a new architecture where download the files on disk is not appropriate, instead is preferable to save the file in memory and read it with pandas from there. For demonstration purposes I leave here a web scrapping script that downloads an excel file from a random website:

import time
import pandas as pd
from io import StringIO, BytesIO
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from datetime import date, timedelta
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


pathDriver = #Path to chromedriver

driver  = webdriver.Chrome(executable_path=pathDriver)

url = 'https://file-examples.com/index.php/sample-documents-download/sample-xls-download/'

driver.get(url)
time.sleep(1)

file_link = driver.find_element_by_xpath('//*[@id="table-files"]/tbody/tr[1]/td[5]/a[1]')
file_link.click()

This script effectively downloads the file in my Downloads folder. What I've tried is to put a StringIO() or BytesIO() stream before and after the click() method and read the object similiar to this:

file_object = StringIO()
df = pd.read_excel(file_object.read())

But the file_object doesn't capture the file and even the file is still downloaded in my disk.

Any suggestions with that?



from Using Python Selenium to download a file in memory, not in disk

No comments:

Post a Comment