Saturday, 14 August 2021

How to stay undetected while scraping the latest post from a social media site that requires login?

I've created a script using python in combination with selenium implementing proxies within it to log in to facebook and scrape the name of the user whose post is on top of my feed. I would like the script to do this every five minutes for an unlimited time.

As this continuous login may lead my account to ban, I thought to implement proxies within the script to do the whole stuff anonymously.

I've written so far:

import random
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def get_first_user(random_proxy):
    options = webdriver.ChromeOptions()
    prefs = {"profile.default_content_setting_values.notifications" : 2}
    options.add_experimental_option("prefs",prefs)
    options.add_argument(f'--proxy-server={random_proxy}')

    with webdriver.Chrome(options=options) as driver:
        wait = WebDriverWait(driver,10)
        driver.get("https://www.facebook.com/")
        driver.find_element_by_id("email").send_keys("username")
        driver.find_element_by_id("pass").send_keys("password",Keys.RETURN)
        user = wait.until(EC.presence_of_element_located((By.XPATH,"//h4[@id][@class][./span[./a]]/span/a"))).text
        return user

if __name__ == '__main__':
    proxies = [`list of proxies`]

    while True:
        random_proxy = proxies.pop(random.randrange(len(proxies)))
        print(get_first_user(random_proxy))
        time.sleep(60000*5)

How to stay undetected while scraping data continuously from a site that requires authentication?



from How to stay undetected while scraping the latest post from a social media site that requires login?

No comments:

Post a Comment