Saturday, 29 December 2018

Unable to rectify the logic within my script to make it stop when it's done

I've written a script in python using proxies to scrape the links of different posts traversing different pages of a webpage. I've tried to make use of proxies from a list. The script is supposed to take random proxies from the list and send request to that website and finally parse the items. However, if any proxy is not working then it should be kicked out from the list.

My script is doing it's job in a faulty way, meaning it just keeps parsing on and on until all the proxies in the list are exhausted whereas the links have already been parsed.

What I'm trying to do is bring about any change within my script so that it will break as soon as the links are parsed no matter if there are still proxies in the list otherwise the script will keep scraping on the same items repeatedly.

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from multiprocessing.pool import ThreadPool
from itertools import cycle

base_url = 'https://stackoverflow.com/questions/tagged/web-scraping'
lead_url = ["https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page={}&pagesize=15".format(page) for page in range(1,6)]

proxyVault = ['104.248.159.145:8888', '113.53.83.252:54356', '206.189.236.200:80', '218.48.229.173:808', '119.15.90.38:60622', '186.250.176.156:42575']

def make_requests(lead_url):
    while len(proxyVault)>0:   
        pitem = cycle(proxyVault)
        proxy = {'https':'http://{}'.format(next(pitem))}
        try:
            res = requests.get(lead_url,proxies=proxy)
            soup = BeautifulSoup(res.text,"lxml")
            [get_title(proxy,urljoin(base_url,item.get("href"))) for item in soup.select(".summary .question-hyperlink")]
        except Exception: 
            proxyVault.pop(0)

def get_title(proxy,itemlink):
    res = requests.get(itemlink,proxies=proxy)
    soup = BeautifulSoup(res.text,"lxml")
    print(soup.select_one("h1[itemprop='name'] a").text)

if __name__ == '__main__':
    ThreadPool(10).map(make_requests, lead_url)

Btw, the proxies used above are just placeholders.



from Unable to rectify the logic within my script to make it stop when it's done

Convert an equation to Python

I have several equations and need to convert it into Python. The problem is that I tried to plot a graph according to the equation. However, the graph that I get is not the same as the original one.

In the paper, the equation of error probability for MIM attack is given by:

First Image

Screen Shot

Second Image

Screen Shot

The equation to calculate the error probability of PNS attack is given by:

Screen Shot

Where the region condition satisfied:

Screen Shot

The error probability of PNS attack should be plotted like this:

screen Shot

My question: How to insert equation 8.1 into equation 8.5?

This is my python code according to equation 8.5:

import matplotlib.pyplot as plt
import math
import numpy as np
from scipy.special import iv,modstruve


x=[0, 5, 10, 15, 20]
t= 0.9
x = np.array(x)
y = (np.exp(x*t/2)*(iv(0, x*t/2) - modstruve(0,x*t/2))-1)/(np.exp(x*t/2-1))                                            

plt.plot(x, y, label='Normal')
plt.xlabel('Mean photon number N')
plt.ylabel('Error probabiity')
plt.scatter(x,y)
plt.title('N/2')
plt.ylim([0, 0.5])
plt.legend()
plt.show()

Please help me regarding this matter.

Thank you.



from Convert an equation to Python

Get the format in dateutil.parse

Is there a way to get the "format" after parsing a date in dateutil. For example something like:

>>> x = parse("2014-01-01 00:12:12")
datetime.datetime(2014, 1, 1, 0, 12, 12)

x.get_original_string_format()
YYYY-MM-DD HH:MM:SS # %Y-%m-%d %H:%M:%S

# Or, passing the date-string directly
get_original_string_format("2014-01-01 00:12:12")
YYYY-MM-DD HH:MM:SS # %Y-%m-%d %H:%M:%S


Update: I'd like to add a bounty to this question to see if someone could add an answer that would do the equivalent on getting the string-format of a common date-string passed. It can use dateutil if you want, but it doesn't have to. Hopefully we'll get some creative solutions here.



from Get the format in dateutil.parse