Friday, 22 March 2019

Given list of websites, search and return information in Python

I created a function that returns a list of urls given a specific companies name. I want to know search through this list of urls and find information on whether the company is owned by another company.

Example: The company "Marketo" was acquired by Adobe.

I want to return whether some company was acquired and by whom.

Here is what I have so far:

import requests
from googlesearch import search
from bs4 import BeautifulSoup as BS


def get_url(company_name):
    url_list = []
    for url in search(company_name, stop=10):
        url_list.append(url)
    return url_list


test1 = get_url('Marketo')
print(test1[7])


r = requests.get(test1[7])
html = r.text
soup = BS(html, 'lxml')
stuff = soup.find_all('a')


print(stuff)

I am new to web scraping and I have no idea how to really search through each URL (assuming I can) and find the information I seek.

The value of test1 is the following list:

['https://www.marketo.com/', 'https://www.marketo.com/software/marketing-automation/', 'https://blog.marketo.com/', 'https://www.marketo.com/software/', 'https://www.marketo.com/company/', 'https://www.marketo.com/solutions/pricing/', 'https://www.marketo.com/solutions/', 'https://en.wikipedia.org/wiki/Marketo', 'https://www.linkedin.com/company/marketo', 'https://www.cmswire.com/digital-marketing/what-is-marketo-a-marketers-guide/']



from Given list of websites, search and return information in Python

No comments:

Post a Comment