Friday 20 July 2018

Getting data from hidden html (popup) using BS4

I am trying to scrape the name of a link in a popup in wikipedia. So when you hover a link in wikipedia, it brings up a little snippet from the intro to that link. I need to scrape that information but I am unsure where it would be in the source. When I inspect the element(as it is popped up) this is the html (for this example I am hovering over the link "Greek")

<a dir="ltr" lang="en" class="mwe-popups-extract" href="/wiki/Ancient_Greek"> 
<p>The <b>Ancient Greek</b> language includes the forms of Greek...(a bunch more text)...</p></a> 

What I need to extract is the href which = "/wiki/Ancient_Greek" but this piece of html disappears when I am not hovering the link. Is there a way (with BS4 and python) to extract this information with the source html I am scraping?

EDIT: I can't afford to make additional calls to webpages because the project takes long to run as it is. If there is anyway to change how I am retrieving the source such that I can get the popup information that would be helpful. This project is giant and getting this popup information is crucial.

any suggestions at all that don't require a complete rebuild of the project is extremely appreciated-- I am using urllib to pull source(with requests) and bs4 to scrape through.



from Getting data from hidden html (popup) using BS4

No comments:

Post a Comment