Hemant Vishwakarma: Getting data from hidden html (popup) using BS4

Friday, 20 July 2018

Getting data from hidden html (popup) using BS4

I am trying to scrape the name of a link in a popup in wikipedia. So when you hover a link in wikipedia, it brings up a little snippet from the intro to that link. I need to scrape that information but I am unsure where it would be in the source. When I inspect the element(as it is popped up) this is the html (for this example I am hovering over the link "Greek")

<a dir="ltr" lang="en" class="mwe-popups-extract" href="/wiki/Ancient_Greek"> 
<p>The <b>Ancient Greek</b> language includes the forms of Greek...(a bunch more text)...</p></a>

What I need to extract is the href which = "/wiki/Ancient_Greek" but this piece of html disappears when I am not hovering the link. Is there a way (with BS4 and python) to extract this information with the source html I am scraping?

EDIT: I can't afford to make additional calls to webpages because the project takes long to run as it is. If there is anyway to change how I am retrieving the source such that I can get the popup information that would be helpful. This project is giant and getting this popup information is crucial.

any suggestions at all that don't require a complete rebuild of the project is extremely appreciated-- I am using urllib to pull source(with requests) and bs4 to scrape through.

from Getting data from hidden html (popup) using BS4

Hemant Vishwakarma

Friday, 20 July 2018

Getting data from hidden html (popup) using BS4

No comments:

Post a Comment