Sunday, 4 August 2019

Script grabs fewer content out of many

I'm trying to get different college names and their rankings from a webpage. The script I've tried with can parse the first few names and their rankings accordingly.

However, there are 233 names and their rankings in that page but they can only be visible when that page is made to scroll downward. The thing is when the page is scrolled downward, the url is still the same and for that reason I can't create any logic to deal with pagination.

Website address

I do not wish to go for selenium and that is the reason I create this post to solve the same using requests.

I've written so far (grabs the first few records):

import requests
from bs4 import BeautifulSoup

url = 'https://www.usnews.com/best-colleges/rankings/national-liberal-arts-colleges'

r = requests.get(url,headers={'User-Agent':'Mozilla/5.0'})
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select("[id^='school-']"):
    name = item.select_one("[class^='DetailCardColleges__StyledAnchor']").text
    rank = item.select_one("[class^='ranklist-ranked-item'] > strong").text
    print(name,rank)

How can I parse all the names and their rankings using requests?



from Script grabs fewer content out of many

No comments:

Post a Comment