I'm trying to scrape the following site
I'm able to receive a response but i don't know how can i access the inner data of the below items in order to scrape it:
I noticed that accessing the items is actually handled by JavaScript and also the pagination.
What should i do in such case?
Below is my code:
import scrapy
from scrapy_splash import SplashRequest
class NmpaSpider(scrapy.Spider):
name = 'nmpa'
http_user = 'hidden' # as am using Cloud Splash
allowed_domains = ['nmpa.gov.cn']
def start_requests(self):
yield SplashRequest('http://app1.nmpa.gov.cn/data_nmpa/face3/base.jsp?tableId=27&tableName=TABLE27&title=%E8%BF%9B%E5%8F%A3%E5%8C%BB%E7%96%97%E5%99%A8%E6%A2%B0%E4%BA%A7%E5%93%81%EF%BC%88%E6%B3%A8%E5%86%8C&bcId=152904442584853439006654836900', args={
'wait': 5}
)
def parse(self, response):
goal = response.xpath("//*[@id='content']//a/@href").getall()
print(goal)
from Scrapy Splash, How to deal with onclick?
No comments:
Post a Comment