Monday 30 August 2021

Scrapy Splash, How to deal with onclick?

I'm trying to scrape the following site

I'm able to receive a response but i don't know how can i access the inner data of the below items in order to scrape it:

I noticed that accessing the items is actually handled by JavaScript and also the pagination.

What should i do in such case?

enter image description here

Below is my code:

import scrapy
from scrapy_splash import SplashRequest


class NmpaSpider(scrapy.Spider):
    name = 'nmpa'
    http_user = 'hidden' # as am using Cloud Splash
    allowed_domains = ['nmpa.gov.cn']

    def start_requests(self):
        yield SplashRequest('http://app1.nmpa.gov.cn/data_nmpa/face3/base.jsp?tableId=27&tableName=TABLE27&title=%E8%BF%9B%E5%8F%A3%E5%8C%BB%E7%96%97%E5%99%A8%E6%A2%B0%E4%BA%A7%E5%93%81%EF%BC%88%E6%B3%A8%E5%86%8C&bcId=152904442584853439006654836900', args={
            'wait': 5}
        )

    def parse(self, response):
        goal = response.xpath("//*[@id='content']//a/@href").getall()
        print(goal)


from Scrapy Splash, How to deal with onclick?

No comments:

Post a Comment