Monday, 17 December 2018

Form Request Using Scrapy + Splash

I am trying to login to a website using the following code (slightly modified for this post):

import scrapy
from scrapy_splash import SplashRequest
from scrapy.crawler import CrawlerProcess

class Login_me(scrapy.Spider):
    name = 'espn'
    allowed_domains = ['games.espn.com']
    start_urls = ['http://games.espn.com/ffl/leaguerosters?leagueId=774630']

    def start_requests(self):
        script = """
        function main(splash)
                local url = splash.args.url

                assert(splash:go(url))
                assert(splash:wait(10))

                local search_input = splash:select('input[type=email]')   
                search_input:send_text("user email")

                local search_input = splash:select('input[type=password]')
                search_input:send_text("user password!")

                assert(splash:wait(10))
                local submit_button = splash:select('input[type=submit]')
                submit_button:click()

                assert(splash:wait(10))

                return html = splash:html()
              end
            """

        yield SplashRequest(
            'http://games.espn.com/ffl/leaguerosters?leagueId=774630',
            callback=self.after_login,
            endpoint='execute',
            args={'lua_source': script}
            )
        def after_login(self, response):
            table = response.xpath('//table[@id="playertable_0"]')
            for player in table.css('tr[id]'):
                 item = {
                         'id': player.css('::attr(id)').extract_first(),
                        }    
                 yield item
            print(item)

I am getting the error:

<GET http://games.espn.com/ffl/signin?redir=http%3A%2F%2Fgames.espn.com%2Fffl%2Fleaguerosters%3FleagueId%3D774630> from <GET http://games.espn.com/ffl/leaguerosters?leagueId=774630>
2018-12-14 16:49:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://games.espn.com/ffl/signin?redir=http%3A%2F%2Fgames.espn.com%2Fffl%2Fleaguerosters%3FleagueId%3D774630> (referer: None)
2018-12-14 16:49:04 [scrapy.core.scraper] ERROR: Spider error processing <GET http://games.espn.com/ffl/signin?redir=http%3A%2F%2Fgames.espn.com%2Fffl%2Fleaguerosters%3FleagueId%3D774630> (referer: None)

I am still not able to login, for some reason. I have bounced around many different posts on here, and have tried many different variation of "splash:select", but I can't seem to find my issue. When i inspect the webpage with chrome, I see this (with a similar html for the password):

 <input type="email" placeholder="Username or Email Address" autocapitalize="none" autocomplete="on" autocorrect="off" spellcheck="false" ng-model="vm.username" 
ng-pattern="/^[^<&quot;>]*$/" ng-required="true" did-disable-validate="" ng-focus="vm.resetUsername()" class="ng-pristine ng-invalid ng-invalid-required 
ng-valid-pattern ng-touched" tabindex="0" required="required" aria-required="true" aria-invalid="true">

The above html, I believe is written in JS though. So I am not able to grab it with Scrapy, so, I viewed the source of the page and I think the relevant JS code to use with Splash is this (not sure though):

function authenticate(params) {
        return makeRequest('POST', '/guest/login', {
            'loginValue': params.loginValue,
            'password': params.password
        }, {
            'Authorization': params.authorization,
            'correlation-id': params.correlationId,
            'conversation-id': params.conversationId,
            'oneid-reporting': buildReportingHeader(params.reporting)
        }, {
            'langPref': getLangPref()
        });
    }

Can someone nudge me in the right direction?



from Form Request Using Scrapy + Splash

No comments:

Post a Comment