Wednesday, 10 July 2019

Why doesn't the Python interpreter return the explicit SyntaxError message?

When looking at CPython's tokenizer.c, the tokenizer returns specific error messages.

As an example, you can take a look at the part where the tokenizer tries to parse a decimal number. When trying to parse the number 5_6 everything should be OK, but when trying to parse the number 5__6 the tokenizer should return a SyntaxError with the message "invalid decimal literal":

static int
tok_decimal_tail(struct tok_state *tok)
{
    int c;

    while (1) {
        do {
            c = tok_nextc(tok);
        } while (isdigit(c));
        if (c != '_') {
            break;
        }
        c = tok_nextc(tok);
        if (!isdigit(c)) {
            tok_backup(tok, c);
            syntaxerror(tok, "invalid decimal literal");
            return 0;
        }
    }
    return c;
}

Using Python, I've tried to reach the tokenizer's SyntaxError message:

In [12]: try: 
    ...:     eval('5__6') 
    ...: except SyntaxError as e: 
    ...:     print(e.args, e.filename, e.lineno, e.msg, e.text) 

('invalid token', ('<string>', 1, 2, '5__6')) <string> 1 invalid token 5__6

Is there any way to extract the SyntaxError message from the tokenizer?



from Why doesn't the Python interpreter return the explicit SyntaxError message?

No comments:

Post a Comment