When looking at CPython's tokenizer.c, the tokenizer returns specific error messages.
As an example, you can take a look at the part where the tokenizer tries to parse a decimal number. When trying to parse the number 5_6 everything should be OK, but when trying to parse the number 5__6 the tokenizer should return a SyntaxError with the message "invalid decimal literal":
static int
tok_decimal_tail(struct tok_state *tok)
{
int c;
while (1) {
do {
c = tok_nextc(tok);
} while (isdigit(c));
if (c != '_') {
break;
}
c = tok_nextc(tok);
if (!isdigit(c)) {
tok_backup(tok, c);
syntaxerror(tok, "invalid decimal literal");
return 0;
}
}
return c;
}
Using Python, I've tried to reach the tokenizer's SyntaxError message:
In [12]: try:
...: eval('5__6')
...: except SyntaxError as e:
...: print(e.args, e.filename, e.lineno, e.msg, e.text)
('invalid token', ('<string>', 1, 2, '5__6')) <string> 1 invalid token 5__6
Is there any way to extract the SyntaxError message from the tokenizer?
from Why doesn't the Python interpreter return the explicit SyntaxError message?
No comments:
Post a Comment