Thursday, 30 May 2019

Can etree.XMLParser in recover mode still throw a parse error?

I have a utility method that parses XML using a parser created as etree.XMLParser(recover=True). I would like to test failure scenarios in a unit test. Except for empty input throwing an lxml.etree.XMLSyntaxError, I can't seem to break the parser.

My question is: is it possible to construct a StringIO or BytesIO input for this parser such that the parser throws a parse error?

Here's some examples (tested with Python 3.5 and lxml 4.3.3):

from io import BytesIO
from lxml import etree


def parse(xml):
    parser = etree.XMLParser(recover=True)
    elem = etree.parse(BytesIO(xml), parser)
    print(etree.tostring(elem))


parse(b'<broken<')  # prints b'<broken/>'
parse(b'</lf|\jf>')  # prints None
parse('<?xml encoding="ascii"?><foo>æøå</foo>'.encode('utf-8'))  # prints b'<foo/>'
parse(b'')  # Throws lxml.etree.XMLSyntaxError



from Can etree.XMLParser in recover mode still throw a parse error?

No comments:

Post a Comment