Sunday, 27 June 2021

Why search term with space does not parse correctly in pyparsing?

My inputs are given like key: "a word" or like anotherkey: "a word (1234)". My issue is that I have used below syntax:

word = pp.Word(pp.printables, excludeChars=":")
word = ("[" + pp.Word(pp.printables + " ", excludeChars=":[]") + "]") | word
non_tag = word + ~pp.FollowedBy(":")
# tagged value is two words with a ":"
tag = pp.Group(word + ":" + word)
# one or more non-tag words - use originalTextFor to get back
# a single string, including intervening white space
phrase = pp.originalTextFor(non_tag[1, ...])
parser = (phrase | tag)[...]

When my inputs are like key: "value1" and hey you how are you? it translates the query to expected output which is ([(['key', ':', '"value1"'], {}), 'and hey you how are you?'], {}), but problem occures when I try to have space between my value after key:

parser.parseString('key: "Microsoft windows (12932)" and hey you how are you?')
([(['key', ':', '"Microsoft'], {}), 'windows (12932)" and hey you how are you?'], {})

It breaks on Microsoft and windows. I know `pyparsing ignores spaces, but how can I solve this issue and get results until the end of the phrase which is double quotes?


EDIT-1 I tried to work around this problem by adding another word like below:

word = ('"' + pp.Word(pp.printables + " ", excludeChars=':"') + '"') | word

It works on queries like key: "windows server (23232)" but not on more complex queries like key1: value and key2: "windows server (1212)". Anyone has any clue about this issue and how should I circumvent this buggy behavior?


EDIT-2 What do I expect? What I need is to extend my grammar so something like below query:

'key: "Microsoft windows (12932)" and hey you how are you?

It should NOT be:

([(['key', ':', '"Microsoft'], {}), 'windows (12932)" and hey you how are you?'], {})

IT should be like:

([(['key', ':', '"Microsoft windows (12932)"'], {}), 'and hey you how are you?'], {})

This query can get combined with more keys with a free text search like below:

A free text search and key1: "Microsoft windows (12312) and key2: "Sample2" or key3: "Another sample (121212)"

This should also get parsed like below:

part1-> A free text search and
part2: ['key1', ':', '"Microsoft windows (12932)"']
part3: ['key2', ':', '"Sample2"']
part3: ['key3', ':', '"Another sample (121212)"']

NOTE: if and, or is attached to tokens it is OK for me. I just need to separate free text search from key:value queries.



from Why search term with space does not parse correctly in pyparsing?

No comments:

Post a Comment