Saturday, 17 October 2020

How can I prevent or trap StopIteration exception in the yield-calling function?

A generator-returning function (i.e. one with a yield statement in it) in one of our libraries fails some tests due to an unhandled StopIteration exception. For convenience, in this post I'll refer to this function as buggy.

I have not been able to find a way for buggy to prevent the exception (without affecting the function's normal operation). Similarly, I have not found a way to trap the exception (with a try/except) within buggy.

(Client code using buggy can trap this exception, but this happens too late, because the code that has the information necessary to properly handle the condition leading to this exception is the buggy function.)

The actual code and test case I am working with are far too complicated to post here, so I have created a very simple, but also extremely artificial toy example that illustrates the problem.

First, the module with the buggy function:

# mymod.py

import csv  # essential!

def buggy(csvfile):
    with open(csvfile) as stream:

        reader = csv.reader(stream)

        # how to test *here* if either stream is at its end?

        for row in reader:
            yield row

As indicated by the comment, the use of the csv module (from the Python 3.x standard library) is an essential feature of this problem1.

The next file for the example is a script that is meant to stand in for "client code". In other word, this script's "real purpose" beyond this example is largely irrelevant. Its role in the example is to provide a simple, reliable way to elicit the problem with the buggy function. (Some of its code could be repurposed for a test case in a test suite, for example.)

#!/usr/bin/env python3

# myscript.py

import sys
import mymod

def print_row(row):
    print(*row, sep='\t')

def main(csvfile, mode=None):
    if mode == 'first':
        print_row(next(mymod.buggy(csvfile)))
    else:
        for row in mymod.buggy(csvfile):
            print_row(row)

if __name__ == '__main__':
    main(*sys.argv[1:])

The script takes the path to a CSV file as a mandatory argument, and an optional second argument. If the second argument is ommitted, or it is anything other than the string "first", the script will print to stdout the information in the CSV file, but in TSV format. If the second argument is the string "first", only the information in the first row will be so printed.

The StopIteration exception I am trying to trap arises when myscript.py script is invoked with an empty file and the string "first" as arguments2.

Here is an example of this code in action:

% cat ok_input.csv
1,2,3
4,5,6
7,8,9
% ./myscript.py ok_input.csv
1   2   3
4   5   6
7   8   9
% ./myscript.py ok_input.csv first
1   2   3
% cat empty_input.csv
# no output (of course)
% ./myscript.py empty_input.csv
# no output (as desired)
% ./myscript.py empty_input.csv first
Traceback (most recent call last):
  File "./myscript.py", line 19, in <module>
    main(*sys.argv[1:])
  File "./myscript.py", line 13, in main
    print_row(next(mymod.buggy(csvfile)))
StopIteration

Q: How can I prevent or trap this StopIteration exception in the lexical scope of the buggy function?


IMPORTANT: Please keep in mind that, in the example given above, the myscript.py script is stand-in for "client code", and is therefore outside of our control. This means that any approach that would require changing the myscript.py script would not solve the actual real-world problem, and therefore it would not be an acceptable answer to this question.

One important difference between the simple example shown above and our actual situation is that in our case, the problematic input stream does not come from an empty file. The problem arises in cases where buggy (or, rather, its real-world counterpart) reaches the end of this stream "too early", so to speak.

I think it may be enough if I could test whether either stream is at its end, before the for row in reader: line, but I have not figured a way to do this either. Testing whether the value returned by stream.read(1) is 0 or 1 will tell me if stream is at its end, but in the latter case stream's internal pointer will be left pointing one byte too far into csvfile's content. (Neither stream.seek(-1, 1) nor stream.tell() work at this point.)


Lastly, to anyone who would like post an answer to this question: it would be most efficient if you were to take advantage of the example code I have provided above to test your proposal before posting it.


EDIT: One variation of mymod.py that I tried was this:

import csv  # essential!

def buggy(csvfile):
    with open(csvfile) as stream:

        reader = csv.reader(stream)

        try:
            firstrow = next(reader)
        except StopIteration:
            firstrow = None

        if firstrow != None:
            yield firstrow

        for row in reader:
            yield row

This variation fails with pretty much the same error message as does the original version.

When I first read @mcernak's proposal, I thought that it was pretty similar to the variation above, and therefore expected it to fail too. Then I was pleasantly surprised to discover that this is not the case! Therefore, as of now, there is one definite candidate to get bounty. That said, I would love to understand why the variation above fails to trap the exception, while @mcernak's succeeds.


1 The actual case I'm dealing with is legacy code; switching from the csv module to some alternative is not an option for us in the short term.

2 Please, disregard entirely the question of what this demonstration script's "right response should be" when it gets invoked with an empty file and the string "first" as arguments. The particular combination of inputs that elicits the StopIteration exception in this post's demonstration does not represent the real-world condition that causes our code to emit the problematic StopIteration exception. Therefore, the "correct response", whatever that may be, of the demonstration script to the empty file plus "first" string combination would be irrelevant to the real-world problem I am dealing with.



from How can I prevent or trap StopIteration exception in the yield-calling function?

No comments:

Post a Comment